Close

MIT Covid19 Datathon

Last week, I had the opportunity to participate as a mentor in the MIT COVID-19 Datathon working on coronavirus infodemic to identify myths, misinformation, and fake news associated with the pandemic. Out of 2,750 participants and 412 teams, our team qualified to be the top 10 semi-finalists.

The MIT COVID-19 Datathon is a week-long virtual event where teams of data scientists, clinicians, public health professionals and other subject matter experts come together to develop meaningful insights leveraging existing datasets to influence policy and decision making in the public and private sector.

There were 5 tracks for this Datathon namely

Measuring the Impact of Policies around COVID-19
Misinformation during the Pandemic
Disparities in Health Outcomes from COVID-19
Epidemiology of COVID-19
Megacity Pandemic Response in NYC

Our team's work was focused on the track Misinformation during the Pandemic such as fake news propaganda, click bait or incorrectly reported news.

Some of the resources from the hackathon are listed below. These are helpful for researchers, academics, and practitioners. Courtesy of MIT datathon participants, organizers, and mentors. Huge thanks for the organizers Leo Anthony CeliFreddy Nguyen, MD, PhD and all others.

Resources & References

Facebook COVID misinformation
https://ai.facebook.com/blog/using-ai-to-detect-covid-19-misinformation-and-exploitative-content

Daily summaries of the latest COVID-19 literature and research

https://www.covid19lst.org/

Coronavirus Tech Handbook

https://coronavirustechhandbook.com/contents

Federal COVID-19 Legislative Tracker

https://coronavirus.skoposlabs.com/#skopos

COVID-19 Critical Care Planning Resources Library

COVID-19: living map of the evidence

http://eppi.ioe.ac.uk/cms/Default.aspx?tabid=3765

Source for COVID-19 modeling data

Semantic search engine on Covid-19 literature-  Yale Advanced Covid-19 Research group

https://p2med.shinyapps.io/Cov19-Monkey

Misinformation spread in Twitter - dataset: A large-scale COVID-19 Twitter chatter dataset for open scientific research - an international collaboration.

https://doi.org/10.5281/zenodo.3723939

Statistics and visualizations: http://www.panacealab.org/covid19

https://github.com/thepanacealab/covid19_twitter

The COVID-19 Public Datasets on BigQuery

https://cloud.google.com/blog/products/data-analytics/free-public-datasets-for-covid19

NYC Health and Mental Hygiene

https://github.com/nychealth/coronavirus-data

COVID-19 Symptom Survey -Facebook Data for Good

https://dataforgood.fb.com/docs/covid-19-symptom-survey-request-for-data-access/

The COVID-19 symptom surveys

https://covid19researchdatabase.org/

CDC's work using the social vulnerability index

https://svi.cdc.gov

UK: Coronavirus (COVID-19) - Office for National Statistics

https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases

Safegraphs - foot traffic data, credit card transaction, and social distancing metrics data

https://www.safegraph.com/covid-19-data-consortium

https://docs.safegraph.com/docs/social-distancing-metrics

COVID deaths by race/ethnicity:

https://www.cdc.gov/nchs/nvss/vsrr/covid_weekly/index.htm

The COVID Tracking Project

https://covidtracking.com/race

Census data county level summaries

https://github.com/JieYingWu/COVID-19_US_County-level_Summaries

COVID-19 Open Research Dataset (CORD-19):

https://pages.semanticscholar.org/coronavirus-research

2019 Novel Coronavirus COVID-19 (2019-nCoV) Epidemiological Data Repository by Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE):

https://github.com/CSSEGISandData/COVID-19

European Center for Disease Control and Prevention (ECDC) - Epidemiological Data

https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distributioncovid-19-cases-worldwide

https://globalepidemics.org/our-data/hospital-capacity/

Italy COVID-19 Data

https://github.com/pcm-dpc/COVID-19

WHO Data - COVID-19 Cases & Deaths in China (by Province) and other countries:

https://data.humdata.org/dataset/coronavirus-covid-19-cases-data-for-china-and-the-rest-of-theworld

ACAPS COVID-19: Government Measures Dataset:

https://data.humdata.org/dataset/acaps-covid19-government-measures-dataset

World Bank Indicators of Interest - regarding population health and healthcare systems worldwide, relevant to the COVID-19 Outbreak:

https://data.humdata.org/dataset/world-bank-indicators-of-interest-to-the-covid-19-outbreak

GeneBank COVID-19 Genetic Sequences:

https://www.ncbi.nlm.nih.gov/genbank/sars-cov-2-seqs

Next Strain - COVID-19 Genomics Database: https://nextstrain.org/ncov

https://aws.amazon.com/marketplace/pp/prodview-a2ev4blctqkwc?qid=1585087643935

U.S. State-Level and County-Level COVID-19 Count Data (cases and deaths):

https://github.com/nytimes/covid-19-data

U.S. State-Specific Projections for Hospital Resource Utilization:

http://www.healthdata.org/covid/

COVID-19 Twitter Datasets

#1 - http://www.panacealab.org/covid19/

#2 - https://ieee-dataport.org/open-access/corona-virus-covid-19-tweets-dataset

#3 - https://github.com/echen102/COVID-19-TweetIDs

UnaCast Social Distancing Scoreboard:

https://www.unacast.com/covid19/social-distancing-scoreboard

Collection of COVID-19 Data APIs (variety of data sources):

https://covid-19-apis.postman.com

New York City COVID-19 Dataset:

https://www1.nyc.gov/site/doh/covid/covid-19-data.page

Definitive Healthcare U.S. hospital capacity data (number of beds, ICU beds, ventilator

capacity by state/county).

Version 1 (GitHub): https://github.com/rsowers-dhc/covid19

Version 2 (AWS):

https://aws.amazon.com/marketplace/pp/USA-Hospital-Beds-COVID-19-Definitive-Healthcare/prodview-yivxd2owkloha

Amazon Web Services (AWS) Data Lake with Public COVID-19 Datasets - includes

several datasets on this list which are stored, updated, and ready-for-analysis on AWS:

https://aws.amazon.com/blogs/big-data/a-public-data-lake-for-analysis-of-covid-19-data/

COVID-19-TweetIDs

https://github.com/echen102/COVID-19-TweetIDs

Government responses to the pandemic

https://github.com/saudiwin/corona_tscs

COVID exposure indices

https://github.com/COVIDExposureIndices/COVIDExposureIndices

Nielsen News Exposure data = Data on household news consumption, disaggregated geographically             

https://www.nielsen.com/us/en/insights/report/2018/q1-2018-total-audience-report/

Weekly Opinion Tracking - Opinion on how the US is handling COVID-19 -- weekly tracking poll

https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/XJLZIN&version=1.0

Covid countryinfo - Relevant variables to predict COVID's progression.

https://www.kaggle.com/koryto/countryinfo

Individual case details, from Hong Kong, Singapore, South Korea, and Philippines

https://www.dolthub.com/repositories/Liquidata/corona-virus/data/master/case_details

Prediction market estimates for covid impact (deaths, etc')     

https://goodjudgment.io/covid/dashboard

COVID-19 in South Korea - detailed        

https://www.kaggle.com/kimjihoo/coronavirusdataset

Containment and mitigation measures for COVID-19

http://epidemicforecasting.org/containment

Impact based on Measures of national significance such as Social distancing, Movement restrictions, Public health measures, Social and economic measures, and Lockdowns

http://epidemicforecasting.org/containment

COVID-19 Community Mobility Reports

https://www.google.com/covid19/mobility/

NHS data sets   

https://digital.nhs.uk/data-and-information/publications/clinical-indicators/ccg-outcomes-indicator-set/current#data-sets

ICU beds per county/state          

https://www.kaggle.com/jaimeblasco/icu-beds-by-county-in-the-us

Doctors and nurses per capita for 40 countries

https://www.kaggle.com/antgoldbloom/doctors-and-nurses-per-1000-people-by-country

Hospital beds (per 1,000 people) - WHO

https://data.worldbank.org/indicator/sh.med.beds.zs

WHO - Immunization coverage estimates by country immunization coverage among 1-year-olds (%) - a good estimate for health system performance

https://apps.who.int/gho/data/node.main.A824?lang=en

Weather temperature - Global Surface Summary of the Day - GSOD NOAA  https://www.kaggle.com/noaa/noaa-global-surface-summary-of-the-day

https://data.nodc.noaa.gov/cgi-bin/iso?id=gov.noaa.ncdc:C00516

COVID19 India Complete Data - COVID19India.org

https://www.covid19india.org/

covidcaremap - US Hospital Facility Bed Capacity Map.

https://www.covidcaremap.org/maps/us-healthcare-system-capacity/#3.5/38/-96

Canada COVID relevant healthcare data

https://www.cihi.ca/en/covid-19-resources

Project COVIEWED Coronavirus News Corpus - A corpus of news articles submitted to /r/Coronavirus subreddit

https://www.kaggle.com/trtmio/project-coviewed-subreddit-coronavirus-news-corpus

Facebook Data for Good: CrowdTangle COVID-19 Live Displays

https://apps.crowdtangle.com/public-hub/covid19

#NLP #BERT #datascience #mit #covid19 #covid19solutions #mitcovid19Datathon #datathon#covid19

Share