Last week, I had the opportunity to participate as a mentor in the MIT COVID-19 Datathon working on coronavirus infodemic to identify myths, misinformation, and fake news associated with the pandemic. Out of 2,750 participants and 412 teams, our team qualified to be the top 10 semi-finalists.
The MIT COVID-19 Datathon is a week-long virtual event where teams of data scientists, clinicians, public health professionals and other subject matter experts come together to develop meaningful insights leveraging existing datasets to influence policy and decision making in the public and private sector.
There were 5 tracks for this Datathon namely
Measuring the Impact of Policies around COVID-19
Misinformation during the Pandemic
Disparities in Health Outcomes from COVID-19
Epidemiology of COVID-19
Megacity Pandemic Response in NYC
Our team's work was focused on the track Misinformation during the Pandemic such as fake news propaganda, click bait or incorrectly reported news.
Some of the resources from the hackathon are listed below. These are helpful for researchers, academics, and practitioners. Courtesy of MIT datathon participants, organizers, and mentors. Huge thanks for the organizers Leo Anthony Celi, Freddy Nguyen, MD, PhD and all others.
Resources & References
Facebook COVID misinformation
https://ai.facebook.com/blog/using-ai-to-detect-covid-19-misinformation-and-exploitative-content
Daily summaries of the latest COVID-19 literature and research
Coronavirus Tech Handbook
https://coronavirustechhandbook.com/contents
Federal COVID-19 Legislative Tracker
https://coronavirus.skoposlabs.com/#skopos
COVID-19 Critical Care Planning Resources Library
COVID-19: living map of the evidence
http://eppi.ioe.ac.uk/cms/Default.aspx?tabid=3765
Source for COVID-19 modeling data
Semantic search engine on Covid-19 literature- Yale Advanced Covid-19 Research group
https://p2med.shinyapps.io/Cov19-Monkey
Misinformation spread in Twitter - dataset: A large-scale COVID-19 Twitter chatter dataset for open scientific research - an international collaboration.
https://doi.org/10.5281/zenodo.3723939
Statistics and visualizations: http://www.panacealab.org/covid19
https://github.com/thepanacealab/covid19_twitter
The COVID-19 Public Datasets on BigQuery
https://cloud.google.com/blog/products/data-analytics/free-public-datasets-for-covid19
NYC Health and Mental Hygiene
https://github.com/nychealth/coronavirus-data
COVID-19 Symptom Survey -Facebook Data for Good
The COVID-19 symptom surveys
CDC's work using the social vulnerability index
UK: Coronavirus (COVID-19) - Office for National Statistics
https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases
Safegraphs - foot traffic data, credit card transaction, and social distancing metrics data
https://www.safegraph.com/covid-19-data-consortium
https://docs.safegraph.com/docs/social-distancing-metrics
COVID deaths by race/ethnicity:
https://www.cdc.gov/nchs/nvss/vsrr/covid_weekly/index.htm
The COVID Tracking Project
https://covidtracking.com/race
Census data county level summaries
https://github.com/JieYingWu/COVID-19_US_County-level_Summaries
COVID-19 Open Research Dataset (CORD-19):
https://pages.semanticscholar.org/coronavirus-research
2019 Novel Coronavirus COVID-19 (2019-nCoV) Epidemiological Data Repository by Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE):
https://github.com/CSSEGISandData/COVID-19
European Center for Disease Control and Prevention (ECDC) - Epidemiological Data
Italy COVID-19 Data
https://github.com/pcm-dpc/COVID-19
WHO Data - COVID-19 Cases & Deaths in China (by Province) and other countries:
https://data.humdata.org/dataset/coronavirus-covid-19-cases-data-for-china-and-the-rest-of-theworld
ACAPS COVID-19: Government Measures Dataset:
https://data.humdata.org/dataset/acaps-covid19-government-measures-dataset
World Bank Indicators of Interest - regarding population health and healthcare systems worldwide, relevant to the COVID-19 Outbreak:
https://data.humdata.org/dataset/world-bank-indicators-of-interest-to-the-covid-19-outbreak
GeneBank COVID-19 Genetic Sequences:
https://www.ncbi.nlm.nih.gov/genbank/sars-cov-2-seqs
Next Strain - COVID-19 Genomics Database: https://nextstrain.org/ncov
https://aws.amazon.com/marketplace/pp/prodview-a2ev4blctqkwc?qid=1585087643935
U.S. State-Level and County-Level COVID-19 Count Data (cases and deaths):
https://github.com/nytimes/covid-19-data
U.S. State-Specific Projections for Hospital Resource Utilization:
http://www.healthdata.org/covid/
COVID-19 Twitter Datasets
#1 - http://www.panacealab.org/covid19/
#2 - https://ieee-dataport.org/open-access/corona-virus-covid-19-tweets-dataset
#3 - https://github.com/echen102/COVID-19-TweetIDs
UnaCast Social Distancing Scoreboard:
https://www.unacast.com/covid19/social-distancing-scoreboard
Collection of COVID-19 Data APIs (variety of data sources):
https://covid-19-apis.postman.com
New York City COVID-19 Dataset:
https://www1.nyc.gov/site/doh/covid/covid-19-data.page
Definitive Healthcare U.S. hospital capacity data (number of beds, ICU beds, ventilator
capacity by state/county).
Version 1 (GitHub): https://github.com/rsowers-dhc/covid19
Version 2 (AWS):
Amazon Web Services (AWS) Data Lake with Public COVID-19 Datasets - includes
several datasets on this list which are stored, updated, and ready-for-analysis on AWS:
https://aws.amazon.com/blogs/big-data/a-public-data-lake-for-analysis-of-covid-19-data/
COVID-19-TweetIDs
https://github.com/echen102/COVID-19-TweetIDs
Government responses to the pandemic
https://github.com/saudiwin/corona_tscs
COVID exposure indices
https://github.com/COVIDExposureIndices/COVIDExposureIndices
Nielsen News Exposure data = Data on household news consumption, disaggregated geographically
https://www.nielsen.com/us/en/insights/report/2018/q1-2018-total-audience-report/
Weekly Opinion Tracking - Opinion on how the US is handling COVID-19 -- weekly tracking poll
https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/XJLZIN&version=1.0
Covid countryinfo - Relevant variables to predict COVID's progression.
https://www.kaggle.com/koryto/countryinfo
Individual case details, from Hong Kong, Singapore, South Korea, and Philippines
https://www.dolthub.com/repositories/Liquidata/corona-virus/data/master/case_details
Prediction market estimates for covid impact (deaths, etc')
https://goodjudgment.io/covid/dashboard
COVID-19 in South Korea - detailed
https://www.kaggle.com/kimjihoo/coronavirusdataset
Containment and mitigation measures for COVID-19
http://epidemicforecasting.org/containment
Impact based on Measures of national significance such as Social distancing, Movement restrictions, Public health measures, Social and economic measures, and Lockdowns
http://epidemicforecasting.org/containment
COVID-19 Community Mobility Reports
https://www.google.com/covid19/mobility/
NHS data sets
ICU beds per county/state
https://www.kaggle.com/jaimeblasco/icu-beds-by-county-in-the-us
Doctors and nurses per capita for 40 countries
https://www.kaggle.com/antgoldbloom/doctors-and-nurses-per-1000-people-by-country
Hospital beds (per 1,000 people) - WHO
https://data.worldbank.org/indicator/sh.med.beds.zs
WHO - Immunization coverage estimates by country immunization coverage among 1-year-olds (%) - a good estimate for health system performance
https://apps.who.int/gho/data/node.main.A824?lang=en
Weather temperature - Global Surface Summary of the Day - GSOD NOAA https://www.kaggle.com/noaa/noaa-global-surface-summary-of-the-day
https://data.nodc.noaa.gov/cgi-bin/iso?id=gov.noaa.ncdc:C00516
COVID19 India Complete Data - COVID19India.org
covidcaremap - US Hospital Facility Bed Capacity Map.
https://www.covidcaremap.org/maps/us-healthcare-system-capacity/#3.5/38/-96
Canada COVID relevant healthcare data
https://www.cihi.ca/en/covid-19-resources
Project COVIEWED Coronavirus News Corpus - A corpus of news articles submitted to /r/Coronavirus subreddit
https://www.kaggle.com/trtmio/project-coviewed-subreddit-coronavirus-news-corpus
Facebook Data for Good: CrowdTangle COVID-19 Live Displays
https://apps.crowdtangle.com/public-hub/covid19
#NLP #BERT #datascience #mit #covid19 #covid19solutions #mitcovid19Datathon #datathon#covid19