Database for Researchers and Thinking Teams: Fight and Win the Corona Challenge
All databases in one central place!
Since the Corona epidemic broke into the world, many research institutes and governments have published many databases containing vital information and data for the general public to enable research groups and reasoning teams to analyze the data and generate insights that will help create valuable value that can help develop future vaccines and creating a more effective coping routine with similar epidemics in the future.
For the reason that there is considerable difficulty in keeping track of all the databases that deal with the Corona virus, the necessity of this article was born in which I gathered all the relevant databases in one central place. You are welcome to share this article with your friends, I update the article on a daily basis and add new initiatives and new databases, so it is recommended to come back here often.
The main purpose of this blog post is to organize in an orderly and efficient manner all the databases distributed across the network, including initiatives by leading research organizations around the world as well as to make available to the scientific and technological research community all databases freely available for open and collaborative work.
If you know of another important repository, do not hesitate to add it in the comments or using this form.
Datasets & Data Challenges:
In response to the COVID-19 pandemic, the White House and a coalition of leading research groups have prepared the COVID-19 Open Research Dataset (CORD-19). CORD-19 is a resource of over 47,000 scholarly articles, including over 36,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses. This freely available dataset is provided to the global research community to apply recent advances in natural language processing and other AI techniques to generate new insights in support of the ongoing fight against this infectious disease. There is a growing urgency for these approaches because of the rapid acceleration in new coronavirus literature, making it difficult for the medical research community to keep up.
The White House Office of Science and Technology Policy (OSTP) pulled together a coalition research groups and companies (including Kaggle) to prepare the COVID-19 Open Research Dataset (CORD-19) to attempt to address key open scientific questions on COVID-19. Those questions are drawn from National Academies of Sciences, Engineering, and Medicine’s (NASEM) and the World Health Organization (WHO).
Governments are taking a wide range of measures in response to the COVID-19 outbreak. The Oxford COVID-19 Government Response Tracker (OxCGRT) aims to track and compare government responses to the coronavirus outbreak worldwide rigorously and consistently.
Systematic information on which governments have taken which measures, and when, can help decision-makers and citizens understand the robustness of governmental responses in a consistent way, aiding efforts to fight the pandemic. The OxCGRT systematically collects information on several different common policy responses governments have taken, scores the stringency of such measures, and aggregates these scores into a common Stringency Index.
The data is collected from public sources by a team of dozens of Oxford University students and staff from every part of the world.
OxCGRT collects publicly available information on 11 indicators of government response (S1-S11). The first seven indicators (S1-S7) take policies such as school closures, travel bans, etc. are recorded on an ordinal scale; the remainder (S8-S11) are financial indicators such as fiscal or monetary measures. For a full description of the data and how it is collected, see this working paper.
From World Health Organization — On 31 December 2019, WHO was alerted to several cases of pneumonia in Wuhan City, Hubei Province of China. The virus did not match any other known virus. This raised concern because when a virus is new, we do not know how it affects people.
So daily level information on the affected people can give some interesting insights when it is made available to the broader data science community.
Johns Hopkins University has made an excellent dashboard using the affected cases data. Data is extracted from the google sheets associated and made available here.
The MIDAS Coordination Center released an online portal for COVID-19 modeling research. The portal improves navigation and search of COVID-19 information. Moving forward we will use the online portal as landing page for COVID-19 data and information and the COVID-19 GitHub repository for sharing of computable (CSV) files with data, parameter estimates, software, and metadata. All community contribution functionality of this repository will be maintained, so continue to send pull requests or issues for questions or contributions!
The COVID-19 pandemic continues to have a devastating effect on the health and well-being of the global population. A critical step in the fight against COVID-19 is effective screening of infected patients, with one of the key screening approaches being radiological imaging using chest radiography. It was found in early studies that patients present abnormalities in chest radiography images that are characteristic of those infected with COVID-19. Motivated by this, a number of artificial intelligence (AI) systems based on deep learning have been proposed and results have been shown to be quite promising in terms of accuracy in detecting patients infected with COVID-19 using chest radiography images. However, to the best of the authors’ knowledge, these developed AI systems have been closed source and unavailable to the research community for deeper understanding and extension, and unavailable for public access and use. Therefore, in this study we introduce COVID-Net, a deep convolutional neural network design tailored for the detection of COVID-19 cases from chest radiography images that is open source and available to the general public. We also describe the chest radiography dataset leveraged to train COVID-Net, which we will refer to as COVIDx and is comprised of 16,756 chest radiography images across 13,645 patient cases from two open access data repositories. Furthermore, we investigate how COVID-Net makes predictions using an explainability method in an attempt to gain deeper insights into critical factors associated with COVID cases, which can aid clinicians in improved screening. By no means a production-ready solution, the hope is that the open access COVID-Net, along with the description on constructing the open source COVIDx dataset, will be leveraged and build upon by both researchers and citizen data scientists alike to accelerate the development of highly accurate yet practical deep learning solutions for detecting COVID-19 cases and accelerate treatment of those who need it the most.
Note: The COVID-Net models provided here are intended to be used as reference models that can be built upon and enhanced as new data becomes available. They are currently at a research stage and not yet intended as production-ready models (not meant for direct clinicial diagnosis), and we are working continuously to improve them as new data becomes available. Please do not use COVID-Net for self-diagnosis and seek help from your local health authorities.
This is the data repository for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). Also, Supported by ESRI Living Atlas Team and the Johns Hopkins University Applied Physics Lab (JHU APL).
Countries around the globe share an increasing number of hCoV-19 genome sequences.
Laboratories around the world are generating in an unprecedented manner, more and more genome sequences and related clinical and epidemiological data associated with the newly emerging coronavirus (hCoV-19) rapidly made available via GISAID. The pandemic virus was first identified in late December 2019 in Hubei Province, where patients were suffering from respiratory illnesses such as pneumonia. Since then, hCoV-19 is detected across the globe.
The genome sequences of hCoV-19 are crucial to design and evaluate diagnostic tests, to track and trace the ongoing outbreak and to identify potential intervention options.
The dataset contains the latest available public data on COVID-19 including a daily situation update, the epidemiological curve and the global geographical distribution (EU/EEA and the UK, worldwide). On 12 February 2020, the novel coronavirus was named severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) while the disease associated with it is now referred to as COVID-19. ECDC is closely monitoring this outbreak and providing risk assessments to guide EU Member States and the EU Commission in their response activities.
We are facing an unprecedented public health crisis with the Coronavirus (Covid-19) outbreak. We believe that data-driven decisions, and people working together for the greater good, are the best way through this difficult time. Right now, it’s more important than ever to have the resources to answer critical questions that matter to your organization and people. This includes having access to timely, detailed, and trustworthy data to think quickly and move fast. We have gathered the power of our Tableau Community and our technology to create a free Covid19 Data Resource Hub to help you make confident decisions with data.
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
Courses, Visualization & More:
This project class investigates and models COVID-19 using tools from data science and machine learning. We will introduce the relevant background for the biology and epidemiology of the COVID-19 virus. Then we will critically examine current models that are used to predict infection rates in the population as well as models used to support various public health interventions (e.g. herd immunity and social distancing). The core of this class will be projects aimed to create tools that can assist in the ongoing global health efforts. Potential projects include data visualization and education platforms, improved modeling and predictions, social network and NLP analysis of the propagation of COVID-19 information, and tools to facilitate good health behavior, etc. The class is aimed toward students with experience in data science and AI, and will include guest lectures by biomedical experts.
- Class participation (20%)
- Scribing lectures (10%)
- Course project (70%)
- Background in machine learning and statistics (CS229, STATS216 or equivalent).
- Some biological background is helpful but not required.
Note from the editor-in-chief of the article: Towards Data Science is a medium publication based primarily on data science research and machine learning. For the reason that I am not a healthcare professional or an epidemiology researcher, my words in the context of this article should not be interpreted as professional advice, but only as a professional guide incorporating databases for use by the scientific and technological research community. For more extended information about the Corona virus, you can click here.