# Covid-19 Public Data Collaboration Project This project aggregates data from various public data sources to better understand the spread and effect of covid-19. The goal is to provide a central place where data, analysis, and discussion can be conducted and shared by a global community struggling to make sense of the current public health emergency. See the [dashboard](covid-19-public-data/files/blob/runs/Dashboard.run.ipynb) for a summary of the global data. ## Getting started and working with the project The simplest way to start is to make an account or logging in and forking the project. Then, feel free to [start an interactive environment](https://renkulab.io/projects/covid-19/covid-19-public-data/environments/new) and use the hosted JupyterLab or RStudio to explore the data. A summary of the data is given below. Please please please consider contributing back cool results from your fork! If you don't know how or just need help with some of the git-heavy aspects of this, shoot us a line [on Discourse](https://renku.discourse.group) or [open an issue](https://renkulab.io/projects/covid-19/covid-19-public-data/collaboration/issues) and someone will be able to help out. The environment image allows you to work in Python or R in JupyterLab or RStudio/Shiny. ### Working with the data A summary of the datasets available in this project is in the table below. In order to work more efficiently with the data, we have implemented a set of "converters" to standardize the various datasets to a subset of useful fields. Each converter is aware of the details of each dataset and produces a view of the dataset that is homogenized with the others. In this way, we are able to visualize with simple commands data of very different origins using very simple procedures. For example, to work with the JHU-CSSE country-level data as well as the more detailed dataset on Spain: ```python from covid_19_utils.converters import CaseConverter converter = CaseConverter('./data/atlas') jhu_df = converter.read_convert('./data/covid-19_jhu-csse') spain_df = converter.read_convert('./data/covid-19-spain') ``` The resulting DataFrames have exactly the same structure so they can be used interchangably in any analysis or plotting code. See the [standardization notebook]('https://renkulab.io/projects/covid-19/covid-19-public-data/files/blob/notebooks/process/standardize_datasets.ipynb') for a more complete example. ### Updating your branch or fork The data in the main master branch of this project is updated daily - how can you keep your fork or branch up-to-date? We recommend that you do not make changes to the files and directories that are automatically updated so as to avoid merge conflicts as much as possible. This includes the datasets in the `data/` directory and the notebooks in `notebooks/` and `runs/`. Especially for notebooks, the easiest way to avoid conflicts would be to simply make a new directory where you put your work. When you are ready to pull in changes from master, you can do the following from a terminal, when working on your branch or fork: ``` git remote add upstream https://renkulab.io/gitlab/covid-19/covid-19-public-data.git git fetch upstream git merge upstream/master ``` This will sync your branch or fork with the latest changes from the master branch of the parent repository. ## Dataset Summary <table class="table"> <thead> <tr> <th>Source</th> <th>Dataset</th> <th>Location</th> <th>Example</th> </tr> </thead> <tbody> <tr> <td><a href="https://github.com/CSSEGISandData/COVID-19">Covid-19 Data Repository at JHU CSSE</a></td> <td><a href="https://renkulab.io/projects/covid-19/covid-19-public-data/datasets/f6726a5b-f973-45d5-b873-30fa0dff772f/">covid-19_jhu-csse</a></td> <td><code>data/covid-19_jhu-csse</code></td> <td><a href="https://renkulab.io/projects/covid-19/covid-19-public-data/files/blob/runs/Dashboard.run.ipynb">dashboard</a></td> </tr> <tr> <td><a href="https://covidtracking.com/">covidtracking.com</a></td> <td><a href="https://renkulab.io/projects/covid-19/covid-19-public-data/datasets/c8bec148-5332-4602-9dc3-e39bbe92ed67/">covidtracking</a></td> <td><code>data/covidtracking</code></td> <td><a href="https://renkulab.io/projects/covid-19/covid-19-public-data/files/blob/runs/covidtracking-dashboard.ipynb">notebook</a></td> </tr> <tr> <td><a href="https://github.com/nytimes/covid-19-data">New York Times Covid-19 Data</a></td> <td><a href="https://renkulab.io/projects/covid-19/covid-19-public-data/datasets/dcac07eb-4c9c-40c5-b541-5072c8302750/">covid-19-us-nyt</a></td> <td><code>data/covid-19-us-nyt</code></td> <td>N/A</td> </tr> <tr> <td><a href="https://github.com/openZH/covid_19">Swiss Cantonal Data</a></td> <td><a href="https://renkulab.io/projects/covid-19/covid-19-public-data/datasets/c9295d7a-0380-4a1b-8731-5c36d76cb8e7/">openzh-covid-19</a></td> <td><code>data/openzh-covid-19</code></td> <td><a href="https://renkulab.io/projects/covid-19/covid-19-public-data/files/blob/runs/openzh-covid-19-dashboard.run.ipynb">notebook</a></td> </tr> <tr> <td><a href="https://github.com/pcm-dpc/COVID-19">Covid-19 data for Italy</a></td> <td><a href="https://renkulab.io/projects/covid-19/covid-19-public-data/datasets/286c58b1-dbbc-4caa-a23a-fcb001d5ac51/">covid-19-italy</a></td> <td><code>data/covid-19-italy</code></td> <td><a href="https://renkulab.io/projects/covid-19/covid-19-public-data/files/blob/runs/italy-covid-19.ipynb">notebook</a> </td> </tr> <tr> <td><a href="https://github.com/itoledor/coronavirus.git">Covid-19 data for Chile</a></td> <td><a href="https://renkulab.io/projects/covid-19/covid-19-public-data/datasets/e7bc5616-1e7c-44a9-995f-bce3cba304b5/">covid-19-chile</a></td> <td><code>data/covid-19-chile</code></td> <td><a href="https://renkulab.io/projects/covid-19/covid-19-public-data/files/blob/notebooks/examples-R/covid19-chile.ipynb">notebook</a></td> </tr> <tr> <td><a href="https://github.com/datadista/datasets.git">Covid-19 data for Spain</a></td> <td><a href="https://renkulab.io/projects/covid-19/covid-19-public-data/datasets/4de0e2e6-c748-4aaf-a2ac-4a3fb0257ed1/">covid-19-spain</a></td> <td><code>data/covid-19-spain></code></td> <td>N/A</td> </tr> <tr> <td><a href="https://github.com/echen102/COVID-19-TweetIDs">Covid-19 tweet IDs</a></td> <td><a href="https://renkulab.io/projects/covid-19/covid-19-public-data/datasets/0fc08252-cb39-4b59-bc82-9b213ec0bec6/">covid-19-tweet-ids</a></td> <td><code>data/covid-19-tweet-ids</code></td> <td>N/A</td> </tr> </tbody> </table> ### Covid-19 Data Repository JHU CSSE This is a global Covid-19 dataset updated regularly from [Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE)](https://github.com/CSSEGISandData/COVID-19). The [dashboard](covid-19-public-data/files/blob/runs/Dashboard.run.ipynb) summarizes this data in combination with population data from the world bank. ### Covid tracking crowdsourcing project [Covid tracking](https://covidtracking.com) is a crowd-sourced dataset for US state-level data. It is updated by hand by an army of volunteers. ### New York Times Covid-19 Dataset The [New York Times Covid-19 Dataset](https://github.com/nytimes/covid-19-data) provides open access to data about the covid-19 cases and deaths per U.S. state and county. ### Covid-19 Data for Swiss Cantons The [swiss cantonal data](https://github.com/openZH/covid_19) collected by the Zürich statistical office. Parts are updated manually, others are starting to become automated. ### Case data for Italy Detailed data compiled by the [Civil Protection of Italy](https://github.com/pcm-dpc/COVID-19). ### Covid-19 related tweet IDs A collection of tweet-ids related to covid-19 from https://github.com/echen102/COVID-19-TweetIDs. ### General - https://data.worldbank.org/indicator/SP.POP.TOTL - https://worldmap.harvard.edu/data/geonode:country_centroids_az8 ## Derived Dataset Summary <table class="table"> <thead> <tr> <th>Dataset</th> <th>Location</th> <th>Code</th> </tr> </thead> <tbody> <tr> <td>Case population rates</td> <td><code>data/covid-19_rates</code></td> <td><a href="https://renkulab.io/projects/covid-19/covid-19-public-data/files/blob/notebooks/process/ToRates.ipynb">notebooks/process/ToRates.ipynb</a></td> </tr> </tbody> </table> ## Contributing If you are interested in working on this project, we would love to get contributions. We would really like to collect more data sources and make them available here! Please provide ideas for data sources that are relevant to understanding covid-19. If you want to add a new datasource yourself, see the section [Adding a new data source](#adding-a-new-data-source) ## Data Sources to Add See the [data sources issue](https://renkulab.io/projects/covid-19/covid-19-public-data/collaboration/issues/1/). ## Adding a new data source Adding a new data source is easy! To do so, in your fork or branch of the project, do the following: * Create a renku dataset using `renku dataset create [dataset name]` * Add any files or folders using `renku dataset add`. [Looking in the commit history will provide some examples](https://renkulab.io/gitlab/covid-19/covid-19-public-data/commits/master). * Create a notebook that shows how to read and work with the dataset in the `notebooks/examples` folder * Protip: use a unique name for the notebook to avoid merge conflicts * Add an issue to the project for any suggestions on things to do with the data