Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
covid-19-public-data
Manage
Activity
Members
Labels
Plan
Issues
0
Issue boards
Milestones
Wiki
Code
Merge requests
0
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container Registry
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Florian Van Horenbeke
covid-19-public-data
Commits
71e50792
Commit
71e50792
authored
5 years ago
by
Chandrasekhar Ramakrishnan
Committed by
renku 0.9.1
5 years ago
Browse files
Options
Downloads
Patches
Plain Diff
renku update
parent
6021fe80
No related branches found
No related tags found
No related merge requests found
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
.renku/workflow/928149d7fe564cd7a8860b772d1967cc.cwl
+125
-0
125 additions, 0 deletions
.renku/workflow/928149d7fe564cd7a8860b772d1967cc.cwl
runs/CompileGeoData.run.ipynb
+63
-63
63 additions, 63 deletions
runs/CompileGeoData.run.ipynb
runs/Dashboard.run.ipynb
+193
-83
193 additions, 83 deletions
runs/Dashboard.run.ipynb
with
381 additions
and
146 deletions
.renku/workflow/928149d7fe564cd7a8860b772d1967cc.cwl
0 → 100644
+
125
−
0
View file @
71e50792
class: Workflow
cwlVersion: v1.0
hints: []
inputs:
input_1:
default: ts_folder
streamable: false
type: string
input_10:
default:
class: Directory
listing: []
path: ../../data/covid-19_jhu-csse
streamable: false
type: Directory
input_11:
default: worldmap_path
streamable: false
type: string
input_12:
default:
class: File
path: ../../data/worldmap/country_centroids.csv
streamable: false
type: File
input_13:
default: out_folder
streamable: false
type: string
input_14:
default:
class: Directory
listing: []
path: ../../data/geodata
streamable: false
type: Directory
input_15:
default:
class: File
path: ../../notebooks/process/CompileGeoData.ipynb
streamable: false
type: File
input_16:
default: runs/CompileGeoData.run.ipynb
streamable: false
type: string
input_2:
default:
class: Directory
listing: []
path: ../../data/covid-19_jhu-csse
streamable: false
type: Directory
input_3:
default: rates_folder
streamable: false
type: string
input_4:
default:
class: Directory
listing: []
path: ../../data/covid-19_rates
streamable: false
type: Directory
input_5:
default: geodata_path
streamable: false
type: string
input_6:
default:
class: File
path: ../../data/geodata/geo_data.csv
streamable: false
type: File
input_7:
default:
class: File
path: ../../notebooks/Dashboard.ipynb
streamable: false
type: File
input_8:
default: runs/Dashboard.run.ipynb
streamable: false
type: string
input_9:
default: ts_folder
streamable: false
type: string
outputs:
output_0:
outputSource: step_1/output_0
streamable: false
type: File
output_1:
outputSource: step_2/output_0
streamable: false
type: File
requirements: []
steps:
step_1:
in:
input_1: input_1
input_2: input_2
input_3: input_3
input_4: input_4
input_5: input_5
input_6: input_6
input_7: input_7
input_8: input_8
out:
- output_0
run: 5ae9a9961e194e7795df04a9722452e8_papermill.cwl
step_2:
in:
input_1: input_9
input_2: input_10
input_3: input_11
input_4: input_12
input_5: input_13
input_6: input_14
input_7: input_15
input_8: input_16
out:
- output_0
run: a8b2f47629164158a118963ae58eea3b_papermill.cwl
This diff is collapsed.
Click to expand it.
runs/CompileGeoData.run.ipynb
+
63
−
63
View file @
71e50792
...
...
@@ -4,10 +4,10 @@
"cell_type": "markdown",
"metadata": {
"papermill": {
"duration": 0.0
19915
,
"end_time": "2020-03-1
8
T1
7
:3
9:45.438621
",
"duration": 0.0
23622
,
"end_time": "2020-03-1
9
T1
5
:3
1:56.889647
",
"exception": false,
"start_time": "2020-03-1
8
T1
7
:3
9:45.418706
",
"start_time": "2020-03-1
9
T1
5
:3
1:56.866025
",
"status": "completed"
},
"tags": []
...
...
@@ -23,10 +23,10 @@
"execution_count": 1,
"metadata": {
"papermill": {
"duration": 0.3
56977
,
"end_time": "2020-03-1
8
T1
7
:3
9:45.806137
",
"duration": 0.3
46428
,
"end_time": "2020-03-1
9
T1
5
:3
1:57.247558
",
"exception": false,
"start_time": "2020-03-1
8
T1
7
:3
9:45.44916
0",
"start_time": "2020-03-1
9
T1
5
:3
1:56.90113
0",
"status": "completed"
},
"tags": []
...
...
@@ -42,10 +42,10 @@
"execution_count": 2,
"metadata": {
"papermill": {
"duration": 0.0
20489
,
"end_time": "2020-03-1
8
T1
7
:3
9:45.839942
",
"duration": 0.0
19927
,
"end_time": "2020-03-1
9
T1
5
:3
1:57.285179
",
"exception": false,
"start_time": "2020-03-1
8
T1
7
:3
9:45.819453
",
"start_time": "2020-03-1
9
T1
5
:3
1:57.265252
",
"status": "completed"
},
"tags": []
...
...
@@ -62,10 +62,10 @@
"cell_type": "markdown",
"metadata": {
"papermill": {
"duration": 0.0
10294
,
"end_time": "2020-03-1
8
T1
7
:3
9:45.862217
",
"duration": 0.0
07207
,
"end_time": "2020-03-1
9
T1
5
:3
1:57.302610
",
"exception": false,
"start_time": "2020-03-1
8
T1
7
:3
9:45.85192
3",
"start_time": "2020-03-1
9
T1
5
:3
1:57.29540
3",
"status": "completed"
},
"tags": [
...
...
@@ -81,10 +81,10 @@
"execution_count": 3,
"metadata": {
"papermill": {
"duration": 0.02
3154
,
"end_time": "2020-03-1
8
T1
7
:3
9:45.909658
",
"duration": 0.02
2457
,
"end_time": "2020-03-1
9
T1
5
:3
1:57.332752
",
"exception": false,
"start_time": "2020-03-1
8
T1
7
:3
9:45.886504
",
"start_time": "2020-03-1
9
T1
5
:3
1:57.310295
",
"status": "completed"
},
"tags": [
...
...
@@ -94,11 +94,11 @@
"outputs": [],
"source": [
"# Parameters\n",
"PAPERMILL_INPUT_PATH = \"notebooks/process/CompileGeoData.ipynb\"\n",
"PAPERMILL_INPUT_PATH = \"
/tmp/ps76102a/
notebooks/process/CompileGeoData.ipynb\"\n",
"PAPERMILL_OUTPUT_PATH = \"runs/CompileGeoData.run.ipynb\"\n",
"ts_folder = \"
.
/data/covid-19_jhu-csse
/
\"\n",
"worldmap_path = \"
.
/data/worldmap/country_centroids.csv\"\n",
"out_folder = \"
.
/data/geodata
/
\"\n"
"ts_folder = \"
/tmp/ps76102a
/data/covid-19_jhu-csse\"\n",
"worldmap_path = \"
/tmp/ps76102a
/data/worldmap/country_centroids.csv\"\n",
"out_folder = \"
/tmp/ps76102a
/data/geodata\"\n"
]
},
{
...
...
@@ -106,10 +106,10 @@
"execution_count": 4,
"metadata": {
"papermill": {
"duration": 0.02
2631
,
"end_time": "2020-03-1
8
T1
7
:3
9:45.942649
",
"duration": 0.02
8453
,
"end_time": "2020-03-1
9
T1
5
:3
1:57.374528
",
"exception": false,
"start_time": "2020-03-1
8
T1
7
:3
9:45.920018
",
"start_time": "2020-03-1
9
T1
5
:3
1:57.346075
",
"status": "completed"
},
"tags": []
...
...
@@ -130,10 +130,10 @@
"execution_count": 5,
"metadata": {
"papermill": {
"duration": 0.0
5949
,
"end_time": "2020-03-1
8
T1
7
:3
9:46.014614
",
"duration": 0.0
81205
,
"end_time": "2020-03-1
9
T1
5
:3
1:57.469382
",
"exception": false,
"start_time": "2020-03-1
8
T1
7
:3
9:45.955124
",
"start_time": "2020-03-1
9
T1
5
:3
1:57.388177
",
"status": "completed"
},
"tags": []
...
...
@@ -147,10 +147,10 @@
"cell_type": "markdown",
"metadata": {
"papermill": {
"duration": 0.0
10077
,
"end_time": "2020-03-1
8
T1
7
:3
9:46.041211
",
"duration": 0.0
09849
,
"end_time": "2020-03-1
9
T1
5
:3
1:57.496810
",
"exception": false,
"start_time": "2020-03-1
8
T1
7
:3
9:46.031134
",
"start_time": "2020-03-1
9
T1
5
:3
1:57.486961
",
"status": "completed"
},
"tags": []
...
...
@@ -164,10 +164,10 @@
"execution_count": 6,
"metadata": {
"papermill": {
"duration": 0.0
43439
,
"end_time": "2020-03-1
8
T1
7
:3
9:46.094285
",
"duration": 0.0
66777
,
"end_time": "2020-03-1
9
T1
5
:3
1:57.571356
",
"exception": false,
"start_time": "2020-03-1
8
T1
7
:3
9:46.050846
",
"start_time": "2020-03-1
9
T1
5
:3
1:57.504579
",
"status": "completed"
},
"tags": []
...
...
@@ -184,10 +184,10 @@
"execution_count": 7,
"metadata": {
"papermill": {
"duration": 0.0
34514
,
"end_time": "2020-03-1
8
T1
7
:3
9:46.142439
",
"duration": 0.0
41217
,
"end_time": "2020-03-1
9
T1
5
:3
1:57.629637
",
"exception": false,
"start_time": "2020-03-1
8
T1
7
:3
9:46.107925
",
"start_time": "2020-03-1
9
T1
5
:3
1:57.588420
",
"status": "completed"
},
"tags": []
...
...
@@ -214,10 +214,10 @@
"cell_type": "markdown",
"metadata": {
"papermill": {
"duration": 0.00
9928
,
"end_time": "2020-03-1
8
T1
7
:3
9:46.166476
",
"duration": 0.00
8941
,
"end_time": "2020-03-1
9
T1
5
:3
1:57.654289
",
"exception": false,
"start_time": "2020-03-1
8
T1
7
:3
9:46.1565
48",
"start_time": "2020-03-1
9
T1
5
:3
1:57.6453
48",
"status": "completed"
},
"tags": []
...
...
@@ -231,10 +231,10 @@
"execution_count": 8,
"metadata": {
"papermill": {
"duration": 0.0
260
25,
"end_time": "2020-03-1
8
T1
7
:3
9:46.20246
5",
"duration": 0.0
314
25,
"end_time": "2020-03-1
9
T1
5
:3
1:57.69367
5",
"exception": false,
"start_time": "2020-03-1
8
T1
7
:3
9:46.17644
0",
"start_time": "2020-03-1
9
T1
5
:3
1:57.66225
0",
"status": "completed"
},
"tags": []
...
...
@@ -260,10 +260,10 @@
"execution_count": 9,
"metadata": {
"papermill": {
"duration": 0.01
8618
,
"end_time": "2020-03-1
8
T1
7
:3
9:46.232915
",
"duration": 0.01
7224
,
"end_time": "2020-03-1
9
T1
5
:3
1:57.725092
",
"exception": false,
"start_time": "2020-03-1
8
T1
7
:3
9:46.214297
",
"start_time": "2020-03-1
9
T1
5
:3
1:57.707868
",
"status": "completed"
},
"tags": []
...
...
@@ -278,10 +278,10 @@
"cell_type": "markdown",
"metadata": {
"papermill": {
"duration": 0.00
9895
,
"end_time": "2020-03-1
8
T1
7
:3
9:46.253760
",
"duration": 0.00
8352
,
"end_time": "2020-03-1
9
T1
5
:3
1:57.742659
",
"exception": false,
"start_time": "2020-03-1
8
T1
7
:3
9:46.243865
",
"start_time": "2020-03-1
9
T1
5
:3
1:57.734307
",
"status": "completed"
},
"tags": []
...
...
@@ -295,10 +295,10 @@
"execution_count": 10,
"metadata": {
"papermill": {
"duration": 0.0
34717
,
"end_time": "2020-03-1
8
T1
7
:3
9:46.298658
",
"duration": 0.0
44489
,
"end_time": "2020-03-1
9
T1
5
:3
1:57.795095
",
"exception": false,
"start_time": "2020-03-1
8
T1
7
:3
9:46.263941
",
"start_time": "2020-03-1
9
T1
5
:3
1:57.750606
",
"status": "completed"
},
"tags": []
...
...
@@ -413,10 +413,10 @@
"cell_type": "markdown",
"metadata": {
"papermill": {
"duration": 0.0
10396
,
"end_time": "2020-03-1
8
T1
7
:3
9:46.322850
",
"duration": 0.0
095
,
"end_time": "2020-03-1
9
T1
5
:3
1:57.820088
",
"exception": false,
"start_time": "2020-03-1
8
T1
7
:3
9:46.312454
",
"start_time": "2020-03-1
9
T1
5
:3
1:57.810588
",
"status": "completed"
},
"tags": []
...
...
@@ -430,10 +430,10 @@
"execution_count": 11,
"metadata": {
"papermill": {
"duration": 0.0
28961
,
"end_time": "2020-03-1
8
T1
7
:3
9:46.362455
",
"duration": 0.0
32448
,
"end_time": "2020-03-1
9
T1
5
:3
1:57.860318
",
"exception": false,
"start_time": "2020-03-1
8
T1
7
:3
9:46.333494
",
"start_time": "2020-03-1
9
T1
5
:3
1:57.827870
",
"status": "completed"
},
"tags": []
...
...
@@ -465,20 +465,20 @@
"version": "3.7.6"
},
"papermill": {
"duration": 2.
13583
6,
"end_time": "2020-03-1
8
T1
7
:3
9:46.683682
",
"duration": 2.
20827
6,
"end_time": "2020-03-1
9
T1
5
:3
1:58.185350
",
"environment_variables": {},
"exception": null,
"input_path": "notebooks/process/CompileGeoData.ipynb",
"input_path": "
/tmp/ps76102a/
notebooks/process/CompileGeoData.ipynb",
"output_path": "runs/CompileGeoData.run.ipynb",
"parameters": {
"PAPERMILL_INPUT_PATH": "notebooks/process/CompileGeoData.ipynb",
"PAPERMILL_INPUT_PATH": "
/tmp/ps76102a/
notebooks/process/CompileGeoData.ipynb",
"PAPERMILL_OUTPUT_PATH": "runs/CompileGeoData.run.ipynb",
"out_folder": "
.
/data/geodata
/
",
"ts_folder": "
.
/data/covid-19_jhu-csse
/
",
"worldmap_path": "
.
/data/worldmap/country_centroids.csv"
"out_folder": "
/tmp/ps76102a
/data/geodata",
"ts_folder": "
/tmp/ps76102a
/data/covid-19_jhu-csse",
"worldmap_path": "
/tmp/ps76102a
/data/worldmap/country_centroids.csv"
},
"start_time": "2020-03-1
8
T1
7
:3
9:44.547846
",
"start_time": "2020-03-1
9
T1
5
:3
1:55.977074
",
"version": "1.1.0"
}
},
...
...
%% Cell type:markdown id: tags:
# Extract the Geographic Info
Use the Harvard
[
country_centroids.csv
](
https://worldmap.harvard.edu/data/geonode:country_centroids_az8
)
data to extract the geographic info we need for the visualizations.
%% Cell type:code id: tags:
```
python
import
pandas
as
pd
import
os
```
%% Cell type:code id: tags:
```
python
ts_folder
=
"
../data/covid-19_jhu-csse/
"
worldmap_path
=
"
../data/worldmap/country_centroids.csv
"
out_folder
=
None
PAPERMILL_OUTPUT_PATH
=
None
```
%% Cell type:markdown id: tags:parameters
## Read in JHU CSSE data
%% Cell type:code id: tags:injected-parameters
```
python
# Parameters
PAPERMILL_INPUT_PATH
=
"
notebooks/process/CompileGeoData.ipynb
"
PAPERMILL_INPUT_PATH
=
"
/tmp/ps76102a/
notebooks/process/CompileGeoData.ipynb
"
PAPERMILL_OUTPUT_PATH
=
"
runs/CompileGeoData.run.ipynb
"
ts_folder
=
"
.
/data/covid-19_jhu-csse
/
"
worldmap_path
=
"
.
/data/worldmap/country_centroids.csv
"
out_folder
=
"
.
/data/geodata
/
"
ts_folder
=
"
/tmp/ps76102a
/data/covid-19_jhu-csse
"
worldmap_path
=
"
/tmp/ps76102a
/data/worldmap/country_centroids.csv
"
out_folder
=
"
/tmp/ps76102a
/data/geodata
"
```
%% Cell type:code id: tags:
```
python
def
read_jhu_covid_region_df
(
name
):
filename
=
os
.
path
.
join
(
ts_folder
,
f
"
time_series_19-covid-
{
name
}
.csv
"
)
df
=
pd
.
read_csv
(
filename
)
df
=
df
.
set_index
([
'
Country/Region
'
,
'
Province/State
'
,
'
Lat
'
,
'
Long
'
])
df
.
columns
=
pd
.
to_datetime
(
df
.
columns
)
region_df
=
df
.
groupby
(
level
=
'
Country/Region
'
).
sum
()
return
region_df
```
%% Cell type:code id: tags:
```
python
confirmed_df
=
read_jhu_covid_region_df
(
"
Confirmed
"
)
```
%% Cell type:markdown id: tags:
# Read in Harvard country centroids
%% Cell type:code id: tags:
```
python
country_centroids_df
=
pd
.
read_csv
(
worldmap_path
)
country_centroids_df
=
country_centroids_df
[[
'
name
'
,
'
name_long
'
,
'
region_un
'
,
'
subregion
'
,
'
region_wb
'
,
'
pop_est
'
,
'
gdp_md_est
'
,
'
income_grp
'
,
'
Longitude
'
,
'
Latitude
'
]]
country_centroids_df
[
'
name_jhu
'
]
=
country_centroids_df
[
'
name_long
'
]
```
%% Cell type:code id: tags:
```
python
country_centroids_df
.
columns
```
%% Output
Index(['name', 'name_long', 'region_un', 'subregion', 'region_wb', 'pop_est',
'gdp_md_est', 'income_grp', 'Longitude', 'Latitude', 'name_jhu'],
dtype='object')
%% Cell type:markdown id: tags:
Fix names that differ between JHU CSSE and Harvard data
%% Cell type:code id: tags:
```
python
region_hd_jhu_map
=
{
'
Brunei Darussalam
'
:
'
Brunei
'
,
"
Côte d
'
Ivoire
"
:
"
Cote d
'
Ivoire
"
,
'
Czech Republic
'
:
'
Czechia
'
,
'
Hong Kong
'
:
'
Hong Kong SAR
'
,
'
Republic of Korea
'
:
'
Korea, South
'
,
'
Macao
'
:
'
Macao SAR
'
,
'
Russian Federation
'
:
'
Russia
'
,
'
Taiwan
'
:
'
Taiwan*
'
,
'
United States
'
:
'
US
'
}
country_centroids_df
[
'
name_jhu
'
]
=
country_centroids_df
[
'
name_jhu
'
].
replace
(
region_hd_jhu_map
)
```
%% Cell type:code id: tags:
```
python
# Use this to find the name in the series
# country_centroids_df[country_centroids_df['name'].str.contains('Macao')]
```
%% Cell type:markdown id: tags:
There are some regions that we cannot resolve, but we will just ignore these.
%% Cell type:code id: tags:
```
python
confirmed_df
.
loc
[
(
confirmed_df
.
index
.
isin
(
country_centroids_df
[
'
name_jhu
'
])
==
False
)
].
iloc
[:,
-
2
:]
```
%% Output
2020-03-16 2020-03-17
Country/Region
Congo (Brazzaville) 1 1
Congo (Kinshasa) 2 3
Cruise Ship 696 696
Eswatini 1 1
Holy See 1 1
Martinique 15 16
North Macedonia 18 26
Republic of the Congo 1 1
The Bahamas 1 1
%% Cell type:markdown id: tags:
# Save the result
%% Cell type:code id: tags:
```
python
if
PAPERMILL_OUTPUT_PATH
:
out_path
=
os
.
path
.
join
(
out_folder
,
f
"
geo_data.csv
"
)
country_centroids_df
.
to_csv
(
out_path
)
```
...
...
This diff is collapsed.
Click to expand it.
runs/Dashboard.run.ipynb
+
193
−
83
View file @
71e50792
source diff could not be displayed: it is too large. Options to address this:
view the blob
.
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment