Skip to content
Snippets Groups Projects
Commit 638d6c01 authored by Chandrasekhar Ramakrishnan's avatar Chandrasekhar Ramakrishnan Committed by renku 0.9.1
Browse files

renku run papermill -p ts_folder ./data/covid-19_jhu-csse/ -p worldmap_path...

renku run papermill -p ts_folder ./data/covid-19_jhu-csse/ -p worldmap_path ./data/worldmap/country_centroids.csv -p out_folder ./data/geodata/ --inject-paths notebooks/process/CompileGeoData.ipynb runs/CompileGeoData.run.ipynb
parent c987c341
No related branches found
No related tags found
No related merge requests found
arguments: []
baseCommand:
- papermill
class: CommandLineTool
cwlVersion: v1.0
hints: []
inputs:
input_1:
default: ts_folder
inputBinding:
position: 1
prefix: -p
separate: true
shellQuote: true
streamable: false
type: string
input_2:
default:
class: Directory
listing: []
path: ../../data/covid-19_jhu-csse
inputBinding:
position: 2
separate: true
shellQuote: true
streamable: false
type: Directory
input_3:
default: worldmap_path
inputBinding:
position: 3
prefix: -p
separate: true
shellQuote: true
streamable: false
type: string
input_4:
default:
class: File
path: ../../data/worldmap/country_centroids.csv
inputBinding:
position: 4
separate: true
shellQuote: true
streamable: false
type: File
input_5:
default: out_folder
inputBinding:
position: 5
prefix: -p
separate: true
shellQuote: true
streamable: false
type: string
input_6:
default:
class: Directory
listing: []
path: ../../data/geodata
inputBinding:
position: 6
separate: true
shellQuote: true
streamable: false
type: Directory
input_7:
default:
class: File
path: ../../notebooks/process/CompileGeoData.ipynb
inputBinding:
position: 7
prefix: --inject-paths
separate: true
shellQuote: true
streamable: false
type: File
input_8:
default: runs/CompileGeoData.run.ipynb
inputBinding:
position: 8
separate: true
shellQuote: true
streamable: false
type: string
outputs:
output_0:
outputBinding:
glob: $(inputs.input_8)
streamable: false
type: File
permanentFailCodes: []
requirements:
- class: InlineJavascriptRequirement
- class: InitialWorkDirRequirement
listing:
- entry: '$({"listing": [], "class": "Directory"})'
entryname: runs
writable: true
- entry: $(inputs.input_2)
entryname: data/covid-19_jhu-csse
writable: false
- entry: $(inputs.input_4)
entryname: data/worldmap/country_centroids.csv
writable: false
- entry: $(inputs.input_6)
entryname: data/geodata
writable: false
- entry: $(inputs.input_7)
entryname: notebooks/process/CompileGeoData.ipynb
writable: false
successCodes: []
temporaryFailCodes: []
%% Cell type:markdown id: tags:
# Extract the Geographic Info
Use the Harvard [country_centroids.csv](https://worldmap.harvard.edu/data/geonode:country_centroids_az8) data to extract the geographic info we need for the visualizations.
%% Cell type:code id: tags:
``` python
import pandas as pd
import os
```
%% Cell type:code id: tags:
``` python
ts_folder = "../data/covid-19_jhu-csse/"
worldmap_path = "../data/worldmap/country_centroids.csv"
out_folder = None
PAPERMILL_OUTPUT_PATH = None
```
%% Cell type:markdown id: tags:parameters
## Read in JHU CSSE data
%% Cell type:code id: tags:injected-parameters
``` python
# Parameters
PAPERMILL_INPUT_PATH = "/tmp/dixae90v/notebooks/CompileGeoData.ipynb"
PAPERMILL_INPUT_PATH = "notebooks/process/CompileGeoData.ipynb"
PAPERMILL_OUTPUT_PATH = "runs/CompileGeoData.run.ipynb"
ts_folder = "/tmp/dixae90v/data/covid-19_jhu-csse"
worldmap_path = "/tmp/dixae90v/data/worldmap/country_centroids.csv"
out_folder = "data/geodata"
ts_folder = "./data/covid-19_jhu-csse/"
worldmap_path = "./data/worldmap/country_centroids.csv"
out_folder = "./data/geodata/"
```
%% Cell type:code id: tags:
``` python
def read_jhu_covid_region_df(name):
filename = os.path.join(ts_folder, f"time_series_19-covid-{name}.csv")
df = pd.read_csv(filename)
df = df.set_index(['Country/Region', 'Province/State', 'Lat', 'Long'])
df.columns = pd.to_datetime(df.columns)
region_df = df.groupby(level='Country/Region').sum()
return region_df
```
%% Cell type:code id: tags:
``` python
confirmed_df = read_jhu_covid_region_df("Confirmed")
```
%% Cell type:markdown id: tags:
# Read in Harvard country centroids
%% Cell type:code id: tags:
``` python
country_centroids_df = pd.read_csv(worldmap_path)
country_centroids_df = country_centroids_df[['name', 'name_long', 'region_un', 'subregion', 'region_wb', 'pop_est', 'gdp_md_est', 'income_grp', 'Longitude', 'Latitude']]
country_centroids_df['name_jhu'] = country_centroids_df['name_long']
```
%% Cell type:code id: tags:
``` python
country_centroids_df.columns
```
%% Output
Index(['name', 'name_long', 'region_un', 'subregion', 'region_wb', 'pop_est',
'gdp_md_est', 'income_grp', 'Longitude', 'Latitude', 'name_jhu'],
dtype='object')
%% Cell type:markdown id: tags:
Fix names that differ between JHU CSSE and Harvard data
%% Cell type:code id: tags:
``` python
region_hd_jhu_map = {
'Brunei Darussalam': 'Brunei',
"Côte d'Ivoire": "Cote d'Ivoire",
'Czech Republic': 'Czechia',
'Hong Kong': 'Hong Kong SAR',
'Republic of Korea': 'Korea, South',
'Macao': 'Macao SAR',
'Russian Federation': 'Russia',
'Taiwan': 'Taiwan*',
'United States': 'US'
}
country_centroids_df['name_jhu'] = country_centroids_df['name_jhu'].replace(region_hd_jhu_map)
```
%% Cell type:code id: tags:
``` python
# Use this to find the name in the series
# country_centroids_df[country_centroids_df['name'].str.contains('Macao')]
```
%% Cell type:markdown id: tags:
There are some regions that we cannot resolve, but we will just ignore these.
%% Cell type:code id: tags:
``` python
confirmed_df.loc[
(confirmed_df.index.isin(country_centroids_df['name_jhu']) == False)
].iloc[:,-2:]
```
%% Output
2020-03-16 2020-03-17
Country/Region
Congo (Brazzaville) 1 1
Congo (Kinshasa) 2 3
Cruise Ship 696 696
Eswatini 1 1
Holy See 1 1
Martinique 15 16
North Macedonia 18 26
Republic of the Congo 1 1
The Bahamas 1 1
%% Cell type:markdown id: tags:
# Save the result
%% Cell type:code id: tags:
``` python
if PAPERMILL_OUTPUT_PATH:
out_path = os.path.join(out_folder, f"geo_data.csv")
country_centroids_df.to_csv(out_path)
```
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment