Skip to content
Snippets Groups Projects
Commit 71ee1b3f authored by Chandrasekhar Ramakrishnan's avatar Chandrasekhar Ramakrishnan Committed by renku 0.9.1
Browse files

renku update --with-siblings

parent 052d4070
No related branches found
No related tags found
No related merge requests found
class: Workflow
cwlVersion: v1.0
hints: []
inputs:
input_1:
default: ts_folder
streamable: false
type: string
input_10:
default:
class: Directory
listing: []
path: ../../data/covid-19_jhu-csse
streamable: false
type: Directory
input_11:
default: rates_folder
streamable: false
type: string
input_12:
default:
class: Directory
listing: []
path: ../../data/covid-19_rates
streamable: false
type: Directory
input_13:
default: geodata_path
streamable: false
type: string
input_14:
default:
class: File
path: ../../data/geodata/geo_data.csv
streamable: false
type: File
input_15:
default:
class: File
path: ../../notebooks/Dashboard.ipynb
streamable: false
type: File
input_16:
default: runs/Dashboard.run.ipynb
streamable: false
type: string
input_2:
default:
class: Directory
listing: []
path: ../../data/covid-19_jhu-csse
streamable: false
type: Directory
input_3:
default: worldmap_path
streamable: false
type: string
input_4:
default:
class: File
path: ../../data/worldmap/country_centroids.csv
streamable: false
type: File
input_5:
default: out_folder
streamable: false
type: string
input_6:
default: data/geodata
streamable: false
type: string
input_7:
default:
class: File
path: ../../notebooks/CompileGeoData.ipynb
streamable: false
type: File
input_8:
default: runs/CompileGeoData.run.ipynb
streamable: false
type: string
input_9:
default: ts_folder
streamable: false
type: string
outputs:
output_0:
outputSource: step_2/output_0
streamable: false
type: File
output_1:
outputSource: step_1/output_1
streamable: false
type: Directory
output_2:
outputSource: step_1/output_0
streamable: false
type: File
requirements: []
steps:
step_1:
in:
input_1: input_1
input_2: input_2
input_3: input_3
input_4: input_4
input_5: input_5
input_6: input_6
input_7: input_7
input_8: input_8
out:
- output_0
- output_1
run: 73781f74e51d4e54bc522007a2030ec2_papermill.cwl
step_2:
in:
input_1: input_9
input_2: input_10
input_3: input_11
input_4: input_12
input_5: input_13
input_6: input_14
input_7: input_15
input_8: input_16
out:
- output_0
run: 4cc7ffe9d5a045efb048ef2222a40ffa_papermill.cwl
source diff could not be displayed: it is stored in LFS. Options to address this: view the blob.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Extract the Geographic Info # Extract the Geographic Info
Use the Harvard [country_centroids.csv](https://worldmap.harvard.edu/data/geonode:country_centroids_az8) data to extract the geographic info we need for the visualizations. Use the Harvard [country_centroids.csv](https://worldmap.harvard.edu/data/geonode:country_centroids_az8) data to extract the geographic info we need for the visualizations.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import pandas as pd import pandas as pd
import os import os
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
ts_folder = "../data/covid-19_jhu-csse/" ts_folder = "../data/covid-19_jhu-csse/"
worldmap_path = "../data/worldmap/country_centroids.csv" worldmap_path = "../data/worldmap/country_centroids.csv"
out_folder = None out_folder = None
PAPERMILL_OUTPUT_PATH = None PAPERMILL_OUTPUT_PATH = None
``` ```
%% Cell type:markdown id: tags:parameters %% Cell type:markdown id: tags:parameters
## Read in JHU CSSE data ## Read in JHU CSSE data
%% Cell type:code id: tags:injected-parameters %% Cell type:code id: tags:injected-parameters
``` python ``` python
# Parameters # Parameters
PAPERMILL_INPUT_PATH = "/tmp/262jiqqx/notebooks/CompileGeoData.ipynb" PAPERMILL_INPUT_PATH = "/tmp/q9ufv19r/notebooks/CompileGeoData.ipynb"
PAPERMILL_OUTPUT_PATH = "runs/CompileGeoData.run.ipynb" PAPERMILL_OUTPUT_PATH = "runs/CompileGeoData.run.ipynb"
ts_folder = "/tmp/262jiqqx/data/covid-19_jhu-csse" ts_folder = "/tmp/q9ufv19r/data/covid-19_jhu-csse"
worldmap_path = "/tmp/262jiqqx/data/worldmap/country_centroids.csv" worldmap_path = "/tmp/q9ufv19r/data/worldmap/country_centroids.csv"
out_folder = "data/geodata" out_folder = "data/geodata"
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
def read_jhu_covid_region_df(name): def read_jhu_covid_region_df(name):
filename = os.path.join(ts_folder, f"time_series_19-covid-{name}.csv") filename = os.path.join(ts_folder, f"time_series_19-covid-{name}.csv")
df = pd.read_csv(filename) df = pd.read_csv(filename)
df = df.set_index(['Country/Region', 'Province/State', 'Lat', 'Long']) df = df.set_index(['Country/Region', 'Province/State', 'Lat', 'Long'])
df.columns = pd.to_datetime(df.columns) df.columns = pd.to_datetime(df.columns)
region_df = df.groupby(level='Country/Region').sum() region_df = df.groupby(level='Country/Region').sum()
return region_df return region_df
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
confirmed_df = read_jhu_covid_region_df("Confirmed") confirmed_df = read_jhu_covid_region_df("Confirmed")
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Read in Harvard country centroids # Read in Harvard country centroids
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
country_centroids_df = pd.read_csv(worldmap_path) country_centroids_df = pd.read_csv(worldmap_path)
country_centroids_df = country_centroids_df[['name', 'name_long', 'region_un', 'subregion', 'region_wb', 'pop_est', 'gdp_md_est', 'income_grp', 'Longitude', 'Latitude']] country_centroids_df = country_centroids_df[['name', 'name_long', 'region_un', 'subregion', 'region_wb', 'pop_est', 'gdp_md_est', 'income_grp', 'Longitude', 'Latitude']]
country_centroids_df['name_jhu'] = country_centroids_df['name_long'] country_centroids_df['name_jhu'] = country_centroids_df['name_long']
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
country_centroids_df.columns country_centroids_df.columns
``` ```
%% Output %% Output
Index(['name', 'name_long', 'region_un', 'subregion', 'region_wb', 'pop_est', Index(['name', 'name_long', 'region_un', 'subregion', 'region_wb', 'pop_est',
'gdp_md_est', 'income_grp', 'Longitude', 'Latitude', 'name_jhu'], 'gdp_md_est', 'income_grp', 'Longitude', 'Latitude', 'name_jhu'],
dtype='object') dtype='object')
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Fix names that differ between JHU CSSE and Harvard data Fix names that differ between JHU CSSE and Harvard data
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
region_hd_jhu_map = { region_hd_jhu_map = {
'Brunei Darussalam': 'Brunei', 'Brunei Darussalam': 'Brunei',
"Côte d'Ivoire": "Cote d'Ivoire", "Côte d'Ivoire": "Cote d'Ivoire",
'Czech Republic': 'Czechia', 'Czech Republic': 'Czechia',
'Hong Kong': 'Hong Kong SAR', 'Hong Kong': 'Hong Kong SAR',
'Republic of Korea': 'Korea, South', 'Republic of Korea': 'Korea, South',
'Macao': 'Macao SAR', 'Macao': 'Macao SAR',
'Russian Federation': 'Russia', 'Russian Federation': 'Russia',
'Taiwan': 'Taiwan*', 'Taiwan': 'Taiwan*',
'United States': 'US' 'United States': 'US'
} }
country_centroids_df['name_jhu'] = country_centroids_df['name_jhu'].replace(region_hd_jhu_map) country_centroids_df['name_jhu'] = country_centroids_df['name_jhu'].replace(region_hd_jhu_map)
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# Use this to find the name in the series # Use this to find the name in the series
# country_centroids_df[country_centroids_df['name'].str.contains('Macao')] # country_centroids_df[country_centroids_df['name'].str.contains('Macao')]
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
There are some regions that we cannot resolve, but we will just ignore these. There are some regions that we cannot resolve, but we will just ignore these.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
confirmed_df.loc[ confirmed_df.loc[
(confirmed_df.index.isin(country_centroids_df['name_jhu']) == False) (confirmed_df.index.isin(country_centroids_df['name_jhu']) == False)
].iloc[:,-2:] ].iloc[:,-2:]
``` ```
%% Output %% Output
2020-03-13 2020-03-14 2020-03-14 2020-03-15
Country/Region Country/Region
Congo (Brazzaville) 0 1
Congo (Kinshasa) 2 2 Congo (Kinshasa) 2 2
Cruise Ship 696 696 Cruise Ship 696 696
Curacao 0 1 Curacao 1 1
Eswatini 0 1 Eswatini 1 1
French Guiana 5 5 Guadeloupe 1 3
Guadeloupe 1 1
Holy See 1 1 Holy See 1 1
Martinique 3 9 Martinique 9 9
North Macedonia 14 14 North Macedonia 14 14
Reunion 5 6 Reunion 6 7
occupied Palestinian territory 0 0 occupied Palestinian territory 0 0
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Save the result # Save the result
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
if PAPERMILL_OUTPUT_PATH: if PAPERMILL_OUTPUT_PATH:
out_path = os.path.join(out_folder, f"geo_data.csv") out_path = os.path.join(out_folder, f"geo_data.csv")
country_centroids_df.to_csv(out_path) country_centroids_df.to_csv(out_path)
``` ```
......
This diff is collapsed.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment