Skip to content
Snippets Groups Projects
Commit 2281ce1f authored by CR (covid cron)'s avatar CR (covid cron) Committed by renku 0.9.1
Browse files

renku update --with-siblings

parent e6d18d59
No related branches found
No related tags found
No related merge requests found
class: Workflow
cwlVersion: v1.0
hints: []
inputs:
input_1:
default: ts_folder
streamable: false
type: string
input_2:
default:
class: Directory
listing: []
path: ../../data/covid-19_jhu-csse
streamable: false
type: Directory
input_3:
default: rates_folder
streamable: false
type: string
input_4:
default:
class: Directory
listing: []
path: ../../data/covid-19_rates
streamable: false
type: Directory
input_5:
default: geodata_path
streamable: false
type: string
input_6:
default:
class: File
path: ../../data/geodata/geo_data.csv
streamable: false
type: File
input_7:
default:
class: File
path: ../../notebooks/Dashboard.ipynb
streamable: false
type: File
input_8:
default: runs/Dashboard.run.ipynb
streamable: false
type: string
outputs:
output_0:
outputSource: step_1/output_0
streamable: false
type: File
requirements: []
steps:
step_1:
in:
input_1: input_1
input_2: input_2
input_3: input_3
input_4: input_4
input_5: input_5
input_6: input_6
input_7: input_7
input_8: input_8
out:
- output_0
run: 5ae9a9961e194e7795df04a9722452e8_papermill.cwl
%% Cell type:code id: tags: %% Cell type:code id: tags:
   
``` python ``` python
import pandas as pd import pandas as pd
import numpy as np import numpy as np
import os import os
from IPython.display import display, HTML, Markdown from IPython.display import display, HTML, Markdown
import covid_19_dashboard as helper import covid_19_dashboard as helper
``` ```
   
%% Cell type:code id: tags:parameters %% Cell type:code id: tags:parameters
   
``` python ``` python
ts_folder = "../data/covid-19_jhu-csse/" ts_folder = "../data/covid-19_jhu-csse/"
rates_folder = "../data/covid-19_rates/" rates_folder = "../data/covid-19_rates/"
geodata_path = "../data/geodata/geo_data.csv" geodata_path = "../data/geodata/geo_data.csv"
out_folder = None out_folder = None
PAPERMILL_OUTPUT_PATH = None PAPERMILL_OUTPUT_PATH = None
``` ```
   
%% Cell type:code id: tags:injected-parameters %% Cell type:code id: tags:injected-parameters
   
``` python ``` python
# Parameters # Parameters
PAPERMILL_INPUT_PATH = "/tmp/t7uqe0kc/notebooks/Dashboard.ipynb" PAPERMILL_INPUT_PATH = "/tmp/aphx1zdn/notebooks/Dashboard.ipynb"
PAPERMILL_OUTPUT_PATH = "runs/Dashboard.run.ipynb" PAPERMILL_OUTPUT_PATH = "runs/Dashboard.run.ipynb"
ts_folder = "/tmp/t7uqe0kc/data/covid-19_jhu-csse" ts_folder = "/tmp/aphx1zdn/data/covid-19_jhu-csse"
rates_folder = "/tmp/t7uqe0kc/data/covid-19_rates" rates_folder = "/tmp/aphx1zdn/data/covid-19_rates"
geodata_path = "/tmp/t7uqe0kc/data/geodata/geo_data.csv" geodata_path = "/tmp/aphx1zdn/data/geodata/geo_data.csv"
``` ```
   
%% Cell type:code id: tags: %% Cell type:code id: tags:
   
``` python ``` python
# Read in the data # Read in the data
``` ```
   
%% Cell type:code id: tags: %% Cell type:code id: tags:
   
``` python ``` python
jhu_frames_map = helper.read_jhu_frames_map(ts_folder) jhu_frames_map = helper.read_jhu_frames_map(ts_folder)
rates_frames_map = helper.read_rates_frames_map(rates_folder) rates_frames_map = helper.read_rates_frames_map(rates_folder)
geodata_df = helper.read_geodata(geodata_path) geodata_df = helper.read_geodata(geodata_path)
   
# Identify countries with 100 or more cases # Identify countries with 100 or more cases
countries_over_thresh = helper.countries_with_number_of_cases(jhu_frames_map, 'confirmed', 100) countries_over_thresh = helper.countries_with_number_of_cases(jhu_frames_map, 'confirmed', 100)
# Filter out some countries with very high case/population ratio # Filter out some countries with very high case/population ratio
countries_over_thresh = [c for c in countries_over_thresh if c not in set(['San Marino', 'Iceland'])] countries_over_thresh = [c for c in countries_over_thresh if c not in set(['San Marino', 'Iceland'])]
``` ```
   
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
   
# Questions About COVID-19 and Its Spread # Questions About COVID-19 and Its Spread
   
Understanding the spread, distribution, and deadliness of COVID-19 is difficult, despite the data available about it. Differences in rates of testing, quality of data, demographics, etc. make it difficult to compare data between countries. Understanding the spread, distribution, and deadliness of COVID-19 is difficult, despite the data available about it. Differences in rates of testing, quality of data, demographics, etc. make it difficult to compare data between countries.
   
All this needs to be considered when looking at the plots below. But despite those caveats, I found it helpful to plot the raw data, even though direct comparisons between countries might not be inaccurate. All this needs to be considered when looking at the plots below. But despite those caveats, I found it helpful to plot the raw data, even though direct comparisons between countries might not be inaccurate.
   
%% Cell type:code id: tags: %% Cell type:code id: tags:
   
``` python ``` python
data_ts = jhu_frames_map['confirmed'].iloc[:,-1].name.strftime("%b %d %Y") data_ts = jhu_frames_map['confirmed'].iloc[:,-1].name.strftime("%b %d %Y")
display(HTML(f"<em>Data up to {data_ts}; countries with 100 or more confirmed cases.</em>")) display(HTML(f"<em>Data up to {data_ts}; countries with 100 or more confirmed cases.</em>"))
``` ```
   
%% Output %% Output
   
   
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
   
## How are cases per 100,000 distributed geographically? ## How are cases per 100,000 distributed geographically?
   
%% Cell type:code id: tags: %% Cell type:code id: tags:
   
``` python ``` python
import altair as alt import altair as alt
``` ```
   
%% Cell type:code id: tags: %% Cell type:code id: tags:
   
``` python ``` python
map_df = helper.compute_map_df(rates_frames_map, jhu_frames_map, geodata_df, countries_over_thresh) map_df = helper.compute_map_df(rates_frames_map, jhu_frames_map, geodata_df, countries_over_thresh)
``` ```
   
%% Cell type:code id: tags: %% Cell type:code id: tags:
   
``` python ``` python
display(helper.map_of_variable(map_df, 'Confirmed/100k', 'Confirmed')) display(helper.map_of_variable(map_df, 'Confirmed/100k', 'Confirmed'))
display(HTML(''' display(HTML('''
<p style="font-size: smaller">Data Sources: <p style="font-size: smaller">Data Sources:
<a href="https://github.com/CSSEGISandData/COVID-19">JHU CSSE</a>, <a href="https://github.com/CSSEGISandData/COVID-19">JHU CSSE</a>,
<a href="https://data.worldbank.org/indicator/SP.POP.TOTL">World Bank</a>, <a href="https://data.worldbank.org/indicator/SP.POP.TOTL">World Bank</a>,
<a href="https://worldmap.harvard.edu/data/geonode:country_centroids_az8">Harvard Worldmap</a> <a href="https://worldmap.harvard.edu/data/geonode:country_centroids_az8">Harvard Worldmap</a>
</p>''')) </p>'''))
``` ```
   
%% Output %% Output
   
   
   
%% Cell type:code id: tags: %% Cell type:code id: tags:
   
``` python ``` python
bars = alt.Chart(map_df).mark_bar().encode( bars = alt.Chart(map_df).mark_bar().encode(
x='Confirmed/100k:Q', x='Confirmed/100k:Q',
y=alt.Y("Country/Region:N", sort='-x'), y=alt.Y("Country/Region:N", sort='-x'),
tooltip=["Country/Region:N", tooltip=["Country/Region:N",
"Confirmed:Q", "Deaths:Q", "Recovered:Q", "Confirmed:Q", "Deaths:Q", "Recovered:Q",
"Confirmed/100k:Q", "Deaths/100k:Q", "Recovered/100k:Q"] "Confirmed/100k:Q", "Deaths/100k:Q", "Recovered/100k:Q"]
) )
   
text = bars.mark_text( text = bars.mark_text(
align='left', align='left',
baseline='middle', baseline='middle',
dx=3 # Nudges text to right so it doesn't appear on top of the bar dx=3 # Nudges text to right so it doesn't appear on top of the bar
).encode( ).encode(
text=alt.Text('Confirmed/100k:Q', format=".3") text=alt.Text('Confirmed/100k:Q', format=".3")
) )
   
chart = (bars + text).properties(height=900, title=f"Confirmed cases per 100k inhabitants") chart = (bars + text).properties(height=900, title=f"Confirmed cases per 100k inhabitants")
display(chart) display(chart)
display(HTML(''' display(HTML('''
<p style="font-size: smaller">Data Sources: <p style="font-size: smaller">Data Sources:
<a href="https://github.com/CSSEGISandData/COVID-19">JHU CSSE</a>, <a href="https://github.com/CSSEGISandData/COVID-19">JHU CSSE</a>,
<a href="https://data.worldbank.org/indicator/SP.POP.TOTL">World Bank</a>, <a href="https://data.worldbank.org/indicator/SP.POP.TOTL">World Bank</a>,
<a href="https://worldmap.harvard.edu/data/geonode:country_centroids_az8">Harvard Worldmap</a> <a href="https://worldmap.harvard.edu/data/geonode:country_centroids_az8">Harvard Worldmap</a>
</p>''')) </p>'''))
``` ```
   
%% Output %% Output
   
   
   
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
   
## How have cases been growing? ## How have cases been growing?
   
%% Cell type:code id: tags: %% Cell type:code id: tags:
   
``` python ``` python
confirmed_rate_df = helper.growth_df(rates_frames_map, geodata_df, 'confirmed', countries_over_thresh, 2) confirmed_rate_df = helper.growth_df(rates_frames_map, geodata_df, 'confirmed', countries_over_thresh, 2)
latest_confirmed_ser = confirmed_rate_df.set_index( latest_confirmed_ser = confirmed_rate_df.set_index(
['Country/Region', 'Geo Region', 'Date']).drop( ['Country/Region', 'Geo Region', 'Date']).drop(
['Longitude', 'Latitude'], axis=1).unstack().iloc[:,-1] ['Longitude', 'Latitude'], axis=1).unstack().iloc[:,-1]
sort_order = latest_confirmed_ser.groupby('Geo Region').mean().sort_values(ascending=False).index.tolist() sort_order = latest_confirmed_ser.groupby('Geo Region').mean().sort_values(ascending=False).index.tolist()
``` ```
   
%% Cell type:code id: tags: %% Cell type:code id: tags:
   
``` python ``` python
base = alt.Chart(confirmed_rate_df).properties( base = alt.Chart(confirmed_rate_df).properties(
width=300, height=200, title="Countries with 2 or more cases per 100k") width=300, height=200, title="Countries with 2 or more cases per 100k")
line = base.mark_line().encode( line = base.mark_line().encode(
x='Date', x='Date',
y='Confirmed/100k', y='Confirmed/100k',
color='Country/Region', color='Country/Region',
facet=alt.Facet('Geo Region:N', columns=1, sort=alt.SortArray(sort_order), title='Geographic Region'), facet=alt.Facet('Geo Region:N', columns=1, sort=alt.SortArray(sort_order), title='Geographic Region'),
tooltip=["Country/Region:N", "Date:T", "Confirmed/100k:Q"] tooltip=["Country/Region:N", "Date:T", "Confirmed/100k:Q"]
) )
line line
display(line) display(line)
display(HTML(''' display(HTML('''
<p style="font-size: smaller">Data Sources: <p style="font-size: smaller">Data Sources:
<a href="https://github.com/CSSEGISandData/COVID-19">JHU CSSE</a>, <a href="https://github.com/CSSEGISandData/COVID-19">JHU CSSE</a>,
<a href="https://data.worldbank.org/indicator/SP.POP.TOTL">World Bank</a>, <a href="https://data.worldbank.org/indicator/SP.POP.TOTL">World Bank</a>,
<a href="https://worldmap.harvard.edu/data/geonode:country_centroids_az8">Harvard Worldmap</a> <a href="https://worldmap.harvard.edu/data/geonode:country_centroids_az8">Harvard Worldmap</a>
</p>''')) </p>'''))
``` ```
   
%% Output %% Output
   
   
   
%% Cell type:code id: tags: %% Cell type:code id: tags:
   
``` python ``` python
def country_increase_df(c, df_nominal, growth_in_rate_df): def country_increase_df(c, df_nominal, growth_in_rate_df):
over_100 = df_nominal[df_nominal['Confirmed'] >= 100] over_100 = df_nominal[df_nominal['Confirmed'] >= 100]
tdf = (over_100[['Date', 'Confirmed']] - over_100.iloc[0][['Date', 'Confirmed']]).reset_index() tdf = (over_100[['Date', 'Confirmed']] - over_100.iloc[0][['Date', 'Confirmed']]).reset_index()
tdfr = growth_in_rate_df[(growth_in_rate_df['Date'] >= over_100.iloc[0]['Date']) & tdfr = growth_in_rate_df[(growth_in_rate_df['Date'] >= over_100.iloc[0]['Date']) &
(growth_in_rate_df['Country/Region'] == c)].reset_index() (growth_in_rate_df['Country/Region'] == c)].reset_index()
tdf['Confirmed/100k'] = tdfr['Confirmed/100k'] tdf['Confirmed/100k'] = tdfr['Confirmed/100k']
tdf['Country/Region'] = c tdf['Country/Region'] = c
tdf['Days'] = (tdf['Date'] / np.timedelta64(1, 'D')).astype(int) tdf['Days'] = (tdf['Date'] / np.timedelta64(1, 'D')).astype(int)
return tdf[['Country/Region', 'Days', 'Confirmed', 'Confirmed/100k']] return tdf[['Country/Region', 'Days', 'Confirmed', 'Confirmed/100k']]
   
   
growth_in_rate_df = helper.growth_df(rates_frames_map, geodata_df, 'confirmed', countries_over_thresh, 0) growth_in_rate_df = helper.growth_df(rates_frames_map, geodata_df, 'confirmed', countries_over_thresh, 0)
frame_map = {'confirmed': jhu_frames_map['confirmed'].groupby(level='Country/Region').sum()} frame_map = {'confirmed': jhu_frames_map['confirmed'].groupby(level='Country/Region').sum()}
growth_in_value_df = helper.growth_df(frame_map, geodata_df, 'confirmed', countries_over_thresh, 1000) growth_in_value_df = helper.growth_df(frame_map, geodata_df, 'confirmed', countries_over_thresh, 1000)
growth_in_value_df = growth_in_value_df.rename({'Confirmed/100k':'Confirmed'}, axis=1) growth_in_value_df = growth_in_value_df.rename({'Confirmed/100k':'Confirmed'}, axis=1)
increase_df = pd.concat([country_increase_df(c, df_nominal, growth_in_rate_df) for increase_df = pd.concat([country_increase_df(c, df_nominal, growth_in_rate_df) for
c, df_nominal in growth_in_value_df.groupby('Country/Region')]) c, df_nominal in growth_in_value_df.groupby('Country/Region')])
``` ```
   
%% Cell type:code id: tags: %% Cell type:code id: tags:
   
``` python ``` python
def facetted_growth_plot(df, variable, sort_order, ref_country, title, yscale='linear'): def facetted_growth_plot(df, variable, sort_order, ref_country, title, yscale='linear'):
base = alt.Chart(df).properties( base = alt.Chart(df).properties(
width=250, height=150) width=250, height=150)
line = base.mark_line().encode( line = base.mark_line().encode(
x='Days', x='Days',
y=variable, y=variable,
color='Country/Region', color='Country/Region',
tooltip=["Country/Region:N", "Days:Q", f"{variable}:Q"] tooltip=["Country/Region:N", "Days:Q", f"{variable}:Q"]
) )
label_loc = increase_df[increase_df['Country/Region'] == ref_country]['Days'].iloc[-2] label_loc = increase_df[increase_df['Country/Region'] == ref_country]['Days'].iloc[-2]
ref = base.mark_line(opacity=0.3).encode( ref = base.mark_line(opacity=0.3).encode(
x='Days', x='Days',
y=alt.Y(variable, scale=alt.Scale(type=yscale)), y=alt.Y(variable, scale=alt.Scale(type=yscale)),
color=alt.ColorValue('steelblue'), color=alt.ColorValue('steelblue'),
).transform_filter(f"datum['Country/Region'] == '{ref_country}'") ).transform_filter(f"datum['Country/Region'] == '{ref_country}'")
ref += ref.mark_text().encode(text='Country/Region:N').transform_filter(f"datum['Days'] == {label_loc}") ref += ref.mark_text().encode(text='Country/Region:N').transform_filter(f"datum['Days'] == {label_loc}")
charts = [] charts = []
# make our small multiples # make our small multiples
for country in sort_order: for country in sort_order:
smallm = line.transform_filter(f"datum['Country/Region'] == '{country}'").properties( smallm = line.transform_filter(f"datum['Country/Region'] == '{country}'").properties(
title=country) title=country)
smallm += ref smallm += ref
charts.append(smallm) charts.append(smallm)
   
# group the small multiples into 3 horizontal charts # group the small multiples into 3 horizontal charts
groups = [] groups = []
c = None c = None
for i, chart in enumerate(charts): for i, chart in enumerate(charts):
if not i%3: if not i%3:
if c != None: if c != None:
groups.append(c) groups.append(c)
c = alt.hconcat() c = alt.hconcat()
c |= chart c |= chart
# vertically combine the horizontal charts # vertically combine the horizontal charts
chart = alt.vconcat(title=title) chart = alt.vconcat(title=title)
for c in groups: for c in groups:
chart &= c chart &= c
return chart return chart
``` ```
   
%% Cell type:code id: tags: %% Cell type:code id: tags:
   
``` python ``` python
sort_order = growth_in_value_df.groupby( sort_order = growth_in_value_df.groupby(
'Country/Region').max().sort_values( 'Country/Region').max().sort_values(
'Confirmed', ascending=False).index.tolist() 'Confirmed', ascending=False).index.tolist()
# Exclude China in this plot because its numbers are far greater then everywhere else # Exclude China in this plot because its numbers are far greater then everywhere else
sort_order = [o for o in sort_order if o != 'China'] sort_order = [o for o in sort_order if o != 'China']
chart = facetted_growth_plot(increase_df[increase_df['Country/Region'] != 'China'], chart = facetted_growth_plot(increase_df[increase_df['Country/Region'] != 'China'],
'Confirmed', 'Confirmed',
sort_order, sort_order,
'Italy', 'Italy',
"Growth of cases from case 100, compared to Italy") "Growth of cases from case 100, compared to Italy")
display(chart) display(chart)
display(HTML(''' display(HTML('''
<p style="font-size: smaller">Data Sources: <p style="font-size: smaller">Data Sources:
<a href="https://github.com/CSSEGISandData/COVID-19">JHU CSSE</a>, <a href="https://github.com/CSSEGISandData/COVID-19">JHU CSSE</a>,
<a href="https://data.worldbank.org/indicator/SP.POP.TOTL">World Bank</a>, <a href="https://data.worldbank.org/indicator/SP.POP.TOTL">World Bank</a>,
<a href="https://worldmap.harvard.edu/data/geonode:country_centroids_az8">Harvard Worldmap</a> <a href="https://worldmap.harvard.edu/data/geonode:country_centroids_az8">Harvard Worldmap</a>
</p> </p>
<p style="font-size: smaller">Inspired by <a href="https://covid19dashboards.com/growth-analysis/">Thomas Wiecki</a>''')) <p style="font-size: smaller">Inspired by <a href="https://covid19dashboards.com/growth-analysis/">Thomas Wiecki</a>'''))
``` ```
   
%% Output %% Output
   
   
   
%% Cell type:code id: tags: %% Cell type:code id: tags:
   
``` python ``` python
sort_order = growth_in_value_df.groupby( sort_order = growth_in_value_df.groupby(
'Country/Region').max().sort_values( 'Country/Region').max().sort_values(
'Confirmed', ascending=False).index.tolist() 'Confirmed', ascending=False).index.tolist()
chart = facetted_growth_plot(increase_df, chart = facetted_growth_plot(increase_df,
'Confirmed/100k', 'Confirmed/100k',
sort_order, sort_order,
'Italy', 'Italy',
"Growth of cases/100k from case 100, compared to Italy") "Growth of cases/100k from case 100, compared to Italy")
display(chart) display(chart)
display(HTML(''' display(HTML('''
<p style="font-size: smaller">Data Sources: <p style="font-size: smaller">Data Sources:
<a href="https://github.com/CSSEGISandData/COVID-19">JHU CSSE</a>, <a href="https://github.com/CSSEGISandData/COVID-19">JHU CSSE</a>,
<a href="https://data.worldbank.org/indicator/SP.POP.TOTL">World Bank</a>, <a href="https://data.worldbank.org/indicator/SP.POP.TOTL">World Bank</a>,
<a href="https://worldmap.harvard.edu/data/geonode:country_centroids_az8">Harvard Worldmap</a> <a href="https://worldmap.harvard.edu/data/geonode:country_centroids_az8">Harvard Worldmap</a>
</p>''')) </p>'''))
``` ```
   
%% Output %% Output
   
   
   
%% Cell type:code id: tags: %% Cell type:code id: tags:
   
``` python ``` python
# Same with log scale # Same with log scale
sort_order = growth_in_value_df.groupby( sort_order = growth_in_value_df.groupby(
'Country/Region').max().sort_values( 'Country/Region').max().sort_values(
'Confirmed', ascending=False).index.tolist() 'Confirmed', ascending=False).index.tolist()
chart = facetted_growth_plot(increase_df, chart = facetted_growth_plot(increase_df,
'Confirmed/100k', 'Confirmed/100k',
sort_order, sort_order,
'Italy', 'Italy',
"Growth of cases/100k from case 100, compared to Italy (log scale)", "Growth of cases/100k from case 100, compared to Italy (log scale)",
'log') 'log')
display(chart) display(chart)
display(HTML(''' display(HTML('''
<p style="font-size: smaller">Data Sources: <p style="font-size: smaller">Data Sources:
<a href="https://github.com/CSSEGISandData/COVID-19">JHU CSSE</a>, <a href="https://github.com/CSSEGISandData/COVID-19">JHU CSSE</a>,
<a href="https://data.worldbank.org/indicator/SP.POP.TOTL">World Bank</a>, <a href="https://data.worldbank.org/indicator/SP.POP.TOTL">World Bank</a>,
<a href="https://worldmap.harvard.edu/data/geonode:country_centroids_az8">Harvard Worldmap</a> <a href="https://worldmap.harvard.edu/data/geonode:country_centroids_az8">Harvard Worldmap</a>
</p>''')) </p>'''))
``` ```
   
%% Output %% Output
   
   
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment