Skip to content
Snippets Groups Projects
Commit 67cbdeaf authored by CI-bot's avatar CI-bot Committed by renku 0.10.4
Browse files

renku update --with-siblings

parent 77f2290c
No related branches found
No related tags found
1 merge request!380Automatic update - auto-update_2020-11-23_08-07
Pipeline #112331 passed
class: Workflow
cwlVersion: v1.0
hints: []
inputs:
input_1:
default:
class: File
path: ../../notebooks/covidtracking.ipynb
streamable: false
type: File
input_10:
default:
class: Directory
listing: []
path: ../../data/covid-19-ecdc
streamable: false
type: Directory
input_11:
default: runs/dataset_summary.run.ipynb
streamable: false
type: string
input_12:
default: atlas_path
streamable: false
type: string
input_13:
default:
class: Directory
listing: []
path: ../../data/atlas
streamable: false
type: Directory
input_14:
default: us_path
streamable: false
type: string
input_15:
default:
class: Directory
listing: []
path: ../../data/covid-19-us-nyt
streamable: false
type: Directory
input_16:
default: italy_path
streamable: false
type: string
input_17:
default:
class: File
path: ../../data/covid-19-italy/dpc-covid19-ita-regioni.csv
streamable: false
type: File
input_18:
default: spain_path
streamable: false
type: string
input_19:
default:
class: File
path: ../../notebooks/Dashboard.ipynb
streamable: false
type: File
input_2:
default: runs/covidtracking.run.ipynb
streamable: false
type: string
input_20:
default: runs/Dashboard.run.ipynb
streamable: false
type: string
input_21:
default: ts_folder
streamable: false
type: string
input_22:
default:
class: Directory
listing: []
path: ../../data/covid-19_jhu-csse
streamable: false
type: Directory
input_23:
default: rates_folder
streamable: false
type: string
input_24:
default:
class: Directory
listing: []
path: ../../data/covid-19_rates
streamable: false
type: Directory
input_25:
default: geodata_path
streamable: false
type: string
input_26:
default:
class: File
path: ../../data/geodata/geo_data.csv
streamable: false
type: File
input_27:
default: atlas_path
streamable: false
type: string
input_28:
default:
class: Directory
listing: []
path: ../../data/atlas
streamable: false
type: Directory
input_3:
default: data_path
streamable: false
type: string
input_4:
default:
class: Directory
listing: []
path: ../../data/covidtracking
streamable: false
type: Directory
input_5:
default: atlas_path
streamable: false
type: string
input_6:
default:
class: Directory
listing: []
path: ../../data/atlas
streamable: false
type: Directory
input_7:
default:
class: File
path: ../../notebooks/datasets_summary.ipynb
streamable: false
type: File
input_8:
default:
class: Directory
listing: []
path: ../../data/covid-19-spain
streamable: false
type: Directory
input_9:
default: ecdc_path
streamable: false
type: string
outputs:
output_0:
outputSource: step_1/output_0
streamable: false
type: File
output_1:
outputSource: step_3/output_0
streamable: false
type: File
output_2:
outputSource: step_2/output_0
streamable: false
type: File
requirements: []
steps:
step_1:
in:
input_1: input_1
input_2: input_2
input_3: input_3
input_4: input_4
input_5: input_5
input_6: input_6
out:
- output_0
run: 2fba4568d8784fb99872b6c8a35f66b9_papermill.cwl
step_2:
in:
input_1: input_7
input_10: input_8
input_11: input_9
input_12: input_10
input_2: input_11
input_3: input_12
input_4: input_13
input_5: input_14
input_6: input_15
input_7: input_16
input_8: input_17
input_9: input_18
out:
- output_0
run: e0c6511bd8234efe8a19405f1145e990_papermill.cwl
step_3:
in:
input_10: input_19
input_11: input_20
input_2: input_21
input_3: input_22
input_4: input_23
input_5: input_24
input_6: input_25
input_7: input_26
input_8: input_27
input_9: input_28
out:
- output_0
run: edeae7a3f9bd41579941a5f9b0eaf2aa_papermill.cwl
This diff is collapsed.
%% Cell type:code id: tags:
 
``` python
from pathlib import Path
 
import pandas as pd
import altair as alt
from IPython.display import display, HTML
 
from covid_19_utils.converters import CaseConverter
```
 
%% Cell type:code id: tags:
 
``` python
html_credits=HTML('''
<p style="font-size: smaller">Data Sources:
<a href="https://covidtracking.com">The COVID Tracking Project</a>
<br>
Analysis and Visualization:
<a href="https://renkulab.io/projects/covid-19/covid-19-public-data">Covid-19 Public Data Collaboration Project</a>
</p>''')
```
 
%% Cell type:code id: tags:parameters
 
``` python
data_path = '../data/covidtracking'
atlas_path = '../data/atlas'
```
 
%% Cell type:code id: tags:injected-parameters
 
``` python
# Parameters
data_path = "/tmp/hc8z45_m/data/covidtracking"
atlas_path = "/tmp/hc8z45_m/data/atlas"
data_path = "/tmp/4vzuj7ki/data/covidtracking"
atlas_path = "/tmp/4vzuj7ki/data/atlas"
```
 
%% Cell type:code id: tags:
 
``` python
# read in the data
converter = CaseConverter(atlas_path)
data_df = converter.read_convert(data_path)
 
# referring to "state" will make more sense in this notebook
data_df = data_df.rename(columns={"region_label": "state"})
```
 
%% Cell type:code id: tags:
 
``` python
# Compute daily differences
tdf = data_df.sort_values(['state', 'date'], ascending=[True, False]).set_index(['state', 'date'])
diffs_df = tdf[['positive', 'deceased', 'positive_100k', 'deceased_100k']].groupby(level='state').diff(periods=-1).dropna(how='all')
tdf_diff=tdf.join(diffs_df, rsuffix='_diff').reset_index()
 
# "Normalizing" the total tests
tdf_diff['total_10'] = tdf_diff['tested']/10.
 
# Daily totals
daily_totals = tdf_diff.groupby('date').sum()
daily_totals.reset_index(level=0, inplace=True)
 
# National daily totals
nation_df = data_df.groupby('date').sum()
nation_df['state']='All US'
nation_df = nation_df.reset_index()
```
 
%% Cell type:markdown id: tags:
 
# Covid-19 Cases in U.S.
 
The case data from the U.S. is obtained from https://covidtracking.com, a public crowd-sourced covid-19 dataset.
 
%% Cell type:markdown id: tags:
 
### Growth trends
 
%% Cell type:code id: tags:
 
``` python
# make dataframe for text labels on chart - hand edit these label locations
textLabels_df = pd.DataFrame(
[[10,6000,'doubles every day'],
[36,50000,'doubles every 3 days'],
[34,100, 'doubles every week']],
columns =['labelX', 'labelY','labelText']
)
 
startCase = 2000
 
# make dataframe of states with points >=10 deceaseds
deceased10_df = data_df.loc[data_df['deceased']>=startCase]
 
# group deceased10 dataframe by state and then increasing order of date
deceased10_df = deceased10_df.sort_values(by=['state','date'])
 
# add US to that dataframe
nationdeceased10_df = nation_df.loc[nation_df['deceased']>=startCase]
deceased10_df= pd.concat ([deceased10_df,nationdeceased10_df])
 
deceased10_df = deceased10_df.reset_index()
 
# make a list of the states with 10 or more deceaseds
state_list = list(set(deceased10_df['state']))
 
# add a column for the number of days since the 10th deceased for each state
for state, df in deceased10_df.groupby('state'):
deceased10_df.loc[df.index,'sinceDay0'] = range(0, len(df))
deceased10_df = deceased10_df.astype({'sinceDay0': 'int32'})
 
#Now create plotlines for each state since 10 deceaseds
lineChart = alt.Chart(deceased10_df,title=f'US States: Cumulative Deaths Since {startCase}th Death').mark_line(interpolate='basis').encode(
alt.X('sinceDay0:Q', axis=alt.Axis(title=f'Days Since {startCase}th Death')),
alt.Y('deceased:Q',
axis = alt.Axis(title='Cumulative Deaths'),
scale=alt.Scale(type='log')),
tooltip=['state', 'sinceDay0', 'deceased', 'positive'],
color = 'state'
).properties(width=800,height=400)
 
## Create a layer with the lines for doubling every day and doubling every week
 
# Compute theoretical trends of doubling every day, 3 days, week
days = {'day':[1,2,3,4,5,10,15,20, max(deceased10_df.sinceDay0)+5]}
logRuleDay_df = pd.DataFrame(days, columns=['day'])
logRuleDay_df['case']= startCase * pow(2,logRuleDay_df['day'])
logRuleDay_df['doubling period']='every day'
 
logRule3Days_df = pd.DataFrame(days, columns=['day'])
logRule3Days_df['case']= startCase * pow(2,(logRule3Days_df['day'])/3)
logRule3Days_df['doubling period']='three days'
 
logRuleWeek_df = pd.DataFrame(days, columns=['day'])
logRuleWeek_df['case']= startCase * pow(2,(logRuleWeek_df['day'])/7)
logRuleWeek_df['doubling period']='every week'
 
logRules_df = pd.concat([logRuleDay_df, logRule3Days_df, logRuleWeek_df])
logRules_df = logRules_df.reset_index()
 
 
ruleChart = alt.Chart(logRules_df).mark_line(opacity=0.2,clip=True).encode(
alt.X('day:Q',
scale=alt.Scale(domain=[1,max(deceased10_df.sinceDay0)+5])),
alt.Y('case', scale=alt.Scale(type='log',domain=[startCase,150000]),
),
color = 'doubling period',
tooltip = ['doubling period'])
 
# create a layer for the state labels
# 1) make dataframe with each state's max days
# 2) make a chart layer with text of state name to right of each state's rightmost point
stateLabels_df = deceased10_df[deceased10_df['sinceDay0'] == deceased10_df.groupby(['state'])['sinceDay0'].transform(max)]
labelChart = alt.Chart(stateLabels_df).mark_text(align='left', baseline='middle', dx=10).encode(
x='sinceDay0',
y='deceased',
text='state',
color='state')
 
#now put the text labels layer on top of state labels Chart
labelChart = labelChart + alt.Chart(textLabels_df).mark_text(align='right', baseline='bottom', dx=0, size=18,opacity=0.5).encode(
x='labelX',
y='labelY',
text='labelText')
 
 
## Create some tooltip behavior - show Y values on mouseover
# Step 1: Selection that chooses nearest point based on value on x-axis
nearest = alt.selection(type='single', nearest=True, on='mouseover',
fields=['sinceDay0'])
 
# Step 2: Transparent selectors across the chart. This is what tells us
# the x-value of the cursor
selectors = alt.Chart().mark_point().encode(
x="sinceDay0:Q",
opacity=alt.value(0),
).add_selection(
nearest
)
 
# Step 3: Add text, show values in column when it's the nearest point to
# mouseover, else show blank
text = lineChart.mark_text(align='center', dx=3, dy=-20).encode(
text=alt.condition(nearest, 'deceased', alt.value(' '))
)
 
 
#Finally, lets show the chart!
 
chart = alt.layer(lineChart, selectors, text, data=deceased10_df)
 
display(chart)
display(html_credits)
```
 
%% Output
 
 
 
%% Cell type:code id: tags:
 
``` python
# make dataframe for text labels on chart - hand edit these label locations
textLabels_df = pd.DataFrame(
[[9,30000,'doubles every day'],
[28,31000,'doubles every 3 days'],
[32,1000, 'doubles every week']],
columns =['labelX', 'labelY','labelText']
)
 
startCase = 100000
 
# make dataframe with only points >=100 positives
positive100_df = data_df.loc[data_df['positive']>=startCase]
 
## add US to that dataframe
nationpos100_df = nation_df.loc[nation_df['positive']>=startCase]
positive100_df= pd.concat ([positive100_df,nationpos100_df])
 
# group positive100 dataframe by state and then increasing order of date
positive100_df = positive100_df.sort_values(by=['state','date'])
positive100_df = positive100_df.reset_index()
 
# make a list of the states with 10 or more deaths (don't really need this)
# state_list = list(set(positive100_df['state']))
 
# add a column for the number of days since the 100th case for each state
for state, df in positive100_df.groupby('state'):
positive100_df.loc[df.index,'sinceDay0'] = range(0, len(df))
positive100_df = positive100_df.astype({'sinceDay0': 'int32'})
 
 
# Now create plotlines for each state since 10 deaths
lineChart = alt.Chart(positive100_df, title=f"US States: total cases since {startCase}th case").mark_line(interpolate='basis').encode(
alt.X('sinceDay0:Q', axis=alt.Axis(title=f'Days since {startCase}th case')),
alt.Y('positive:Q',
axis = alt.Axis(title='Cumulative positive cases'),
scale=alt.Scale(type='log')),
tooltip=['state', 'sinceDay0', 'deceased', 'positive'],
color = 'state'
).properties(width=800,height=400)
 
## Create a layer with the lines for doubling every day and doubling every week
# make dataframe with lines to indicate doubling every day, 3 days, week
 
days = {'day':[1,2,3,4,5,10,15,20, max(positive100_df.sinceDay0)+5]}
 
logRuleDay_df = pd.DataFrame (days, columns=['day'])
logRuleDay_df['case']= startCase * pow(2,logRuleDay_df['day'])
logRuleDay_df['doubling period']='every day'
 
logRule3Days_df = pd.DataFrame (days, columns=['day'])
logRule3Days_df['case']= startCase * pow(2,(logRule3Days_df['day'])/3)
logRule3Days_df['doubling period']='three days'
 
logRuleWeek_df = pd.DataFrame (days, columns=['day'])
logRuleWeek_df['case']= startCase * pow(2,(logRuleWeek_df['day'])/7)
logRuleWeek_df['doubling period']='every week'
 
logRules_df = pd.concat([logRuleDay_df, logRule3Days_df, logRuleWeek_df])
logRules_df = logRules_df.reset_index()
 
ruleChart = alt.Chart(logRules_df).mark_line(opacity=0.2,clip=True).encode(
alt.X('day:Q',
scale=alt.Scale(domain=[1, max(positive100_df.sinceDay0)+5])),
alt.Y('case', scale=alt.Scale(domain=[startCase,2000000], type='log'),
),
color = 'doubling period')
 
# create a layer for the state labels
# 1) make dataframe with each state's max days
# 2) make a chart layer with text of state name to right of each state's rightmost point
stateLabels_df = positive100_df[positive100_df['sinceDay0'] == positive100_df.groupby(['state'])['sinceDay0'].transform(max)]
labelChart = alt.Chart(stateLabels_df).mark_text(align='left', baseline='middle', dx=10).encode(
x='sinceDay0',
y='positive',
text='state',
color='state')
 
#now put the text labels layer on top of state labels Chart
labelChart = labelChart + alt.Chart(textLabels_df).mark_text(align='right', baseline='bottom', dx=0, size=18,opacity=0.5).encode(
x='labelX',
y='labelY',
text='labelText')
 
#Create some tooltip behavior
# Step 1: Selection that chooses nearest point based on value on x-axis
nearest = alt.selection(type='single', nearest=True, on='mouseover',
fields=['sinceDay0'])
 
# Step 2: Transparent selectors across the chart. This is what tells us
# the x-value of the cursor
selectors = alt.Chart().mark_point().encode(
x="sinceDay0:Q",
opacity=alt.value(0),
).add_selection(
nearest
)
 
# Step 3: Add text, show values in Sex column when it's the nearest point to
# mouseover, else show blank
text = lineChart.mark_text(align='center', dx=3, dy=-20).encode(
text=alt.condition(nearest, 'positive', alt.value(' '))
)
 
 
#Finally, lets show the chart!
 
chart = alt.layer(lineChart, selectors, text, data=positive100_df)
#chart = alt.layer(lineChart, ruleChart, labelChart)
chart.properties (width=400,height=800)
display(chart)
display(html_credits)
```
 
%% Output
 
 
 
%% Cell type:markdown id: tags:
 
### Daily Cumulative Totals
 
Cumulative reported totals of positive cases and deaths.
 
%% Cell type:code id: tags:
 
``` python
base = alt.Chart(
daily_totals
).mark_bar(size=2).encode(
alt.X('date', axis=alt.Axis(title='')
)
).properties(
height=200,
width=400
)
 
cumulative = base.encode(alt.Y('positive', title = 'Cumulative cases'))
cumulative_deaths = base.encode(alt.Y('deceased', title = 'Cumulative deaths'))
rates = base.encode(alt.Y('positive_diff', title='Daily cases'))
rates_deaths = base.encode(alt.Y('deceased_diff', title='Daily deaths'))
chart = alt.vconcat(
cumulative | rates, cumulative_deaths | rates_deaths,
title='Cumulative Covid-19 cases and deaths in the U.S.'
).configure_title(
anchor='middle'
)
display(chart)
display(html_credits)
```
 
%% Output
 
 
 
%% Cell type:markdown id: tags:
 
### Total tests and positives per 100k population
 
%% Cell type:code id: tags:
 
``` python
most_recent_test_date = data_df['date'].max()
most_recent_df = data_df[data_df['date'] == most_recent_test_date]
print("Most recent test date", most_recent_test_date)
print(len(most_recent_df), "states/territories have data on this date.")
```
 
%% Output
 
Most recent test date 2020-11-22 00:00:00
50 states/territories have data on this date.
 
%% Cell type:code id: tags:
 
``` python
viz_df = most_recent_df.sort_values('tested_100k', ascending=False)
chart = alt.Chart(viz_df, title="Cases (orange points) and tests(blue bars) per 100k").encode(alt.X('state', sort=None))
tests = chart.mark_bar().encode(alt.Y('tested_100k', axis=alt.Axis(title='COVID-19 Tests/100k, Positive Cases/100k')))
positives = chart.mark_point(color='orange', filled=True, size=100, opacity=1).encode(alt.Y('positive_100k'))
display(alt.layer(tests, positives))
display(html_credits)
```
 
%% Output
 
 
 
%% Cell type:markdown id: tags:
 
## Counts and rates by state
 
Taking a look at the three states with the highest per-capita incidence of covid-19. The red and yellow curves represent the total tests and total positive tests respectively.
 
%% Cell type:code id: tags:
 
``` python
# produce the charts for a few states
 
charts=[]
for state in most_recent_df.sort_values('tested_100k', ascending=False)['state'].to_list()[:3]:
state_df = tdf_diff[tdf_diff['state'] == state].copy()
 
base = alt.Chart(state_df, title=state).encode(alt.X('date', axis=alt.Axis(title='Date'))).properties(width=250, height=150)
dailies = base.mark_bar(size=6).encode(alt.Y('positive_diff', axis=alt.Axis(title='Daily positive')))
 
totals = base.mark_line(color='red').encode(alt.Y('total_10', axis=alt.Axis(title='Total/10')))
positives = totals.mark_line(color='orange').encode(alt.Y('positive', axis=alt.Axis(title='Positive')))
cumulative = totals + positives
 
ratio = base.mark_line(color='red').encode(alt.Y('ratio', axis=alt.Axis(title='Positive/Total'), scale=alt.Scale(domain=(0,1))))
 
charts.append(alt.layer(dailies, cumulative).resolve_scale(y='independent'))
 
display(alt.hconcat(*charts))
display(html_credits)
```
 
%% Output
 
 
 
%% Cell type:code id: tags:
 
``` python
```
This diff is collapsed.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment