Skip to content
Snippets Groups Projects
Commit 8f1b44e8 authored by Chandrasekhar Ramakrishnan's avatar Chandrasekhar Ramakrishnan
Browse files

feat: display number of tests per 100k pop in US states

parent 003624d1
Branches
No related tags found
1 merge request!25us_per_100k
Pipeline #17678 passed
%% Cell type:code id: tags:
``` python
import pandas as pd
import altair as alt
import matplotlib
```
%% Output
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-60-ffb398b1b277> in <module>
1 import pandas as pd
2 import altair as alt
----> 3 import matplotlib
ModuleNotFoundError: No module named 'matplotlib'
%% Cell type:markdown id: tags:
# Look at the metadata
%% Cell type:code id: tags:
``` python
metadata_df = pd.read_json('../../data/covidtracking/states-metadata.json')
```
%% Cell type:code id: tags:
``` python
metadata_df.head()
```
%% Output
state covid19SiteOld \
0 AK http://dhss.alaska.gov/dph/Epi/id/Pages/COVID-...
1 AL http://www.alabamapublichealth.gov/infectiousd...
2 AR https://www.healthy.arkansas.gov/programs-serv...
3 AZ https://www.azdhs.gov/preparedness/epidemiolog...
4 CA https://www.cdph.ca.gov/Programs/CID/DCDC/Page...
covid19Site covid19SiteSecondary \
0 http://dhss.alaska.gov/dph/Epi/id/Pages/COVID-... NaN
1 https://alpublichealth.maps.arcgis.com/apps/op... NaN
2 https://www.healthy.arkansas.gov/programs-serv... NaN
3 https://www.azdhs.gov/preparedness/epidemiolog... NaN
4 https://www.latimes.com/projects/california-co... NaN
twitter pui pum \
0 @Alaska_DHSS All data False
1 @alpublichealth No data False
2 @adhpio All data True
3 @azdhs All data False
4 @CAPublicHealth Only positives False
notes name
0 Unclear if their reported number means "person... Alaska
1 Last negative count from 3/16. Alabama
2 Pending = "PUIs" Arkansas
3 Negative = “Ruled Out”. Our total is slightly ... Arizona
4 Only positives reported regularly. Add deaths ... California
%% Cell type:markdown id: tags:
# Look at the data
%% Cell type:code id: tags:
``` python
data_df = pd.read_json('../../data/covidtracking/states-daily.json')
data_df['date'] = pd.to_datetime(data_df['date'], format="%Y%m%d")
```
%% Cell type:code id: tags:
``` python
data_df.head()
```
%% Output
date state positive negative pending death total \
0 2020-03-19 AK 6 400.0 NaN NaN 406
1 2020-03-19 AL 68 28.0 NaN 0.0 96
2 2020-03-19 AR 46 310.0 113.0 NaN 469
3 2020-03-19 AS 0 NaN NaN 0.0 0
4 2020-03-19 AZ 44 175.0 130.0 0.0 349
dateChecked
0 2020-03-19T20:00:00Z
1 2020-03-19T20:00:00Z
2 2020-03-19T20:00:00Z
3 2020-03-19T20:00:00Z
4 2020-03-19T20:00:00Z
%% Cell type:markdown id: tags:
### Daily counts and totals
%% Cell type:code id: tags:
``` python
# compute daily differences
tdf = data_df.sort_values(['state', 'date'], ascending=[True, False]).set_index(['state', 'date'])
diffs_df = tdf[['positive', 'negative', 'death']].groupby(level='state').diff(periods=-1).dropna(how='all')
tdf_diff=tdf.join(diffs_df, rsuffix='_diff').reset_index()
```
%% Cell type:code id: tags:
``` python
# "Normalizing" the totals
tdf_diff['total_10'] = tdf_diff['total']/10.
```
%% Cell type:code id: tags:
``` python
# produce the charts for a few states
charts=[]
for state in ['WA', 'CA', 'NY']:
state_df = tdf_diff[tdf_diff['state'] == state]
state_df['daily_positive'] = state_df['positive'][::-1].diff()
state_df = tdf_diff[tdf_diff['state'] == state].copy()
state_df['total_10'] = state_df['total']/10.
state_df.loc[:,'daily_positive'] = state_df['positive'][::-1].diff()
state_df.loc[:,'total_10'] = state_df['total']/10.
base = alt.Chart(state_df, title=state).encode(alt.X('date', axis=alt.Axis(title='Date'))).properties(width=250, height=150)
dailies = base.mark_bar(size=10).encode(alt.Y('daily_positive', axis=alt.Axis(title='Daily positive')))
totals = base.mark_line(color='red').encode(alt.Y('total_10', axis=alt.Axis(title='Total/10')))
positives = totals.mark_line(color='orange').encode(alt.Y('positive', axis=alt.Axis(title='Positive')))
cumulative = totals + positives
charts.append(alt.layer(dailies, cumulative).resolve_scale(y='independent'))
alt.hconcat(*charts)
```
%% Output
/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:7: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
import sys
/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:9: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
if __name__ == '__main__':
alt.HConcatChart(...)
%% Cell type:markdown id: tags:
### Counts per 100k
%% Cell type:code id: tags:
``` python
pop_df = pd.read_csv('../../data/geodata/us_pop_fung_2019.csv').set_index('ST')
```
%% Cell type:code id: tags:
``` python
most_recent_test_date = data_df['date'].max()
most_recent_df = data_df[data_df['date'] == most_recent_test_date].set_index('state')
print("Most recent test date", most_recent_test_date)
print(len(most_recent_df), "states/territories have data on this date.")
```
%% Output
Most recent test date 2020-03-19 00:00:00
56 states/territories have data on this date.
%% Cell type:code id: tags:
``` python
most_recent_df['total/100k'] = (most_recent_df['total'] / pop_df['Population']) * 100000
most_recent_df = most_recent_df.reset_index()
```
%% Cell type:code id: tags:
``` python
most_recent_df.head()
```
%% Output
state date positive negative pending death total \
0 AK 2020-03-19 6 400.0 NaN NaN 406
1 AL 2020-03-19 68 28.0 NaN 0.0 96
2 AR 2020-03-19 46 310.0 113.0 NaN 469
3 AS 2020-03-19 0 NaN NaN 0.0 0
4 AZ 2020-03-19 44 175.0 130.0 0.0 349
dateChecked total/100k
0 2020-03-19T20:00:00Z 55.498978
1 2020-03-19T20:00:00Z 1.957911
2 2020-03-19T20:00:00Z 15.540994
3 2020-03-19T20:00:00Z NaN
4 2020-03-19T20:00:00Z 4.794801
%% Cell type:code id: tags:
``` python
chart = alt.Chart(most_recent_df.sort_values('total/100k'), title="Tests per 100k")
chart.mark_bar().encode(alt.X('state', sort='y'), alt.Y('total/100k'))
```
%% Output
alt.Chart(...)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment