Skip to content
Snippets Groups Projects
Commit ccc37edf authored by Chandrasekhar Ramakrishnan's avatar Chandrasekhar Ramakrishnan Committed by renku 0.9.1
Browse files

renku run papermill -p out_folder ./data/covidtracking/ --inject-paths...

renku run papermill -p out_folder ./data/covidtracking/ --inject-paths notebooks/process/download-covidtracking-data.ipynb runs/download-covidtracking-data.runs.ipynb
parent 56912200
No related branches found
No related tags found
No related merge requests found
arguments: []
baseCommand:
- papermill
class: CommandLineTool
cwlVersion: v1.0
hints: []
inputs:
input_1:
default: out_folder
inputBinding:
position: 1
prefix: -p
separate: true
shellQuote: true
streamable: false
type: string
input_2:
default: data/covidtracking
inputBinding:
position: 2
separate: true
shellQuote: true
streamable: false
type: string
input_3:
default:
class: File
path: ../../notebooks/process/download-covidtracking-data.ipynb
inputBinding:
position: 3
prefix: --inject-paths
separate: true
shellQuote: true
streamable: false
type: File
input_4:
default: runs/download-covidtracking-data.runs.ipynb
inputBinding:
position: 4
separate: true
shellQuote: true
streamable: false
type: string
outputs:
output_0:
outputBinding:
glob: $(inputs.input_4)
streamable: false
type: File
output_1:
outputBinding:
glob: $(inputs.input_2)
streamable: false
type: Directory
permanentFailCodes: []
requirements:
- class: InlineJavascriptRequirement
- class: InitialWorkDirRequirement
listing:
- entry: '$({"listing": [], "class": "Directory"})'
entryname: runs
writable: true
- entry: '$({"listing": [], "class": "Directory"})'
entryname: data/covidtracking
writable: true
- entry: $(inputs.input_3)
entryname: notebooks/process/download-covidtracking-data.ipynb
writable: false
successCodes: []
temporaryFailCodes: []
source diff could not be displayed: it is stored in LFS. Options to address this: view the blob.
source diff could not be displayed: it is stored in LFS. Options to address this: view the blob.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import requests import requests
import os import os
import pandas as pd import pandas as pd
``` ```
%% Cell type:code id: tags:parameters %% Cell type:code id: tags:parameters
``` python ``` python
out_folder = "../data/covidtracking/" out_folder = "../data/covidtracking/"
PAPERMILL_OUTPUT_PATH = None PAPERMILL_OUTPUT_PATH = None
``` ```
%% Cell type:code id: tags:injected-parameters %% Cell type:code id: tags:injected-parameters
``` python ``` python
# Parameters # Parameters
PAPERMILL_INPUT_PATH = "/tmp/e18fw9c9/notebooks/process/download-covidtracking-data.ipynb" PAPERMILL_INPUT_PATH = "notebooks/process/download-covidtracking-data.ipynb"
PAPERMILL_OUTPUT_PATH = "runs/download-covidtracking-data.runs.ipynb" PAPERMILL_OUTPUT_PATH = "runs/download-covidtracking-data.runs.ipynb"
out_folder = "data/covidtracking" out_folder = "./data/covidtracking/"
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Download state metadata # Download state metadata
Download a dataset of URLs for data for each US state and several territories. See [Google Doc](https://docs.google.com/spreadsheets/d/18oVRrHj3c183mHmq3m89_163yuYltLNlOmPerQ18E8w/htmlview?sle=true). Download a dataset of URLs for data for each US state and several territories. See [Google Doc](https://docs.google.com/spreadsheets/d/18oVRrHj3c183mHmq3m89_163yuYltLNlOmPerQ18E8w/htmlview?sle=true).
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
url = 'http://covidtracking.com/api/states/info' url = 'http://covidtracking.com/api/states/info'
r = requests.get(url, allow_redirects=True) r = requests.get(url, allow_redirects=True)
states_metadata_json = r.content states_metadata_json = r.content
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# save the result # save the result
if PAPERMILL_OUTPUT_PATH: if PAPERMILL_OUTPUT_PATH:
out_path = os.path.join(out_folder, 'states-metadata.json') out_path = os.path.join(out_folder, 'states-metadata.json')
with open(out_path, 'wb') as f: with open(out_path, 'wb') as f:
f.write(states_metadata_json) f.write(states_metadata_json)
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
metadata_df = pd.read_json(states_metadata_json) metadata_df = pd.read_json(states_metadata_json)
print(len(metadata_df), "states and territories have metadata") print(len(metadata_df), "states and territories have metadata")
metadata_df.head(2) metadata_df.head(2)
``` ```
%% Output %% Output
56 states and territories have metadata 56 states and territories have metadata
state covid19SiteOld \ state covid19SiteOld \
0 AK http://dhss.alaska.gov/dph/Epi/id/Pages/COVID-... 0 AK http://dhss.alaska.gov/dph/Epi/id/Pages/COVID-...
1 AL http://www.alabamapublichealth.gov/infectiousd... 1 AL http://www.alabamapublichealth.gov/infectiousd...
covid19Site \ covid19Site \
0 http://dhss.alaska.gov/dph/Epi/id/Pages/COVID-... 0 http://dhss.alaska.gov/dph/Epi/id/Pages/COVID-...
1 https://alpublichealth.maps.arcgis.com/apps/op... 1 https://alpublichealth.maps.arcgis.com/apps/op...
covid19SiteSecondary twitter \ covid19SiteSecondary twitter \
0 http://dhss.alaska.gov/dph/Epi/id/Pages/COVID-... @Alaska_DHSS 0 http://dhss.alaska.gov/dph/Epi/id/Pages/COVID-... @Alaska_DHSS
1 None @alpublichealth 1 None @alpublichealth
pui pum notes name pui pum notes fips \
0 All data False We count the reported number as "persons teste... Alaska 0 All data False We count the reported number as "persons teste... 2
1 No data False Last update time taken from [main page](http:/... Alabama 1 No data False Last update time taken from [main page](http:/... 1
name
0 Alaska
1 Alabama
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Download daily state data # Download daily state data
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
url = 'https://covidtracking.com/api/states/daily' url = 'https://covidtracking.com/api/states/daily'
r = requests.get(url, allow_redirects=True) r = requests.get(url, allow_redirects=True)
states_daily_json = r.content states_daily_json = r.content
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# save the result # save the result
if PAPERMILL_OUTPUT_PATH: if PAPERMILL_OUTPUT_PATH:
out_path = os.path.join(out_folder, 'states-daily.json') out_path = os.path.join(out_folder, 'states-daily.json')
with open(out_path, 'wb') as f: with open(out_path, 'wb') as f:
f.write(states_daily_json) f.write(states_daily_json)
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
data_df = pd.read_json(states_daily_json) data_df = pd.read_json(states_daily_json)
print(len(data_df), "data points") print(len(data_df), "data points")
data_df.head(2) data_df.head(2)
``` ```
%% Output %% Output
1037 data points 1093 data points
date state positive negative pending hospitalized death total \ date state positive negative pending hospitalized death total \
0 20200324 AK 36.0 986.0 NaN 0.0 NaN 1022 0 20200325 AK 42.0 1649.0 NaN 1.0 1.0 1691
1 20200324 AL 215.0 2106.0 NaN NaN 0.0 2321 1 20200325 AL 283.0 2529.0 NaN NaN 0.0 2812
dateChecked totalTestResults deathIncrease \
0 2020-03-25T20:00:00Z 1691 1.0
1 2020-03-25T20:00:00Z 2812 0.0
hospitalizedIncrease negativeIncrease positiveIncrease \
0 1.0 663.0 6.0
1 0.0 423.0 68.0
dateChecked totalTestResultsIncrease
0 2020-03-24T20:00:00Z 0 669.0
1 2020-03-24T20:00:00Z 1 491.0
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment