Skip to content
Snippets Groups Projects
Commit bda5230b authored by CR (covid cron)'s avatar CR (covid cron) Committed by renku 0.10.0
Browse files

renku rerun data/covidtracking/states-metadata.json data/covidtracking/states-daily.json

parent 3b8d2763
No related branches found
No related tags found
No related merge requests found
class: Workflow
cwlVersion: v1.0
hints: []
inputs:
input_1:
default: states-daily.json
streamable: false
type: string
input_2:
default: states-metadata.json
streamable: false
type: string
input_3:
default: out_folder
streamable: false
type: string
input_4:
default: data/covidtracking
streamable: false
type: string
input_5:
default:
class: File
path: ../../notebooks/process/download-covidtracking-data.ipynb
streamable: false
type: File
input_6:
default: runs/download-covidtracking-data.runs.ipynb
streamable: false
type: string
outputs:
output_0:
outputSource: step_3/output_0
streamable: false
type: File
output_2:
outputSource: step_3/output_1
streamable: false
type: Directory
requirements: []
steps:
step_1:
in:
filename: input_1
input_directory: step_3/output_1
out:
- output_file
run:
arguments: []
baseCommand:
- 'true'
class: CommandLineTool
cwlVersion: v1.0
hints: []
inputs:
filename:
default: states-daily.json
streamable: false
type: string
input_directory:
streamable: false
type: Directory
outputs:
output_file:
outputBinding:
glob: $(inputs.filename)
streamable: false
type: File
permanentFailCodes: []
requirements:
- &id001
class: InlineJavascriptRequirement
- &id002
class: InitialWorkDirRequirement
listing: $(inputs.input_directory.listing)
successCodes: []
temporaryFailCodes: []
step_2:
in:
filename: input_2
input_directory: step_3/output_1
out:
- output_file
run:
arguments: []
baseCommand:
- 'true'
class: CommandLineTool
cwlVersion: v1.0
hints: []
inputs:
filename:
default: states-metadata.json
streamable: false
type: string
input_directory:
streamable: false
type: Directory
outputs:
output_file:
outputBinding:
glob: $(inputs.filename)
streamable: false
type: File
permanentFailCodes: []
requirements:
- *id001
- *id002
successCodes: []
temporaryFailCodes: []
step_3:
in:
input_1: input_3
input_2: input_4
input_3: input_5
input_4: input_6
out:
- output_1
- output_0
run: a17d560c41a54f5aa307ce5f3c5effe5_papermill.cwl
source diff could not be displayed: it is stored in LFS. Options to address this: view the blob.
source diff could not be displayed: it is stored in LFS. Options to address this: view the blob.
%% Cell type:code id: tags:
``` python
import requests
import os
import pandas as pd
```
%% Cell type:code id: tags:parameters
``` python
out_folder = "../data/covidtracking/"
PAPERMILL_OUTPUT_PATH = None
```
%% Cell type:code id: tags:injected-parameters
``` python
# Parameters
PAPERMILL_INPUT_PATH = "/tmp/anascddl/notebooks/process/download-covidtracking-data.ipynb"
PAPERMILL_INPUT_PATH = "/tmp/v_x6sgie/notebooks/process/download-covidtracking-data.ipynb"
PAPERMILL_OUTPUT_PATH = "runs/download-covidtracking-data.runs.ipynb"
out_folder = "data/covidtracking"
```
%% Cell type:markdown id: tags:
# Download state metadata
Download a dataset of URLs for data for each US state and several territories. See [Google Doc](https://docs.google.com/spreadsheets/d/18oVRrHj3c183mHmq3m89_163yuYltLNlOmPerQ18E8w/htmlview?sle=true).
%% Cell type:code id: tags:
``` python
url = 'http://covidtracking.com/api/states/info'
r = requests.get(url, allow_redirects=True)
states_metadata_json = r.content
```
%% Cell type:code id: tags:
``` python
# save the result
if PAPERMILL_OUTPUT_PATH:
out_path = os.path.join(out_folder, 'states-metadata.json')
with open(out_path, 'wb') as f:
f.write(states_metadata_json)
```
%% Cell type:code id: tags:
``` python
metadata_df = pd.read_json(states_metadata_json)
print(len(metadata_df), "states and territories have metadata")
metadata_df.head(2)
```
%% Output
56 states and territories have metadata
state covid19SiteOld \
0 AK http://dhss.alaska.gov/dph/Epi/id/Pages/COVID-...
1 AL http://www.alabamapublichealth.gov/infectiousd...
covid19Site \
0 http://dhss.alaska.gov/dph/Epi/id/Pages/COVID-...
1 https://alpublichealth.maps.arcgis.com/apps/op...
covid19SiteSecondary twitter \
0 http://dhss.alaska.gov/dph/Epi/id/Pages/COVID-... @Alaska_DHSS
1 None @alpublichealth
pui pum notes fips \
0 All data False Total tests are taken from the annotations on ... 2
1 No data False Negatives = (Totals - Positives) \nPositives o... 1
name
0 Alaska
1 Alabama
%% Cell type:markdown id: tags:
# Download daily state data
%% Cell type:code id: tags:
``` python
url = 'https://covidtracking.com/api/states/daily'
r = requests.get(url, allow_redirects=True)
states_daily_json = r.content
```
%% Cell type:code id: tags:
``` python
# save the result
if PAPERMILL_OUTPUT_PATH:
out_path = os.path.join(out_folder, 'states-daily.json')
with open(out_path, 'wb') as f:
f.write(states_daily_json)
```
%% Cell type:code id: tags:
``` python
data_df = pd.read_json(states_daily_json)
print(len(data_df), "data points")
data_df.head(2)
```
%% Output
1373 data points
1429 data points
date state positive negative pending hospitalized death total \
0 20200330 AK 114.0 3540.0 NaN 7.0 3.0 3654
1 20200330 AL 859.0 5694.0 NaN NaN 6.0 6553
0 20200331 AK 119.0 3594.0 NaN 7.0 3.0 3713
1 20200331 AL 981.0 6298.0 NaN NaN 13.0 7279
hash dateChecked \
0 01a1c96fd2ed214d8747ab778c2fec7203c8cd2f 2020-03-30T20:00:00Z
1 1ced1dbd9879f8bbc4b1f7b7876b82611895d58e 2020-03-30T20:00:00Z
0 5339d73f78797a2174fe32109c081c80915f8378 2020-03-31T20:00:00Z
1 06fc31f03af09c0f09f4c5bc18fac9daa63e94f8 2020-03-31T20:00:00Z
totalTestResults fips deathIncrease hospitalizedIncrease \
0 3654 2 1.0 1.0
1 6553 1 2.0 0.0
0 3713 2 0.0 0.0
1 7279 1 7.0 0.0
negativeIncrease positiveIncrease totalTestResultsIncrease
0 308.0 12.0 320.0
1 1510.0 53.0 1563.0
0 54.0 5.0 59.0
1 604.0 122.0 726.0
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment