Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • aaron.spring/s2s-ai-challenge-template
  • anthonypv_97/s2s-ai-challenge-template
  • anthonypv_97/s2s-ai-challenge-predictia
  • sandeep.sukumaran/s2s-ai-challenge-template
  • lucasdmarten/s2s-ai-challenge-template
  • otaviomf123/s2s-ai-challenge-template
  • utkuunalan/s2s-ai-challenge-template
  • utkuunalan/s2s-ai-challenge-envai
  • glabolhumar/s2s-ai-challenge-enveai
  • glabolhumar/s2s-ai-challenge-envai
  • 1557684138/s2s-ai-challenge-template
  • janer0/s2s-ai-challenge-template
  • luweispark/s2s-ai-challenge-template
  • luweispark/s2s-ai-challenge-tianing
  • 444375642/s2s-ai-challenge-onthego
  • rok.roskar/s2s-ai-challenge-template
  • wanwb1224/s2s-ai-challenge-template
  • 834586573/s2s-ai-challenge-template
  • suryadev/s2s-ai-challenge-template
  • suryadev/s2s-sps
  • rhkdgns322/s2s-ai-challenge-template
  • lorenzo.cavazzi.tech/s2s-ai-challenge-template-test
  • chprsandeep/s2s-ai-challenge-template
  • chprsandeep/s2s-ai-challenge-template-deeplearners
  • chprsandeep/s2s-ai-challenge-deeplearners
  • adam.bienkowski/s2s-ai-challenge-uconn
  • tasko.olevski/s2s-ai-challenge-template-test1
  • 605483660/s2s-ai-challenge-template-1
  • dattranoptimuss/s2s-ai-challenge-template
  • declan.finney/s2s-ai-challenge-template
  • hilla.gerstman/s2s-ai-challenge-template
  • maria.pyrina/s2s-ai-challenge-template
  • weiriche/s2s-ai-challenge-s2s-eth
  • lxywn96/s2s-ai-challenge-template
  • ken.takahashi.guevara/s2s-ai-challenge-senamhi
  • manpreet.phy/s2s-ai-challenge-template
  • rahul.s8396/s2s-ai-challenge-template
  • manmeet.singh/s2s-ai-challenge-template
  • manmeet.singh/s2s-ai-challenge-template-iitm-ut-austin
  • xiangyanfei212/s2s-ai-challenge-template
  • cheikhnoreyni.fall/s2s-ai-challenge-template
  • 1843402075/s2s-ai-challenge-template
  • priyanka.yadav/s2s-ai-challenge-template
  • priyanka.yadav/s2s-ai-challenge-s2s-eth
  • wanedahirou/s2s-ai-challenge-template
  • r08h19/s2s-ai-challenge-template
  • xueqy_666/s2s-ai-challenge-template
  • r08h19/s2s-ai-challenge-pink
  • 1727072371/s2s-ai-challenge-template
  • 1727072371/s2s-ai-challenge-templateandy
  • 1727072371/s2s-ai-challenge-templateandy1
  • jiehongx/s2s-ai-challenge-template
  • kwmski7/s2s-ai-challenge-template
  • lo.riches/s2s-ai-challenge-template
  • thmamouka/s2s-ai-challenge-agroapps
  • vvourlioti/s2s-ai-challenge-agroapps
  • dolkong400/s2s-ai-challenge-template
  • 1843402075/s2s-ai-challenge-123
  • daniel.steinfeld87/s2s-ai-challenge-kit-eth-ubern
  • jehangir_awan/s2s-ai-challenge-template
  • muhammad.haider/s2s-ai-challenge-template
  • rahul.s8396/s2s-ai-challenge-sa-india
  • mudithnirmala/s2s-ai-challenge-template
  • tao.sun/s2s-ai-challenge-template
  • rayjohnbell0/s2s-ai-challenge-template
  • lluis.palma/s2s-ai-challenge-bsc-es
  • daniel.janke/s2s-ai-challenge-daniel-janke
  • daniel.janke/s2s-ai-challenge-danieljanke
  • jordan.gierschendorf/s2s-ai-challenge-template
  • declan.finney/s2s-ai-challenge-swan
  • 1843402075/s2s-ai-challenge-1234
  • yixisi1505/s2s-ai-challenge-ylabaiers
  • hayakawa-gen1010/s2s-ai-challenge-template-ylabaiers
  • adounkpep/s2s-ai-challenge-pyaj
  • molina/s2s-ai-challenge-ncar-team1
  • molina/s2s-ai-challenge-ncar-team2
  • rmcsqrd/s2s-ai-challenge-explore
  • lxywn96/s2s-ai-challenge-template-new
  • lxywn96/s2s-ai-challenge-nuister-f1
  • b1gdaniel/s2s-ai-challenge-nuister-f2
  • xueqy_666/s2s-ai-challenge-xqy
  • xueqy_666/s2s-ai-challenge-nuister-f4
  • 1727072371/s2s-ai-challenge-nuister-f3
  • 1727072371/s2s-ai-challenge-nuister-f5
  • panglin0912/s2s-ai-challenge-nuister-f5
  • 1342071344/s2s-ai-challenge-template
  • 931072922/s2s-ai-challenge-test
  • 931072922/s2s-ai-challenge-test2
  • 931072922/s2s-ai-challenge-piesat
  • jaareval/s2s-ai-challenge-uatest
  • tasko.olevski/s2s-ai-challenge-template-test2
  • medakramzaytar/s2s-ai-challenge-tabola
  • kwibukabertrand/s2s-ai-challenge-template
  • roberto.garcia/s2s-ai-challenge
  • roberto.garcia/s2s-ai-challenge-mnt-cptec-inpe
  • tamer.shoubaki/s2s-ai-challenge-rssai
  • 1342071344/s2s-ai-challenge-teamname
  • 1342071344/s2s-ai-challenge-template0
  • thabangline/s2s-ai-challenge-template
  • 2101110593/s2s-ai-challenge-piesat
  • info2/s2s-ai-challenge-template
  • jordan.gierschendorf1/s2s-ai-challenge-template
  • deepkneko/s2s-ai-challenge-ylabaiers
  • gyin/s2s-ai-challenge-new
  • pmartineau.work/s2s-ai-challenge-template
  • awr/s2s-ai-challenge-template-awr
  • awr/s2s-ai-challenge-temp
  • tasko.olevski/s2s-ai-challenge-template-test3
  • awr/s2s-ai-challenge-template3
  • lluis.palma/s2s-ai-challenge-bsc
  • cheikhnoreyni.fall/s2s-ai-challenge-template-noreyni
  • cheikhnoreyni.fall/s2s-ai-challenge-template-noreynidioum
  • tamerthamoqa/s2s-ai-challenge-template
  • cheikhnoreyni.fall/s2s-ai-challenge-template-noreynilpa
  • damien.specq/s2s-ai-challenge-template
  • kjhall01/s2s-ai-challenge-kjhall01
  • bjoern.mayer92/s2s-ai-challenge-template-zmaw2
  • zhoushanglin100/s2s-ai-challenge-template
  • samguo_321/s2s-ai-challenge-bsc
  • samguo_321/s2s-ai-challenge-guoshan
  • medakramzaytar/s2s-ai-challenge-bb
  • ejiro.chiomaa/s2s-ai-challenge-ejiro
  • mayur/s2s-ai-challenge-template-mayur
  • btickell/s2s-ai-challenge-template-mayur
  • junjie.liu.china/s2s-ai-challenge-template
  • zhanglang/s2s-ai-challenge-template
  • adjebbar83/s2s-ai-challenge-template
  • 1765007740/s2s-ai-challenge-template
128 results
Show changes
%% Cell type:markdown id: tags:
# Create biweekly renku datasets from `climetlab-s2s-ai-challenge`
%% Cell type:markdown id: tags:
Goal:
- Create biweekly renku datasets from [`climatelab-s2s-ai-challenge`](https://github.com/ecmwf-lab/climetlab-s2s-ai-challenge).
- These renku datasets are then used in notebooks:
- `ML_train_and_predict.ipynb` to train the ML model and do ML-based predictions
- `RPSS_verification.ipynb` to calculate RPSS of the ML model
- `mean_bias_reduction.ipynb` to remove the mean bias and
Requirements:
- [`climetlab`](https://github.com/ecmwf/climetlab)
- [`climatelab-s2s-ai-challenge`](https://github.com/ecmwf-lab/climetlab-s2s-ai-challenge)
- S2S and CPC observations uploaded on [European Weather Cloud (EWC)](https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/training-input/0.3.0/netcdf/index.html)
Output: [renku dataset](https://renku-python.readthedocs.io/en/latest/commands.html#module-renku.cli.dataset) `s2s-ai-challenge`
- observations
- deterministic:
- `hindcast-like-observations_2000-2019_biweekly_deterministic.zarr`
- `forecast-like-observations_2020_biweekly_deterministic.zarr`
- edges:
- `hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc`
- probabilistic:
- `hindcast-like-observations_2000-2019_biweekly_terciled.zarr`
- `forecast-like-observations_2020_biweekly_terciled.nc`
- forecasts/hindcasts
- deterministic:
- `ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr`
- `ecmwf_forecast-input_2020_biweekly_deterministic.zarr`
- more models could be added
- benchmark:
- probabilistic:
- `ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc`
%% Cell type:code id: tags:
``` python
#!renku dataset create s2s-ai-challenge
```
%% Cell type:code id: tags:
``` python
import climetlab as cml
import matplotlib.pyplot as plt
import xarray as xr
import xskillscore as xs
import pandas as pd
import climetlab_s2s_ai_challenge
import climetlab as cml
print(f'Climetlab version : {cml.__version__}')
print(f'Climetlab-s2s-ai-challenge plugin version : {climetlab_s2s_ai_challenge.__version__}')
xr.set_options(keep_attrs=True)
xr.set_options(display_style='text')
```
%% Output
WARNING: ecmwflibs universal: cannot find a library called MagPlus
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/climetlab/plotting/drivers/magics/actions.py:36: UserWarning: Magics library could not be found
warnings.warn(str(e))
Magics library could not be found
Climetlab version : 0.8.6
Climetlab-s2s-ai-challenge plugin version : 0.8.0
<xarray.core.options.set_options at 0x2b2b3825c090>
<xarray.core.options.set_options at 0x2b51c148f590>
%% Cell type:code id: tags:
``` python
# caching path for climetlab
cache_path = "/work/mh0727/m300524/S2S_AI/cache" # set your own path
cache_path = "/work/mh0727/m300524/S2S_AI/cache4" # set your own path
cml.settings.set("cache-directory", cache_path)
```
%% Cell type:code id: tags:
``` python
cache_path = "../data"
```
%% Cell type:code id: tags:
``` python
import climetlab_s2s_ai_challenge
print(f'Climetlab version : {cml.__version__}')
print(f'Climetlab-s2s-ai-challenge plugin version : {climetlab_s2s_ai_challenge.__version__}')
```
%% Output
Climetlab version : 0.7.0
Climetlab-s2s-ai-challenge plugin version : 0.6.0
%% Cell type:markdown id: tags:
# Download and cache
Download all files for the observations, forecast and hindcast.
%% Cell type:code id: tags:
``` python
dates = xr.cftime_range(start='20200102',freq='7D', periods=53).strftime('%Y%m%d').to_list()
dates[:5]
# shortcut
from scripts import download
#download()
```
%% Output
['20200102', '20200109', '20200116', '20200123', '20200130']
%% Cell type:markdown id: tags:
## observations `output-reference`
## hindcast and forecast `input`
%% Cell type:code id: tags:
``` python
obs_dataset_labels = ['training-output-reference','test-output-reference'] # ML community
# starting dates forecast_time in 2020
dates = xr.cftime_range(start='20200102',freq='7D', periods=53).strftime('%Y%m%d').to_list()
forecast_dataset_labels = ['training-input','test-input'] # ML community
# equiv to
obs_dataset_labels = ['hindcast-like-observations','forecast-like-observations'] # NWP community
```
forecast_dataset_labels = ['hindcast-input','forecast-input'] # NWP community
%% Cell type:code id: tags:
varlist_forecast = ['tp','t2m'] # can add more
``` python
varlist_obs = ['tp','t2m']
center_list = ['ecmwf'] # 'ncep', 'eccc'
```
%% Cell type:code id: tags:
``` python
%%time
# takes 5h for one model
for v in varlist_obs:
for d in dates:
for ds in obs_dataset_labels:
print(ds,v,d)
# only netcdf, no choice
cml.load_dataset(f"s2s-ai-challenge-{ds}", date=d, parameter=v).to_xarray()
# takes ~ 10-30 min to download for one model one variable depending on number of model realizations
# and download settings https://climetlab.readthedocs.io/en/latest/guide/settings.html
for center in center_list:
for ds in forecast_dataset_labels:
cml.load_dataset(f"s2s-ai-challenge-{ds}", origin=center, parameter=varlist_forecast, format='netcdf').to_xarray()
```
%% Cell type:markdown id: tags:
## hindcast and forecast `input`
## observations `output-reference`
%% Cell type:code id: tags:
``` python
forecast_dataset_labels = ['training-input','test-input'] # ML community
obs_dataset_labels = ['training-output-reference','test-output-reference'] # ML community
# equiv to
forecast_dataset_labels = ['hindcast-input','forecast-input'] # NWP community
```
%% Cell type:code id: tags:
obs_dataset_labels = ['hindcast-like-observations','forecast-like-observations'] # NWP community
``` python
varlist_forecast = ['tp','t2m']
varlist_obs = ['tp', 't2m']
```
%% Cell type:code id: tags:
``` python
model_list = ['ecmwf']
%%time
# takes 10min to download
for ds in obs_dataset_labels:
print(ds)
# only netcdf, no format choice
cml.load_dataset(f"s2s-ai-challenge-{ds}", date=dates, parameter=varlist_obs).to_xarray()
```
%% Cell type:code id: tags:
``` python
%%time
# takes 4-5h for one model
for model in model_list:
for v in varlist_forecast:
for d in dates:
for ds in forecast_dataset_labels:
print(ds,v,d)
# take netcdf format
cml.load_dataset(f"s2s-ai-challenge-{ds}", origin=model, date=d, parameter=v, format='netcdf').to_xarray()
# download obs_time for to create output-reference/observations for other models than ecmwf and eccc,
# i.e. ncep or any S2S or Sub model
obs_time = cml.load_dataset(f"s2s-ai-challenge-observations", parameter=['t2m', 'pr']).to_xarray()
```
%% Cell type:markdown id: tags:
# create bi-weekly aggregates
%% Cell type:code id: tags:
``` python
from scripts import add_valid_time_from_forecast_reference_time_and_lead_time
from scripts import aggregate_biweekly, ensure_attributes
#aggregate_biweekly??
```
%% Cell type:code id: tags:
``` python
w34 = [pd.Timedelta(f'{i} d') for i in range(14,28)]
w34 = xr.DataArray(w34,dims='lead_time', coords={'lead_time':w34})
w34
```
%% Output
for c, center in enumerate(center_list): # forecast centers (could also take models)
for dsl in obs_dataset_labels:# + forecast_dataset_labels: # climetlab dataset labels
for p, parameter in enumerate(varlist_forecast): # variables
if c != 0 and 'observation' in dsl: # only do once for observations
continue
print(f"datasetlabel: {dsl}, center: {center}, parameter: {parameter}")
if 'input' in dsl:
ds = cml.load_dataset(f"s2s-ai-challenge-{dsl}", origin=center, parameter=parameter, format='netcdf').to_xarray()
elif 'observation' in dsl: # obs only netcdf, no choice
if parameter not in ['t2m', 'tp']:
continue
ds = cml.load_dataset(f"s2s-ai-challenge-{dsl}", parameter=parameter, date=dates).to_xarray()
if p == 0:
ds_biweekly = ds.map(aggregate_biweekly)
else:
ds_biweekly[parameter] = ds.map(aggregate_biweekly)[parameter]
ds_biweekly = ds_biweekly.map(ensure_attributes, biweekly=True)
ds_biweekly = ds_biweekly.sortby('forecast_time')
if 'test' in dsl:
ds_biweekly = ds_biweekly.chunk('auto')
else:
ds_biweekly = ds_biweekly.chunk({'forecast_time':'auto','lead_time':-1,'longitude':-1,'latitude':-1})
<xarray.DataArray (lead_time: 14)>
array([1209600000000000, 1296000000000000, 1382400000000000,
1468800000000000, 1555200000000000, 1641600000000000,
1728000000000000, 1814400000000000, 1900800000000000,
1987200000000000, 2073600000000000, 2160000000000000,
2246400000000000, 2332800000000000], dtype='timedelta64[ns]')
Coordinates:
* lead_time (lead_time) timedelta64[ns] 14 days 15 days ... 26 days 27 days
if 'hindcast' in dsl:
time = f'{int(ds_biweekly.forecast_time.dt.year.min())}-{int(ds_biweekly.forecast_time.dt.year.max())}'
if 'input' in dsl:
name = f'{center}_{dsl}'
elif 'observations':
name = dsl
%% Cell type:code id: tags:
elif 'forecast' in dsl:
time = '2020'
if 'input' in dsl:
name = f'{center}_{dsl}'
elif 'observations':
name = dsl
else:
assert False
``` python
w56 = [pd.Timedelta(f'{i} d') for i in range(28,42)]
w56 = xr.DataArray(w56,dims='lead_time', coords={'lead_time':w56})
w56
# pattern: {model_if_not_observations}{observations/forecast/hindcast}_{time}_biweekly_deterministic.zarr
zp = f'{cache_path}/{name}_{time}_biweekly_deterministic.zarr'
ds_biweekly.attrs.update({'postprocessed_by':'https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/blob/master/notebooks/renku_datasets_biweekly.ipynb'})
print(f'save to: {zp}')
ds_biweekly.astype('float32').to_zarr(zp, consolidated=True, mode='w')
```
%% Output
<xarray.DataArray (lead_time: 14)>
array([2419200000000000, 2505600000000000, 2592000000000000,
2678400000000000, 2764800000000000, 2851200000000000,
2937600000000000, 3024000000000000, 3110400000000000,
3196800000000000, 3283200000000000, 3369600000000000,
3456000000000000, 3542400000000000], dtype='timedelta64[ns]')
Coordinates:
* lead_time (lead_time) timedelta64[ns] 28 days 29 days ... 40 days 41 days
%% Cell type:code id: tags:
``` python
# starting dates forecast_time in 2020
dates = xr.cftime_range(start='20200102',freq='7D', periods=53).strftime('%Y%m%d').to_list()
biweekly_lead = [pd.Timedelta(f"{i} d") for i in [14, 28]] # take first day of biweekly average
# alternative: new lead_time coordinate: use 21 35 as mid points of week 3-4 and week 5-6
```
datasetlabel: hindcast-like-observations, center: ecmwf, parameter: tp
By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data. This dataset has been dowloaded from IRIDL. By downloading this data you also agree to the terms and conditions defined at https://iridl.ldeo.columbia.edu.
%% Cell type:code id: tags:
WARNING: ecmwflibs universal: found eccodes at /work/mh0727/m300524/conda-envs/s2s-ai/lib/libeccodes.so
``` python
# refactor after closed: https://github.com/ecmwf/climetlab/issues/16 https://github.com/ecmwf-lab/climetlab-s2s-ai-challenge/issues/9
# then, no need to loop over dates
model = 'ecmwf'
for dsl in forecast_dataset_labels + obs_dataset_labels: # dataset labels
for v in varlist_forecast:
print(dsl,v)
ds = []
for d in dates:
if 'input' in dsl:
# just ecmwf model for now
ds2 = cml.load_dataset(f"s2s-ai-challenge-{dsl}", origin=model, date=d, parameter=v, format='netcdf').to_xarray()
elif 'observation' in dsl: # obs only netcdf, no choice
ds2 = cml.load_dataset(f"s2s-ai-challenge-{dsl}", date=d, parameter=v).to_xarray().chunk()
if v == 't2m': # biweekly mean
d34 = ds2.sel(lead_time=w34).mean('lead_time')
d56 = ds2.sel(lead_time=w56).mean('lead_time')
ds2 = xr.concat([d34,d56],'lead_time').assign_coords(lead_time=biweekly_lead)
elif v=='tp': # biweekly sum
d34 = ds2.sel(lead_time=pd.Timedelta("27 d")) - ds2.sel(lead_time=pd.Timedelta("13 d")) # tp from day 14 to day 27
d56 = ds2.sel(lead_time=pd.Timedelta("41 d")) - ds2.sel(lead_time=pd.Timedelta("27 d")) # tp from day 28 to day 42
ds2 = xr.concat([d34,d56],'lead_time').assign_coords(lead_time=biweekly_lead)
ds.append(ds2.chunk({'forecast_time':1}))
# sort forecast_time
ds = xr.concat(ds, 'forecast_time').sortby('forecast_time')
ds = add_valid_time_from_forecast_reference_time_and_lead_time(ds)
ds.lead_time.attrs['comment'] = 'lead_time describes bi-weekly aggregates. The pd.Timedelta corresponds to the first day of the '
# first run on tp and add t2m to that dataset
if v == 'tp':
ds_ = ds.copy()
elif v == 't2m':
ds_[v] = ds[v]
ds = ds_
if 'test' in dsl:
ds = ds.chunk('auto')
else:
ds = ds.chunk({'forecast_time':12,'lead_time':-1,'longitude':'auto','latitude':'auto'}) # 3 month init chunk
if 'hindcast' in dsl:
time = '2000-2019'
if 'input' in dsl:
name = f'{model}_{dsl}'
elif 'observations':
name = dsl
elif 'forecast' in dsl:
time = '2020'
if 'input' in dsl:
name = f'{model}_{dsl}'
elif 'observations':
name = dsl
else:
assert False
# pattern: {model if applies _}{observations/forecast/hindcast}_{time}_biweekly_deterministic.zarr
zp = f'{cache_path}/{name}_{time}_biweekly_deterministic.zarr'
print(f'save to: {zp}')
ds.to_zarr(zp, consolidated=True, mode='w')
```
Warning: ecCodes 2.21.0 or higher is recommended. You are running version 2.18.0
By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data.
%% Output
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/indexing.py:1379: PerformanceWarning: Slicing with an out-of-order index is generating 20 times more chunks
return self.array[key]
hindcast-input tp
By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data.
datasetlabel: hindcast-like-observations, center: ecmwf, parameter: t2m
WARNING: ecmwflibs universal: found eccodes at /work/mh0727/m300524/conda-envs/s2s-ai/lib/libeccodes.so
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/indexing.py:1379: PerformanceWarning: Slicing is producing a large chunk. To accept the large
chunk and silence this warning, set the option
>>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
... array[indexer]
To avoid creating the large chunks, set the option
>>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
... array[indexer]
return self.array[key]
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/indexing.py:1379: PerformanceWarning: Slicing is producing a large chunk. To accept the large
chunk and silence this warning, set the option
>>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
... array[indexer]
To avoid creating the large chunks, set the option
>>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
... array[indexer]
return self.array[key]
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/indexing.py:1379: PerformanceWarning: Slicing with an out-of-order index is generating 20 times more chunks
return self.array[key]
Warning: ecCodes 2.21.0 or higher is recommended. You are running version 2.12.3
hindcast-input t2m
save to: ../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr
forecast-input tp
By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data.
forecast-input t2m
save to: ../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr
hindcast-like-observations tp
By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data. This dataset has been dowloaded from IRIDL. By downloading this data you also agree to the terms and conditions defined at https://iridl.ldeo.columbia.edu.
hindcast-like-observations t2m
save to: ../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
x = np.divide(x1, x2, out)
forecast-like-observations tp
datasetlabel: forecast-like-observations, center: ecmwf, parameter: tp
By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data. This dataset has been dowloaded from IRIDL. By downloading this data you also agree to the terms and conditions defined at https://iridl.ldeo.columbia.edu.
forecast-like-observations t2m
datasetlabel: forecast-like-observations, center: ecmwf, parameter: t2m
save to: ../data/forecast-like-observations_2020_biweekly_deterministic.zarr
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
x = np.divide(x1, x2, out)
%% Cell type:markdown id: tags:
## add to `renku` dataset `s2s-ai-challenge`
%% Cell type:code id: tags:
``` python
# observations as hindcast
# run renku commands from projects root directory only
# !renku dataset add s2s-ai-challenge data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
```
%% Cell type:code id: tags:
``` python
# for further use retrieve from git lfs
# !renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
```
%% Cell type:code id: tags:
``` python
obs_2000_2019 = xr.open_zarr(f"{cache_path}/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr", consolidated=True)
print(obs_2000_2019.sizes,'\n',obs_2000_2019.coords,'\n', obs_2000_2019.nbytes/1e6,'MB')
```
%% Output
Frozen(SortedKeysDict({'forecast_time': 1060, 'latitude': 121, 'lead_time': 2, 'longitude': 240}))
Coordinates:
* forecast_time (forecast_time) datetime64[ns] 2000-01-02 ... 2019-12-31
* latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
* lead_time (lead_time) timedelta64[ns] 14 days 28 days
* longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
valid_time (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 1060), meta=np.ndarray>
985.065144 MB
492.546744 MB
%% Cell type:code id: tags:
``` python
# observations as forecast
# run renku commands from projects root directory only
# !renku dataset add s2s-ai-challenge data/forecast-like-observations_2020_biweekly_deterministic.zarr
```
%% Cell type:code id: tags:
``` python
# for further use retrieve from git lfs
# !renku storage pull ../data/forecast-like-observations_2020_biweekly_deterministic.zarr
```
%% Cell type:code id: tags:
``` python
obs_2020 = xr.open_zarr(f"{cache_path}/forecast-like-observations_2020_biweekly_deterministic.zarr", consolidated=True)
print(obs_2020.sizes,'\n',obs_2020.coords,'\n', obs_2020.nbytes/1e6,'MB')
```
%% Output
Frozen(SortedKeysDict({'forecast_time': 53, 'latitude': 121, 'lead_time': 2, 'longitude': 240}))
Coordinates:
* forecast_time (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31
* latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
* lead_time (lead_time) timedelta64[ns] 14 days 28 days
* longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
valid_time (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 53), meta=np.ndarray>
49.256016 MB
24.630096 MB
%% Cell type:code id: tags:
``` python
# ecmwf hindcast-input
# run renku commands from projects root directory only
# !renku dataset add s2s-ai-challenge data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr
```
%% Cell type:code id: tags:
``` python
hind_2000_2019 = xr.open_zarr(f"{cache_path}/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr", consolidated=True)
print(hind_2000_2019.sizes,'\n',hind_2000_2019.coords,'\n', hind_2000_2019.nbytes/1e6,'MB')
# for further use retrieve from git lfs
# !renku storage pull ../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr
```
%% Output
%% Cell type:code id: tags:
Frozen(SortedKeysDict({'forecast_time': 1060, 'latitude': 121, 'lead_time': 2, 'longitude': 240, 'realization': 11}))
Coordinates:
* forecast_time (forecast_time) datetime64[ns] 2000-01-02 ... 2019-12-31
* latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
* lead_time (lead_time) timedelta64[ns] 14 days 28 days
* longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
* realization (realization) int64 0 1 2 3 4 5 6 7 8 9 10
valid_time (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 1060), meta=np.ndarray>
5417.730832 MB
``` python
hind_2000_2019 = xr.open_zarr(f"{cache_path}/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr", consolidated=True)
print(hind_2000_2019.sizes,'\n',hind_2000_2019.coords,'\n', hind_2000_2019.nbytes/1e6,'MB')
```
%% Cell type:code id: tags:
``` python
# ecmwf forecast-input
# run renku commands from projects root directory only
# !renku dataset add s2s-ai-challenge data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr
```
%% Cell type:code id: tags:
``` python
fct_2020 = xr.open_zarr(f"{cache_path}/ecmwf_forecast-input_2020_biweekly_deterministic.zarr", consolidated=True)
print(fct_2020.sizes,'\n',fct_2020.coords,'\n', fct_2020.nbytes/1e6,'MB')
# for further use retrieve from git lfs
# !renku storage pull ../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr
```
%% Output
%% Cell type:code id: tags:
Frozen(SortedKeysDict({'forecast_time': 53, 'latitude': 121, 'lead_time': 2, 'longitude': 240, 'realization': 51}))
Coordinates:
* forecast_time (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31
* latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
* lead_time (lead_time) timedelta64[ns] 14 days 28 days
* longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
* realization (realization) int64 0 1 2 3 4 5 6 7 ... 44 45 46 47 48 49 50
valid_time (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 53), meta=np.ndarray>
1255.926504 MB
``` python
fct_2020 = xr.open_zarr(f"{cache_path}/ecmwf_forecast-input_2020_biweekly_deterministic.zarr", consolidated=True)
print(fct_2020.sizes,'\n',fct_2020.coords,'\n', fct_2020.nbytes/1e6,'MB')
```
%% Cell type:markdown id: tags:
# tercile edges
Create 2 tercile edges at 1/3 and 2/3 quantiles of the 2000-2019 biweekly distrbution for each week of the year
%% Cell type:code id: tags:
``` python
obs_2000_2019 = xr.open_zarr(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr', consolidated=True)
```
%% Cell type:code id: tags:
``` python
from scripts import add_year_week_coords
```
%% Cell type:code id: tags:
``` python
# add week for groupby, see https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge/-/issues/29
obs_2000_2019 = add_year_week_coords(obs_2000_2019)
```
%% Output
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:383: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
FutureWarning,
%% Cell type:code id: tags:
``` python
obs_2000_2019
```
%% Output
<xarray.Dataset>
Dimensions: (forecast_time: 1060, latitude: 121, lead_time: 2, longitude: 240)
Coordinates:
* forecast_time (forecast_time) datetime64[ns] 2000-01-02 ... 2019-12-31
* latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
* lead_time (lead_time) timedelta64[ns] 14 days 28 days
* longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
valid_time (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 1060), meta=np.ndarray>
week (forecast_time) int64 1 2 3 4 5 6 7 ... 47 48 49 50 51 52 53
year (forecast_time) int64 2000 2000 2000 2000 ... 2019 2019 2019
Data variables:
t2m (lead_time, forecast_time, latitude, longitude) float32 dask.array<chunksize=(2, 530, 121, 240), meta=np.ndarray>
tp (lead_time, forecast_time, latitude, longitude) float32 dask.array<chunksize=(2, 530, 121, 240), meta=np.ndarray>
Attributes:
created_by_script: tools/observations/makefile
created_by_software: climetlab-s2s-ai-challenge
function: climetlab_s2s_ai_challenge.extra.forecast_like_obse...
postprocessed_by: https://renkulab.io/gitlab/aaron.spring/s2s-ai-chal...
regrid_method: conservative
source_dataset_name: NOAA NCEP CPC UNIFIED_PRCP GAUGE_BASED GLOBAL v1p0 ...
source_hosting: IRIDL
source_url: http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/...
%% Cell type:code id: tags:
``` python
tercile_file = f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc'
```
%% Cell type:code id: tags:
``` python
%%time
xr.open_zarr(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr',
consolidated=True).chunk({'forecast_time':-1,'longitude':'auto'}).groupby('forecast_time.weekofyear').quantile(q=[1./3.,2./3.], dim=['forecast_time']).rename({'quantile':'category_edge'}).to_netcdf(tercile_file)
obs_2000_2019.chunk({'forecast_time':-1,'longitude':'auto'}).groupby('week').quantile(q=[1./3.,2./3.], dim='forecast_time').rename({'quantile':'category_edge'}).astype('float32').to_netcdf(tercile_file)
```
%% Output
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
FutureWarning,
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/numpy/lib/nanfunctions.py:1390: RuntimeWarning: All-NaN slice encountered
overwrite_input, interpolation)
CPU times: user 15min 49s, sys: 6min 16s, total: 22min 6s
Wall time: 13min 14s
CPU times: user 19min 35s, sys: 8min 33s, total: 28min 9s
Wall time: 16min 44s
%% Cell type:code id: tags:
``` python
tercile_edges = xr.open_dataset(tercile_file)
tercile_edges
```
%% Output
<xarray.Dataset>
Dimensions: (category_edge: 2, latitude: 121, lead_time: 2, longitude: 240, weekofyear: 53)
Dimensions: (category_edge: 2, latitude: 121, lead_time: 2, longitude: 240, week: 53)
Coordinates:
* latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
* lead_time (lead_time) timedelta64[ns] 14 days 28 days
* longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
* category_edge (category_edge) float64 0.3333 0.6667
* weekofyear (weekofyear) int64 1 2 3 4 5 6 7 8 ... 47 48 49 50 51 52 53
* week (week) int64 1 2 3 4 5 6 7 8 9 ... 45 46 47 48 49 50 51 52 53
Data variables:
t2m (weekofyear, category_edge, lead_time, latitude, longitude) float64 ...
tp (weekofyear, category_edge, lead_time, latitude, longitude) float64 ...
t2m (week, category_edge, lead_time, latitude, longitude) float32 ...
tp (week, category_edge, lead_time, latitude, longitude) float32 ...
Attributes:
created_by_script: tools/observations/makefile
created_by_software: climetlab-s2s-ai-challenge
function: climetlab_s2s_ai_challenge.extra.forecast_like_obse...
postprocessed_by: https://renkulab.io/gitlab/aaron.spring/s2s-ai-chal...
regrid_method: conservative
source_dataset_name: NOAA NCEP CPC UNIFIED_PRCP GAUGE_BASED GLOBAL v1p0 ...
source_hosting: IRIDL
source_url: http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/...
%% Cell type:code id: tags:
``` python
tercile_edges.nbytes*1e-6,'MB'
```
%% Output
(98.507024, 'MB')
(49.255184, 'MB')
%% Cell type:code id: tags:
``` python
# run renku commands from projects root directory only
# tercile edges
#!renku dataset add s2s-ai-challenge data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc
```
%% Cell type:code id: tags:
``` python
# to use retrieve from git lfs
#!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc
#xr.open_dataset("../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc")
```
%% Cell type:markdown id: tags:
# observations in categories
- counting how many deterministic forecasts realizations fall into each category, like counting rps
- categorize forecast-like-observations 2020 into categories
%% Cell type:code id: tags:
``` python
obs_2020 = xr.open_zarr(f'{cache_path}/forecast-like-observations_2020_biweekly_deterministic.zarr', consolidated=True)
obs_2020.sizes
```
%% Output
Frozen(SortedKeysDict({'forecast_time': 53, 'latitude': 121, 'lead_time': 2, 'longitude': 240}))
%% Cell type:code id: tags:
``` python
# create a mask for land grid
mask = obs_2020.std(['lead_time','forecast_time']).notnull()
```
%% Cell type:code id: tags:
``` python
# mask.to_array().plot(col='variable')
```
%% Cell type:code id: tags:
``` python
```
%% Cell type:code id: tags:
``` python
tercile_edges = xr.open_dataset(tercile_file)
```
%% Cell type:code id: tags:
``` python
tercile_edges.tp
```
%% Output
<xarray.DataArray 'tp' (weekofyear: 53, category_edge: 2, lead_time: 2, latitude: 121, longitude: 240)>
[6156480 values with dtype=float64]
Coordinates:
* latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
* lead_time (lead_time) timedelta64[ns] 14 days 28 days
* longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
* category_edge (category_edge) float64 0.3333 0.6667
* weekofyear (weekofyear) int64 1 2 3 4 5 6 7 8 ... 47 48 49 50 51 52 53
Attributes:
comment: precipitation accumulated since lead_time including 0 days
long_name: total precipitation
pointwidth: 0
standard_name: precipitation_amount
units: kg m-2
%% Cell type:code id: tags:
``` python
# total precipitation in arid regions are masked
# Frederic Vitart suggested by email: "Based on your map we could mask all the areas where the lower tercile boundary is lower than 0.1 mm"
# we are using a dry mask as in https://doi.org/10.1175/MWR-D-17-0092.1
th = 0.01
tp_arid_mask = tercile_edges.tp.isel(category_edge=0, lead_time=0, drop=True) > th
#tp_arid_mask.where(mask.tp).plot(col='forecast_time', col_wrap=4)
#plt.suptitle(f'dry mask: week 3-4 tp 1/3 category_edge > {th} kg m-2',y=1., x=.4)
#plt.savefig('dry_mask.png')
```
%% Cell type:code id: tags:
``` python
# look into tercile edges
# tercile_edges.isel(forecast_time=0)['tp'].plot(col='lead_time',row='category_edge', robust=True)
# tercile_edges.isel(forecast_time=[0,20],category_edge=1)['tp'].plot(col='lead_time', row='forecast_time', robust=True)
# tercile_edges.tp.mean(['forecast_time']).plot(col='lead_time',row='category_edge',vmax=.5)
```
%% Cell type:code id: tags:
%% Cell type:markdown id: tags:
``` python
#tercile_edges.isel(forecast_time=0)['tp'].plot(col='lead_time',row='category_edge', robust=True)
```
## categorize observations
%% Cell type:code id: tags:
%% Cell type:markdown id: tags:
``` python
#tercile_edges.isel(forecast_time=[0,20],category_edge=1)['tp'].plot(col='lead_time', row='forecast_time', robust=True)
```
### forecast 2020
%% Cell type:code id: tags:
``` python
# tercile_edges.tp.mean(['forecast_time']).plot(col='lead_time',row='category_edge',vmax=.5)
from scripts import make_probabilistic
```
%% Cell type:markdown id: tags:
## categorize observations
%% Cell type:code id: tags:
``` python
from scripts import make_probabilistic
# tp_arid_mask.isel(week=[0,10,20,30,40]).plot(col='week')
```
%% Cell type:code id: tags:
``` python
obs_2020_p = make_probabilistic(obs_2020, tercile_edges, mask=mask)
```
%% Output
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:383: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
FutureWarning,
%% Cell type:code id: tags:
``` python
obs_2020_p.nbytes/1e6, 'MB'
```
%% Output
(147.760264, 'MB')
(147.75984, 'MB')
%% Cell type:code id: tags:
``` python
obs_2020_p.to_netcdf(f'{cache_path}/forecast-like-observations_2020_biweekly_terciled.nc')
obs_2020_p
```
%% Output
<xarray.Dataset>
Dimensions: (category: 3, forecast_time: 53, latitude: 121, lead_time: 2, longitude: 240)
Coordinates:
* forecast_time (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31
* latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
* lead_time (lead_time) timedelta64[ns] 14 days 28 days
* longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
valid_time (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 53), meta=np.ndarray>
* category (category) <U12 'below normal' 'near normal' 'above normal'
Data variables:
t2m (category, lead_time, forecast_time, latitude, longitude) float64 dask.array<chunksize=(1, 2, 53, 121, 240), meta=np.ndarray>
tp (category, lead_time, forecast_time, latitude, longitude) float64 dask.array<chunksize=(1, 2, 53, 121, 240), meta=np.ndarray>
%% Cell type:code id: tags:
``` python
obs_2020_p.astype('float32').to_netcdf(f'{cache_path}/forecast-like-observations_2020_biweekly_terciled.nc')
```
%% Output
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
x = np.divide(x1, x2, out)
%% Cell type:code id: tags:
``` python
# forecast-like-observations terciled
# run renku commands from projects root directory only
# !renku dataset add s2s-ai-challenge data/forecast-like-observations_2020_biweekly_terciled.nc
```
%% Cell type:code id: tags:
``` python
# to use retrieve from git lfs
#!renku storage pull ../data/forecast-like-observations_2020_biweekly_terciled.nc
xr.open_dataset("../data/forecast-like-observations_2020_biweekly_terciled.nc")
```
%% Output
<xarray.Dataset>
Dimensions: (category: 3, forecast_time: 53, latitude: 121, lead_time: 2, longitude: 240)
Coordinates:
* forecast_time (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31
* latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
* lead_time (lead_time) timedelta64[ns] 14 days 28 days
* longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
valid_time (lead_time, forecast_time) datetime64[ns] ...
weekofyear (forecast_time) int64 ...
* category (category) object 'below normal' 'near normal' 'above normal'
Data variables:
t2m (category, lead_time, forecast_time, latitude, longitude) float64 ...
tp (category, lead_time, forecast_time, latitude, longitude) float64 ...
t2m (category, lead_time, forecast_time, latitude, longitude) float32 ...
tp (category, lead_time, forecast_time, latitude, longitude) float32 ...
%% Cell type:code id: tags:
``` python
```
%% Cell type:code id: tags:
%% Cell type:markdown id: tags:
``` python
```
### hindcast 2000_2019
%% Cell type:code id: tags:
``` python
obs_2000_2019 = xr.open_zarr(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr', consolidated=True)
```
%% Cell type:code id: tags:
``` python
obs_2000_2019_p = make_probabilistic(obs_2000_2019, tercile_edges, mask=mask)
```
%% Output
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:383: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
FutureWarning,
%% Cell type:code id: tags:
``` python
obs_2000_2019_p.nbytes/1e6, 'MB'
```
%% Output
(2955.147368, 'MB')
(2955.138888, 'MB')
%% Cell type:code id: tags:
``` python
obs_2000_2019_p.chunk('auto').to_zarr(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_terciled.zarr', consolidated=True, mode='w')
obs_2000_2019_p.astype('float32').chunk('auto').to_zarr(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_terciled.zarr', consolidated=True, mode='w')
```
%% Output
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
x = np.divide(x1, x2, out)
<xarray.backends.zarr.ZarrStore at 0x2b2b614e84b0>
<xarray.backends.zarr.ZarrStore at 0x2b51c1233360>
%% Cell type:code id: tags:
``` python
# forecast-like-observations terciled
# run renku commands from projects root directory only
# !renku dataset add s2s-ai-challenge data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr
```
%% Cell type:code id: tags:
``` python
# to use retrieve from git lfs
#!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr
xr.open_zarr("../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr")
```
%% Output
<xarray.Dataset>
Dimensions: (category: 3, forecast_time: 1060, latitude: 121, lead_time: 2, longitude: 240)
Coordinates:
* category (category) <U12 'below normal' 'near normal' 'above normal'
* forecast_time (forecast_time) datetime64[ns] 2000-01-02 ... 2019-12-31
* latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
* lead_time (lead_time) timedelta64[ns] 14 days 28 days
* longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
valid_time (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 1060), meta=np.ndarray>
weekofyear (forecast_time) int64 dask.array<chunksize=(1060,), meta=np.ndarray>
Data variables:
t2m (category, lead_time, forecast_time, latitude, longitude) float64 dask.array<chunksize=(3, 2, 96, 121, 240), meta=np.ndarray>
tp (category, lead_time, forecast_time, latitude, longitude) float64 dask.array<chunksize=(3, 2, 96, 121, 240), meta=np.ndarray>
t2m (category, lead_time, forecast_time, latitude, longitude) float32 dask.array<chunksize=(1, 2, 530, 121, 240), meta=np.ndarray>
tp (category, lead_time, forecast_time, latitude, longitude) float32 dask.array<chunksize=(1, 2, 530, 121, 240), meta=np.ndarray>
%% Cell type:code id: tags:
``` python
# checking category frequencies
# o = xr.open_zarr("../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr")
# w=0
# v='tp'
# o.sel(forecast_time=o.forecast_time.dt.dayofyear==2+7*w).sum('forecast_time', skipna=False)[v].plot(row='lead_time',col='category', levels=[5.5,6.5,7.5])
# o.sel(forecast_time=o.forecast_time.dt.dayofyear==2+7*w).sum('forecast_time', skipna=False).sum('category', skipna=False)[v].plot(row='lead_time', levels=[16.5,17.5,18.5,19.5,20.5])
```
%% Cell type:code id: tags:
%% Cell type:markdown id: tags:
``` python
```
# Benchmark
%% Cell type:markdown id: tags:
center: ECMWF
# benchmark
The calibration has been performed by using the tercile boundaries from the model climatology rather than from observations. Script by Frederic Vitart.
%% Cell type:code id: tags:
``` python
bench_p = cml.load_dataset("s2s-ai-challenge-test-output-benchmark",
parameter='t2m',
weeks=['34','56']
).to_xarray()
bench_p = cml.load_dataset("s2s-ai-challenge-test-output-benchmark", parameter=['tp','t2m']).to_xarray()
```
%% Output
50%|█████ | 1/2 [00:00<00:00, 6.11it/s]
By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data.
100%|██████████| 2/2 [00:00<00:00, 6.89it/s]
WARNING: ecmwflibs universal: found eccodes at /work/mh0727/m300524/conda-envs/s2s-ai/lib/libeccodes.so
Warning: ecCodes 2.21.0 or higher is recommended. You are running version 2.12.3
%% Cell type:code id: tags:
``` python
bench_p['tp'] = cml.load_dataset("s2s-ai-challenge-test-output-benchmark",
parameter='tp',
weeks=['34','56']
).to_xarray().tp
bench_p['category'].attrs = {'long_name': 'tercile category probabilities', 'units': '1',
'description': 'Probabilities for three tercile categories. All three tercile category probabilities must add up to 1.'}
```
%% Cell type:code id: tags:
``` python
bench_p['category'] = ['below normal','near normal','above normal']
bench_p['lead_time'] = [pd.Timedelta(f"{i} d") for i in [14, 28]] # take first day of biweekly average as new coordinate
bench_p['lead_time'].attrs = {'long_name':'forecast_period', 'description': 'Forecast period is the time interval between the forecast reference time and the validity time.',
'aggregate': 'The pd.Timedelta corresponds to the first day of a biweekly aggregate.',
'week34_t2m': 'mean[day 14, 27]',
'week56_t2m': 'mean[day 28, 41]',
'week34_tp': 'day 28 minus day 14',
'week56_tp': 'day 42 minus day 28'}
```
%% Cell type:code id: tags:
``` python
bench_p['lead_time'] = biweekly_lead
bench_p = bench_p / 100 # convert percent to [0-1] probability
```
%% Cell type:code id: tags:
``` python
bench_p = bench_p / 100
bench_p.tp.attrs['units']=''
bench_p.t2m.attrs['units']=''
bench_p = bench_p.map(ensure_attributes, biweekly=True)
```
%% Output
0%| | 0/1 [00:00<?, ?it/s]
By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data.
100%|██████████| 1/1 [00:00<00:00, 4.34it/s]
100%|██████████| 1/1 [00:00<00:00, 4.22it/s]
%% Cell type:code id: tags:
``` python
# bench_p.isel(forecast_time=2).t2m.plot(row='lead_time', col='category')
```
%% Cell type:code id: tags:
``` python
bench_p
```
%% Output
<xarray.Dataset>
Dimensions: (category: 3, forecast_time: 53, latitude: 121, lead_time: 2, longitude: 240)
Coordinates:
* category (category) object 'below normal' 'near normal' 'above normal'
* forecast_time (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31
* category (category) <U12 'below normal' 'near normal' 'above normal'
* lead_time (lead_time) timedelta64[ns] 14 days 28 days
* latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
* longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
valid_time (forecast_time, lead_time) datetime64[ns] dask.array<chunksize=(53, 1), meta=np.ndarray>
valid_time (forecast_time, lead_time) datetime64[ns] 2020-01-16 ... 2...
Data variables:
t2m (category, forecast_time, lead_time, latitude, longitude) float32 dask.array<chunksize=(3, 53, 1, 121, 240), meta=np.ndarray>
tp (category, forecast_time, lead_time, latitude, longitude) float32 dask.array<chunksize=(3, 53, 1, 121, 240), meta=np.ndarray>
Attributes:
GRIB_edition: 1
GRIB_centre: ecmf
GRIB_centreDescription: European Centre for Medium-Range Weather Forecasts
GRIB_subCentre: 0
Conventions: CF-1.7
institution: European Centre for Medium-Range Weather Forecasts
history: 2021-05-25T12:55 GRIB to CDM+CF via cfgrib-0.9.9...
tp (category, forecast_time, lead_time, latitude, longitude) float32 ...
t2m (category, forecast_time, lead_time, latitude, longitude) float32 ...
%% Cell type:code id: tags:
``` python
bench_p.to_netcdf('../data/ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc')
bench_p.astype('float32').to_netcdf('../data/ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc')
```
%% Cell type:code id: tags:
``` python
#!renku dataset add s2s-ai-challenge data/ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc
```
......
import xarray as xr
import pandas as pd
import numpy as np
import climetlab_s2s_ai_challenge
import climetlab as cml
cache_path = '../data'
def download(varlist_forecast=['tp','t2m'],
center_list=['ecwmf'],
forecast_dataset_labels=['hindcast-input','forecast-input'],
obs_dataset_labels=['hindcast-like-observations','forecast-like-observations'],
varlist_observations=['t2m','tp'],
benchmark=True,
format='netcdf'
):
"""Download files with climetlab to cache_path. Set cache_path:
cml.settings.set("cache-directory", cache_path)
"""
if isinstance(center_list, str):
center_list = [center_list]
if isinstance(varlist_forecast, str):
varlist_forecast = [varlist_forecast]
dates = xr.cftime_range(start='20200102',freq='7D', periods=53).strftime('%Y%m%d').to_list()
if forecast_dataset_labels:
print(f'Downloads variables {varlist_forecast} from datasets {forecast_dataset_labels} from center {center_list} in {format} format.')
for center in center_list:
for ds in forecast_dataset_labels:
for parameter in varlist_forecast:
try:
cml.load_dataset(f"s2s-ai-challenge-{ds}", origin=center, parameter=varlist_forecast, format=format).to_xarray()
except:
pass
if obs_dataset_labels:
print(f'Downloads variables tp and t2m from datasets {obs_dataset_labels} netcdf format. Additionally downloads raw t2m and pr observations with a time dimension.')
try:
for ds in obs_dataset_labels:
for parameter in varlist_observations:
cml.load_dataset(f"s2s-ai-challenge-{ds}", date=dates, parameter=parameter).to_xarray()
except:
pass
# raw
cml.load_dataset(f"s2s-ai-challenge-observations", parameter=varlist_observations).to_xarray()
if benchmark:
cml.load_dataset("s2s-ai-challenge-test-output-benchmark", parameter=['tp','t2m']).to_xarray()
print('finished')
return
def add_valid_time_from_forecast_reference_time_and_lead_time(forecast, init_dim='forecast_time'):
"""Creates valid_time(forecast_time, lead_time).
......@@ -28,11 +73,98 @@ def add_valid_time_from_forecast_reference_time_and_lead_time(forecast, init_dim
return forecast
def make_probabilistic(ds, tercile_edges, member_dim='realization', mask=None):
def aggregate_biweekly(da):
"""
Aggregate initialized S2S forecasts biweekly for xr.DataArrays.
Use ds.map(aggregate_biweekly) for xr.Datasets.
Applies to the ECMWF S2S data model: https://confluence.ecmwf.int/display/S2S/Parameters
"""
# biweekly averaging
w34 = [pd.Timedelta(f'{i} d') for i in range(14,28)]
w34 = xr.DataArray(w34,dims='lead_time', coords={'lead_time':w34})
w56 = [pd.Timedelta(f'{i} d') for i in range(28,42)]
w56 = xr.DataArray(w56,dims='lead_time', coords={'lead_time':w56})
biweekly_lead = [pd.Timedelta(f"{i} d") for i in [14, 28]] # take first day of biweekly average as new coordinate
v = da.name
if climetlab_s2s_ai_challenge.CF_CELL_METHODS[v] == 'sum': # biweekly difference for sum variables: tp and ttr
d34 = da.sel(lead_time=pd.Timedelta("28 d")) - da.sel(lead_time=pd.Timedelta("14 d")) # tp from day 14 to day 27
d56 = da.sel(lead_time=pd.Timedelta("42 d")) - da.sel(lead_time=pd.Timedelta("28 d")) # tp from day 28 to day 42
da_biweekly = xr.concat([d34,d56],'lead_time').assign_coords(lead_time=biweekly_lead)
else: # t2m, see climetlab_s2s_ai_challenge.CF_CELL_METHODS # biweekly: mean [day 14, day 27]
d34 = da.sel(lead_time=w34).mean('lead_time')
d56 = da.sel(lead_time=w56).mean('lead_time')
da_biweekly = xr.concat([d34,d56],'lead_time').assign_coords(lead_time=biweekly_lead)
da_biweekly = add_valid_time_from_forecast_reference_time_and_lead_time(da_biweekly)
da_biweekly['lead_time'].attrs = {'long_name':'forecast_period', 'description': 'Forecast period is the time interval between the forecast reference time and the validity time.',
'aggregate': 'The pd.Timedelta corresponds to the first day of a biweekly aggregate.',
'week34_t2m': 'mean[day 14, 27]',
'week56_t2m': 'mean[day 28, 41]',
'week34_tp': 'day 28 minus day 14',
'week56_tp': 'day 42 minus day 28'}
return da_biweekly
def ensure_attributes(da, biweekly=False):
"""Ensure that coordinates and variables have proper attributes. Set biweekly==True to set special comments for the biweely aggregates."""
template = cml.load_dataset('s2s-ai-challenge-test-input',parameter='t2m', origin='ecmwf', format='netcdf', date='20200102').to_xarray()
for c in da.coords:
if c in template.coords:
da.coords[c].attrs.update(template.coords[c].attrs)
if 'valid_time' in da.coords:
da['valid_time'].attrs.update({'long_name': 'validity time',
'standard_name': 'time',
'description': 'time for which the forecast is valid',
'calculate':'forecast_time + lead_time'})
if 'forecast_time' in da.coords:
da['forecast_time'].attrs.update({'long_name' : 'initial time of forecast', 'standard_name': 'forecast_reference_time',
'description':'The forecast reference time in NWP is the "data time", the time of the analysis from which the forecast was made. It is not the time for which the forecast is valid.'})
# fix tp
if da.name == 'tp':
da.attrs['units'] = 'kg m-2'
if biweekly:
da['lead_time'].attrs.update({'standard_name':'forecast_period', 'long_name': 'lead time',
'description': 'Forecast period is the time interval between the forecast reference time and the validity time.',
'aggregate': 'The pd.Timedelta corresponds to the first day of a biweekly aggregate.',
'week34_t2m': 'mean[14 days, 27 days]',
'week56_t2m': 'mean[28 days, 41 days]',
'week34_tp': '28 days minus 14 days',
'week56_tp': '42 days minus 28 days'})
if da.name == 'tp':
da.attrs.update({'aggregate_week34': '28 days minus 14 days',
'aggregate_week56': '42 days minus 28 days',
'description': 'https://confluence.ecmwf.int/display/S2S/S2S+Total+Precipitation'})
if da.name == 't2m':
da.attrs.update({'aggregate_week34': 'mean[14 days, 27 days]',
'aggregate_week56': 'mean[28 days, 41 days]',
'variable_before_categorization': 'https://confluence.ecmwf.int/display/S2S/S2S+Surface+Air+Temperature'})
return da
def add_year_week_coords(ds):
import numpy as np
if 'week' not in ds.coords and 'year' not in ds.coords:
year = ds.forecast_time.dt.year.to_index().unique()
week = (list(np.arange(1,54)))
weeks = week * len(year)
years = np.repeat(year,len(week))
ds.coords["week"] = ("forecast_time", weeks)
ds.coords['week'].attrs['description'] = "This week represents the number of forecast_time starting from 1 to 53. Note: This week is different from the ISO week from groupby('forecast_time.weekofyear'), see https://en.wikipedia.org/wiki/ISO_week_date and https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge/-/issues/29"
ds.coords["year"] = ("forecast_time", years)
ds.coords['year'].attrs['long_name'] = "calendar year"
return ds
def make_probabilistic(ds, tercile_edges, member_dim='realization', mask=None, groupby_coord='week'):
"""Compute probabilities from ds (observations or forecasts) based on tercile_edges."""
# broadcast
if 'forecast_time' not in tercile_edges.dims and 'weekofyear' in tercile_edges.dims:
tercile_edges = tercile_edges.sel(weekofyear=ds.forecast_time.dt.weekofyear)
ds = add_year_week_coords(ds)
tercile_edges = tercile_edges.sel({groupby_coord: ds.coords[groupby_coord]})
bn = ds < tercile_edges.isel(category_edge=0, drop=True) # below normal
n = (ds >= tercile_edges.isel(category_edge=0, drop=True)) & (ds < tercile_edges.isel(category_edge=1, drop=True)) # normal
an = ds >= tercile_edges.isel(category_edge=1, drop=True) # above normal
......@@ -48,11 +180,24 @@ def make_probabilistic(ds, tercile_edges, member_dim='realization', mask=None):
# we are using a dry mask as in https://doi.org/10.1175/MWR-D-17-0092.1
tp_arid_mask = tercile_edges.tp.isel(category_edge=0, lead_time=0, drop=True) > 0.01
ds_p['tp'] = ds_p['tp'].where(tp_arid_mask)
# ds_p = ds_p * 100 # in percent %
ds_p['category'].attrs = {'long_name': 'tercile category probabilities', 'units': '1',
'description': 'Probabilities for three tercile categories. All three tercile category probabilities must add up to 1.'}
ds_p['tp'].attrs = {'long_name': 'Probability of total precipitation in tercile categories', 'units': '1',
'comment': 'All three tercile category probabilities must add up to 1.',
'variable_before_categorization': 'https://confluence.ecmwf.int/display/S2S/S2S+Total+Precipitation'
}
ds_p['t2m'].attrs = {'long_name': 'Probability of 2m temperature in tercile categories', 'units': '1',
'comment': 'All three tercile category probabilities must add up to 1.',
'variable_before_categorization': 'https://confluence.ecmwf.int/display/S2S/S2S+Surface+Air+Temperature'
}
if 'year' in ds_p.coords:
del ds_p.coords['year']
if groupby_coord in ds_p.coords:
ds_p = ds_p.drop(groupby_coord)
return ds_p
def skill_by_year(preds):
def skill_by_year(preds, adapt=False):
"""Returns pd.Dataframe of RPSS per year."""
# similar verification_RPSS.ipynb
# as scorer bot but returns a score for each year
......@@ -65,98 +210,114 @@ def skill_by_year(preds):
# from root
#renku storage pull data/forecast-like-observations_2020_biweekly_terciled.nc
#renku storage pull data/hindcast-like-observations_2000-2019_biweekly_terciled.nc
cache_path = '../data'
if 2020 in preds.forecast_time.dt.year:
obs_p = xr.open_dataset(f'{cache_path}/forecast-like-observations_2020_biweekly_terciled.nc').sel(forecast_time=preds.forecast_time)
else:
obs_p = xr.open_dataset(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_terciled.zarr', engine='zarr').sel(forecast_time=preds.forecast_time)
# ML probabilities
fct_p = preds
# climatology
clim_p = xr.DataArray([1/3, 1/3, 1/3], dims='category', coords={'category':['below normal', 'near normal', 'above normal']}).to_dataset(name='tp')
clim_p['t2m'] = clim_p['tp']
## RPSS
if adapt:
# select only obs_p where fct_p forecasts provided
for c in ['longitude', 'latitude', 'forecast_time', 'lead_time']:
obs_p = obs_p.sel({c:fct_p[c]})
obs_p = obs_p[list(fct_p.data_vars)]
clim_p = clim_p[list(fct_p.data_vars)]
else:
# check inputs
assert_predictions_2020(obs_p)
assert_predictions_2020(fct_p)
# rps_ML
rps_ML = xs.rps(obs_p, fct_p, category_edges=None, dim=[], input_distributions='p').compute()
# rps_clim
rps_clim = xs.rps(obs_p, clim_p, category_edges=None, dim='forecast_time', input_distributions='p').compute()
# rpss
rpss = 1 - rps_ML / rps_clim
rpss = rpss.groupby('forecast_time.year').mean()
# cleaning
# check for -inf grid cells
if (rpss==-np.inf).to_array().any():
(rpss == rpss.min()).sum()
rps_clim = xs.rps(obs_p, clim_p, category_edges=None, dim=[], input_distributions='p').compute()
# dirty fix
rpss = rpss.clip(-1, 1)
# what to do with requested grid cells where NaN is submitted? also penalize, todo: https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/issues/7
## RPSS
# penalize # https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/issues/7
expect = obs_p.sum('category')
expect = expect.where(expect > 0.98).where(expect < 1.02) # should be True if not all NaN
mask = xr.ones_like(rpss.isel(lead_time=0, drop=True)).reset_coords(drop=True).t2m
boundary_tropics = 30
mask = xr.concat([mask.where(mask.latitude > boundary_tropics),
mask.where(np.abs(mask.latitude) <= boundary_tropics),
mask.where((mask.latitude < -boundary_tropics) & (mask.latitude > -60))],'area')
mask = mask.assign_coords(area=['northern_extratropics', 'tropics', 'southern_extratropics'])
mask.name = 'area'
# https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/issues/50
rps_ML = rps_ML.where(expect, other=2) # assign RPS=2 where value was expected but NaN found
mask = mask.where(rpss.t2m.isel(lead_time=0, drop=True).notnull())
# following Weigel 2007: https://doi.org/10.1175/MWR3280.1
rpss = 1 - (rps_ML.groupby('forecast_time.year').mean() / rps_clim.groupby('forecast_time.year').mean())
# clip
rpss = rpss.clip(-10, 1)
# weighted area mean
weights = np.cos(np.deg2rad(np.abs(mask.latitude)))
weights = np.cos(np.deg2rad(np.abs(rpss.latitude)))
# spatially weighted score averaged over lead_times and variables to one single value
scores = (rpss * mask).weighted(weights).mean('latitude').mean('longitude')
scores = rpss.sel(latitude=slice(None, -60)).weighted(weights).mean('latitude').mean('longitude')
scores = scores.to_array().mean(['lead_time', 'variable'])
return scores.to_dataframe('RPSS')
def assert_predictions_2020(preds_test):
def assert_predictions_2020(preds_test, exclude='weekofyear'):
"""Check the variables, coordinates and dimensions of 2020 predictions."""
from xarray.testing import assert_equal # doesnt care about attrs but checks coords
# is dataset
assert isinstance(preds_test, xr.Dataset)
# has both vars: tp and t2m
assert 'tp' in preds_test.data_vars
assert 't2m' in preds_test.data_vars
if 'data_vars' in exclude:
assert 'tp' in preds_test.data_vars
assert 't2m' in preds_test.data_vars
## coords
# ignore weekofyear coord if not dim
if 'weekofyear' in exclude and 'weekofyear' in preds_test.coords and 'weekofyear' not in preds_test.dims:
preds_test = preds_test.drop('weekofyear')
# forecast_time
d = pd.date_range(start='2020-01-02', freq='7D', periods=53)
forecast_time = xr.DataArray(d, dims='forecast_time', coords={'forecast_time':d})
assert (forecast_time == preds_test['forecast_time']).all()
if 'forecast_time' in exclude:
d = pd.date_range(start='2020-01-02', freq='7D', periods=53)
forecast_time = xr.DataArray(d, dims='forecast_time', coords={'forecast_time':d}, name='forecast_time')
assert_equal(forecast_time, preds_test['forecast_time'])
# longitude
lon = np.arange(0., 360., 1.5)
longitude = xr.DataArray(lon, dims='longitude', coords={'longitude': lon})
assert (longitude == preds_test['longitude']).all()
if 'longitude' in exclude:
lon = np.arange(0., 360., 1.5)
longitude = xr.DataArray(lon, dims='longitude', coords={'longitude': lon}, name='longitude')
assert_equal(longitude, preds_test['longitude'])
# latitude
lat = np.arange(90., -90.1, 1.5)
latitude = xr.DataArray(lat, dims='latitude', coords={'latitude': lat})
assert (latitude == preds_test['latitude']).all()
if 'latitude' in exclude:
lat = np.arange(-90., 90.1, 1.5)[::-1]
latitude = xr.DataArray(lat, dims='latitude', coords={'latitude': lat}, name='latitude')
assert_equal(latitude, preds_test['latitude'])
# lead_time
lead = [pd.Timedelta(f'{i} d') for i in [14, 28]]
lead_time = xr.DataArray(lead, dims='lead_time', coords={'lead_time': lead})
assert (lead_time == preds_test['lead_time']).all()
if 'lead_time' in exclude:
lead = [pd.Timedelta(f'{i} d') for i in [14, 28]]
lead_time = xr.DataArray(lead, dims='lead_time', coords={'lead_time': lead}, name='lead_time')
assert_equal(lead_time, preds_test['lead_time'])
# category
cat = np.array(['below normal', 'near normal', 'above normal'], dtype='<U12')
category = xr.DataArray(cat, dims='category', coords={'category': cat})
assert (category == preds_test['category']).all()
if 'category' in exclude:
cat = np.array(['below normal', 'near normal', 'above normal'], dtype='<U12')
category = xr.DataArray(cat, dims='category', coords={'category': cat}, name='category')
assert_equal(category, preds_test['category'])
# size
from dask.utils import format_bytes
size_in_MB = float(format_bytes(preds_test.nbytes).split(' ')[0])
assert size_in_MB > 50
assert size_in_MB < 250
if 'size' in exclude:
from dask.utils import format_bytes
size_in_MB = float(format_bytes(preds_test.nbytes).split(' ')[0])
# todo: refine for dtypes
assert size_in_MB > 50
assert size_in_MB < 250
# no other dims
assert set(preds_test.dims) - {'category', 'forecast_time', 'latitude', 'lead_time', 'longitude'} == set()
\ No newline at end of file
if 'dims' in exclude:
assert set(preds_test.dims) - {'category', 'forecast_time', 'latitude', 'lead_time', 'longitude'} == set()
source diff could not be displayed: it is stored in LFS. Options to address this: view the blob.
echo remove previous zarrs from renku dataset
renku dataset unlink s2s-ai-challenge --include "data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr/**/**" -y
renku dataset unlink s2s-ai-challenge --include "data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/**/**" -y
renku dataset unlink s2s-ai-challenge --include "data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr/**/**" -y
renku dataset unlink s2s-ai-challenge --include "data/forecast-like-observations_2020_biweekly_deterministic.zarr/**/**" -y
renku dataset unlink s2s-ai-challenge --include "data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr/**/**" -y
renku dataset unlink s2s-ai-challenge --include "data/**/**" -y
# observations and derived
renku dataset add s2s-ai-challenge data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
renku dataset add s2s-ai-challenge data/forecast-like-observations_2020_biweekly_deterministic.zarr
renku dataset add s2s-ai-challenge -o data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc
renku dataset add s2s-ai-challenge -o data/forecast-like-observations_2020_biweekly_terciled.nc
renku dataset add s2s-ai-challenge data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr
# benchmark
renku dataset add s2s-ai-challenge -o data/ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc
# forecast / hindcast
for center in ecmwf; do
renku dataset add s2s-ai-challenge data/${center}_hindcast-input_2000-2019_biweekly_deterministic.zarr
renku dataset add s2s-ai-challenge data/${center}_forecast-input_2020_biweekly_deterministic.zarr
done
renku dataset ls-files s2s-ai-challenge
renku dataset ls-tags s2s-ai-challenge
echo consider new tag: renku dataset tag -d description s2s-ai-challenge 0.x