Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • aaron.spring/s2s-ai-challenge-template
  • anthonypv_97/s2s-ai-challenge-template
  • anthonypv_97/s2s-ai-challenge-predictia
  • sandeep.sukumaran/s2s-ai-challenge-template
  • lucasdmarten/s2s-ai-challenge-template
  • otaviomf123/s2s-ai-challenge-template
  • utkuunalan/s2s-ai-challenge-template
  • utkuunalan/s2s-ai-challenge-envai
  • glabolhumar/s2s-ai-challenge-enveai
  • glabolhumar/s2s-ai-challenge-envai
  • 1557684138/s2s-ai-challenge-template
  • janer0/s2s-ai-challenge-template
  • luweispark/s2s-ai-challenge-template
  • luweispark/s2s-ai-challenge-tianing
  • 444375642/s2s-ai-challenge-onthego
  • rok.roskar/s2s-ai-challenge-template
  • wanwb1224/s2s-ai-challenge-template
  • 834586573/s2s-ai-challenge-template
  • suryadev/s2s-ai-challenge-template
  • suryadev/s2s-sps
  • rhkdgns322/s2s-ai-challenge-template
  • lorenzo.cavazzi.tech/s2s-ai-challenge-template-test
  • chprsandeep/s2s-ai-challenge-template
  • chprsandeep/s2s-ai-challenge-template-deeplearners
  • chprsandeep/s2s-ai-challenge-deeplearners
  • adam.bienkowski/s2s-ai-challenge-uconn
  • tasko.olevski/s2s-ai-challenge-template-test1
  • 605483660/s2s-ai-challenge-template-1
  • dattranoptimuss/s2s-ai-challenge-template
  • declan.finney/s2s-ai-challenge-template
  • hilla.gerstman/s2s-ai-challenge-template
  • maria.pyrina/s2s-ai-challenge-template
  • weiriche/s2s-ai-challenge-s2s-eth
  • lxywn96/s2s-ai-challenge-template
  • ken.takahashi.guevara/s2s-ai-challenge-senamhi
  • manpreet.phy/s2s-ai-challenge-template
  • rahul.s8396/s2s-ai-challenge-template
  • manmeet.singh/s2s-ai-challenge-template
  • manmeet.singh/s2s-ai-challenge-template-iitm-ut-austin
  • xiangyanfei212/s2s-ai-challenge-template
  • cheikhnoreyni.fall/s2s-ai-challenge-template
  • 1843402075/s2s-ai-challenge-template
  • priyanka.yadav/s2s-ai-challenge-template
  • priyanka.yadav/s2s-ai-challenge-s2s-eth
  • wanedahirou/s2s-ai-challenge-template
  • r08h19/s2s-ai-challenge-template
  • xueqy_666/s2s-ai-challenge-template
  • r08h19/s2s-ai-challenge-pink
  • 1727072371/s2s-ai-challenge-template
  • 1727072371/s2s-ai-challenge-templateandy
  • 1727072371/s2s-ai-challenge-templateandy1
  • jiehongx/s2s-ai-challenge-template
  • kwmski7/s2s-ai-challenge-template
  • lo.riches/s2s-ai-challenge-template
  • thmamouka/s2s-ai-challenge-agroapps
  • vvourlioti/s2s-ai-challenge-agroapps
  • dolkong400/s2s-ai-challenge-template
  • 1843402075/s2s-ai-challenge-123
  • daniel.steinfeld87/s2s-ai-challenge-kit-eth-ubern
  • jehangir_awan/s2s-ai-challenge-template
  • muhammad.haider/s2s-ai-challenge-template
  • rahul.s8396/s2s-ai-challenge-sa-india
  • mudithnirmala/s2s-ai-challenge-template
  • tao.sun/s2s-ai-challenge-template
  • rayjohnbell0/s2s-ai-challenge-template
  • lluis.palma/s2s-ai-challenge-bsc-es
  • daniel.janke/s2s-ai-challenge-daniel-janke
  • daniel.janke/s2s-ai-challenge-danieljanke
  • jordan.gierschendorf/s2s-ai-challenge-template
  • declan.finney/s2s-ai-challenge-swan
  • 1843402075/s2s-ai-challenge-1234
  • yixisi1505/s2s-ai-challenge-ylabaiers
  • hayakawa-gen1010/s2s-ai-challenge-template-ylabaiers
  • adounkpep/s2s-ai-challenge-pyaj
  • molina/s2s-ai-challenge-ncar-team1
  • molina/s2s-ai-challenge-ncar-team2
  • rmcsqrd/s2s-ai-challenge-explore
  • lxywn96/s2s-ai-challenge-template-new
  • lxywn96/s2s-ai-challenge-nuister-f1
  • b1gdaniel/s2s-ai-challenge-nuister-f2
  • xueqy_666/s2s-ai-challenge-xqy
  • xueqy_666/s2s-ai-challenge-nuister-f4
  • 1727072371/s2s-ai-challenge-nuister-f3
  • 1727072371/s2s-ai-challenge-nuister-f5
  • panglin0912/s2s-ai-challenge-nuister-f5
  • 1342071344/s2s-ai-challenge-template
  • 931072922/s2s-ai-challenge-test
  • 931072922/s2s-ai-challenge-test2
  • 931072922/s2s-ai-challenge-piesat
  • jaareval/s2s-ai-challenge-uatest
  • tasko.olevski/s2s-ai-challenge-template-test2
  • medakramzaytar/s2s-ai-challenge-tabola
  • kwibukabertrand/s2s-ai-challenge-template
  • roberto.garcia/s2s-ai-challenge
  • roberto.garcia/s2s-ai-challenge-mnt-cptec-inpe
  • tamer.shoubaki/s2s-ai-challenge-rssai
  • 1342071344/s2s-ai-challenge-teamname
  • 1342071344/s2s-ai-challenge-template0
  • thabangline/s2s-ai-challenge-template
  • 2101110593/s2s-ai-challenge-piesat
  • info2/s2s-ai-challenge-template
  • jordan.gierschendorf1/s2s-ai-challenge-template
  • deepkneko/s2s-ai-challenge-ylabaiers
  • gyin/s2s-ai-challenge-new
  • pmartineau.work/s2s-ai-challenge-template
  • awr/s2s-ai-challenge-template-awr
  • awr/s2s-ai-challenge-temp
  • tasko.olevski/s2s-ai-challenge-template-test3
  • awr/s2s-ai-challenge-template3
  • lluis.palma/s2s-ai-challenge-bsc
  • cheikhnoreyni.fall/s2s-ai-challenge-template-noreyni
  • cheikhnoreyni.fall/s2s-ai-challenge-template-noreynidioum
  • tamerthamoqa/s2s-ai-challenge-template
  • cheikhnoreyni.fall/s2s-ai-challenge-template-noreynilpa
  • damien.specq/s2s-ai-challenge-template
  • kjhall01/s2s-ai-challenge-kjhall01
  • bjoern.mayer92/s2s-ai-challenge-template-zmaw2
  • zhoushanglin100/s2s-ai-challenge-template
  • samguo_321/s2s-ai-challenge-bsc
  • samguo_321/s2s-ai-challenge-guoshan
  • medakramzaytar/s2s-ai-challenge-bb
  • ejiro.chiomaa/s2s-ai-challenge-ejiro
  • mayur/s2s-ai-challenge-template-mayur
  • btickell/s2s-ai-challenge-template-mayur
  • junjie.liu.china/s2s-ai-challenge-template
  • zhanglang/s2s-ai-challenge-template
  • adjebbar83/s2s-ai-challenge-template
  • 1765007740/s2s-ai-challenge-template
128 results
Show changes
%% Cell type:markdown id: tags:
# Create biweekly renku datasets from `climetlab-s2s-ai-challenge`
%% Cell type:markdown id: tags:
Goal:
- Create biweekly renku datasets from [`climatelab-s2s-ai-challenge`](https://github.com/ecmwf-lab/climetlab-s2s-ai-challenge).
- These renku datasets are then used in notebooks:
- `ML_train_and_predict.ipynb` to train the ML model and do ML-based predictions
- `RPSS_verification.ipynb` to calculate RPSS of the ML model
- `mean_bias_reduction.ipynb` to remove the mean bias and
Requirements:
- [`climetlab`](https://github.com/ecmwf/climetlab)
- [`climatelab-s2s-ai-challenge`](https://github.com/ecmwf-lab/climetlab-s2s-ai-challenge)
- S2S and CPC observations uploaded on [European Weather Cloud (EWC)](https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/training-input/0.3.0/netcdf/index.html)
Output: [renku dataset](https://renku-python.readthedocs.io/en/latest/commands.html#module-renku.cli.dataset) `s2s-ai-challenge`
- observations
- deterministic:
- `hindcast-like-observations_2000-2019_biweekly_deterministic.zarr`
- `forecast-like-observations_2020_biweekly_deterministic.zarr`
- edges:
- `hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc`
- probabilistic:
- `hindcast-like-observations_2000-2019_biweekly_terciled.zarr`
- `forecast-like-observations_2020_biweekly_terciled.nc`
- forecasts/hindcasts
- deterministic:
- `ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr`
- `ecmwf_forecast-input_2020_biweekly_deterministic.zarr`
- more models could be added
- benchmark:
- probabilistic:
- `ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc`
%% Cell type:code id: tags:
``` python
import matplotlib.pyplot as plt
import xarray as xr
import xskillscore as xs
import pandas as pd
import climetlab_s2s_ai_challenge
import climetlab as cml
print(f'Climetlab version : {cml.__version__}')
print(f'Climetlab-s2s-ai-challenge plugin version : {climetlab_s2s_ai_challenge.__version__}')
xr.set_options(keep_attrs=True)
xr.set_options(display_style='text')
```
%% Output
Climetlab version : 0.8.0
Climetlab-s2s-ai-challenge plugin version : 0.6.7
WARNING: ecmwflibs universal: cannot find a library called MagPlus
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/climetlab/plotting/drivers/magics/actions.py:36: UserWarning: Magics library could not be found
warnings.warn(str(e))
Magics library could not be found
Climetlab version : 0.8.6
Climetlab-s2s-ai-challenge plugin version : 0.8.0
<xarray.core.options.set_options at 0x2b4cbd09de10>
<xarray.core.options.set_options at 0x2b51c148f590>
%% Cell type:code id: tags:
``` python
# caching path for climetlab
cache_path = "/work/mh0727/m300524/S2S_AI/cache" # set your own path
cache_path = "/work/mh0727/m300524/S2S_AI/cache4" # set your own path
cml.settings.set("cache-directory", cache_path)
```
%% Cell type:code id: tags:
``` python
cache_path = "../data"
```
%% Cell type:markdown id: tags:
# Download and cache
Download all files for the observations, forecast and hindcast.
%% Cell type:code id: tags:
``` python
# shortcut
from scripts import download
#download()
```
%% Cell type:markdown id: tags:
## hindcast and forecast `input`
%% Cell type:code id: tags:
``` python
# starting dates forecast_time in 2020
dates = xr.cftime_range(start='20200102',freq='7D', periods=53).strftime('%Y%m%d').to_list()
forecast_dataset_labels = ['training-input','test-input'] # ML community
# equiv to
forecast_dataset_labels = ['hindcast-input','forecast-input'] # NWP community
varlist_forecast = ['tp','t2m'] # can add more
center_list = ['ecmwf'] # 'ncep', 'eccc'
```
%% Cell type:code id: tags:
``` python
%%time
# takes ~ 10-30 min to download for one model one variable depending on number of model realizations
# and download settings https://climetlab.readthedocs.io/en/latest/guide/settings.html
for center in center_list:
for ds in forecast_dataset_labels:
cml.load_dataset(f"s2s-ai-challenge-{ds}", origin=center, parameter=varlist_forecast, format='netcdf').to_xarray()
```
%% Cell type:markdown id: tags:
## observations `output-reference`
%% Cell type:code id: tags:
``` python
obs_dataset_labels = ['training-output-reference','test-output-reference'] # ML community
# equiv to
obs_dataset_labels = ['hindcast-like-observations','forecast-like-observations'] # NWP community
varlist_obs = ['tp', 't2m']
```
%% Cell type:code id: tags:
``` python
%%time
# takes 10min to download
for ds in obs_dataset_labels:
print(ds)
# only netcdf, no format choice
cml.load_dataset(f"s2s-ai-challenge-{ds}", date=dates, parameter=varlist_obs).to_xarray()
```
%% Cell type:code id: tags:
``` python
# download obs_time for to create output-reference/observations for other models than ecmwf and eccc,
# i.e. ncep or any S2S or Sub model
obs_time = cml.load_dataset(f"s2s-ai-challenge-observations", parameter=['t2m', 'pr']).to_xarray()
```
%% Cell type:markdown id: tags:
# create bi-weekly aggregates
%% Cell type:code id: tags:
``` python
from scripts import aggregate_biweekly, ensure_attributes
#aggregate_biweekly??
```
%% Cell type:code id: tags:
``` python
for c, center in enumerate(center_list): # forecast centers (could also take models)
for dsl in obs_dataset_labels + forecast_dataset_labels: # climetlab dataset labels
for dsl in obs_dataset_labels:# + forecast_dataset_labels: # climetlab dataset labels
for p, parameter in enumerate(varlist_forecast): # variables
if c != 0 and 'observation' in dsl: # only do once for observations
continue
print(f"datasetlabel: {dsl}, center: {center}, parameter: {parameter}")
if 'input' in dsl:
ds = cml.load_dataset(f"s2s-ai-challenge-{dsl}", origin=center, parameter=parameter, format='netcdf').to_xarray()
elif 'observation' in dsl: # obs only netcdf, no choice
if parameter not in ['t2m', 'tp']:
continue
ds = cml.load_dataset(f"s2s-ai-challenge-{dsl}", parameter=parameter, date=dates).to_xarray()
if p == 0:
ds_biweekly = ds.map(aggregate_biweekly)
else:
ds_biweekly[parameter] = ds.map(aggregate_biweekly)[parameter]
ds_biweekly = ds_biweekly.map(ensure_attributes, biweekly=True)
ds_biweekly = ds_biweekly.sortby('forecast_time')
if 'test' in dsl:
ds_biweekly = ds_biweekly.chunk('auto')
else:
ds_biweekly = ds_biweekly.chunk({'forecast_time':'auto','lead_time':-1,'longitude':-1,'latitude':-1})
if 'hindcast' in dsl:
time = f'{int(ds_biweekly.forecast_time.dt.year.min())}-{int(ds_biweekly.forecast_time.dt.year.max())}'
if 'input' in dsl:
name = f'{center}_{dsl}'
elif 'observations':
name = dsl
elif 'forecast' in dsl:
time = '2020'
if 'input' in dsl:
name = f'{center}_{dsl}'
elif 'observations':
name = dsl
else:
assert False
# pattern: {model_if_not_observations}{observations/forecast/hindcast}_{time}_biweekly_deterministic.zarr
zp = f'{cache_path}/{name}_{time}_biweekly_deterministic.zarr'
ds_biweekly.attrs.update({'postprocessed':'by https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/blob/master/notebooks/renku_datasets_biweekly.ipynb'})
ds_biweekly.attrs.update({'postprocessed_by':'https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/blob/master/notebooks/renku_datasets_biweekly.ipynb'})
print(f'save to: {zp}')
ds_biweekly.astype('float32').to_zarr(zp, consolidated=True, mode='w')
```
%% Output
datasetlabel: hindcast-like-observations, center: ecmwf, parameter: tp
By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data. This dataset has been dowloaded from IRIDL. By downloading this data you also agree to the terms and conditions defined at https://iridl.ldeo.columbia.edu.
WARNING: ecmwflibs universal: found eccodes at /work/mh0727/m300524/conda-envs/s2s-ai/lib/libeccodes.so
Warning: ecCodes 2.21.0 or higher is recommended. You are running version 2.18.0
By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data.
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/indexing.py:1379: PerformanceWarning: Slicing with an out-of-order index is generating 20 times more chunks
return self.array[key]
datasetlabel: hindcast-like-observations, center: ecmwf, parameter: t2m
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/indexing.py:1379: PerformanceWarning: Slicing is producing a large chunk. To accept the large
chunk and silence this warning, set the option
>>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
... array[indexer]
To avoid creating the large chunks, set the option
>>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
... array[indexer]
return self.array[key]
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/indexing.py:1379: PerformanceWarning: Slicing is producing a large chunk. To accept the large
chunk and silence this warning, set the option
>>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
... array[indexer]
To avoid creating the large chunks, set the option
>>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
... array[indexer]
return self.array[key]
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/indexing.py:1379: PerformanceWarning: Slicing with an out-of-order index is generating 20 times more chunks
return self.array[key]
save to: ../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
x = np.divide(x1, x2, out)
datasetlabel: forecast-like-observations, center: ecmwf, parameter: tp
By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data. This dataset has been dowloaded from IRIDL. By downloading this data you also agree to the terms and conditions defined at https://iridl.ldeo.columbia.edu.
datasetlabel: forecast-like-observations, center: ecmwf, parameter: t2m
save to: ../data/forecast-like-observations_2020_biweekly_deterministic.zarr
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
x = np.divide(x1, x2, out)
%% Cell type:markdown id: tags:
## add to `renku` dataset `s2s-ai-challenge`
%% Cell type:code id: tags:
``` python
# observations as hindcast
# run renku commands from projects root directory only
# !renku dataset add s2s-ai-challenge data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
```
%% Cell type:code id: tags:
``` python
# for further use retrieve from git lfs
# !renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
```
%% Cell type:code id: tags:
``` python
obs_2000_2019 = xr.open_zarr(f"{cache_path}/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr", consolidated=True)
print(obs_2000_2019.sizes,'\n',obs_2000_2019.coords,'\n', obs_2000_2019.nbytes/1e6,'MB')
```
%% Output
Frozen(SortedKeysDict({'forecast_time': 1060, 'latitude': 121, 'lead_time': 2, 'longitude': 240}))
Coordinates:
* forecast_time (forecast_time) datetime64[ns] 2000-01-02 ... 2019-12-31
* latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
* lead_time (lead_time) timedelta64[ns] 14 days 28 days
* longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
valid_time (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 1060), meta=np.ndarray>
492.546744 MB
%% Cell type:code id: tags:
``` python
# observations as forecast
# run renku commands from projects root directory only
# !renku dataset add s2s-ai-challenge data/forecast-like-observations_2020_biweekly_deterministic.zarr
```
%% Cell type:code id: tags:
``` python
# for further use retrieve from git lfs
# !renku storage pull ../data/forecast-like-observations_2020_biweekly_deterministic.zarr
```
%% Cell type:code id: tags:
``` python
obs_2020 = xr.open_zarr(f"{cache_path}/forecast-like-observations_2020_biweekly_deterministic.zarr", consolidated=True)
print(obs_2020.sizes,'\n',obs_2020.coords,'\n', obs_2020.nbytes/1e6,'MB')
```
%% Output
Frozen(SortedKeysDict({'forecast_time': 53, 'latitude': 121, 'lead_time': 2, 'longitude': 240}))
Coordinates:
* forecast_time (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31
* latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
* lead_time (lead_time) timedelta64[ns] 14 days 28 days
* longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
valid_time (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 53), meta=np.ndarray>
24.630096 MB
%% Cell type:code id: tags:
``` python
# ecmwf hindcast-input
# run renku commands from projects root directory only
# !renku dataset add s2s-ai-challenge data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr
```
%% Cell type:code id: tags:
``` python
hind_2000_2019 = xr.open_zarr(f"{cache_path}/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr", consolidated=True)
print(hind_2000_2019.sizes,'\n',hind_2000_2019.coords,'\n', hind_2000_2019.nbytes/1e6,'MB')
# for further use retrieve from git lfs
# !renku storage pull ../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr
```
%% Output
%% Cell type:code id: tags:
Frozen(SortedKeysDict({'forecast_time': 1060, 'latitude': 121, 'lead_time': 2, 'longitude': 240, 'realization': 11}))
Coordinates:
* forecast_time (forecast_time) datetime64[ns] 2000-01-02 ... 2019-12-31
* latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
* lead_time (lead_time) timedelta64[ns] 14 days 28 days
* longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
* realization (realization) int64 0 1 2 3 4 5 6 7 8 9 10
valid_time (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 1060), meta=np.ndarray>
5417.730832 MB
``` python
hind_2000_2019 = xr.open_zarr(f"{cache_path}/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr", consolidated=True)
print(hind_2000_2019.sizes,'\n',hind_2000_2019.coords,'\n', hind_2000_2019.nbytes/1e6,'MB')
```
%% Cell type:code id: tags:
``` python
# ecmwf forecast-input
# run renku commands from projects root directory only
# !renku dataset add s2s-ai-challenge data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr
```
%% Cell type:code id: tags:
``` python
fct_2020 = xr.open_zarr(f"{cache_path}/ecmwf_forecast-input_2020_biweekly_deterministic.zarr", consolidated=True)
print(fct_2020.sizes,'\n',fct_2020.coords,'\n', fct_2020.nbytes/1e6,'MB')
# for further use retrieve from git lfs
# !renku storage pull ../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr
```
%% Output
%% Cell type:code id: tags:
Frozen(SortedKeysDict({'forecast_time': 53, 'latitude': 121, 'lead_time': 2, 'longitude': 240, 'realization': 51}))
Coordinates:
* forecast_time (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31
* latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
* lead_time (lead_time) timedelta64[ns] 14 days 28 days
* longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
* realization (realization) int64 0 1 2 3 4 5 6 7 ... 44 45 46 47 48 49 50
valid_time (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 53), meta=np.ndarray>
1255.926504 MB
``` python
fct_2020 = xr.open_zarr(f"{cache_path}/ecmwf_forecast-input_2020_biweekly_deterministic.zarr", consolidated=True)
print(fct_2020.sizes,'\n',fct_2020.coords,'\n', fct_2020.nbytes/1e6,'MB')
```
%% Cell type:markdown id: tags:
# tercile edges
Create 2 tercile edges at 1/3 and 2/3 quantiles of the 2000-2019 biweekly distrbution for each week of the year
%% Cell type:code id: tags:
``` python
obs_2000_2019 = xr.open_zarr(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr', consolidated=True)
```
%% Cell type:code id: tags:
``` python
from scripts import add_year_week_coords
```
%% Cell type:code id: tags:
``` python
# add week for groupby, see https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge/-/issues/29
obs_2000_2019 = add_year_week_coords(obs_2000_2019)
```
%% Output
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:383: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
FutureWarning,
%% Cell type:code id: tags:
``` python
obs_2000_2019
```
%% Output
<xarray.Dataset>
Dimensions: (forecast_time: 1060, latitude: 121, lead_time: 2, longitude: 240)
Coordinates:
* forecast_time (forecast_time) datetime64[ns] 2000-01-02 ... 2019-12-31
* latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
* lead_time (lead_time) timedelta64[ns] 14 days 28 days
* longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
valid_time (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 1060), meta=np.ndarray>
week (forecast_time) int64 1 2 3 4 5 6 7 ... 47 48 49 50 51 52 53
year (forecast_time) int64 2000 2000 2000 2000 ... 2019 2019 2019
Data variables:
t2m (lead_time, forecast_time, latitude, longitude) float32 dask.array<chunksize=(2, 530, 121, 240), meta=np.ndarray>
tp (lead_time, forecast_time, latitude, longitude) float32 dask.array<chunksize=(2, 530, 121, 240), meta=np.ndarray>
Attributes:
created_by_script: tools/observations/makefile
created_by_software: climetlab-s2s-ai-challenge
function: climetlab_s2s_ai_challenge.extra.forecast_like_obse...
postprocessed_by: https://renkulab.io/gitlab/aaron.spring/s2s-ai-chal...
regrid_method: conservative
source_dataset_name: NOAA NCEP CPC UNIFIED_PRCP GAUGE_BASED GLOBAL v1p0 ...
source_hosting: IRIDL
source_url: http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/...
%% Cell type:code id: tags:
``` python
tercile_file = f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc'
```
%% Cell type:code id: tags:
``` python
%%time
xr.open_zarr(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr',
consolidated=True).chunk({'forecast_time':-1,'longitude':'auto'}).groupby('forecast_time.weekofyear').quantile(q=[1./3.,2./3.], dim=['forecast_time']).rename({'quantile':'category_edge'}).astype('float32').to_netcdf(tercile_file)
obs_2000_2019.chunk({'forecast_time':-1,'longitude':'auto'}).groupby('week').quantile(q=[1./3.,2./3.], dim='forecast_time').rename({'quantile':'category_edge'}).astype('float32').to_netcdf(tercile_file)
```
%% Output
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
FutureWarning,
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/numpy/lib/nanfunctions.py:1390: RuntimeWarning: All-NaN slice encountered
overwrite_input, interpolation)
CPU times: user 21min 25s, sys: 9min 19s, total: 30min 45s
Wall time: 18min 4s
CPU times: user 19min 35s, sys: 8min 33s, total: 28min 9s
Wall time: 16min 44s
%% Cell type:code id: tags:
``` python
tercile_edges = xr.open_dataset(tercile_file)
tercile_edges
```
%% Output
<xarray.Dataset>
Dimensions: (category_edge: 2, latitude: 121, lead_time: 2, longitude: 240, weekofyear: 53)
Dimensions: (category_edge: 2, latitude: 121, lead_time: 2, longitude: 240, week: 53)
Coordinates:
* latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
* lead_time (lead_time) timedelta64[ns] 14 days 28 days
* longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
* category_edge (category_edge) float64 0.3333 0.6667
* weekofyear (weekofyear) int64 1 2 3 4 5 6 7 8 ... 47 48 49 50 51 52 53
* week (week) int64 1 2 3 4 5 6 7 8 9 ... 45 46 47 48 49 50 51 52 53
Data variables:
t2m (weekofyear, category_edge, lead_time, latitude, longitude) float32 ...
tp (weekofyear, category_edge, lead_time, latitude, longitude) float32 ...
t2m (week, category_edge, lead_time, latitude, longitude) float32 ...
tp (week, category_edge, lead_time, latitude, longitude) float32 ...
Attributes:
created_by_script: tools/observations/makefile
created_by_software: climetlab-s2s-ai-challenge
function: climetlab_s2s_ai_challenge.extra.forecast_like_obse...
postprocessed: by https://renkulab.io/gitlab/aaron.spring/s2s-ai-c...
postprocessed_by: https://renkulab.io/gitlab/aaron.spring/s2s-ai-chal...
regrid_method: conservative
source_dataset_name: NOAA NCEP CPC UNIFIED_PRCP GAUGE_BASED GLOBAL v1p0 ...
source_hosting: IRIDL
source_url: http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/...
%% Cell type:code id: tags:
``` python
tercile_edges.nbytes*1e-6,'MB'
```
%% Output
(49.255184, 'MB')
%% Cell type:code id: tags:
``` python
# run renku commands from projects root directory only
# tercile edges
#!renku dataset add s2s-ai-challenge data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc
```
%% Cell type:code id: tags:
``` python
# to use retrieve from git lfs
#!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc
#xr.open_dataset("../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc")
```
%% Cell type:markdown id: tags:
# observations in categories
- counting how many deterministic forecasts realizations fall into each category, like counting rps
- categorize forecast-like-observations 2020 into categories
%% Cell type:code id: tags:
``` python
obs_2020 = xr.open_zarr(f'{cache_path}/forecast-like-observations_2020_biweekly_deterministic.zarr', consolidated=True)
obs_2020.sizes
```
%% Output
Frozen(SortedKeysDict({'forecast_time': 53, 'latitude': 121, 'lead_time': 2, 'longitude': 240}))
%% Cell type:code id: tags:
``` python
# create a mask for land grid
mask = obs_2020.std(['lead_time','forecast_time']).notnull()
```
%% Cell type:code id: tags:
``` python
# mask.to_array().plot(col='variable')
```
%% Cell type:code id: tags:
``` python
# total precipitation in arid regions are masked
# Frederic Vitart suggested by email: "Based on your map we could mask all the areas where the lower tercile boundary is lower than 0.1 mm"
# we are using a dry mask as in https://doi.org/10.1175/MWR-D-17-0092.1
th = 0.01
tp_arid_mask = tercile_edges.tp.isel(category_edge=0, lead_time=0, drop=True) > th
#tp_arid_mask.where(mask.tp).plot(col='forecast_time', col_wrap=4)
#plt.suptitle(f'dry mask: week 3-4 tp 1/3 category_edge > {th} kg m-2',y=1., x=.4)
#plt.savefig('dry_mask.png')
```
%% Cell type:code id: tags:
``` python
# look into tercile edges
```
%% Cell type:code id: tags:
``` python
#tercile_edges.isel(forecast_time=0)['tp'].plot(col='lead_time',row='category_edge', robust=True)
```
%% Cell type:code id: tags:
``` python
#tercile_edges.isel(forecast_time=[0,20],category_edge=1)['tp'].plot(col='lead_time', row='forecast_time', robust=True)
```
%% Cell type:code id: tags:
``` python
# tercile_edges.isel(forecast_time=0)['tp'].plot(col='lead_time',row='category_edge', robust=True)
# tercile_edges.isel(forecast_time=[0,20],category_edge=1)['tp'].plot(col='lead_time', row='forecast_time', robust=True)
# tercile_edges.tp.mean(['forecast_time']).plot(col='lead_time',row='category_edge',vmax=.5)
```
%% Cell type:markdown id: tags:
## categorize observations
%% Cell type:markdown id: tags:
### forecast 2020
%% Cell type:code id: tags:
``` python
from scripts import make_probabilistic
```
%% Cell type:code id: tags:
``` python
# tp_arid_mask.isel(week=[0,10,20,30,40]).plot(col='week')
```
%% Cell type:code id: tags:
``` python
obs_2020_p = make_probabilistic(obs_2020, tercile_edges, mask=mask)
```
%% Output
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:383: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
FutureWarning,
%% Cell type:code id: tags:
``` python
obs_2020_p.nbytes/1e6, 'MB'
```
%% Output
(147.75984, 'MB')
%% Cell type:code id: tags:
``` python
obs_2020_p
```
%% Output
<xarray.Dataset>
Dimensions: (category: 3, forecast_time: 53, latitude: 121, lead_time: 2, longitude: 240)
Coordinates:
* forecast_time (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31
* latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
* lead_time (lead_time) timedelta64[ns] 14 days 28 days
* longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
valid_time (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 53), meta=np.ndarray>
* category (category) <U12 'below normal' 'near normal' 'above normal'
Data variables:
t2m (category, lead_time, forecast_time, latitude, longitude) float64 dask.array<chunksize=(1, 2, 53, 121, 240), meta=np.ndarray>
tp (category, lead_time, forecast_time, latitude, longitude) float64 dask.array<chunksize=(1, 2, 53, 121, 240), meta=np.ndarray>
%% Cell type:code id: tags:
``` python
obs_2020_p.astype('float32').to_netcdf(f'{cache_path}/forecast-like-observations_2020_biweekly_terciled.nc')
```
%% Output
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
x = np.divide(x1, x2, out)
%% Cell type:code id: tags:
``` python
# forecast-like-observations terciled
# run renku commands from projects root directory only
# !renku dataset add s2s-ai-challenge data/forecast-like-observations_2020_biweekly_terciled.nc
```
%% Cell type:code id: tags:
``` python
# to use retrieve from git lfs
#!renku storage pull ../data/forecast-like-observations_2020_biweekly_terciled.nc
xr.open_dataset("../data/forecast-like-observations_2020_biweekly_terciled.nc")
```
%% Output
<xarray.Dataset>
Dimensions: (category: 3, forecast_time: 53, latitude: 121, lead_time: 2, longitude: 240)
Coordinates:
* forecast_time (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31
* latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
* lead_time (lead_time) timedelta64[ns] 14 days 28 days
* longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
valid_time (lead_time, forecast_time) datetime64[ns] ...
* category (category) object 'below normal' 'near normal' 'above normal'
Data variables:
t2m (category, lead_time, forecast_time, latitude, longitude) float32 ...
tp (category, lead_time, forecast_time, latitude, longitude) float32 ...
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
### hindcast 2000_2019
%% Cell type:code id: tags:
``` python
obs_2000_2019 = xr.open_zarr(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr', consolidated=True)
```
%% Cell type:code id: tags:
``` python
obs_2000_2019_p = make_probabilistic(obs_2000_2019, tercile_edges, mask=mask)
```
%% Output
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:383: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
FutureWarning,
%% Cell type:code id: tags:
``` python
obs_2000_2019_p.nbytes/1e6, 'MB'
```
%% Output
(2955.138888, 'MB')
%% Cell type:code id: tags:
``` python
obs_2000_2019_p.astype('float32').chunk('auto').to_zarr(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_terciled.zarr', consolidated=True, mode='w')
```
%% Output
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
x = np.divide(x1, x2, out)
<xarray.backends.zarr.ZarrStore at 0x2b34e40d80c0>
<xarray.backends.zarr.ZarrStore at 0x2b51c1233360>
%% Cell type:code id: tags:
``` python
# forecast-like-observations terciled
# run renku commands from projects root directory only
# !renku dataset add s2s-ai-challenge data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr
```
%% Cell type:code id: tags:
``` python
# to use retrieve from git lfs
#!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr
xr.open_zarr("../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr")
```
%% Output
<xarray.Dataset>
Dimensions: (category: 3, forecast_time: 1060, latitude: 121, lead_time: 2, longitude: 240)
Coordinates:
* category (category) <U12 'below normal' 'near normal' 'above normal'
* forecast_time (forecast_time) datetime64[ns] 2000-01-02 ... 2019-12-31
* latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
* lead_time (lead_time) timedelta64[ns] 14 days 28 days
* longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
valid_time (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 1060), meta=np.ndarray>
Data variables:
t2m (category, lead_time, forecast_time, latitude, longitude) float32 dask.array<chunksize=(1, 2, 280, 121, 240), meta=np.ndarray>
tp (category, lead_time, forecast_time, latitude, longitude) float32 dask.array<chunksize=(1, 2, 280, 121, 240), meta=np.ndarray>
t2m (category, lead_time, forecast_time, latitude, longitude) float32 dask.array<chunksize=(1, 2, 530, 121, 240), meta=np.ndarray>
tp (category, lead_time, forecast_time, latitude, longitude) float32 dask.array<chunksize=(1, 2, 530, 121, 240), meta=np.ndarray>
%% Cell type:code id: tags:
``` python
# checking category frequencies
# o = xr.open_zarr("../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr")
# w=0
# v='tp'
# o.sel(forecast_time=o.forecast_time.dt.dayofyear==2+7*w).sum('forecast_time', skipna=False)[v].plot(row='lead_time',col='category', levels=[5.5,6.5,7.5])
# o.sel(forecast_time=o.forecast_time.dt.dayofyear==2+7*w).sum('forecast_time', skipna=False).sum('category', skipna=False)[v].plot(row='lead_time', levels=[16.5,17.5,18.5,19.5,20.5])
```
%% Cell type:markdown id: tags:
# Benchmark
center: ECMWF
The calibration has been performed by using the tercile boundaries from the model climatology rather than from observations. Script by Frederic Vitart.
%% Cell type:code id: tags:
``` python
bench_p = cml.load_dataset("s2s-ai-challenge-test-output-benchmark", parameter=['tp','t2m']).to_xarray()
```
%% Output
50%|█████ | 1/2 [00:00<00:00, 6.11it/s]
By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data.
100%|██████████| 2/2 [00:00<00:00, 6.89it/s]
WARNING: ecmwflibs universal: found eccodes at /work/mh0727/m300524/conda-envs/s2s-ai/lib/libeccodes.so
Warning: ecCodes 2.21.0 or higher is recommended. You are running version 2.12.3
%% Cell type:code id: tags:
``` python
bench_p['category'].attrs = {'long_name': 'tercile category probabilities', 'units': '1',
'description': 'Probabilities for three tercile categories. All three tercile category probabilities must add up to 1.'}
```
%% Cell type:code id: tags:
``` python
bench_p['lead_time'] = [pd.Timedelta(f"{i} d") for i in [14, 28]] # take first day of biweekly average as new coordinate
bench_p['lead_time'].attrs = {'long_name':'forecast_period', 'description': 'Forecast period is the time interval between the forecast reference time and the validity time.',
'aggregate': 'The pd.Timedelta corresponds to the first day of a biweekly aggregate.',
'week34_t2m': 'mean[day 14, 27]',
'week56_t2m': 'mean[day 28, 41]',
'week34_tp': 'day 28 minus day 14',
'week56_tp': 'day 42 minus day 28'}
```
%% Cell type:code id: tags:
``` python
bench_p = bench_p / 100 # convert percent to [0-1] probability
```
%% Cell type:code id: tags:
``` python
bench_p = bench_p.map(ensure_attributes, biweekly=True)
```
%% Output
0%| | 0/1 [00:00<?, ?it/s]
By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data.
100%|██████████| 1/1 [00:00<00:00, 4.34it/s]
100%|██████████| 1/1 [00:00<00:00, 4.22it/s]
%% Cell type:code id: tags:
``` python
# bench_p.isel(forecast_time=2).t2m.plot(row='lead_time', col='category')
```
%% Cell type:code id: tags:
``` python
bench_p
```
%% Output
<xarray.Dataset>
Dimensions: (category: 3, forecast_time: 53, latitude: 121, lead_time: 2, longitude: 240)
Coordinates:
* category (category) object 'below normal' 'near normal' 'above normal'
* forecast_time (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31
* lead_time (lead_time) timedelta64[ns] 14 days 28 days
* latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
* longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
valid_time (forecast_time, lead_time) datetime64[ns] 2020-01-16 ... 2...
Data variables:
tp (category, forecast_time, lead_time, latitude, longitude) float32 ...
t2m (category, forecast_time, lead_time, latitude, longitude) float32 ...
%% Cell type:code id: tags:
``` python
bench_p.astype('float32').to_netcdf('../data/ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc')
```
%% Cell type:code id: tags:
``` python
#!renku dataset add s2s-ai-challenge data/ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc
```
......
......@@ -19,7 +19,7 @@ def download(varlist_forecast=['tp','t2m'],
cml.settings.set("cache-directory", cache_path)
"""
if isinstance(center_list, str):
model_list = [center_list]
center_list = [center_list]
if isinstance(varlist_forecast, str):
varlist_forecast = [varlist_forecast]
......@@ -27,11 +27,11 @@ def download(varlist_forecast=['tp','t2m'],
if forecast_dataset_labels:
print(f'Downloads variables {varlist_forecast} from datasets {forecast_dataset_labels} from center {center_list} in {format} format.')
for model in model_list:
for center in center_list:
for ds in forecast_dataset_labels:
for parameter in varlist_forecast:
try:
cml.load_dataset(f"s2s-ai-challenge-{ds}", origin=model, parameter=varlist_forecast, format=format).to_xarray()
cml.load_dataset(f"s2s-ai-challenge-{ds}", origin=center, parameter=varlist_forecast, format=format).to_xarray()
except:
pass
if obs_dataset_labels:
......@@ -146,11 +146,25 @@ def ensure_attributes(da, biweekly=False):
return da
def make_probabilistic(ds, tercile_edges, member_dim='realization', mask=None):
def add_year_week_coords(ds):
import numpy as np
if 'week' not in ds.coords and 'year' not in ds.coords:
year = ds.forecast_time.dt.year.to_index().unique()
week = (list(np.arange(1,54)))
weeks = week * len(year)
years = np.repeat(year,len(week))
ds.coords["week"] = ("forecast_time", weeks)
ds.coords['week'].attrs['description'] = "This week represents the number of forecast_time starting from 1 to 53. Note: This week is different from the ISO week from groupby('forecast_time.weekofyear'), see https://en.wikipedia.org/wiki/ISO_week_date and https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge/-/issues/29"
ds.coords["year"] = ("forecast_time", years)
ds.coords['year'].attrs['long_name'] = "calendar year"
return ds
def make_probabilistic(ds, tercile_edges, member_dim='realization', mask=None, groupby_coord='week'):
"""Compute probabilities from ds (observations or forecasts) based on tercile_edges."""
# broadcast
if 'forecast_time' not in tercile_edges.dims and 'weekofyear' in tercile_edges.dims:
tercile_edges = tercile_edges.sel(weekofyear=ds.forecast_time.dt.weekofyear)
ds = add_year_week_coords(ds)
tercile_edges = tercile_edges.sel({groupby_coord: ds.coords[groupby_coord]})
bn = ds < tercile_edges.isel(category_edge=0, drop=True) # below normal
n = (ds >= tercile_edges.isel(category_edge=0, drop=True)) & (ds < tercile_edges.isel(category_edge=1, drop=True)) # normal
an = ds >= tercile_edges.isel(category_edge=1, drop=True) # above normal
......@@ -176,12 +190,14 @@ def make_probabilistic(ds, tercile_edges, member_dim='realization', mask=None):
'comment': 'All three tercile category probabilities must add up to 1.',
'variable_before_categorization': 'https://confluence.ecmwf.int/display/S2S/S2S+Surface+Air+Temperature'
}
if 'weekofyear' in ds_p.coords:
ds_p = ds_p.drop('weekofyear')
if 'year' in ds_p.coords:
del ds_p.coords['year']
if groupby_coord in ds_p.coords:
ds_p = ds_p.drop(groupby_coord)
return ds_p
def skill_by_year(preds):
def skill_by_year(preds, adapt=False):
"""Returns pd.Dataframe of RPSS per year."""
# similar verification_RPSS.ipynb
# as scorer bot but returns a score for each year
......@@ -194,44 +210,49 @@ def skill_by_year(preds):
# from root
#renku storage pull data/forecast-like-observations_2020_biweekly_terciled.nc
#renku storage pull data/hindcast-like-observations_2000-2019_biweekly_terciled.nc
cache_path = '../data'
if 2020 in preds.forecast_time.dt.year:
obs_p = xr.open_dataset(f'{cache_path}/forecast-like-observations_2020_biweekly_terciled.nc').sel(forecast_time=preds.forecast_time)
else:
obs_p = xr.open_dataset(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_terciled.zarr', engine='zarr').sel(forecast_time=preds.forecast_time)
# ML probabilities
fct_p = preds
# check inputs
assert_predictions_2020(obs_p)
assert_predictions_2020(fct_p)
# climatology
clim_p = xr.DataArray([1/3, 1/3, 1/3], dims='category', coords={'category':['below normal', 'near normal', 'above normal']}).to_dataset(name='tp')
clim_p['t2m'] = clim_p['tp']
## RPSS
if adapt:
# select only obs_p where fct_p forecasts provided
for c in ['longitude', 'latitude', 'forecast_time', 'lead_time']:
obs_p = obs_p.sel({c:fct_p[c]})
obs_p = obs_p[list(fct_p.data_vars)]
clim_p = clim_p[list(fct_p.data_vars)]
else:
# check inputs
assert_predictions_2020(obs_p)
assert_predictions_2020(fct_p)
# rps_ML
rps_ML = xs.rps(obs_p, fct_p, category_edges=None, dim=[], input_distributions='p').compute()
# rps_clim
rps_clim = xs.rps(obs_p, clim_p, category_edges=None, dim=[], input_distributions='p').compute()
# rpss
rpss = 1 - (rps_ML / rps_clim)
# https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/issues/7
# penalize
penalize = obs_p.where(fct_p!=1, other=-10).mean('category')
rpss = rpss.where(penalize!=0,other=-10)
## RPSS
# penalize # https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/issues/7
expect = obs_p.sum('category')
expect = expect.where(expect > 0.98).where(expect < 1.02) # should be True if not all NaN
# https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/issues/50
rps_ML = rps_ML.where(expect, other=2) # assign RPS=2 where value was expected but NaN found
# following Weigel 2007: https://doi.org/10.1175/MWR3280.1
rpss = 1 - (rps_ML.groupby('forecast_time.year').mean() / rps_clim.groupby('forecast_time.year').mean())
# clip
rpss = rpss.clip(-10, 1)
# average over all forecasts
rpss = rpss.groupby('forecast_time.year').mean()
# weighted area mean
weights = np.cos(np.deg2rad(np.abs(rpss.latitude)))
......
source diff could not be displayed: it is stored in LFS. Options to address this: view the blob.
echo remove previous zarrs from renku dataset
renku dataset unlink s2s-ai-challenge --include "data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr/**/**" -y
renku dataset unlink s2s-ai-challenge --include "data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/**/**" -y
renku dataset unlink s2s-ai-challenge --include "data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr/**/**" -y
renku dataset unlink s2s-ai-challenge --include "data/forecast-like-observations_2020_biweekly_deterministic.zarr/**/**" -y
renku dataset unlink s2s-ai-challenge --include "data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr/**/**" -y
renku dataset unlink s2s-ai-challenge --include "data/**/**" -y
# observations and derived
renku dataset add -o s2s-ai-challenge data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
renku dataset add s2s-ai-challenge data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
renku dataset add -o s2s-ai-challenge data/forecast-like-observations_2020_biweekly_deterministic.zarr
renku dataset add s2s-ai-challenge data/forecast-like-observations_2020_biweekly_deterministic.zarr
renku dataset add -o s2s-ai-challenge data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc
renku dataset add s2s-ai-challenge -o data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc
renku dataset add -o s2s-ai-challenge data/forecast-like-observations_2020_biweekly_terciled.nc
renku dataset add s2s-ai-challenge -o data/forecast-like-observations_2020_biweekly_terciled.nc
renku dataset add -o s2s-ai-challenge data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr
renku dataset add s2s-ai-challenge data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr
# benchmark
renku dataset add -o s2s-ai-challenge data/ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc
renku dataset add s2s-ai-challenge -o data/ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc
# forecast / hindcast
for center in ecmwf; do
renku dataset add -o s2s-ai-challenge data/${center}_hindcast-input_2000-2019_biweekly_deterministic.zarr
renku dataset add s2s-ai-challenge data/${center}_hindcast-input_2000-2019_biweekly_deterministic.zarr
renku dataset add -o s2s-ai-challenge data/${center}_forecast-input_2020_biweekly_deterministic.zarr
renku dataset add s2s-ai-challenge data/${center}_forecast-input_2020_biweekly_deterministic.zarr
done
......@@ -25,4 +40,4 @@ renku dataset ls-files s2s-ai-challenge
renku dataset ls-tags s2s-ai-challenge
echo consider new tag: renku dataset tag -d description s2s-ai-challenge 0.x
\ No newline at end of file
echo consider new tag: renku dataset tag -d description s2s-ai-challenge 0.x