Compare revisions

9019caf1 · 9019caf1 · 9019caf1 · 9019caf1 · 9019caf1 · 9019caf1
--- a/environment.yml
+++ b/environment.yml
 name: "base"
 channels:
  - defaults
-# dependencies:
-# - add packages here
-# - one per line
-prefix: "/opt/conda"
\ No newline at end of file
+dependencies:
+  - xarray
+  # ML
+  - tensorflow
+  - pytorch
+  # viz
+  - matplotlib-base
+  # - cartopy
+  # scoring
+  - xskillscore>=0.0.20  # includes sklearn
+  # data access
+  - intake
+  - fsspec
+  - zarr
+  - s3fs
+  - intake-xarray
+  - cfgrib
+  - eccodes
+  - nc-time-axis
+  - pydap
+  - h5netcdf
+  - netcdf4
+  - pip
+  - pip:
+    - climetlab >= 0.8.0
+    - climetlab_s2s_ai_challenge >= 0.7.1
+    - configargparse # for weatherbench
+    - netcdf4==1.5.4
+prefix: "/opt/conda"
--- a/notebooks/ML_forecast_template.ipynb
+++ b/notebooks/ML_forecast_template.ipynb
+%% Cell type:markdown id: tags:
+
+# Train ML model for predictions of week 3-4 & 5-6
+
+This notebook create a Machine Learning `ML_model` to predict weeks 3-4 & 5-6 based on `S2S` weeks 3-4 & 5-6 forecasts and is compared to `CPC` observations for the [`s2s-ai-challenge`](https://s2s-ai-challenge.github.io/).
+
+%% Cell type:markdown id: tags:
+
+# Synopsis
+
+%% Cell type:markdown id: tags:
+
+## Method: `name`
+
+- decription
+- a few details
+
+%% Cell type:markdown id: tags:
+
+## Data used
+
+Training-input for Machine Learning model:
+- renku datasets, climetlab, IRIDL
+
+Forecast-input for Machine Learning model:
+- renku datasets, climetlab, IRIDL
+
+Compare Machine Learning model forecast against ground truth:
+- renku datasets, climetlab, IRIDL
+
+%% Cell type:markdown id: tags:
+
+## Resources used
+for training, details in reproducibility
+
+- platform: renku
+- memory: 8 GB
+- processors: 2 CPU
+- storage required: 10 GB
+
+%% Cell type:markdown id: tags:
+
+## Safeguards
+
+All points have to be [x] checked. If not, your submission is invalid.
+
+Changes to the code after submissions are not possible, as the `commit` before the `tag` will be reviewed.
+(Only in exceptions and if previous effort in reproducibility can be found, it may be allowed to improve readability and reproducibility after November 1st 2021.)
+
+%% Cell type:markdown id: tags:
+
+### Safeguards to prevent [overfitting](https://en.wikipedia.org/wiki/Overfitting?wprov=sfti1)
+
+If the organizers suspect overfitting, your contribution can be disqualified.
+
+  - [ ] We did not use 2020 observations in training (explicit overfitting and cheating)
+  - [ ] We did not repeatedly verify my model on 2020 observations and incrementally improved my RPSS (implicit overfitting)
+  - [ ] We provide RPSS scores for the training period with script `skill_by_year`, see in section 6.3 `predict`.
+  - [ ] We tried our best to prevent [data leakage](https://en.wikipedia.org/wiki/Leakage_(machine_learning)?wprov=sfti1).
+  - [ ] We honor the `train-validate-test` [split principle](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets). This means that the hindcast data is split into `train` and `validate`, whereas `test` is withheld.
+  - [ ] We did not use `test` explicitly in training or implicitly in incrementally adjusting parameters.
+  - [ ] We considered [cross-validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)).
+
+%% Cell type:markdown id: tags:
+
+### Safeguards for Reproducibility
+Notebook/code must be independently reproducible from scratch by the organizers (after the competition), if not possible: no prize
+  - [ ] All training data is publicly available (no pre-trained private neural networks, as they are not reproducible for us)
+  - [ ] Code is well documented, readable and reproducible.
+  - [ ] Code to reproduce training and predictions is preferred to run within a day on the described architecture. If the training takes longer than a day, please justify why this is needed. Please do not submit training piplelines, which take weeks to train.
+
+%% Cell type:markdown id: tags:
+
+# Todos to improve template
+
+This is just a demo.
+
+- [ ] for both variables
+- [ ] for both `lead_time`s
+- [ ] ensure probabilistic prediction outcome with `category` dim
+
+%% Cell type:markdown id: tags:
+
+# Imports
+
+%% Cell type:code id: tags:
+
+``` python
+from tensorflow.keras.layers import Input, Dense, Flatten
+from tensorflow.keras.models import Sequential
+
+import matplotlib.pyplot as plt
+
+import xarray as xr
+xr.set_options(display_style='text')
+
+from dask.utils import format_bytes
+import xskillscore as xs
+```
+
+%% Cell type:markdown id: tags:
+
+# Get training data
+
+preprocessing of input data may be done in separate notebook/script
+
+%% Cell type:markdown id: tags:
+
+## Hindcast
+
+get weekly initialized hindcasts
+
+%% Cell type:code id: tags:
+
+``` python
+# consider renku datasets
+#! renku storage pull path
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+## Observations
+corresponding to hindcasts
+
+%% Cell type:code id: tags:
+
+``` python
+# consider renku datasets
+#! renku storage pull path
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+# ML model
+
+%% Cell type:code id: tags:
+
+``` python
+bs=32
+
+import numpy as np
+class DataGenerator(keras.utils.Sequence):
+    def __init__(self):
+        """
+        Data generator
+
+        Template from https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly
+
+        Args:
+
+        """
+
+        self.on_epoch_end()
+
+        # For some weird reason calling .load() earlier messes up the mean and std computations
+        if load: print('Loading data into RAM'); self.data.load()
+
+    def __len__(self):
+        'Denotes the number of batches per epoch'
+        return int(np.ceil(self.n_samples / self.batch_size))
+
+    def __getitem__(self, i):
+        'Generate one batch of data'
+        idxs = self.idxs[i * self.batch_size:(i + 1) * self.batch_size]
+        # got all nan if nans not masked
+        X = self.data.isel(time=idxs).fillna(0.).values
+        y = self.verif_data.isel(time=idxs).fillna(0.).values
+        return X, y
+
+    def on_epoch_end(self):
+        'Updates indexes after each epoch'
+        self.idxs = np.arange(self.n_samples)
+        if self.shuffle == True:
+            np.random.shuffle(self.idxs)
+```
+
+%% Cell type:markdown id: tags:
+
+## data prep: train, valid, test
+
+%% Cell type:code id: tags:
+
+``` python
+# time is the forecast_reference_time
+time_train_start,time_train_end='2000','2017'
+time_valid_start,time_valid_end='2018','2019'
+time_test = '2020'
+```
+
+%% Cell type:code id: tags:
+
+``` python
+dg_train = DataGenerator()
+```
+
+%% Cell type:code id: tags:
+
+``` python
+dg_valid = DataGenerator()
+```
+
+%% Cell type:code id: tags:
+
+``` python
+dg_test = DataGenerator()
+```
+
+%% Cell type:markdown id: tags:
+
+## `fit`
+
+%% Cell type:code id: tags:
+
+``` python
+cnn = keras.models.Sequential([])
+```
+
+%% Cell type:code id: tags:
+
+``` python
+cnn.summary()
+```
+
+%% Cell type:code id: tags:
+
+``` python
+cnn.compile(keras.optimizers.Adam(1e-4), 'mse')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+import warnings
+warnings.simplefilter("ignore")
+```
+
+%% Cell type:code id: tags:
+
+``` python
+cnn.fit(dg_train, epochs=1, validation_data=dg_valid)
+```
+
+%% Cell type:markdown id: tags:
+
+## `predict`
+
+Create predictions and print `mean(variable, lead_time, longitude, weighted latitude)` RPSS for all years as calculated by `skill_by_year`.
+
+%% Cell type:code id: tags:
+
+``` python
+from scripts import skill_by_year
+```
+
+%% Cell type:code id: tags:
+
+``` python
+def create_predictions(model, dg):
+    """Create non-iterative predictions"""
+    preds = model.predict(dg).squeeze()
+    # transform
+
+    return preds
+```
+
+%% Cell type:markdown id: tags:
+
+### `predict` training period in-sample
+
+%% Cell type:code id: tags:
+
+``` python
+preds_is = create_predictions(cnn, dg_train)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+skill_by_year(preds_is)
+```
+
+%% Cell type:markdown id: tags:
+
+### `predict` valid out-of-sample
+
+%% Cell type:code id: tags:
+
+``` python
+preds_os = create_predictions(cnn, dg_valid)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+skill_by_year(preds_os)
+```
+
+%% Cell type:markdown id: tags:
+
+### `predict` test
+
+%% Cell type:code id: tags:
+
+``` python
+preds_test = create_predictions(cnn, dg_test)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+skill_by_year(preds_test)
+```
+
+%% Cell type:markdown id: tags:
+
+# Submission
+
+%% Cell type:code id: tags:
+
+``` python
+preds_test.sizes # expect: category(3), longitude, latitude, lead_time(2), forecast_time (53)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+from scripts import assert_predictions_2020
+assert_predictions_2020(preds_test)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+preds_test.to_netcdf('../submissions/ML_prediction_2020.nc')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+#!git add ../submissions/ML_prediction_2020.nc
+#!git add ML_forecast_template.ipynb
+```
+
+%% Cell type:code id: tags:
+
+``` python
+#!git commit -m "commit submission for my_method_name" # whatever message you want
+```
+
+%% Cell type:code id: tags:
+
+``` python
+#!git tag "submission-my_method_name-0.0.1" # if this is to be checked by scorer, only the last submitted==tagged version will be considered
+```
+
+%% Cell type:code id: tags:
+
+``` python
+#!git push --tags
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+# Reproducibility
+
+%% Cell type:markdown id: tags:
+
+## memory
+
+%% Cell type:code id: tags:
+
+``` python
+# https://phoenixnap.com/kb/linux-commands-check-memory-usage
+!free -g
+```
+
+%% Cell type:markdown id: tags:
+
+## CPU
+
+%% Cell type:code id: tags:
+
+``` python
+!lscpu
+```
+
+%% Cell type:markdown id: tags:
+
+## software
+
+%% Cell type:code id: tags:
+
+``` python
+!conda list
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+%% Cell type:markdown id: tags:
+
+# Train ML model for predictions of week 3-4 & 5-6
+
+This notebook create a Machine Learning `ML_model` to predict weeks 3-4 & 5-6 based on `S2S` weeks 3-4 & 5-6 forecasts and is compared to `CPC` observations for the [`s2s-ai-challenge`](https://s2s-ai-challenge.github.io/).
+
+%% Cell type:markdown id: tags:
+
+# Synopsis
+
+%% Cell type:markdown id: tags:
+
+## Method: `name`
+
+- decription
+- a few details
+
+%% Cell type:markdown id: tags:
+
+## Data used
+
+Training-input for Machine Learning model:
+- renku datasets, climetlab, IRIDL
+
+Forecast-input for Machine Learning model:
+- renku datasets, climetlab, IRIDL
+
+Compare Machine Learning model forecast against ground truth:
+- renku datasets, climetlab, IRIDL
+
+%% Cell type:markdown id: tags:
+
+## Resources used
+for training, details in reproducibility
+
+- platform: renku
+- memory: 8 GB
+- processors: 2 CPU
+- storage required: 10 GB
+
+%% Cell type:markdown id: tags:
+
+## Safeguards
+
+All points have to be [x] checked. If not, your submission is invalid.
+
+Changes to the code after submissions are not possible, as the `commit` before the `tag` will be reviewed.
+(Only in exceptions and if previous effort in reproducibility can be found, it may be allowed to improve readability and reproducibility after November 1st 2021.)
+
+%% Cell type:markdown id: tags:
+
+### Safeguards to prevent [overfitting](https://en.wikipedia.org/wiki/Overfitting?wprov=sfti1)
+
+If the organizers suspect overfitting, your contribution can be disqualified.
+
+  - [ ] We did not use 2020 observations in training (explicit overfitting and cheating)
+  - [ ] We did not repeatedly verify my model on 2020 observations and incrementally improved my RPSS (implicit overfitting)
+  - [ ] We provide RPSS scores for the training period with script `skill_by_year`, see in section 6.3 `predict`.
+  - [ ] We tried our best to prevent [data leakage](https://en.wikipedia.org/wiki/Leakage_(machine_learning)?wprov=sfti1).
+  - [ ] We honor the `train-validate-test` [split principle](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets). This means that the hindcast data is split into `train` and `validate`, whereas `test` is withheld.
+  - [ ] We did not use `test` explicitly in training or implicitly in incrementally adjusting parameters.
+  - [ ] We considered [cross-validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)).
+
+%% Cell type:markdown id: tags:
+
+### Safeguards for Reproducibility
+Notebook/code must be independently reproducible from scratch by the organizers (after the competition), if not possible: no prize
+  - [ ] All training data is publicly available (no pre-trained private neural networks, as they are not reproducible for us)
+  - [ ] Code is well documented, readable and reproducible.
+  - [ ] Code to reproduce training and predictions is preferred to run within a day on the described architecture. If the training takes longer than a day, please justify why this is needed. Please do not submit training piplelines, which take weeks to train.
+
+%% Cell type:markdown id: tags:
+
+# Todos to improve template
+
+This is just a demo.
+
+- [ ] for both variables
+- [ ] for both `lead_time`s
+- [ ] ensure probabilistic prediction outcome with `category` dim
+
+%% Cell type:markdown id: tags:
+
+# Imports
+
+%% Cell type:code id: tags:
+
+``` python
+from tensorflow.keras.layers import Input, Dense, Flatten
+from tensorflow.keras.models import Sequential
+
+import matplotlib.pyplot as plt
+
+import xarray as xr
+xr.set_options(display_style='text')
+
+from dask.utils import format_bytes
+import xskillscore as xs
+```
+
+%% Cell type:markdown id: tags:
+
+# Get training data
+
+preprocessing of input data may be done in separate notebook/script
+
+%% Cell type:markdown id: tags:
+
+## Hindcast
+
+get weekly initialized hindcasts
+
+%% Cell type:code id: tags:
+
+``` python
+# consider renku datasets
+#! renku storage pull path
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+## Observations
+corresponding to hindcasts
+
+%% Cell type:code id: tags:
+
+``` python
+# consider renku datasets
+#! renku storage pull path
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+# ML model
+
+%% Cell type:code id: tags:
+
+``` python
+bs=32
+
+import numpy as np
+class DataGenerator(keras.utils.Sequence):
+    def __init__(self):
+        """
+        Data generator
+
+        Template from https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly
+
+        Args:
+
+        """
+
+        self.on_epoch_end()
+
+        # For some weird reason calling .load() earlier messes up the mean and std computations
+        if load: print('Loading data into RAM'); self.data.load()
+
+    def __len__(self):
+        'Denotes the number of batches per epoch'
+        return int(np.ceil(self.n_samples / self.batch_size))
+
+    def __getitem__(self, i):
+        'Generate one batch of data'
+        idxs = self.idxs[i * self.batch_size:(i + 1) * self.batch_size]
+        # got all nan if nans not masked
+        X = self.data.isel(time=idxs).fillna(0.).values
+        y = self.verif_data.isel(time=idxs).fillna(0.).values
+        return X, y
+
+    def on_epoch_end(self):
+        'Updates indexes after each epoch'
+        self.idxs = np.arange(self.n_samples)
+        if self.shuffle == True:
+            np.random.shuffle(self.idxs)
+```
+
+%% Cell type:markdown id: tags:
+
+## data prep: train, valid, test
+
+%% Cell type:code id: tags:
+
+``` python
+# time is the forecast_reference_time
+time_train_start,time_train_end='2000','2017'
+time_valid_start,time_valid_end='2018','2019'
+time_test = '2020'
+```
+
+%% Cell type:code id: tags:
+
+``` python
+dg_train = DataGenerator()
+```
+
+%% Cell type:code id: tags:
+
+``` python
+dg_valid = DataGenerator()
+```
+
+%% Cell type:code id: tags:
+
+``` python
+dg_test = DataGenerator()
+```
+
+%% Cell type:markdown id: tags:
+
+## `fit`
+
+%% Cell type:code id: tags:
+
+``` python
+cnn = keras.models.Sequential([])
+```
+
+%% Cell type:code id: tags:
+
+``` python
+cnn.summary()
+```
+
+%% Cell type:code id: tags:
+
+``` python
+cnn.compile(keras.optimizers.Adam(1e-4), 'mse')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+import warnings
+warnings.simplefilter("ignore")
+```
+
+%% Cell type:code id: tags:
+
+``` python
+cnn.fit(dg_train, epochs=1, validation_data=dg_valid)
+```
+
+%% Cell type:markdown id: tags:
+
+## `predict`
+
+Create predictions and print `mean(variable, lead_time, longitude, weighted latitude)` RPSS for all years as calculated by `skill_by_year`.
+
+%% Cell type:code id: tags:
+
+``` python
+from scripts import skill_by_year
+```
+
+%% Cell type:code id: tags:
+
+``` python
+def create_predictions(model, dg):
+    """Create non-iterative predictions"""
+    preds = model.predict(dg).squeeze()
+    # transform
+
+    return preds
+```
+
+%% Cell type:markdown id: tags:
+
+### `predict` training period in-sample
+
+%% Cell type:code id: tags:
+
+``` python
+preds_is = create_predictions(cnn, dg_train)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+skill_by_year(preds_is)
+```
+
+%% Cell type:markdown id: tags:
+
+### `predict` valid out-of-sample
+
+%% Cell type:code id: tags:
+
+``` python
+preds_os = create_predictions(cnn, dg_valid)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+skill_by_year(preds_os)
+```
+
+%% Cell type:markdown id: tags:
+
+### `predict` test
+
+%% Cell type:code id: tags:
+
+``` python
+preds_test = create_predictions(cnn, dg_test)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+skill_by_year(preds_test)
+```
+
+%% Cell type:markdown id: tags:
+
+# Submission
+
+%% Cell type:code id: tags:
+
+``` python
+preds_test.sizes # expect: category(3), longitude, latitude, lead_time(2), forecast_time (53)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+from scripts import assert_predictions_2020
+assert_predictions_2020(preds_test)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+preds_test.to_netcdf('../submissions/ML_prediction_2020.nc')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+#!git add ../submissions/ML_prediction_2020.nc
+#!git add ML_forecast_template.ipynb
+```
+
+%% Cell type:code id: tags:
+
+``` python
+#!git commit -m "commit submission for my_method_name" # whatever message you want
+```
+
+%% Cell type:code id: tags:
+
+``` python
+#!git tag "submission-my_method_name-0.0.1" # if this is to be checked by scorer, only the last submitted==tagged version will be considered
+```
+
+%% Cell type:code id: tags:
+
+``` python
+#!git push --tags
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+# Reproducibility
+
+%% Cell type:markdown id: tags:
+
+## memory
+
+%% Cell type:code id: tags:
+
+``` python
+# https://phoenixnap.com/kb/linux-commands-check-memory-usage
+!free -g
+```
+
+%% Cell type:markdown id: tags:
+
+## CPU
+
+%% Cell type:code id: tags:
+
+``` python
+!lscpu
+```
+
+%% Cell type:markdown id: tags:
+
+## software
+
+%% Cell type:code id: tags:
+
+``` python
+!conda list
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
--- a/notebooks/ML_train_and_predict.ipynb
+++ b/notebooks/ML_train_and_predict.ipynb
+%% Cell type:markdown id: tags:
+
+# Train ML model to correct predictions of week 3-4 & 5-6
+
+This notebook create a Machine Learning `ML_model` to predict weeks 3-4 & 5-6 based on `S2S` weeks 3-4 & 5-6 forecasts and is compared to `CPC` observations for the [`s2s-ai-challenge`](https://s2s-ai-challenge.github.io/).
+
+%% Cell type:markdown id: tags:
+
+# Synopsis
+
+%% Cell type:markdown id: tags:
+
+## Method: `ML-based mean bias reduction`
+
+- calculate the ML-based bias from 2000-2019 deterministic ensemble mean forecast
+- remove that the ML-based bias from 2020 forecast deterministic ensemble mean forecast
+
+%% Cell type:markdown id: tags:
+
+## Data used
+
+type: renku datasets
+
+Training-input for Machine Learning model:
+- hindcasts of models:
+    - ECMWF: `ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr`
+
+Forecast-input for Machine Learning model:
+- real-time 2020 forecasts of models:
+    - ECMWF: `ecmwf_forecast-input_2020_biweekly_deterministic.zarr`
+
+Compare Machine Learning model forecast against against ground truth:
+- `CPC` observations:
+    - `hindcast-like-observations_biweekly_deterministic.zarr`
+    - `forecast-like-observations_2020_biweekly_deterministic.zarr`
+
+%% Cell type:markdown id: tags:
+
+## Resources used
+for training, details in reproducibility
+
+- platform: renku
+- memory: 8 GB
+- processors: 2 CPU
+- storage required: 10 GB
+
+%% Cell type:markdown id: tags:
+
+## Safeguards
+
+All points have to be [x] checked. If not, your submission is invalid.
+
+Changes to the code after submissions are not possible, as the `commit` before the `tag` will be reviewed.
+(Only in exceptions and if previous effort in reproducibility can be found, it may be allowed to improve readability and reproducibility after November 1st 2021.)
+
+%% Cell type:markdown id: tags:
+
+### Safeguards to prevent [overfitting](https://en.wikipedia.org/wiki/Overfitting?wprov=sfti1)
+
+If the organizers suspect overfitting, your contribution can be disqualified.
+
+  - [x] We did not use 2020 observations in training (explicit overfitting and cheating)
+  - [x] We did not repeatedly verify my model on 2020 observations and incrementally improved my RPSS (implicit overfitting)
+  - [x] We provide RPSS scores for the training period with script `print_RPS_per_year`, see in section 6.3 `predict`.
+  - [x] We tried our best to prevent [data leakage](https://en.wikipedia.org/wiki/Leakage_(machine_learning)?wprov=sfti1).
+  - [x] We honor the `train-validate-test` [split principle](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets). This means that the hindcast data is split into `train` and `validate`, whereas `test` is withheld.
+  - [x] We did not use `test` explicitly in training or implicitly in incrementally adjusting parameters.
+  - [x] We considered [cross-validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)).
+
+%% Cell type:markdown id: tags:
+
+### Safeguards for Reproducibility
+Notebook/code must be independently reproducible from scratch by the organizers (after the competition), if not possible: no prize
+  - [x] All training data is publicly available (no pre-trained private neural networks, as they are not reproducible for us)
+  - [x] Code is well documented, readable and reproducible.
+  - [x] Code to reproduce training and predictions is preferred to run within a day on the described architecture. If the training takes longer than a day, please justify why this is needed. Please do not submit training piplelines, which take weeks to train.
+
+%% Cell type:markdown id: tags:
+
+# Todos to improve template
+
+This is just a demo.
+
+- [ ] use multiple predictor variables and two predicted variables
+- [ ] for both `lead_time`s in one go
+- [ ] consider seasonality, for now all `forecast_time` months are mixed
+- [ ] make probabilistic predictions with `category` dim, for now works deterministic
+
+%% Cell type:markdown id: tags:
+
+# Imports
+
+%% Cell type:code id: tags:
+
+``` python
+from tensorflow.keras.layers import Input, Dense, Flatten
+from tensorflow.keras.models import Sequential
+
+import matplotlib.pyplot as plt
+
+import xarray as xr
+xr.set_options(display_style='text')
+import numpy as np
+
+from dask.utils import format_bytes
+import xskillscore as xs
+```
+
+%% Cell type:markdown id: tags:
+
+# Get training data
+
+preprocessing of input data may be done in separate notebook/script
+
+%% Cell type:markdown id: tags:
+
+## Hindcast
+
+get weekly initialized hindcasts
+
+%% Cell type:code id: tags:
+
+``` python
+v='t2m'
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# preprocessed as renku dataset
+!renku storage pull ../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr
+```
+
+%% Output
+
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
+
+%% Cell type:code id: tags:
+
+``` python
+hind_2000_2019 = xr.open_zarr("../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr", consolidated=True)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# preprocessed as renku dataset
+!renku storage pull ../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr
+```
+
+%% Output
+
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
+
+%% Cell type:code id: tags:
+
+``` python
+fct_2020 = xr.open_zarr("../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr", consolidated=True)
+```
+
+%% Cell type:markdown id: tags:
+
+## Observations
+corresponding to hindcasts
+
+%% Cell type:code id: tags:
+
+``` python
+# preprocessed as renku dataset
+!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
+```
+
+%% Output
+
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
+
+%% Cell type:code id: tags:
+
+``` python
+obs_2000_2019 = xr.open_zarr("../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr", consolidated=True)#[v]
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# preprocessed as renku dataset
+!renku storage pull ../data/forecast-like-observations_2020_biweekly_deterministic.zarr
+```
+
+%% Output
+
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
+
+%% Cell type:code id: tags:
+
+``` python
+obs_2020 = xr.open_zarr("../data/forecast-like-observations_2020_biweekly_deterministic.zarr", consolidated=True)#[v]
+```
+
+%% Cell type:markdown id: tags:
+
+# ML model
+
+%% Cell type:markdown id: tags:
+
+based on [Weatherbench](https://github.com/pangeo-data/WeatherBench/blob/master/quickstart.ipynb)
+
+%% Cell type:code id: tags:
+
+``` python
+# run once only and dont commit
+!git clone https://github.com/pangeo-data/WeatherBench/
+```
+
+%% Output
+
+    fatal: destination path 'WeatherBench' already exists and is not an empty directory.
+
+%% Cell type:code id: tags:
+
+``` python
+import sys
+sys.path.insert(1, 'WeatherBench')
+from WeatherBench.src.train_nn import DataGenerator, PeriodicConv2D, create_predictions
+import tensorflow.keras as keras
+```
+
+%% Cell type:code id: tags:
+
+``` python
+bs=32
+
+import numpy as np
+class DataGenerator(keras.utils.Sequence):
+    def __init__(self, fct, verif, lead_time, batch_size=bs, shuffle=True, load=True,
+                 mean=None, std=None):
+        """
+        Data generator for WeatherBench data.
+        Template from https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly
+
+        Args:
+            fct: forecasts from S2S models: xr.DataArray (xr.Dataset doesnt work properly)
+            verif: observations with same dimensionality (xr.Dataset doesnt work properly)
+            lead_time: Lead_time as in model
+            batch_size: Batch size
+            shuffle: bool. If True, data is shuffled.
+            load: bool. If True, datadet is loaded into RAM.
+            mean: If None, compute mean from data.
+            std: If None, compute standard deviation from data.
+
+        Todo:
+        - use number in a better way, now uses only ensemble mean forecast
+        - dont use .sel(lead_time=lead_time) to train over all lead_time at once
+        - be sensitive with forecast_time, pool a few around the weekofyear given
+        - use more variables as predictors
+        - predict more variables
+        """
+
+        if isinstance(fct, xr.Dataset):
+            print('convert fct to array')
+            fct = fct.to_array().transpose(...,'variable')
+            self.fct_dataset=True
+        else:
+            self.fct_dataset=False
+
+        if isinstance(verif, xr.Dataset):
+            print('convert verif to array')
+            verif = verif.to_array().transpose(...,'variable')
+            self.verif_dataset=True
+        else:
+            self.verif_dataset=False
+
+        #self.fct = fct
+        self.batch_size = batch_size
+        self.shuffle = shuffle
+        self.lead_time = lead_time
+
+        self.fct_data = fct.transpose('forecast_time', ...).sel(lead_time=lead_time)
+        self.fct_mean = self.fct_data.mean('forecast_time').compute() if mean is None else mean
+        self.fct_std = self.fct_data.std('forecast_time').compute() if std is None else std
+
+        self.verif_data = verif.transpose('forecast_time', ...).sel(lead_time=lead_time)
+        self.verif_mean = self.verif_data.mean('forecast_time').compute() if mean is None else mean
+        self.verif_std = self.verif_data.std('forecast_time').compute() if std is None else std
+
+        # Normalize
+        self.fct_data = (self.fct_data - self.fct_mean) / self.fct_std
+        self.verif_data = (self.verif_data - self.verif_mean) / self.verif_std
+
+        self.n_samples = self.fct_data.forecast_time.size
+        self.forecast_time = self.fct_data.forecast_time
+
+        self.on_epoch_end()
+
+        # For some weird reason calling .load() earlier messes up the mean and std computations
+        if load:
+            # print('Loading data into RAM')
+            self.fct_data.load()
+
+    def __len__(self):
+        'Denotes the number of batches per epoch'
+        return int(np.ceil(self.n_samples / self.batch_size))
+
+    def __getitem__(self, i):
+        'Generate one batch of data'
+        idxs = self.idxs[i * self.batch_size:(i + 1) * self.batch_size]
+        # got all nan if nans not masked
+        X = self.fct_data.isel(forecast_time=idxs).fillna(0.).values
+        y = self.verif_data.isel(forecast_time=idxs).fillna(0.).values
+        return X, y
+
+    def on_epoch_end(self):
+        'Updates indexes after each epoch'
+        self.idxs = np.arange(self.n_samples)
+        if self.shuffle == True:
+            np.random.shuffle(self.idxs)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# 2 bi-weekly `lead_time`: week 3-4
+lead = hind_2000_2019.isel(lead_time=0).lead_time
+
+lead
+```
+
+%% Output
+
+    <xarray.DataArray 'lead_time' ()>
+    array(1209600000000000, dtype='timedelta64[ns]')
+    Coordinates:
+        lead_time  timedelta64[ns] 14 days
+    Attributes:
+        aggregate:      The pd.Timedelta corresponds to the first day of a biweek...
+        description:    Forecast period is the time interval between the forecast...
+        long_name:      lead time
+        standard_name:  forecast_period
+        week34_t2m:     mean[14 days, 27 days]
+        week34_tp:      28 days minus 14 days
+        week56_t2m:     mean[28 days, 41 days]
+        week56_tp:      42 days minus 28 days
+
+%% Cell type:code id: tags:
+
+``` python
+# mask, needed?
+hind_2000_2019 = hind_2000_2019.where(obs_2000_2019.isel(forecast_time=0, lead_time=0,drop=True).notnull())
+```
+
+%% Cell type:markdown id: tags:
+
+## data prep: train, valid, test
+
+[Use the hindcast period to split train and valid.](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets) Do not use the 2020 data for testing!
+
+%% Cell type:code id: tags:
+
+``` python
+# time is the forecast_time
+time_train_start,time_train_end='2000','2017' # train
+time_valid_start,time_valid_end='2018','2019' # valid
+time_test = '2020'                            # test
+```
+
+%% Cell type:code id: tags:
+
+``` python
+dg_train = DataGenerator(
+    hind_2000_2019.mean('realization').sel(forecast_time=slice(time_train_start,time_train_end))[v],
+    obs_2000_2019.sel(forecast_time=slice(time_train_start,time_train_end))[v],
+    lead_time=lead, batch_size=bs, load=True)
+```
+
+%% Output
+
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
+
+%% Cell type:code id: tags:
+
+``` python
+dg_valid = DataGenerator(
+    hind_2000_2019.mean('realization').sel(forecast_time=slice(time_valid_start,time_valid_end))[v],
+    obs_2000_2019.sel(forecast_time=slice(time_valid_start,time_valid_end))[v],
+    lead_time=lead, batch_size=bs, shuffle=False, load=True)
+```
+
+%% Output
+
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
+
+%% Cell type:code id: tags:
+
+``` python
+# do not use, delete?
+dg_test = DataGenerator(
+    fct_2020.mean('realization').sel(forecast_time=time_test)[v],
+    obs_2020.sel(forecast_time=time_test)[v],
+    lead_time=lead, batch_size=bs, load=True, mean=dg_train.fct_mean, std=dg_train.fct_std, shuffle=False)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+X, y = dg_valid[0]
+X.shape, y.shape
+```
+
+%% Output
+
+    ((32, 121, 240), (32, 121, 240))
+
+%% Cell type:code id: tags:
+
+``` python
+# short look into training data: large biases
+# any problem from normalizing?
+# i=4
+# xr.DataArray(np.vstack([X[i],y[i]])).plot(yincrease=False, robust=True)
+```
+
+%% Cell type:markdown id: tags:
+
+## `fit`
+
+%% Cell type:code id: tags:
+
+``` python
+cnn = keras.models.Sequential([
+    PeriodicConv2D(filters=32, kernel_size=5, conv_kwargs={'activation':'relu'}, input_shape=(32, 64, 1)),
+    PeriodicConv2D(filters=1, kernel_size=5)
+])
+```
+
+%% Output
+
+    WARNING:tensorflow:AutoGraph could not transform <bound method PeriodicPadding2D.call of <WeatherBench.src.train_nn.PeriodicPadding2D object at 0x7f86042986a0>> and will run it as-is.
+    Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
+    Cause: module 'gast' has no attribute 'Index'
+    To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
+    WARNING: AutoGraph could not transform <bound method PeriodicPadding2D.call of <WeatherBench.src.train_nn.PeriodicPadding2D object at 0x7f86042986a0>> and will run it as-is.
+    Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
+    Cause: module 'gast' has no attribute 'Index'
+    To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
+
+%% Cell type:code id: tags:
+
+``` python
+cnn.summary()
+```
+
+%% Output
+
+    Model: "sequential"
+    _________________________________________________________________
+    Layer (type)                 Output Shape              Param #
+    =================================================================
+    periodic_conv2d (PeriodicCon (None, 32, 64, 32)        832
+    _________________________________________________________________
+    periodic_conv2d_1 (PeriodicC (None, 32, 64, 1)         801
+    =================================================================
+    Total params: 1,633
+    Trainable params: 1,633
+    Non-trainable params: 0
+    _________________________________________________________________
+
+%% Cell type:code id: tags:
+
+``` python
+cnn.compile(keras.optimizers.Adam(1e-4), 'mse')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+import warnings
+warnings.simplefilter("ignore")
+```
+
+%% Cell type:code id: tags:
+
+``` python
+cnn.fit(dg_train, epochs=2, validation_data=dg_valid)
+```
+
+%% Output
+
+    Epoch 1/2
+    30/30 [==============================] - 58s 2s/step - loss: 0.1472 - val_loss: 0.0742
+    Epoch 2/2
+    30/30 [==============================] - 45s 1s/step - loss: 0.0712 - val_loss: 0.0545
+
+    <tensorflow.python.keras.callbacks.History at 0x7f865c2103d0>
+
+%% Cell type:markdown id: tags:
+
+## `predict`
+
+Create predictions and print `mean(variable, lead_time, longitude, weighted latitude)` RPSS for all years as calculated by `skill_by_year`.
+
+%% Cell type:code id: tags:
+
+``` python
+from scripts import add_valid_time_from_forecast_reference_time_and_lead_time
+
+def _create_predictions(model, dg, lead):
+    """Create non-iterative predictions"""
+    preds = model.predict(dg).squeeze()
+    # Unnormalize
+    preds = preds * dg.fct_std.values + dg.fct_mean.values
+    if dg.verif_dataset:
+        da = xr.DataArray(
+                    preds,
+                    dims=['forecast_time', 'latitude', 'longitude','variable'],
+                    coords={'forecast_time': dg.fct_data.forecast_time, 'latitude': dg.fct_data.latitude,
+                            'longitude': dg.fct_data.longitude},
+                ).to_dataset() # doesnt work yet
+    else:
+        da = xr.DataArray(
+                    preds,
+                    dims=['forecast_time', 'latitude', 'longitude'],
+                    coords={'forecast_time': dg.fct_data.forecast_time, 'latitude': dg.fct_data.latitude,
+                            'longitude': dg.fct_data.longitude},
+                )
+    da = da.assign_coords(lead_time=lead)
+    # da = add_valid_time_from_forecast_reference_time_and_lead_time(da)
+    return da
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# optionally masking the ocean when making probabilistic
+mask = obs_2020.std(['lead_time','forecast_time']).notnull()
+```
+
+%% Cell type:code id: tags:
+
+``` python
+from scripts import make_probabilistic
+```
+
+%% Cell type:code id: tags:
+
+``` python
+!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc
+```
+
+%% Output
+
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
+
+%% Cell type:code id: tags:
+
+``` python
+cache_path='../data'
+tercile_file = f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc'
+tercile_edges = xr.open_dataset(tercile_file)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# this is not useful but results have expected dimensions
+# actually train for each lead_time
+
+def create_predictions(cnn, fct, obs, time):
+    preds_test=[]
+    for lead in fct.lead_time:
+        dg = DataGenerator(fct.mean('realization').sel(forecast_time=time)[v],
+                           obs.sel(forecast_time=time)[v],
+                           lead_time=lead, batch_size=bs, mean=dg_train.fct_mean, std=dg_train.fct_std, shuffle=False)
+        preds_test.append(_create_predictions(cnn, dg, lead))
+    preds_test = xr.concat(preds_test, 'lead_time')
+    preds_test['lead_time'] = fct.lead_time
+    # add valid_time coord
+    preds_test = add_valid_time_from_forecast_reference_time_and_lead_time(preds_test)
+    preds_test = preds_test.to_dataset(name=v)
+    # add fake var
+    preds_test['tp'] = preds_test['t2m']
+    # make probabilistic
+    preds_test = make_probabilistic(preds_test.expand_dims('realization'), tercile_edges, mask=mask)
+    return preds_test
+```
+
+%% Cell type:markdown id: tags:
+
+### `predict` training period in-sample
+
+%% Cell type:code id: tags:
+
+``` python
+!renku storage pull ../data/forecast-like-observations_2020_biweekly_terciled.nc
+```
+
+%% Output
+
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
+
+%% Cell type:code id: tags:
+
+``` python
+!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr
+```
+
+%% Output
+
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
+
+%% Cell type:code id: tags:
+
+``` python
+from scripts import skill_by_year
+import os
+if os.environ['HOME'] == '/home/jovyan':
+    import pandas as pd
+    # assume on renku with small memory
+    step = 2
+    skill_list = []
+    for year in np.arange(int(time_train_start), int(time_train_end) -1, step): # loop over years to consume less memory on renku
+        preds_is = create_predictions(cnn, hind_2000_2019, obs_2000_2019, time=slice(str(year), str(year+step-1))).compute()
+        skill_list.append(skill_by_year(preds_is))
+    skill = pd.concat(skill_list)
+else: # with larger memory, simply do
+    preds_is = create_predictions(cnn, hind_2000_2019, obs_2000_2019, time=slice(time_train_start, time_train_end))
+    skill = skill_by_year(preds_is)
+skill
+```
+
+%% Output
+
+              RPSS
+    year
+    2000 -0.862483
+    2001 -1.015485
+    2002 -1.101022
+    2003 -1.032647
+    2004 -1.056348
+    2005 -1.165675
+    2006 -1.057217
+    2007 -1.170849
+    2008 -1.049785
+    2009 -1.169108
+    2010 -1.130845
+    2011 -1.052670
+    2012 -1.126449
+    2013 -1.126930
+    2014 -1.095896
+    2015 -1.117486
+
+%% Cell type:markdown id: tags:
+
+### `predict` validation period out-of-sample
+
+%% Cell type:code id: tags:
+
+``` python
+preds_os = create_predictions(cnn, hind_2000_2019, obs_2000_2019, time=slice(time_valid_start, time_valid_end))
+
+skill_by_year(preds_os)
+```
+
+%% Output
+
+              RPSS
+    year
+    2018 -1.099744
+    2019 -1.172401
+
+%% Cell type:markdown id: tags:
+
+### `predict` test
+
+%% Cell type:code id: tags:
+
+``` python
+preds_test = create_predictions(cnn, fct_2020, obs_2020, time=time_test)
+
+skill_by_year(preds_test)
+```
+
+%% Output
+
+              RPSS
+    year
+    2020 -1.076834
+
+%% Cell type:markdown id: tags:
+
+# Submission
+
+%% Cell type:code id: tags:
+
+``` python
+from scripts import assert_predictions_2020
+assert_predictions_2020(preds_test)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+preds_test.to_netcdf('../submissions/ML_prediction_2020.nc')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# !git add ../submissions/ML_prediction_2020.nc
+# !git add ML_train_and_prediction.ipynb
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# !git commit -m "template_test commit message" # whatever message you want
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# !git tag "submission-template_test-0.0.1" # if this is to be checked by scorer, only the last submitted==tagged version will be considered
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# !git push --tags
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+# Reproducibility
+
+%% Cell type:markdown id: tags:
+
+## memory
+
+%% Cell type:code id: tags:
+
+``` python
+# https://phoenixnap.com/kb/linux-commands-check-memory-usage
+!free -g
+```
+
+%% Output
+
+                  total        used        free      shared  buff/cache   available
+    Mem:             31           7          11           0          12          24
+    Swap:             0           0           0
+
+%% Cell type:markdown id: tags:
+
+## CPU
+
+%% Cell type:code id: tags:
+
+``` python
+!lscpu
+```
+
+%% Output
+
+    Architecture:                    x86_64
+    CPU op-mode(s):                  32-bit, 64-bit
+    Byte Order:                      Little Endian
+    Address sizes:                   40 bits physical, 48 bits virtual
+    CPU(s):                          8
+    On-line CPU(s) list:             0-7
+    Thread(s) per core:              1
+    Core(s) per socket:              1
+    Socket(s):                       8
+    NUMA node(s):                    1
+    Vendor ID:                       GenuineIntel
+    CPU family:                      6
+    Model:                           85
+    Model name:                      Intel Xeon Processor (Skylake, IBRS)
+    Stepping:                        4
+    CPU MHz:                         2095.078
+    BogoMIPS:                        4190.15
+    Virtualization:                  VT-x
+    Hypervisor vendor:               KVM
+    Virtualization type:             full
+    L1d cache:                       256 KiB
+    L1i cache:                       256 KiB
+    L2 cache:                        32 MiB
+    L3 cache:                        128 MiB
+    NUMA node0 CPU(s):               0-7
+    Vulnerability Itlb multihit:     KVM: Mitigation: Split huge pages
+    Vulnerability L1tf:              Mitigation; PTE Inversion; VMX conditional cach
+                                     e flushes, SMT disabled
+    Vulnerability Mds:               Vulnerable: Clear CPU buffers attempted, no mic
+                                     rocode; SMT Host state unknown
+    Vulnerability Meltdown:          Mitigation; PTI
+    Vulnerability Spec store bypass: Vulnerable
+    Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user
+                                      pointer sanitization
+    Vulnerability Spectre v2:        Mitigation; Full generic retpoline, IBPB condit
+                                     ional, IBRS_FW, STIBP disabled, RSB filling
+    Vulnerability Srbds:             Not affected
+    Vulnerability Tsx async abort:   Not affected
+    Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtr
+                                     r pge mca cmov pat pse36 clflush mmx fxsr sse s
+                                     se2 syscall nx pdpe1gb rdtscp lm constant_tsc r
+                                     ep_good nopl xtopology cpuid tsc_known_freq pni
+                                      pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_
+                                     2 x2apic movbe popcnt tsc_deadline_timer aes xs
+                                     ave avx f16c rdrand hypervisor lahf_lm abm 3dno
+                                     wprefetch cpuid_fault invpcid_single pti ibrs i
+                                     bpb tpr_shadow vnmi flexpriority ept vpid ept_a
+                                     d fsgsbase bmi1 avx2 smep bmi2 erms invpcid avx
+                                     512f avx512dq rdseed adx smap clwb avx512cd avx
+                                     512bw avx512vl xsaveopt xsavec xgetbv1 arat pku
+                                      ospke
+
+%% Cell type:markdown id: tags:
+
+## software
+
+%% Cell type:code id: tags:
+
+``` python
+!conda list
+```
+
+%% Output
+
+    # packages in environment at /opt/conda:
+    #
+    # Name                    Version                   Build  Channel
+    _libgcc_mutex             0.1                 conda_forge    conda-forge
+    _openmp_mutex             4.5                       1_gnu    conda-forge
+    _pytorch_select           0.1                       cpu_0    defaults
+    _tflow_select             2.3.0                       mkl    defaults
+    absl-py                   0.13.0           py38h06a4308_0    defaults
+    aiobotocore               1.4.1              pyhd3eb1b0_0    defaults
+    aiohttp                   3.7.4.post0      py38h7f8727e_2    defaults
+    aioitertools              0.7.1              pyhd3eb1b0_0    defaults
+    alembic                   1.4.3              pyh9f0ad1d_0    conda-forge
+    ansiwrap                  0.8.4                    pypi_0    pypi
+    appdirs                   1.4.4                    pypi_0    pypi
+    argcomplete               1.12.3                   pypi_0    pypi
+    argon2-cffi               20.1.0           py38h497a2fe_2    conda-forge
+    argparse                  1.4.0                    pypi_0    pypi
+    asciitree                 0.3.3                      py_2    defaults
+    astor                     0.8.1            py38h06a4308_0    defaults
+    astunparse                1.6.3                      py_0    defaults
+    async-timeout             3.0.1                    pypi_0    pypi
+    async_generator           1.10                       py_0    conda-forge
+    attrs                     21.2.0                   pypi_0    pypi
+    backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
+    backports                 1.0                        py_2    conda-forge
+    backports.functools_lru_cache 1.6.1                      py_0    conda-forge
+    bagit                     1.8.1                    pypi_0    pypi
+    beautifulsoup4            4.10.0             pyh06a4308_0    defaults
+    binutils_impl_linux-64    2.35.1               h193b22a_1    conda-forge
+    binutils_linux-64         2.35                h67ddf6f_30    conda-forge
+    black                     20.8b1                   pypi_0    pypi
+    blas                      1.0                         mkl    defaults
+    bleach                    3.2.1              pyh9f0ad1d_0    conda-forge
+    blinker                   1.4                        py_1    conda-forge
+    bokeh                     2.3.3            py38h06a4308_0    defaults
+    botocore                  1.20.106           pyhd3eb1b0_0    defaults
+    bottleneck                1.3.2            py38heb32a55_1    defaults
+    bracex                    2.1.1                    pypi_0    pypi
+    branca                    0.3.1                    pypi_0    pypi
+    brotli                    1.0.9                he6710b0_2    defaults
+    brotlipy                  0.7.0           py38h497a2fe_1001    conda-forge
+    bzip2                     1.0.8                h7f98852_4    conda-forge
+    c-ares                    1.17.1               h36c2ea0_0    conda-forge
+    ca-certificates           2021.7.5             h06a4308_1    defaults
+    cachecontrol              0.12.6                   pypi_0    pypi
+    cachetools                4.2.4                    pypi_0    pypi
+    calamus                   0.3.12                   pypi_0    pypi
+    cdsapi                    0.5.1                    pypi_0    pypi
+    certifi                   2021.5.30                pypi_0    pypi
+    certipy                   0.1.3                      py_0    conda-forge
+    cffi                      1.14.6                   pypi_0    pypi
+    cfgrib                    0.9.9.0            pyhd8ed1ab_1    conda-forge
+    cftime                    1.5.0            py38h6323ea4_0    defaults
+    chardet                   3.0.4                    pypi_0    pypi
+    click                     7.1.2                    pypi_0    pypi
+    click-completion          0.5.2                    pypi_0    pypi
+    click-option-group        0.5.3                    pypi_0    pypi
+    click-plugins             1.1.1                    pypi_0    pypi
+    climetlab                 0.8.31                   pypi_0    pypi
+    climetlab-s2s-ai-challenge 0.8.0                    pypi_0    pypi
+    cloudpickle               2.0.0              pyhd3eb1b0_0    defaults
+    colorama                  0.4.4                    pypi_0    pypi
+    coloredlogs               15.0.1                   pypi_0    pypi
+    commonmark                0.9.1                    pypi_0    pypi
+    conda                     4.9.2            py38h578d9bd_0    conda-forge
+    conda-package-handling    1.7.2            py38h8df0ef7_0    conda-forge
+    configargparse            1.5.2                    pypi_0    pypi
+    configurable-http-proxy   1.3.0                         0    conda-forge
+    coverage                  5.5              py38h27cfd23_2    defaults
+    cryptography              3.4.8                    pypi_0    pypi
+    curl                      7.71.1               he644dc0_8    conda-forge
+    cwlgen                    0.4.2                    pypi_0    pypi
+    cwltool                   3.1.20211004060744          pypi_0    pypi
+    cycler                    0.10.0                   py38_0    defaults
+    cython                    0.29.24          py38h295c915_0    defaults
+    cytoolz                   0.11.0           py38h7b6447c_0    defaults
+    dask                      2021.8.1           pyhd3eb1b0_0    defaults
+    dask-core                 2021.8.1           pyhd3eb1b0_0    defaults
+    dataclasses               0.8                pyh6d0b6a4_7    defaults
+    decorator                 4.4.2                      py_0    conda-forge
+    defusedxml                0.6.0                      py_0    conda-forge
+    distributed               2021.8.1         py38h06a4308_0    defaults
+    distro                    1.5.0                    pypi_0    pypi
+    docopt                    0.6.2            py38h06a4308_0    defaults
+    eccodes                   2.21.0               ha0e6eb6_0    conda-forge
+    ecmwf-api-client          1.6.1                    pypi_0    pypi
+    ecmwflibs                 0.3.14                   pypi_0    pypi
+    entrypoints               0.3             pyhd8ed1ab_1003    conda-forge
+    environ-config            21.2.0                   pypi_0    pypi
+    fasteners                 0.16.3             pyhd3eb1b0_0    defaults
+    filelock                  3.0.12                   pypi_0    pypi
+    findlibs                  0.0.2                    pypi_0    pypi
+    fonttools                 4.25.0             pyhd3eb1b0_0    defaults
+    freetype                  2.10.4               h5ab3b9f_0    defaults
+    frozendict                2.0.6                    pypi_0    pypi
+    fsspec                    2021.7.0           pyhd3eb1b0_0    defaults
+    gast                      0.4.0              pyhd3eb1b0_0    defaults
+    gcc_impl_linux-64         9.3.0               h70c0ae5_18    conda-forge
+    gcc_linux-64              9.3.0               hf25ea35_30    conda-forge
+    gitdb                     4.0.7                    pypi_0    pypi
+    gitpython                 3.1.14                   pypi_0    pypi
+    google-auth               1.33.0             pyhd3eb1b0_0    defaults
+    google-auth-oauthlib      0.4.4              pyhd3eb1b0_0    defaults
+    google-pasta              0.2.0              pyhd3eb1b0_0    defaults
+    grpcio                    1.36.1           py38h2157cd5_1    defaults
+    gxx_impl_linux-64         9.3.0               hd87eabc_18    conda-forge
+    gxx_linux-64              9.3.0               h3fbe746_30    conda-forge
+    h5netcdf                  0.11.0             pyhd8ed1ab_0    conda-forge
+    h5py                      2.10.0           py38hd6299e0_1    defaults
+    hdf4                      4.2.13               h3ca952b_2    defaults
+    hdf5                      1.10.6          nompi_h6a2412b_1114    conda-forge
+    heapdict                  1.0.1              pyhd3eb1b0_0    defaults
+    humanfriendly             10.0                     pypi_0    pypi
+    humanize                  3.7.1                    pypi_0    pypi
+    icu                       68.1                 h58526e2_0    conda-forge
+    idna                      2.10               pyh9f0ad1d_0    conda-forge
+    importlib-metadata        3.4.0            py38h578d9bd_0    conda-forge
+    importlib_metadata        3.4.0                hd8ed1ab_0    conda-forge
+    intake                    0.6.3              pyhd3eb1b0_0    defaults
+    intake-xarray             0.5.0              pyhd3eb1b0_0    defaults
+    intel-openmp              2019.4                      243    defaults
+    ipykernel                 5.4.2            py38h81c977d_0    conda-forge
+    ipython                   7.19.0           py38h81c977d_2    conda-forge
+    ipython_genutils          0.2.0                      py_1    conda-forge
+    isodate                   0.6.0                    pypi_0    pypi
+    jasper                    1.900.1              hd497a04_4    defaults
+    jedi                      0.17.2           py38h578d9bd_1    conda-forge
+    jellyfish                 0.8.8                    pypi_0    pypi
+    jinja2                    3.0.1                    pypi_0    pypi
+    jmespath                  0.10.0             pyhd3eb1b0_0    defaults
+    joblib                    1.0.1              pyhd3eb1b0_0    defaults
+    jpeg                      9d                   h7f8727e_0    defaults
+    json5                     0.9.5              pyh9f0ad1d_0    conda-forge
+    jsonschema                3.2.0                      py_2    conda-forge
+    jupyter-server-proxy      1.6.0                    pypi_0    pypi
+    jupyter_client            6.1.11             pyhd8ed1ab_1    conda-forge
+    jupyter_core              4.7.0            py38h578d9bd_0    conda-forge
+    jupyter_telemetry         0.1.0              pyhd8ed1ab_1    conda-forge
+    jupyterhub                1.2.2                    pypi_0    pypi
+    jupyterlab                2.2.9                      py_0    conda-forge
+    jupyterlab-git            0.23.3                   pypi_0    pypi
+    jupyterlab_pygments       0.1.2              pyh9f0ad1d_0    conda-forge
+    jupyterlab_server         1.2.0                      py_0    conda-forge
+    keras-preprocessing       1.1.2              pyhd3eb1b0_0    defaults
+    kernel-headers_linux-64   2.6.32              h77966d4_13    conda-forge
+    kiwisolver                1.3.1            py38h2531618_0    defaults
+    krb5                      1.17.2               h926e7f8_0    conda-forge
+    lazy-object-proxy         1.6.0                    pypi_0    pypi
+    lcms2                     2.12                 h3be6417_0    defaults
+    ld_impl_linux-64          2.35.1               hea4e1c9_1    conda-forge
+    libaec                    1.0.4                he6710b0_1    defaults
+    libblas                   3.9.0           1_h86c2bf4_netlib    conda-forge
+    libcblas                  3.9.0           5_h92ddd45_netlib    conda-forge
+    libcurl                   7.71.1               hcdd3856_8    conda-forge
+    libedit                   3.1.20191231         he28a2e2_2    conda-forge
+    libev                     4.33                 h516909a_1    conda-forge
+    libffi                    3.3                  h58526e2_2    conda-forge
+    libgcc-devel_linux-64     9.3.0               h7864c58_18    conda-forge
+    libgcc-ng                 9.3.0               h2828fa1_18    conda-forge
+    libgfortran-ng            9.3.0               ha5ec8a7_17    defaults
+    libgfortran5              9.3.0               ha5ec8a7_17    defaults
+    libgomp                   9.3.0               h2828fa1_18    conda-forge
+    liblapack                 3.9.0           5_h92ddd45_netlib    conda-forge
+    libllvm10                 10.0.1               hbcb73fb_5    defaults
+    libmklml                  2019.0.5                      0    defaults
+    libnetcdf                 4.7.4           nompi_h56d31a8_107    conda-forge
+    libnghttp2                1.41.0               h8cfc5f6_2    conda-forge
+    libpng                    1.6.37               hbc83047_0    defaults
+    libprotobuf               3.17.2               h4ff587b_1    defaults
+    libsodium                 1.0.18               h36c2ea0_1    conda-forge
+    libssh2                   1.9.0                hab1572f_5    conda-forge
+    libstdcxx-devel_linux-64  9.3.0               hb016644_18    conda-forge
+    libstdcxx-ng              9.3.0               h6de172a_18    conda-forge
+    libtiff                   4.2.0                h85742a9_0    defaults
+    libuv                     1.40.0               h7f98852_0    conda-forge
+    libwebp-base              1.2.0                h27cfd23_0    defaults
+    llvmlite                  0.36.0           py38h612dafd_4    defaults
+    locket                    0.2.1            py38h06a4308_1    defaults
+    lockfile                  0.12.2                   pypi_0    pypi
+    lxml                      4.6.3                    pypi_0    pypi
+    lz4-c                     1.9.3                h295c915_1    defaults
+    magics                    1.5.6                    pypi_0    pypi
+    mako                      1.1.4              pyh44b312d_0    conda-forge
+    markdown                  3.3.4            py38h06a4308_0    defaults
+    markupsafe                2.0.1                    pypi_0    pypi
+    marshmallow               3.13.0                   pypi_0    pypi
+    matplotlib-base           3.4.2            py38hab158f2_0    defaults
+    mistune                   0.8.4           py38h497a2fe_1003    conda-forge
+    mkl                       2020.2                      256    defaults
+    mkl-service               2.3.0            py38he904b0f_0    defaults
+    mkl_fft                   1.3.0            py38h54f3939_0    defaults
+    mkl_random                1.1.1            py38h0573a6f_0    defaults
+    msgpack-python            1.0.2            py38hff7bd54_1    defaults
+    multidict                 5.1.0            py38h27cfd23_2    defaults
+    munkres                   1.1.4                      py_0    defaults
+    mypy-extensions           0.4.3                    pypi_0    pypi
+    nbclient                  0.5.0                    pypi_0    pypi
+    nbconvert                 6.0.7            py38h578d9bd_3    conda-forge
+    nbdime                    2.1.0                    pypi_0    pypi
+    nbformat                  5.1.2              pyhd8ed1ab_1    conda-forge
+    nbresuse                  0.4.0                    pypi_0    pypi
+    nc-time-axis              1.3.1              pyhd8ed1ab_2    conda-forge
+    ncurses                   6.2                  h58526e2_4    conda-forge
+    ndg-httpsclient           0.5.1                    pypi_0    pypi
+    nest-asyncio              1.4.3              pyhd8ed1ab_0    conda-forge
+    netcdf4                   1.5.4                    pypi_0    pypi
+    networkx                  2.6.3                    pypi_0    pypi
+    ninja                     1.10.2               hff7bd54_1    defaults
+    nodejs                    15.3.0               h25f6087_0    conda-forge
+    notebook                  6.2.0            py38h578d9bd_0    conda-forge
+    numba                     0.53.1           py38ha9443f7_0    defaults
+    numcodecs                 0.8.0            py38h2531618_0    defaults
+    numexpr                   2.7.3            py38hb2eb853_0    defaults
+    numpy                     1.19.2           py38h54aff64_0    defaults
+    numpy-base                1.19.2           py38hfa32c7d_0    defaults
+    oauthlib                  3.0.1                      py_0    conda-forge
+    olefile                   0.46               pyhd3eb1b0_0    defaults
+    openjpeg                  2.4.0                h3ad879b_0    defaults
+    openssl                   1.1.1l               h7f8727e_0    defaults
+    opt_einsum                3.3.0              pyhd3eb1b0_1    defaults
+    owlrl                     5.2.3                    pypi_0    pypi
+    packaging                 20.8               pyhd3deb0d_0    conda-forge
+    pamela                    1.0.0                      py_0    conda-forge
+    pandas                    1.3.2            py38h8c16a72_0    defaults
+    pandoc                    2.11.3.2             h7f98852_0    conda-forge
+    pandocfilters             1.4.2                      py_1    conda-forge
+    papermill                 2.3.1                    pypi_0    pypi
+    parso                     0.7.1              pyh9f0ad1d_0    conda-forge
+    partd                     1.2.0              pyhd3eb1b0_0    defaults
+    pathspec                  0.9.0                    pypi_0    pypi
+    patool                    1.12                     pypi_0    pypi
+    pdbufr                    0.9.0                    pypi_0    pypi
+    pexpect                   4.8.0              pyh9f0ad1d_2    conda-forge
+    pickleshare               0.7.5                   py_1003    conda-forge
+    pillow                    8.3.1            py38h2c7a002_0    defaults
+    pip                       21.0.1                   pypi_0    pypi
+    pipx                      0.16.1.0                 pypi_0    pypi
+    pluggy                    0.13.1                   pypi_0    pypi
+    portalocker               2.3.2                    pypi_0    pypi
+    powerline-shell           0.7.0                    pypi_0    pypi
+    prometheus_client         0.9.0              pyhd3deb0d_0    conda-forge
+    prompt-toolkit            3.0.10             pyha770c72_0    conda-forge
+    properscoring             0.1                        py_0    conda-forge
+    protobuf                  3.17.2           py38h295c915_0    defaults
+    prov                      1.5.1                    pypi_0    pypi
+    psutil                    5.8.0            py38h27cfd23_1    defaults
+    ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
+    pyasn1                    0.4.8              pyhd3eb1b0_0    defaults
+    pyasn1-modules            0.2.8                      py_0    defaults
+    pycosat                   0.6.3           py38h497a2fe_1006    conda-forge
+    pycparser                 2.20               pyh9f0ad1d_2    conda-forge
+    pycurl                    7.43.0.6         py38h996a351_1    conda-forge
+    pydap                     3.2.2           pyh9f0ad1d_1001    conda-forge
+    pydot                     1.4.2                    pypi_0    pypi
+    pygments                  2.10.0                   pypi_0    pypi
+    pyjwt                     2.1.0                    pypi_0    pypi
+    pyld                      2.0.3                    pypi_0    pypi
+    pyodc                     1.1.1                    pypi_0    pypi
+    pyopenssl                 20.0.1             pyhd8ed1ab_0    conda-forge
+    pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
+    pyrsistent                0.17.3           py38h497a2fe_2    conda-forge
+    pyshacl                   0.17.0.post1             pypi_0    pypi
+    pysocks                   1.7.1            py38h578d9bd_3    conda-forge
+    python                    3.8.6           hffdb5ce_4_cpython    conda-forge
+    python-dateutil           2.8.1                      py_0    conda-forge
+    python-eccodes            2021.03.0        py38hb5d20a5_1    conda-forge
+    python-editor             1.0.4                    pypi_0    pypi
+    python-flatbuffers        1.12               pyhd3eb1b0_0    defaults
+    python-json-logger        2.0.1              pyh9f0ad1d_0    conda-forge
+    python-snappy             0.6.0            py38h2531618_3    defaults
+    python_abi                3.8                      1_cp38    conda-forge
+    pytorch                   1.8.1           cpu_py38h60491be_0    defaults
+    pytz                      2021.1             pyhd3eb1b0_0    defaults
+    pyyaml                    5.4.1                    pypi_0    pypi
+    pyzmq                     21.0.1           py38h3d7ac18_0    conda-forge
+    rdflib                    6.0.1                    pypi_0    pypi
+    rdflib-jsonld             0.5.0                    pypi_0    pypi
+    readline                  8.0                  he28a2e2_2    conda-forge
+    regex                     2021.4.4                 pypi_0    pypi
+    renku                     0.16.2                   pypi_0    pypi
+    requests                  2.24.0                   pypi_0    pypi
+    requests-oauthlib         1.3.0                      py_0    defaults
+    rich                      10.3.0                   pypi_0    pypi
+    rsa                       4.7.2              pyhd3eb1b0_1    defaults
+    ruamel-yaml               0.16.5                   pypi_0    pypi
+    ruamel.yaml.clib          0.2.2            py38h497a2fe_2    conda-forge
+    ruamel_yaml               0.15.80         py38h497a2fe_1003    conda-forge
+    s3fs                      2021.7.0           pyhd3eb1b0_0    defaults
+    schema-salad              8.2.20210918131710          pypi_0    pypi
+    scikit-learn              0.24.2           py38ha9443f7_0    defaults
+    scipy                     1.7.0            py38h7b17777_1    conda-forge
+    send2trash                1.5.0                      py_0    conda-forge
+    setuptools                58.2.0                   pypi_0    pypi
+    setuptools-scm            6.0.1                    pypi_0    pypi
+    shellescape               3.8.1                    pypi_0    pypi
+    shellingham               1.4.0                    pypi_0    pypi
+    simpervisor               0.4                      pypi_0    pypi
+    six                       1.16.0                   pypi_0    pypi
+    smmap                     4.0.0                    pypi_0    pypi
+    snappy                    1.1.8                he6710b0_0    defaults
+    sortedcontainers          2.4.0              pyhd3eb1b0_0    defaults
+    soupsieve                 2.2.1              pyhd3eb1b0_0    defaults
+    sqlalchemy                1.3.22           py38h497a2fe_1    conda-forge
+    sqlite                    3.34.0               h74cdb3f_0    conda-forge
+    sysroot_linux-64          2.12                h77966d4_13    conda-forge
+    tabulate                  0.8.9                    pypi_0    pypi
+    tbb                       2020.3               hfd86e86_0    defaults
+    tblib                     1.7.0              pyhd3eb1b0_0    defaults
+    tenacity                  7.0.0                    pypi_0    pypi
+    tensorboard               2.4.0              pyhc547734_0    defaults
+    tensorboard-plugin-wit    1.6.0                      py_0    defaults
+    tensorflow                2.4.1           mkl_py38hb2083e0_0    defaults
+    tensorflow-base           2.4.1           mkl_py38h43e0292_0    defaults
+    tensorflow-estimator      2.6.0              pyh7b7c402_0    defaults
+    termcolor                 1.1.0            py38h06a4308_1    defaults
+    terminado                 0.9.2            py38h578d9bd_0    conda-forge
+    testpath                  0.4.4                      py_0    conda-forge
+    textwrap3                 0.9.2                    pypi_0    pypi
+    threadpoolctl             2.2.0              pyh0d69192_0    defaults
+    tini                      0.18.0            h14c3975_1001    conda-forge
+    tk                        8.6.10               h21135ba_1    conda-forge
+    toml                      0.10.2                   pypi_0    pypi
+    toolz                     0.11.1             pyhd3eb1b0_0    defaults
+    tornado                   6.1              py38h497a2fe_1    conda-forge
+    tqdm                      4.60.0                   pypi_0    pypi
+    traitlets                 5.0.5                      py_0    conda-forge
+    typed-ast                 1.4.2                    pypi_0    pypi
+    typing-extensions         3.7.4.3                  pypi_0    pypi
+    typing_extensions         3.10.0.2           pyh06a4308_0    defaults
+    urllib3                   1.25.11                  pypi_0    pypi
+    userpath                  1.4.2                    pypi_0    pypi
+    wcmatch                   8.2                      pypi_0    pypi
+    wcwidth                   0.2.5              pyh9f0ad1d_2    conda-forge
+    webencodings              0.5.1                      py_1    conda-forge
+    webob                     1.8.7              pyhd3eb1b0_0    defaults
+    werkzeug                  2.0.1              pyhd3eb1b0_0    defaults
+    wheel                     0.36.2             pyhd3deb0d_0    conda-forge
+    wrapt                     1.12.1           py38h7b6447c_1    defaults
+    xarray                    0.19.0             pyhd3eb1b0_1    defaults
+    xhistogram                0.3.0              pyhd8ed1ab_0    conda-forge
+    xskillscore               0.0.23             pyhd8ed1ab_0    conda-forge
+    xz                        5.2.5                h516909a_1    conda-forge
+    yagup                     0.1.1                    pypi_0    pypi
+    yaml                      0.2.5                h516909a_0    conda-forge
+    yarl                      1.6.3            py38h27cfd23_0    defaults
+    zarr                      2.8.1              pyhd3eb1b0_0    defaults
+    zeromq                    4.3.3                h58526e2_3    conda-forge
+    zict                      2.0.0              pyhd3eb1b0_0    defaults
+    zipp                      3.4.0                      py_0    conda-forge
+    zlib                      1.2.11            h516909a_1010    conda-forge
+    zstd                      1.4.9                haebb681_0    defaults
+
+%% Cell type:code id: tags:
+
+``` python
+```
+%% Cell type:markdown id: tags:
+
+# Train ML model to correct predictions of week 3-4 & 5-6
+
+This notebook create a Machine Learning `ML_model` to predict weeks 3-4 & 5-6 based on `S2S` weeks 3-4 & 5-6 forecasts and is compared to `CPC` observations for the [`s2s-ai-challenge`](https://s2s-ai-challenge.github.io/).
+
+%% Cell type:markdown id: tags:
+
+# Synopsis
+
+%% Cell type:markdown id: tags:
+
+## Method: `ML-based mean bias reduction`
+
+- calculate the ML-based bias from 2000-2019 deterministic ensemble mean forecast
+- remove that the ML-based bias from 2020 forecast deterministic ensemble mean forecast
+
+%% Cell type:markdown id: tags:
+
+## Data used
+
+type: renku datasets
+
+Training-input for Machine Learning model:
+- hindcasts of models:
+    - ECMWF: `ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr`
+
+Forecast-input for Machine Learning model:
+- real-time 2020 forecasts of models:
+    - ECMWF: `ecmwf_forecast-input_2020_biweekly_deterministic.zarr`
+
+Compare Machine Learning model forecast against against ground truth:
+- `CPC` observations:
+    - `hindcast-like-observations_biweekly_deterministic.zarr`
+    - `forecast-like-observations_2020_biweekly_deterministic.zarr`
+
+%% Cell type:markdown id: tags:
+
+## Resources used
+for training, details in reproducibility
+
+- platform: renku
+- memory: 8 GB
+- processors: 2 CPU
+- storage required: 10 GB
+
+%% Cell type:markdown id: tags:
+
+## Safeguards
+
+All points have to be [x] checked. If not, your submission is invalid.
+
+Changes to the code after submissions are not possible, as the `commit` before the `tag` will be reviewed.
+(Only in exceptions and if previous effort in reproducibility can be found, it may be allowed to improve readability and reproducibility after November 1st 2021.)
+
+%% Cell type:markdown id: tags:
+
+### Safeguards to prevent [overfitting](https://en.wikipedia.org/wiki/Overfitting?wprov=sfti1)
+
+If the organizers suspect overfitting, your contribution can be disqualified.
+
+  - [x] We did not use 2020 observations in training (explicit overfitting and cheating)
+  - [x] We did not repeatedly verify my model on 2020 observations and incrementally improved my RPSS (implicit overfitting)
+  - [x] We provide RPSS scores for the training period with script `print_RPS_per_year`, see in section 6.3 `predict`.
+  - [x] We tried our best to prevent [data leakage](https://en.wikipedia.org/wiki/Leakage_(machine_learning)?wprov=sfti1).
+  - [x] We honor the `train-validate-test` [split principle](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets). This means that the hindcast data is split into `train` and `validate`, whereas `test` is withheld.
+  - [x] We did not use `test` explicitly in training or implicitly in incrementally adjusting parameters.
+  - [x] We considered [cross-validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)).
+
+%% Cell type:markdown id: tags:
+
+### Safeguards for Reproducibility
+Notebook/code must be independently reproducible from scratch by the organizers (after the competition), if not possible: no prize
+  - [x] All training data is publicly available (no pre-trained private neural networks, as they are not reproducible for us)
+  - [x] Code is well documented, readable and reproducible.
+  - [x] Code to reproduce training and predictions is preferred to run within a day on the described architecture. If the training takes longer than a day, please justify why this is needed. Please do not submit training piplelines, which take weeks to train.
+
+%% Cell type:markdown id: tags:
+
+# Todos to improve template
+
+This is just a demo.
+
+- [ ] use multiple predictor variables and two predicted variables
+- [ ] for both `lead_time`s in one go
+- [ ] consider seasonality, for now all `forecast_time` months are mixed
+- [ ] make probabilistic predictions with `category` dim, for now works deterministic
+
+%% Cell type:markdown id: tags:
+
+# Imports
+
+%% Cell type:code id: tags:
+
+``` python
+from tensorflow.keras.layers import Input, Dense, Flatten
+from tensorflow.keras.models import Sequential
+
+import matplotlib.pyplot as plt
+
+import xarray as xr
+xr.set_options(display_style='text')
+import numpy as np
+
+from dask.utils import format_bytes
+import xskillscore as xs
+```
+
+%% Cell type:markdown id: tags:
+
+# Get training data
+
+preprocessing of input data may be done in separate notebook/script
+
+%% Cell type:markdown id: tags:
+
+## Hindcast
+
+get weekly initialized hindcasts
+
+%% Cell type:code id: tags:
+
+``` python
+v='t2m'
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# preprocessed as renku dataset
+!renku storage pull ../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr
+```
+
+%% Output
+
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
+
+%% Cell type:code id: tags:
+
+``` python
+hind_2000_2019 = xr.open_zarr("../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr", consolidated=True)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# preprocessed as renku dataset
+!renku storage pull ../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr
+```
+
+%% Output
+
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
+
+%% Cell type:code id: tags:
+
+``` python
+fct_2020 = xr.open_zarr("../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr", consolidated=True)
+```
+
+%% Cell type:markdown id: tags:
+
+## Observations
+corresponding to hindcasts
+
+%% Cell type:code id: tags:
+
+``` python
+# preprocessed as renku dataset
+!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
+```
+
+%% Output
+
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
+
+%% Cell type:code id: tags:
+
+``` python
+obs_2000_2019 = xr.open_zarr("../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr", consolidated=True)#[v]
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# preprocessed as renku dataset
+!renku storage pull ../data/forecast-like-observations_2020_biweekly_deterministic.zarr
+```
+
+%% Output
+
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
+
+%% Cell type:code id: tags:
+
+``` python
+obs_2020 = xr.open_zarr("../data/forecast-like-observations_2020_biweekly_deterministic.zarr", consolidated=True)#[v]
+```
+
+%% Cell type:markdown id: tags:
+
+# ML model
+
+%% Cell type:markdown id: tags:
+
+based on [Weatherbench](https://github.com/pangeo-data/WeatherBench/blob/master/quickstart.ipynb)
+
+%% Cell type:code id: tags:
+
+``` python
+# run once only and dont commit
+!git clone https://github.com/pangeo-data/WeatherBench/
+```
+
+%% Output
+
+    fatal: destination path 'WeatherBench' already exists and is not an empty directory.
+
+%% Cell type:code id: tags:
+
+``` python
+import sys
+sys.path.insert(1, 'WeatherBench')
+from WeatherBench.src.train_nn import DataGenerator, PeriodicConv2D, create_predictions
+import tensorflow.keras as keras
+```
+
+%% Cell type:code id: tags:
+
+``` python
+bs=32
+
+import numpy as np
+class DataGenerator(keras.utils.Sequence):
+    def __init__(self, fct, verif, lead_time, batch_size=bs, shuffle=True, load=True,
+                 mean=None, std=None):
+        """
+        Data generator for WeatherBench data.
+        Template from https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly
+
+        Args:
+            fct: forecasts from S2S models: xr.DataArray (xr.Dataset doesnt work properly)
+            verif: observations with same dimensionality (xr.Dataset doesnt work properly)
+            lead_time: Lead_time as in model
+            batch_size: Batch size
+            shuffle: bool. If True, data is shuffled.
+            load: bool. If True, datadet is loaded into RAM.
+            mean: If None, compute mean from data.
+            std: If None, compute standard deviation from data.
+
+        Todo:
+        - use number in a better way, now uses only ensemble mean forecast
+        - dont use .sel(lead_time=lead_time) to train over all lead_time at once
+        - be sensitive with forecast_time, pool a few around the weekofyear given
+        - use more variables as predictors
+        - predict more variables
+        """
+
+        if isinstance(fct, xr.Dataset):
+            print('convert fct to array')
+            fct = fct.to_array().transpose(...,'variable')
+            self.fct_dataset=True
+        else:
+            self.fct_dataset=False
+
+        if isinstance(verif, xr.Dataset):
+            print('convert verif to array')
+            verif = verif.to_array().transpose(...,'variable')
+            self.verif_dataset=True
+        else:
+            self.verif_dataset=False
+
+        #self.fct = fct
+        self.batch_size = batch_size
+        self.shuffle = shuffle
+        self.lead_time = lead_time
+
+        self.fct_data = fct.transpose('forecast_time', ...).sel(lead_time=lead_time)
+        self.fct_mean = self.fct_data.mean('forecast_time').compute() if mean is None else mean
+        self.fct_std = self.fct_data.std('forecast_time').compute() if std is None else std
+
+        self.verif_data = verif.transpose('forecast_time', ...).sel(lead_time=lead_time)
+        self.verif_mean = self.verif_data.mean('forecast_time').compute() if mean is None else mean
+        self.verif_std = self.verif_data.std('forecast_time').compute() if std is None else std
+
+        # Normalize
+        self.fct_data = (self.fct_data - self.fct_mean) / self.fct_std
+        self.verif_data = (self.verif_data - self.verif_mean) / self.verif_std
+
+        self.n_samples = self.fct_data.forecast_time.size
+        self.forecast_time = self.fct_data.forecast_time
+
+        self.on_epoch_end()
+
+        # For some weird reason calling .load() earlier messes up the mean and std computations
+        if load:
+            # print('Loading data into RAM')
+            self.fct_data.load()
+
+    def __len__(self):
+        'Denotes the number of batches per epoch'
+        return int(np.ceil(self.n_samples / self.batch_size))
+
+    def __getitem__(self, i):
+        'Generate one batch of data'
+        idxs = self.idxs[i * self.batch_size:(i + 1) * self.batch_size]
+        # got all nan if nans not masked
+        X = self.fct_data.isel(forecast_time=idxs).fillna(0.).values
+        y = self.verif_data.isel(forecast_time=idxs).fillna(0.).values
+        return X, y
+
+    def on_epoch_end(self):
+        'Updates indexes after each epoch'
+        self.idxs = np.arange(self.n_samples)
+        if self.shuffle == True:
+            np.random.shuffle(self.idxs)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# 2 bi-weekly `lead_time`: week 3-4
+lead = hind_2000_2019.isel(lead_time=0).lead_time
+
+lead
+```
+
+%% Output
+
+    <xarray.DataArray 'lead_time' ()>
+    array(1209600000000000, dtype='timedelta64[ns]')
+    Coordinates:
+        lead_time  timedelta64[ns] 14 days
+    Attributes:
+        aggregate:      The pd.Timedelta corresponds to the first day of a biweek...
+        description:    Forecast period is the time interval between the forecast...
+        long_name:      lead time
+        standard_name:  forecast_period
+        week34_t2m:     mean[14 days, 27 days]
+        week34_tp:      28 days minus 14 days
+        week56_t2m:     mean[28 days, 41 days]
+        week56_tp:      42 days minus 28 days
+
+%% Cell type:code id: tags:
+
+``` python
+# mask, needed?
+hind_2000_2019 = hind_2000_2019.where(obs_2000_2019.isel(forecast_time=0, lead_time=0,drop=True).notnull())
+```
+
+%% Cell type:markdown id: tags:
+
+## data prep: train, valid, test
+
+[Use the hindcast period to split train and valid.](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets) Do not use the 2020 data for testing!
+
+%% Cell type:code id: tags:
+
+``` python
+# time is the forecast_time
+time_train_start,time_train_end='2000','2017' # train
+time_valid_start,time_valid_end='2018','2019' # valid
+time_test = '2020'                            # test
+```
+
+%% Cell type:code id: tags:
+
+``` python
+dg_train = DataGenerator(
+    hind_2000_2019.mean('realization').sel(forecast_time=slice(time_train_start,time_train_end))[v],
+    obs_2000_2019.sel(forecast_time=slice(time_train_start,time_train_end))[v],
+    lead_time=lead, batch_size=bs, load=True)
+```
+
+%% Output
+
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
+
+%% Cell type:code id: tags:
+
+``` python
+dg_valid = DataGenerator(
+    hind_2000_2019.mean('realization').sel(forecast_time=slice(time_valid_start,time_valid_end))[v],
+    obs_2000_2019.sel(forecast_time=slice(time_valid_start,time_valid_end))[v],
+    lead_time=lead, batch_size=bs, shuffle=False, load=True)
+```
+
+%% Output
+
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
+
+%% Cell type:code id: tags:
+
+``` python
+# do not use, delete?
+dg_test = DataGenerator(
+    fct_2020.mean('realization').sel(forecast_time=time_test)[v],
+    obs_2020.sel(forecast_time=time_test)[v],
+    lead_time=lead, batch_size=bs, load=True, mean=dg_train.fct_mean, std=dg_train.fct_std, shuffle=False)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+X, y = dg_valid[0]
+X.shape, y.shape
+```
+
+%% Output
+
+    ((32, 121, 240), (32, 121, 240))
+
+%% Cell type:code id: tags:
+
+``` python
+# short look into training data: large biases
+# any problem from normalizing?
+# i=4
+# xr.DataArray(np.vstack([X[i],y[i]])).plot(yincrease=False, robust=True)
+```
+
+%% Cell type:markdown id: tags:
+
+## `fit`
+
+%% Cell type:code id: tags:
+
+``` python
+cnn = keras.models.Sequential([
+    PeriodicConv2D(filters=32, kernel_size=5, conv_kwargs={'activation':'relu'}, input_shape=(32, 64, 1)),
+    PeriodicConv2D(filters=1, kernel_size=5)
+])
+```
+
+%% Output
+
+    WARNING:tensorflow:AutoGraph could not transform <bound method PeriodicPadding2D.call of <WeatherBench.src.train_nn.PeriodicPadding2D object at 0x7f86042986a0>> and will run it as-is.
+    Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
+    Cause: module 'gast' has no attribute 'Index'
+    To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
+    WARNING: AutoGraph could not transform <bound method PeriodicPadding2D.call of <WeatherBench.src.train_nn.PeriodicPadding2D object at 0x7f86042986a0>> and will run it as-is.
+    Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
+    Cause: module 'gast' has no attribute 'Index'
+    To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
+
+%% Cell type:code id: tags:
+
+``` python
+cnn.summary()
+```
+
+%% Output
+
+    Model: "sequential"
+    _________________________________________________________________
+    Layer (type)                 Output Shape              Param #
+    =================================================================
+    periodic_conv2d (PeriodicCon (None, 32, 64, 32)        832
+    _________________________________________________________________
+    periodic_conv2d_1 (PeriodicC (None, 32, 64, 1)         801
+    =================================================================
+    Total params: 1,633
+    Trainable params: 1,633
+    Non-trainable params: 0
+    _________________________________________________________________
+
+%% Cell type:code id: tags:
+
+``` python
+cnn.compile(keras.optimizers.Adam(1e-4), 'mse')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+import warnings
+warnings.simplefilter("ignore")
+```
+
+%% Cell type:code id: tags:
+
+``` python
+cnn.fit(dg_train, epochs=2, validation_data=dg_valid)
+```
+
+%% Output
+
+    Epoch 1/2
+    30/30 [==============================] - 58s 2s/step - loss: 0.1472 - val_loss: 0.0742
+    Epoch 2/2
+    30/30 [==============================] - 45s 1s/step - loss: 0.0712 - val_loss: 0.0545
+
+    <tensorflow.python.keras.callbacks.History at 0x7f865c2103d0>
+
+%% Cell type:markdown id: tags:
+
+## `predict`
+
+Create predictions and print `mean(variable, lead_time, longitude, weighted latitude)` RPSS for all years as calculated by `skill_by_year`.
+
+%% Cell type:code id: tags:
+
+``` python
+from scripts import add_valid_time_from_forecast_reference_time_and_lead_time
+
+def _create_predictions(model, dg, lead):
+    """Create non-iterative predictions"""
+    preds = model.predict(dg).squeeze()
+    # Unnormalize
+    preds = preds * dg.fct_std.values + dg.fct_mean.values
+    if dg.verif_dataset:
+        da = xr.DataArray(
+                    preds,
+                    dims=['forecast_time', 'latitude', 'longitude','variable'],
+                    coords={'forecast_time': dg.fct_data.forecast_time, 'latitude': dg.fct_data.latitude,
+                            'longitude': dg.fct_data.longitude},
+                ).to_dataset() # doesnt work yet
+    else:
+        da = xr.DataArray(
+                    preds,
+                    dims=['forecast_time', 'latitude', 'longitude'],
+                    coords={'forecast_time': dg.fct_data.forecast_time, 'latitude': dg.fct_data.latitude,
+                            'longitude': dg.fct_data.longitude},
+                )
+    da = da.assign_coords(lead_time=lead)
+    # da = add_valid_time_from_forecast_reference_time_and_lead_time(da)
+    return da
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# optionally masking the ocean when making probabilistic
+mask = obs_2020.std(['lead_time','forecast_time']).notnull()
+```
+
+%% Cell type:code id: tags:
+
+``` python
+from scripts import make_probabilistic
+```
+
+%% Cell type:code id: tags:
+
+``` python
+!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc
+```
+
+%% Output
+
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
+
+%% Cell type:code id: tags:
+
+``` python
+cache_path='../data'
+tercile_file = f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc'
+tercile_edges = xr.open_dataset(tercile_file)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# this is not useful but results have expected dimensions
+# actually train for each lead_time
+
+def create_predictions(cnn, fct, obs, time):
+    preds_test=[]
+    for lead in fct.lead_time:
+        dg = DataGenerator(fct.mean('realization').sel(forecast_time=time)[v],
+                           obs.sel(forecast_time=time)[v],
+                           lead_time=lead, batch_size=bs, mean=dg_train.fct_mean, std=dg_train.fct_std, shuffle=False)
+        preds_test.append(_create_predictions(cnn, dg, lead))
+    preds_test = xr.concat(preds_test, 'lead_time')
+    preds_test['lead_time'] = fct.lead_time
+    # add valid_time coord
+    preds_test = add_valid_time_from_forecast_reference_time_and_lead_time(preds_test)
+    preds_test = preds_test.to_dataset(name=v)
+    # add fake var
+    preds_test['tp'] = preds_test['t2m']
+    # make probabilistic
+    preds_test = make_probabilistic(preds_test.expand_dims('realization'), tercile_edges, mask=mask)
+    return preds_test
+```
+
+%% Cell type:markdown id: tags:
+
+### `predict` training period in-sample
+
+%% Cell type:code id: tags:
+
+``` python
+!renku storage pull ../data/forecast-like-observations_2020_biweekly_terciled.nc
+```
+
+%% Output
+
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
+
+%% Cell type:code id: tags:
+
+``` python
+!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr
+```
+
+%% Output
+
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
+
+%% Cell type:code id: tags:
+
+``` python
+from scripts import skill_by_year
+import os
+if os.environ['HOME'] == '/home/jovyan':
+    import pandas as pd
+    # assume on renku with small memory
+    step = 2
+    skill_list = []
+    for year in np.arange(int(time_train_start), int(time_train_end) -1, step): # loop over years to consume less memory on renku
+        preds_is = create_predictions(cnn, hind_2000_2019, obs_2000_2019, time=slice(str(year), str(year+step-1))).compute()
+        skill_list.append(skill_by_year(preds_is))
+    skill = pd.concat(skill_list)
+else: # with larger memory, simply do
+    preds_is = create_predictions(cnn, hind_2000_2019, obs_2000_2019, time=slice(time_train_start, time_train_end))
+    skill = skill_by_year(preds_is)
+skill
+```
+
+%% Output
+
+              RPSS
+    year
+    2000 -0.862483
+    2001 -1.015485
+    2002 -1.101022
+    2003 -1.032647
+    2004 -1.056348
+    2005 -1.165675
+    2006 -1.057217
+    2007 -1.170849
+    2008 -1.049785
+    2009 -1.169108
+    2010 -1.130845
+    2011 -1.052670
+    2012 -1.126449
+    2013 -1.126930
+    2014 -1.095896
+    2015 -1.117486
+
+%% Cell type:markdown id: tags:
+
+### `predict` validation period out-of-sample
+
+%% Cell type:code id: tags:
+
+``` python
+preds_os = create_predictions(cnn, hind_2000_2019, obs_2000_2019, time=slice(time_valid_start, time_valid_end))
+
+skill_by_year(preds_os)
+```
+
+%% Output
+
+              RPSS
+    year
+    2018 -1.099744
+    2019 -1.172401
+
+%% Cell type:markdown id: tags:
+
+### `predict` test
+
+%% Cell type:code id: tags:
+
+``` python
+preds_test = create_predictions(cnn, fct_2020, obs_2020, time=time_test)
+
+skill_by_year(preds_test)
+```
+
+%% Output
+
+              RPSS
+    year
+    2020 -1.076834
+
+%% Cell type:markdown id: tags:
+
+# Submission
+
+%% Cell type:code id: tags:
+
+``` python
+from scripts import assert_predictions_2020
+assert_predictions_2020(preds_test)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+preds_test.to_netcdf('../submissions/ML_prediction_2020.nc')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# !git add ../submissions/ML_prediction_2020.nc
+# !git add ML_train_and_prediction.ipynb
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# !git commit -m "template_test commit message" # whatever message you want
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# !git tag "submission-template_test-0.0.1" # if this is to be checked by scorer, only the last submitted==tagged version will be considered
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# !git push --tags
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+# Reproducibility
+
+%% Cell type:markdown id: tags:
+
+## memory
+
+%% Cell type:code id: tags:
+
+``` python
+# https://phoenixnap.com/kb/linux-commands-check-memory-usage
+!free -g
+```
+
+%% Output
+
+                  total        used        free      shared  buff/cache   available
+    Mem:             31           7          11           0          12          24
+    Swap:             0           0           0
+
+%% Cell type:markdown id: tags:
+
+## CPU
+
+%% Cell type:code id: tags:
+
+``` python
+!lscpu
+```
+
+%% Output
+
+    Architecture:                    x86_64
+    CPU op-mode(s):                  32-bit, 64-bit
+    Byte Order:                      Little Endian
+    Address sizes:                   40 bits physical, 48 bits virtual
+    CPU(s):                          8
+    On-line CPU(s) list:             0-7
+    Thread(s) per core:              1
+    Core(s) per socket:              1
+    Socket(s):                       8
+    NUMA node(s):                    1
+    Vendor ID:                       GenuineIntel
+    CPU family:                      6
+    Model:                           85
+    Model name:                      Intel Xeon Processor (Skylake, IBRS)
+    Stepping:                        4
+    CPU MHz:                         2095.078
+    BogoMIPS:                        4190.15
+    Virtualization:                  VT-x
+    Hypervisor vendor:               KVM
+    Virtualization type:             full
+    L1d cache:                       256 KiB
+    L1i cache:                       256 KiB
+    L2 cache:                        32 MiB
+    L3 cache:                        128 MiB
+    NUMA node0 CPU(s):               0-7
+    Vulnerability Itlb multihit:     KVM: Mitigation: Split huge pages
+    Vulnerability L1tf:              Mitigation; PTE Inversion; VMX conditional cach
+                                     e flushes, SMT disabled
+    Vulnerability Mds:               Vulnerable: Clear CPU buffers attempted, no mic
+                                     rocode; SMT Host state unknown
+    Vulnerability Meltdown:          Mitigation; PTI
+    Vulnerability Spec store bypass: Vulnerable
+    Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user
+                                      pointer sanitization
+    Vulnerability Spectre v2:        Mitigation; Full generic retpoline, IBPB condit
+                                     ional, IBRS_FW, STIBP disabled, RSB filling
+    Vulnerability Srbds:             Not affected
+    Vulnerability Tsx async abort:   Not affected
+    Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtr
+                                     r pge mca cmov pat pse36 clflush mmx fxsr sse s
+                                     se2 syscall nx pdpe1gb rdtscp lm constant_tsc r
+                                     ep_good nopl xtopology cpuid tsc_known_freq pni
+                                      pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_
+                                     2 x2apic movbe popcnt tsc_deadline_timer aes xs
+                                     ave avx f16c rdrand hypervisor lahf_lm abm 3dno
+                                     wprefetch cpuid_fault invpcid_single pti ibrs i
+                                     bpb tpr_shadow vnmi flexpriority ept vpid ept_a
+                                     d fsgsbase bmi1 avx2 smep bmi2 erms invpcid avx
+                                     512f avx512dq rdseed adx smap clwb avx512cd avx
+                                     512bw avx512vl xsaveopt xsavec xgetbv1 arat pku
+                                      ospke
+
+%% Cell type:markdown id: tags:
+
+## software
+
+%% Cell type:code id: tags:
+
+``` python
+!conda list
+```
+
+%% Output
+
+    # packages in environment at /opt/conda:
+    #
+    # Name                    Version                   Build  Channel
+    _libgcc_mutex             0.1                 conda_forge    conda-forge
+    _openmp_mutex             4.5                       1_gnu    conda-forge
+    _pytorch_select           0.1                       cpu_0    defaults
+    _tflow_select             2.3.0                       mkl    defaults
+    absl-py                   0.13.0           py38h06a4308_0    defaults
+    aiobotocore               1.4.1              pyhd3eb1b0_0    defaults
+    aiohttp                   3.7.4.post0      py38h7f8727e_2    defaults
+    aioitertools              0.7.1              pyhd3eb1b0_0    defaults
+    alembic                   1.4.3              pyh9f0ad1d_0    conda-forge
+    ansiwrap                  0.8.4                    pypi_0    pypi
+    appdirs                   1.4.4                    pypi_0    pypi
+    argcomplete               1.12.3                   pypi_0    pypi
+    argon2-cffi               20.1.0           py38h497a2fe_2    conda-forge
+    argparse                  1.4.0                    pypi_0    pypi
+    asciitree                 0.3.3                      py_2    defaults
+    astor                     0.8.1            py38h06a4308_0    defaults
+    astunparse                1.6.3                      py_0    defaults
+    async-timeout             3.0.1                    pypi_0    pypi
+    async_generator           1.10                       py_0    conda-forge
+    attrs                     21.2.0                   pypi_0    pypi
+    backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
+    backports                 1.0                        py_2    conda-forge
+    backports.functools_lru_cache 1.6.1                      py_0    conda-forge
+    bagit                     1.8.1                    pypi_0    pypi
+    beautifulsoup4            4.10.0             pyh06a4308_0    defaults
+    binutils_impl_linux-64    2.35.1               h193b22a_1    conda-forge
+    binutils_linux-64         2.35                h67ddf6f_30    conda-forge
+    black                     20.8b1                   pypi_0    pypi
+    blas                      1.0                         mkl    defaults
+    bleach                    3.2.1              pyh9f0ad1d_0    conda-forge
+    blinker                   1.4                        py_1    conda-forge
+    bokeh                     2.3.3            py38h06a4308_0    defaults
+    botocore                  1.20.106           pyhd3eb1b0_0    defaults
+    bottleneck                1.3.2            py38heb32a55_1    defaults
+    bracex                    2.1.1                    pypi_0    pypi
+    branca                    0.3.1                    pypi_0    pypi
+    brotli                    1.0.9                he6710b0_2    defaults
+    brotlipy                  0.7.0           py38h497a2fe_1001    conda-forge
+    bzip2                     1.0.8                h7f98852_4    conda-forge
+    c-ares                    1.17.1               h36c2ea0_0    conda-forge
+    ca-certificates           2021.7.5             h06a4308_1    defaults
+    cachecontrol              0.12.6                   pypi_0    pypi
+    cachetools                4.2.4                    pypi_0    pypi
+    calamus                   0.3.12                   pypi_0    pypi
+    cdsapi                    0.5.1                    pypi_0    pypi
+    certifi                   2021.5.30                pypi_0    pypi
+    certipy                   0.1.3                      py_0    conda-forge
+    cffi                      1.14.6                   pypi_0    pypi
+    cfgrib                    0.9.9.0            pyhd8ed1ab_1    conda-forge
+    cftime                    1.5.0            py38h6323ea4_0    defaults
+    chardet                   3.0.4                    pypi_0    pypi
+    click                     7.1.2                    pypi_0    pypi
+    click-completion          0.5.2                    pypi_0    pypi
+    click-option-group        0.5.3                    pypi_0    pypi
+    click-plugins             1.1.1                    pypi_0    pypi
+    climetlab                 0.8.31                   pypi_0    pypi
+    climetlab-s2s-ai-challenge 0.8.0                    pypi_0    pypi
+    cloudpickle               2.0.0              pyhd3eb1b0_0    defaults
+    colorama                  0.4.4                    pypi_0    pypi
+    coloredlogs               15.0.1                   pypi_0    pypi
+    commonmark                0.9.1                    pypi_0    pypi
+    conda                     4.9.2            py38h578d9bd_0    conda-forge
+    conda-package-handling    1.7.2            py38h8df0ef7_0    conda-forge
+    configargparse            1.5.2                    pypi_0    pypi
+    configurable-http-proxy   1.3.0                         0    conda-forge
+    coverage                  5.5              py38h27cfd23_2    defaults
+    cryptography              3.4.8                    pypi_0    pypi
+    curl                      7.71.1               he644dc0_8    conda-forge
+    cwlgen                    0.4.2                    pypi_0    pypi
+    cwltool                   3.1.20211004060744          pypi_0    pypi
+    cycler                    0.10.0                   py38_0    defaults
+    cython                    0.29.24          py38h295c915_0    defaults
+    cytoolz                   0.11.0           py38h7b6447c_0    defaults
+    dask                      2021.8.1           pyhd3eb1b0_0    defaults
+    dask-core                 2021.8.1           pyhd3eb1b0_0    defaults
+    dataclasses               0.8                pyh6d0b6a4_7    defaults
+    decorator                 4.4.2                      py_0    conda-forge
+    defusedxml                0.6.0                      py_0    conda-forge
+    distributed               2021.8.1         py38h06a4308_0    defaults
+    distro                    1.5.0                    pypi_0    pypi
+    docopt                    0.6.2            py38h06a4308_0    defaults
+    eccodes                   2.21.0               ha0e6eb6_0    conda-forge
+    ecmwf-api-client          1.6.1                    pypi_0    pypi
+    ecmwflibs                 0.3.14                   pypi_0    pypi
+    entrypoints               0.3             pyhd8ed1ab_1003    conda-forge
+    environ-config            21.2.0                   pypi_0    pypi
+    fasteners                 0.16.3             pyhd3eb1b0_0    defaults
+    filelock                  3.0.12                   pypi_0    pypi
+    findlibs                  0.0.2                    pypi_0    pypi
+    fonttools                 4.25.0             pyhd3eb1b0_0    defaults
+    freetype                  2.10.4               h5ab3b9f_0    defaults
+    frozendict                2.0.6                    pypi_0    pypi
+    fsspec                    2021.7.0           pyhd3eb1b0_0    defaults
+    gast                      0.4.0              pyhd3eb1b0_0    defaults
+    gcc_impl_linux-64         9.3.0               h70c0ae5_18    conda-forge
+    gcc_linux-64              9.3.0               hf25ea35_30    conda-forge
+    gitdb                     4.0.7                    pypi_0    pypi
+    gitpython                 3.1.14                   pypi_0    pypi
+    google-auth               1.33.0             pyhd3eb1b0_0    defaults
+    google-auth-oauthlib      0.4.4              pyhd3eb1b0_0    defaults
+    google-pasta              0.2.0              pyhd3eb1b0_0    defaults
+    grpcio                    1.36.1           py38h2157cd5_1    defaults
+    gxx_impl_linux-64         9.3.0               hd87eabc_18    conda-forge
+    gxx_linux-64              9.3.0               h3fbe746_30    conda-forge
+    h5netcdf                  0.11.0             pyhd8ed1ab_0    conda-forge
+    h5py                      2.10.0           py38hd6299e0_1    defaults
+    hdf4                      4.2.13               h3ca952b_2    defaults
+    hdf5                      1.10.6          nompi_h6a2412b_1114    conda-forge
+    heapdict                  1.0.1              pyhd3eb1b0_0    defaults
+    humanfriendly             10.0                     pypi_0    pypi
+    humanize                  3.7.1                    pypi_0    pypi
+    icu                       68.1                 h58526e2_0    conda-forge
+    idna                      2.10               pyh9f0ad1d_0    conda-forge
+    importlib-metadata        3.4.0            py38h578d9bd_0    conda-forge
+    importlib_metadata        3.4.0                hd8ed1ab_0    conda-forge
+    intake                    0.6.3              pyhd3eb1b0_0    defaults
+    intake-xarray             0.5.0              pyhd3eb1b0_0    defaults
+    intel-openmp              2019.4                      243    defaults
+    ipykernel                 5.4.2            py38h81c977d_0    conda-forge
+    ipython                   7.19.0           py38h81c977d_2    conda-forge
+    ipython_genutils          0.2.0                      py_1    conda-forge
+    isodate                   0.6.0                    pypi_0    pypi
+    jasper                    1.900.1              hd497a04_4    defaults
+    jedi                      0.17.2           py38h578d9bd_1    conda-forge
+    jellyfish                 0.8.8                    pypi_0    pypi
+    jinja2                    3.0.1                    pypi_0    pypi
+    jmespath                  0.10.0             pyhd3eb1b0_0    defaults
+    joblib                    1.0.1              pyhd3eb1b0_0    defaults
+    jpeg                      9d                   h7f8727e_0    defaults
+    json5                     0.9.5              pyh9f0ad1d_0    conda-forge
+    jsonschema                3.2.0                      py_2    conda-forge
+    jupyter-server-proxy      1.6.0                    pypi_0    pypi
+    jupyter_client            6.1.11             pyhd8ed1ab_1    conda-forge
+    jupyter_core              4.7.0            py38h578d9bd_0    conda-forge
+    jupyter_telemetry         0.1.0              pyhd8ed1ab_1    conda-forge
+    jupyterhub                1.2.2                    pypi_0    pypi
+    jupyterlab                2.2.9                      py_0    conda-forge
+    jupyterlab-git            0.23.3                   pypi_0    pypi
+    jupyterlab_pygments       0.1.2              pyh9f0ad1d_0    conda-forge
+    jupyterlab_server         1.2.0                      py_0    conda-forge
+    keras-preprocessing       1.1.2              pyhd3eb1b0_0    defaults
+    kernel-headers_linux-64   2.6.32              h77966d4_13    conda-forge
+    kiwisolver                1.3.1            py38h2531618_0    defaults
+    krb5                      1.17.2               h926e7f8_0    conda-forge
+    lazy-object-proxy         1.6.0                    pypi_0    pypi
+    lcms2                     2.12                 h3be6417_0    defaults
+    ld_impl_linux-64          2.35.1               hea4e1c9_1    conda-forge
+    libaec                    1.0.4                he6710b0_1    defaults
+    libblas                   3.9.0           1_h86c2bf4_netlib    conda-forge
+    libcblas                  3.9.0           5_h92ddd45_netlib    conda-forge
+    libcurl                   7.71.1               hcdd3856_8    conda-forge
+    libedit                   3.1.20191231         he28a2e2_2    conda-forge
+    libev                     4.33                 h516909a_1    conda-forge
+    libffi                    3.3                  h58526e2_2    conda-forge
+    libgcc-devel_linux-64     9.3.0               h7864c58_18    conda-forge
+    libgcc-ng                 9.3.0               h2828fa1_18    conda-forge
+    libgfortran-ng            9.3.0               ha5ec8a7_17    defaults
+    libgfortran5              9.3.0               ha5ec8a7_17    defaults
+    libgomp                   9.3.0               h2828fa1_18    conda-forge
+    liblapack                 3.9.0           5_h92ddd45_netlib    conda-forge
+    libllvm10                 10.0.1               hbcb73fb_5    defaults
+    libmklml                  2019.0.5                      0    defaults
+    libnetcdf                 4.7.4           nompi_h56d31a8_107    conda-forge
+    libnghttp2                1.41.0               h8cfc5f6_2    conda-forge
+    libpng                    1.6.37               hbc83047_0    defaults
+    libprotobuf               3.17.2               h4ff587b_1    defaults
+    libsodium                 1.0.18               h36c2ea0_1    conda-forge
+    libssh2                   1.9.0                hab1572f_5    conda-forge
+    libstdcxx-devel_linux-64  9.3.0               hb016644_18    conda-forge
+    libstdcxx-ng              9.3.0               h6de172a_18    conda-forge
+    libtiff                   4.2.0                h85742a9_0    defaults
+    libuv                     1.40.0               h7f98852_0    conda-forge
+    libwebp-base              1.2.0                h27cfd23_0    defaults
+    llvmlite                  0.36.0           py38h612dafd_4    defaults
+    locket                    0.2.1            py38h06a4308_1    defaults
+    lockfile                  0.12.2                   pypi_0    pypi
+    lxml                      4.6.3                    pypi_0    pypi
+    lz4-c                     1.9.3                h295c915_1    defaults
+    magics                    1.5.6                    pypi_0    pypi
+    mako                      1.1.4              pyh44b312d_0    conda-forge
+    markdown                  3.3.4            py38h06a4308_0    defaults
+    markupsafe                2.0.1                    pypi_0    pypi
+    marshmallow               3.13.0                   pypi_0    pypi
+    matplotlib-base           3.4.2            py38hab158f2_0    defaults
+    mistune                   0.8.4           py38h497a2fe_1003    conda-forge
+    mkl                       2020.2                      256    defaults
+    mkl-service               2.3.0            py38he904b0f_0    defaults
+    mkl_fft                   1.3.0            py38h54f3939_0    defaults
+    mkl_random                1.1.1            py38h0573a6f_0    defaults
+    msgpack-python            1.0.2            py38hff7bd54_1    defaults
+    multidict                 5.1.0            py38h27cfd23_2    defaults
+    munkres                   1.1.4                      py_0    defaults
+    mypy-extensions           0.4.3                    pypi_0    pypi
+    nbclient                  0.5.0                    pypi_0    pypi
+    nbconvert                 6.0.7            py38h578d9bd_3    conda-forge
+    nbdime                    2.1.0                    pypi_0    pypi
+    nbformat                  5.1.2              pyhd8ed1ab_1    conda-forge
+    nbresuse                  0.4.0                    pypi_0    pypi
+    nc-time-axis              1.3.1              pyhd8ed1ab_2    conda-forge
+    ncurses                   6.2                  h58526e2_4    conda-forge
+    ndg-httpsclient           0.5.1                    pypi_0    pypi
+    nest-asyncio              1.4.3              pyhd8ed1ab_0    conda-forge
+    netcdf4                   1.5.4                    pypi_0    pypi
+    networkx                  2.6.3                    pypi_0    pypi
+    ninja                     1.10.2               hff7bd54_1    defaults
+    nodejs                    15.3.0               h25f6087_0    conda-forge
+    notebook                  6.2.0            py38h578d9bd_0    conda-forge
+    numba                     0.53.1           py38ha9443f7_0    defaults
+    numcodecs                 0.8.0            py38h2531618_0    defaults
+    numexpr                   2.7.3            py38hb2eb853_0    defaults
+    numpy                     1.19.2           py38h54aff64_0    defaults
+    numpy-base                1.19.2           py38hfa32c7d_0    defaults
+    oauthlib                  3.0.1                      py_0    conda-forge
+    olefile                   0.46               pyhd3eb1b0_0    defaults
+    openjpeg                  2.4.0                h3ad879b_0    defaults
+    openssl                   1.1.1l               h7f8727e_0    defaults
+    opt_einsum                3.3.0              pyhd3eb1b0_1    defaults
+    owlrl                     5.2.3                    pypi_0    pypi
+    packaging                 20.8               pyhd3deb0d_0    conda-forge
+    pamela                    1.0.0                      py_0    conda-forge
+    pandas                    1.3.2            py38h8c16a72_0    defaults
+    pandoc                    2.11.3.2             h7f98852_0    conda-forge
+    pandocfilters             1.4.2                      py_1    conda-forge
+    papermill                 2.3.1                    pypi_0    pypi
+    parso                     0.7.1              pyh9f0ad1d_0    conda-forge
+    partd                     1.2.0              pyhd3eb1b0_0    defaults
+    pathspec                  0.9.0                    pypi_0    pypi
+    patool                    1.12                     pypi_0    pypi
+    pdbufr                    0.9.0                    pypi_0    pypi
+    pexpect                   4.8.0              pyh9f0ad1d_2    conda-forge
+    pickleshare               0.7.5                   py_1003    conda-forge
+    pillow                    8.3.1            py38h2c7a002_0    defaults
+    pip                       21.0.1                   pypi_0    pypi
+    pipx                      0.16.1.0                 pypi_0    pypi
+    pluggy                    0.13.1                   pypi_0    pypi
+    portalocker               2.3.2                    pypi_0    pypi
+    powerline-shell           0.7.0                    pypi_0    pypi
+    prometheus_client         0.9.0              pyhd3deb0d_0    conda-forge
+    prompt-toolkit            3.0.10             pyha770c72_0    conda-forge
+    properscoring             0.1                        py_0    conda-forge
+    protobuf                  3.17.2           py38h295c915_0    defaults
+    prov                      1.5.1                    pypi_0    pypi
+    psutil                    5.8.0            py38h27cfd23_1    defaults
+    ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
+    pyasn1                    0.4.8              pyhd3eb1b0_0    defaults
+    pyasn1-modules            0.2.8                      py_0    defaults
+    pycosat                   0.6.3           py38h497a2fe_1006    conda-forge
+    pycparser                 2.20               pyh9f0ad1d_2    conda-forge
+    pycurl                    7.43.0.6         py38h996a351_1    conda-forge
+    pydap                     3.2.2           pyh9f0ad1d_1001    conda-forge
+    pydot                     1.4.2                    pypi_0    pypi
+    pygments                  2.10.0                   pypi_0    pypi
+    pyjwt                     2.1.0                    pypi_0    pypi
+    pyld                      2.0.3                    pypi_0    pypi
+    pyodc                     1.1.1                    pypi_0    pypi
+    pyopenssl                 20.0.1             pyhd8ed1ab_0    conda-forge
+    pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
+    pyrsistent                0.17.3           py38h497a2fe_2    conda-forge
+    pyshacl                   0.17.0.post1             pypi_0    pypi
+    pysocks                   1.7.1            py38h578d9bd_3    conda-forge
+    python                    3.8.6           hffdb5ce_4_cpython    conda-forge
+    python-dateutil           2.8.1                      py_0    conda-forge
+    python-eccodes            2021.03.0        py38hb5d20a5_1    conda-forge
+    python-editor             1.0.4                    pypi_0    pypi
+    python-flatbuffers        1.12               pyhd3eb1b0_0    defaults
+    python-json-logger        2.0.1              pyh9f0ad1d_0    conda-forge
+    python-snappy             0.6.0            py38h2531618_3    defaults
+    python_abi                3.8                      1_cp38    conda-forge
+    pytorch                   1.8.1           cpu_py38h60491be_0    defaults
+    pytz                      2021.1             pyhd3eb1b0_0    defaults
+    pyyaml                    5.4.1                    pypi_0    pypi
+    pyzmq                     21.0.1           py38h3d7ac18_0    conda-forge
+    rdflib                    6.0.1                    pypi_0    pypi
+    rdflib-jsonld             0.5.0                    pypi_0    pypi
+    readline                  8.0                  he28a2e2_2    conda-forge
+    regex                     2021.4.4                 pypi_0    pypi
+    renku                     0.16.2                   pypi_0    pypi
+    requests                  2.24.0                   pypi_0    pypi
+    requests-oauthlib         1.3.0                      py_0    defaults
+    rich                      10.3.0                   pypi_0    pypi
+    rsa                       4.7.2              pyhd3eb1b0_1    defaults
+    ruamel-yaml               0.16.5                   pypi_0    pypi
+    ruamel.yaml.clib          0.2.2            py38h497a2fe_2    conda-forge
+    ruamel_yaml               0.15.80         py38h497a2fe_1003    conda-forge
+    s3fs                      2021.7.0           pyhd3eb1b0_0    defaults
+    schema-salad              8.2.20210918131710          pypi_0    pypi
+    scikit-learn              0.24.2           py38ha9443f7_0    defaults
+    scipy                     1.7.0            py38h7b17777_1    conda-forge
+    send2trash                1.5.0                      py_0    conda-forge
+    setuptools                58.2.0                   pypi_0    pypi
+    setuptools-scm            6.0.1                    pypi_0    pypi
+    shellescape               3.8.1                    pypi_0    pypi
+    shellingham               1.4.0                    pypi_0    pypi
+    simpervisor               0.4                      pypi_0    pypi
+    six                       1.16.0                   pypi_0    pypi
+    smmap                     4.0.0                    pypi_0    pypi
+    snappy                    1.1.8                he6710b0_0    defaults
+    sortedcontainers          2.4.0              pyhd3eb1b0_0    defaults
+    soupsieve                 2.2.1              pyhd3eb1b0_0    defaults
+    sqlalchemy                1.3.22           py38h497a2fe_1    conda-forge
+    sqlite                    3.34.0               h74cdb3f_0    conda-forge
+    sysroot_linux-64          2.12                h77966d4_13    conda-forge
+    tabulate                  0.8.9                    pypi_0    pypi
+    tbb                       2020.3               hfd86e86_0    defaults
+    tblib                     1.7.0              pyhd3eb1b0_0    defaults
+    tenacity                  7.0.0                    pypi_0    pypi
+    tensorboard               2.4.0              pyhc547734_0    defaults
+    tensorboard-plugin-wit    1.6.0                      py_0    defaults
+    tensorflow                2.4.1           mkl_py38hb2083e0_0    defaults
+    tensorflow-base           2.4.1           mkl_py38h43e0292_0    defaults
+    tensorflow-estimator      2.6.0              pyh7b7c402_0    defaults
+    termcolor                 1.1.0            py38h06a4308_1    defaults
+    terminado                 0.9.2            py38h578d9bd_0    conda-forge
+    testpath                  0.4.4                      py_0    conda-forge
+    textwrap3                 0.9.2                    pypi_0    pypi
+    threadpoolctl             2.2.0              pyh0d69192_0    defaults
+    tini                      0.18.0            h14c3975_1001    conda-forge
+    tk                        8.6.10               h21135ba_1    conda-forge
+    toml                      0.10.2                   pypi_0    pypi
+    toolz                     0.11.1             pyhd3eb1b0_0    defaults
+    tornado                   6.1              py38h497a2fe_1    conda-forge
+    tqdm                      4.60.0                   pypi_0    pypi
+    traitlets                 5.0.5                      py_0    conda-forge
+    typed-ast                 1.4.2                    pypi_0    pypi
+    typing-extensions         3.7.4.3                  pypi_0    pypi
+    typing_extensions         3.10.0.2           pyh06a4308_0    defaults
+    urllib3                   1.25.11                  pypi_0    pypi
+    userpath                  1.4.2                    pypi_0    pypi
+    wcmatch                   8.2                      pypi_0    pypi
+    wcwidth                   0.2.5              pyh9f0ad1d_2    conda-forge
+    webencodings              0.5.1                      py_1    conda-forge
+    webob                     1.8.7              pyhd3eb1b0_0    defaults
+    werkzeug                  2.0.1              pyhd3eb1b0_0    defaults
+    wheel                     0.36.2             pyhd3deb0d_0    conda-forge
+    wrapt                     1.12.1           py38h7b6447c_1    defaults
+    xarray                    0.19.0             pyhd3eb1b0_1    defaults
+    xhistogram                0.3.0              pyhd8ed1ab_0    conda-forge
+    xskillscore               0.0.23             pyhd8ed1ab_0    conda-forge
+    xz                        5.2.5                h516909a_1    conda-forge
+    yagup                     0.1.1                    pypi_0    pypi
+    yaml                      0.2.5                h516909a_0    conda-forge
+    yarl                      1.6.3            py38h27cfd23_0    defaults
+    zarr                      2.8.1              pyhd3eb1b0_0    defaults
+    zeromq                    4.3.3                h58526e2_3    conda-forge
+    zict                      2.0.0              pyhd3eb1b0_0    defaults
+    zipp                      3.4.0                      py_0    conda-forge
+    zlib                      1.2.11            h516909a_1010    conda-forge
+    zstd                      1.4.9                haebb681_0    defaults
+
+%% Cell type:code id: tags:
+
+``` python
+```
--- a/notebooks/RPSS_verification.ipynb
+++ b/notebooks/RPSS_verification.ipynb
+%% Cell type:markdown id: tags:
+
+# Score biweekly ML forecasts against re-calibrated ECWMF forecast with RPSS
+
+%% Cell type:markdown id: tags:
+
+Goal:
+
+- Score biweekly 2020 ML-based forecasts against climatology with RPSS
+- identical to scorer bot script: https://renkulab.io/gitlab/tasko.olevski/s2s-ai-competition-scoring-image/-/blob/master/scoring/scoring_script.py
+- (for reference) RPSS of recalibrated real-time 2020 ECMWF forecasts
+
+Requirements:
+
+- [`xskillscore`](https://github.com/xarray-contrib/xskillscore)
+- [renku datasets](https://renku-python.readthedocs.io/en/latest/commands.html#module-renku.cli.dataset) / file
+    - observations
+        - probabilistic:
+            - renku dataset: `forecast-like-observations_2020_biweekly_terciled.nc`
+    - ML forecasts
+        - probabilistic:
+            - file: `../submissions/ML_prediction_2020.nc`
+    - benchmark to check whether ML better than recalibrated ECMWF:
+        - probabilistic:
+            - renku dataset: `ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc`
+
+Output:
+- RPSS score averaged over `lead_time` spatially averaged from [90N-60S] with cosine `latitude` weights and averaged over both `lead_time`s and variables
+
+%% Cell type:code id: tags:
+
+``` python
+import xarray as xr
+import xskillscore as xs
+import numpy as np
+
+xr.set_options(keep_attrs=True)
+xr.set_options(display_style='text')
+
+cache_path = "../data"
+
+plot = False # take much more space in git, consider not commiting figures regularly because git size will increase
+```
+
+%% Cell type:markdown id: tags:
+
+# Get categorized
+
+%% Cell type:markdown id: tags:
+
+## 2020 observations
+
+%% Cell type:code id: tags:
+
+``` python
+# to use retrieve from git lfs
+!renku storage pull ../data/forecast-like-observations_2020_biweekly_terciled.nc
+```
+
+%% Output
+
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
+
+%% Cell type:code id: tags:
+
+``` python
+obs_p = xr.open_dataset(f'{cache_path}/forecast-like-observations_2020_biweekly_terciled.nc')
+
+obs_p.sizes
+```
+
+%% Output
+
+    Frozen({'category': 3, 'lead_time': 2, 'forecast_time': 53, 'latitude': 121, 'longitude': 240})
+
+%% Cell type:markdown id: tags:
+
+## 2020 ML-based forecasts
+
+%% Cell type:code id: tags:
+
+``` python
+!renku storage pull ../submissions/ML_prediction_2020.nc
+```
+
+%% Output
+
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
+
+%% Cell type:code id: tags:
+
+``` python
+# submission for your ML model
+fct_p = xr.open_dataset(f"../submissions/ML_prediction_2020.nc")
+fct_p.sizes
+```
+
+%% Output
+
+    Frozen({'category': 3, 'lead_time': 2, 'forecast_time': 53, 'latitude': 121, 'longitude': 240})
+
+%% Cell type:markdown id: tags:
+
+## climatology
+
+%% Cell type:code id: tags:
+
+``` python
+clim_p = xr.DataArray([1/3, 1/3, 1/3], dims='category', coords={'category':['below normal', 'near normal', 'above normal']}).to_dataset(name='tp')
+clim_p['t2m'] = clim_p['tp']
+```
+
+%% Cell type:markdown id: tags:
+
+### checks
+
+%% Cell type:code id: tags:
+
+``` python
+from scripts import assert_predictions_2020
+
+assert_predictions_2020(fct_p)
+assert_predictions_2020(obs_p)
+```
+
+%% Cell type:markdown id: tags:
+
+# RPS
+
+%% Cell type:markdown id: tags:
+
+## RPS ML model
+
+%% Cell type:code id: tags:
+
+``` python
+rps_ML = xs.rps(obs_p, fct_p, category_edges=None, dim=[], input_distributions='p').compute()
+```
+
+%% Cell type:code id: tags:
+
+``` python
+if plot:
+    for v in rps_ML.data_vars:
+        rps_ML[v].mean('forecast_time').plot(robust=True, col='lead_time')
+```
+
+%% Cell type:markdown id: tags:
+
+## RPS climatology
+
+%% Cell type:code id: tags:
+
+``` python
+rps_clim = xs.rps(obs_p, clim_p, category_edges=None, dim=[], input_distributions='p').compute()
+```
+
+%% Cell type:code id: tags:
+
+``` python
+if plot:
+    for v in rps_clim.data_vars:
+        rps_clim[v].plot(robust=True, col='lead_time')
+```
+
+%% Cell type:markdown id: tags:
+
+## RPSS
+
+%% Cell type:code id: tags:
+
+``` python
+# penalize # https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/issues/7
+expect = obs_p.sum('category')
+expect = expect.where(expect > 0.98).where(expect < 1.02)  # should be True if not all NaN
+
+# https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/issues/50
+rps_ML = rps_ML.where(expect, other=2)  # assign RPS=2 where value was expected but NaN found
+
+# following Weigel 2007: https://doi.org/10.1175/MWR3280.1
+rpss = 1 - (rps_ML.mean('forecast_time') / rps_clim.mean('forecast_time'))
+# clip
+rpss = rpss.clip(-10, 1)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+if plot:
+    for v in rpss.data_vars:
+        rpss[v].plot(robust=True, col='lead_time', vmax=1)
+```
+
+%% Cell type:markdown id: tags:
+
+# Scoring for RPSS leaderboard
+
+%% Cell type:code id: tags:
+
+``` python
+# what to do with requested grid cells where NaN is submitted? also penalize, todo: https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/issues/7
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# for the subleaderboards
+mask = xr.ones_like(rpss.isel(lead_time=0, drop=True)).reset_coords(drop=True).t2m
+boundary_tropics = 30
+mask = xr.concat([mask.where(mask.latitude > boundary_tropics),
+                mask.where(np.abs(mask.latitude) <= boundary_tropics),
+                mask.where((mask.latitude < -boundary_tropics) & (mask.latitude > -60))], 'area')
+mask = mask.assign_coords(area=['northern_extratropics', 'tropics', 'southern_extratropics'])
+mask.name = 'area'
+
+mask = mask.where(rpss.t2m.isel(lead_time=0, drop=True).notnull())
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# weighted area mean
+weights = np.cos(np.deg2rad(np.abs(mask.latitude)))
+# spatially weighted score averaged and averaged over lead_times
+scores = (rpss * mask).weighted(weights).mean('latitude').mean('longitude')
+pd_scores = scores.reset_coords(drop=True).to_dataframe().unstack(0).T.round(2)
+
+pd_scores # for subleaderboards
+```
+
+%% Output
+
+    area           northern_extratropics  tropics  southern_extratropics
+        lead_time
+    t2m 14 days                     0.06     0.08                   0.03
+        28 days                     0.01     0.05                  -0.01
+    tp  14 days                    -0.08    -0.11                  -0.06
+        28 days                    -0.09    -0.18                  -0.07
+
+%% Cell type:code id: tags:
+
+``` python
+# averaged variables to one single value
+scores = rpss.sel(latitude=slice(None, -60)).weighted(weights).mean('latitude').mean('longitude').to_array().mean(['lead_time','variable']).reset_coords(drop=True)
+# final score score transfered to leaderboard
+print(scores)
+```
+
+%% Output
+
+    <xarray.DataArray ()>
+    array(-0.03815502)
+
+%% Cell type:code id: tags:
+
+``` python
+# check your submission status here: https://renkulab.io/gitlab/tasko.olevski/s2s-ai-competition-scoring-image/-/blob/master/README.md
+```
+
+%% Cell type:markdown id: tags:
+
+# RPS re-calibrated ECMWF benchmark
+for reference to check what the ML-model should beat
+
+%% Cell type:code id: tags:
+
+``` python
+plot = False # show ECMWF RPSS for reference
+```
+
+%% Cell type:code id: tags:
+
+``` python
+!renku storage pull ../data/ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc
+```
+
+%% Output
+
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
+
+%% Cell type:code id: tags:
+
+``` python
+bench_p = xr.open_dataset('../data/ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+assert_predictions_2020(bench_p)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+rps_bench = xs.rps(obs_p, bench_p, category_edges=None, dim=[], input_distributions='p').compute()
+```
+
+%% Cell type:code id: tags:
+
+``` python
+if plot:
+    for v in rps_bench.data_vars:
+        rps_bench[v].mean('forecast_time').plot(robust=True, col='lead_time')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/issues/50
+rps_bench = rps_bench.where(expect, other=2)  # assign RPS=2 where value was expected but NaN found
+
+# following Weigel 2007: https://doi.org/10.1175/MWR3280.1
+rpss_bench = 1 - (rps_bench.mean('forecast_time') / rps_clim.mean('forecast_time'))
+# clip
+rpss_bench = rpss_bench.clip(-10, 1)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+if plot:
+    for v in rpss_bench.data_vars:
+        rpss_bench[v].plot(robust=True, col='lead_time')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+scores_bench = (rpss_bench * mask).weighted(weights).mean('latitude').mean('longitude')
+pd_scores_bench = scores_bench.reset_coords(drop=True).to_dataframe().unstack(0).T.round(2)
+
+pd_scores_bench # for subleaderboards
+```
+
+%% Output
+
+    area           northern_extratropics  tropics  southern_extratropics
+        lead_time
+    tp  14 days                    -0.01    -0.00                  -0.01
+        28 days                    -0.03    -0.06                  -0.01
+    t2m 14 days                     0.05     0.07                   0.00
+        28 days                    -0.00     0.04                  -0.04
+
+%% Cell type:code id: tags:
+
+``` python
+# averaged variables to one single value
+scores_bench = rpss_bench.sel(latitude=slice(None, -60)).weighted(weights).mean('latitude').mean('longitude').to_array().mean(['lead_time','variable']).reset_coords(drop=True)
+# final score score transfered to leaderboard
+print(scores_bench)
+```
+
+%% Output
+
+    <xarray.DataArray ()>
+    array(-0.00158017)
+
+%% Cell type:code id: tags:
+
+``` python
+```
+%% Cell type:markdown id: tags:
+
+# Score biweekly ML forecasts against re-calibrated ECWMF forecast with RPSS
+
+%% Cell type:markdown id: tags:
+
+Goal:
+
+- Score biweekly 2020 ML-based forecasts against climatology with RPSS
+- identical to scorer bot script: https://renkulab.io/gitlab/tasko.olevski/s2s-ai-competition-scoring-image/-/blob/master/scoring/scoring_script.py
+- (for reference) RPSS of recalibrated real-time 2020 ECMWF forecasts
+
+Requirements:
+
+- [`xskillscore`](https://github.com/xarray-contrib/xskillscore)
+- [renku datasets](https://renku-python.readthedocs.io/en/latest/commands.html#module-renku.cli.dataset) / file
+    - observations
+        - probabilistic:
+            - renku dataset: `forecast-like-observations_2020_biweekly_terciled.nc`
+    - ML forecasts
+        - probabilistic:
+            - file: `../submissions/ML_prediction_2020.nc`
+    - benchmark to check whether ML better than recalibrated ECMWF:
+        - probabilistic:
+            - renku dataset: `ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc`
+
+Output:
+- RPSS score averaged over `lead_time` spatially averaged from [90N-60S] with cosine `latitude` weights and averaged over both `lead_time`s and variables
+
+%% Cell type:code id: tags:
+
+``` python
+import xarray as xr
+import xskillscore as xs
+import numpy as np
+
+xr.set_options(keep_attrs=True)
+xr.set_options(display_style='text')
+
+cache_path = "../data"
+
+plot = False # take much more space in git, consider not commiting figures regularly because git size will increase
+```
+
+%% Cell type:markdown id: tags:
+
+# Get categorized
+
+%% Cell type:markdown id: tags:
+
+## 2020 observations
+
+%% Cell type:code id: tags:
+
+``` python
+# to use retrieve from git lfs
+!renku storage pull ../data/forecast-like-observations_2020_biweekly_terciled.nc
+```
+
+%% Output
+
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
+
+%% Cell type:code id: tags:
+
+``` python
+obs_p = xr.open_dataset(f'{cache_path}/forecast-like-observations_2020_biweekly_terciled.nc')
+
+obs_p.sizes
+```
+
+%% Output
+
+    Frozen({'category': 3, 'lead_time': 2, 'forecast_time': 53, 'latitude': 121, 'longitude': 240})
+
+%% Cell type:markdown id: tags:
+
+## 2020 ML-based forecasts
+
+%% Cell type:code id: tags:
+
+``` python
+!renku storage pull ../submissions/ML_prediction_2020.nc
+```
+
+%% Output
+
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
+
+%% Cell type:code id: tags:
+
+``` python
+# submission for your ML model
+fct_p = xr.open_dataset(f"../submissions/ML_prediction_2020.nc")
+fct_p.sizes
+```
+
+%% Output
+
+    Frozen({'category': 3, 'lead_time': 2, 'forecast_time': 53, 'latitude': 121, 'longitude': 240})
+
+%% Cell type:markdown id: tags:
+
+## climatology
+
+%% Cell type:code id: tags:
+
+``` python
+clim_p = xr.DataArray([1/3, 1/3, 1/3], dims='category', coords={'category':['below normal', 'near normal', 'above normal']}).to_dataset(name='tp')
+clim_p['t2m'] = clim_p['tp']
+```
+
+%% Cell type:markdown id: tags:
+
+### checks
+
+%% Cell type:code id: tags:
+
+``` python
+from scripts import assert_predictions_2020
+
+assert_predictions_2020(fct_p)
+assert_predictions_2020(obs_p)
+```
+
+%% Cell type:markdown id: tags:
+
+# RPS
+
+%% Cell type:markdown id: tags:
+
+## RPS ML model
+
+%% Cell type:code id: tags:
+
+``` python
+rps_ML = xs.rps(obs_p, fct_p, category_edges=None, dim=[], input_distributions='p').compute()
+```
+
+%% Cell type:code id: tags:
+
+``` python
+if plot:
+    for v in rps_ML.data_vars:
+        rps_ML[v].mean('forecast_time').plot(robust=True, col='lead_time')
+```
+
+%% Cell type:markdown id: tags:
+
+## RPS climatology
+
+%% Cell type:code id: tags:
+
+``` python
+rps_clim = xs.rps(obs_p, clim_p, category_edges=None, dim=[], input_distributions='p').compute()
+```
+
+%% Cell type:code id: tags:
+
+``` python
+if plot:
+    for v in rps_clim.data_vars:
+        rps_clim[v].plot(robust=True, col='lead_time')
+```
+
+%% Cell type:markdown id: tags:
+
+## RPSS
+
+%% Cell type:code id: tags:
+
+``` python
+# penalize # https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/issues/7
+expect = obs_p.sum('category')
+expect = expect.where(expect > 0.98).where(expect < 1.02)  # should be True if not all NaN
+
+# https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/issues/50
+rps_ML = rps_ML.where(expect, other=2)  # assign RPS=2 where value was expected but NaN found
+
+# following Weigel 2007: https://doi.org/10.1175/MWR3280.1
+rpss = 1 - (rps_ML.mean('forecast_time') / rps_clim.mean('forecast_time'))
+# clip
+rpss = rpss.clip(-10, 1)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+if plot:
+    for v in rpss.data_vars:
+        rpss[v].plot(robust=True, col='lead_time', vmax=1)
+```
+
+%% Cell type:markdown id: tags:
+
+# Scoring for RPSS leaderboard
+
+%% Cell type:code id: tags:
+
+``` python
+# what to do with requested grid cells where NaN is submitted? also penalize, todo: https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/issues/7
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# for the subleaderboards
+mask = xr.ones_like(rpss.isel(lead_time=0, drop=True)).reset_coords(drop=True).t2m
+boundary_tropics = 30
+mask = xr.concat([mask.where(mask.latitude > boundary_tropics),
+                mask.where(np.abs(mask.latitude) <= boundary_tropics),
+                mask.where((mask.latitude < -boundary_tropics) & (mask.latitude > -60))], 'area')
+mask = mask.assign_coords(area=['northern_extratropics', 'tropics', 'southern_extratropics'])
+mask.name = 'area'
+
+mask = mask.where(rpss.t2m.isel(lead_time=0, drop=True).notnull())
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# weighted area mean
+weights = np.cos(np.deg2rad(np.abs(mask.latitude)))
+# spatially weighted score averaged and averaged over lead_times
+scores = (rpss * mask).weighted(weights).mean('latitude').mean('longitude')
+pd_scores = scores.reset_coords(drop=True).to_dataframe().unstack(0).T.round(2)
+
+pd_scores # for subleaderboards
+```
+
+%% Output
+
+    area           northern_extratropics  tropics  southern_extratropics
+        lead_time
+    t2m 14 days                     0.06     0.08                   0.03
+        28 days                     0.01     0.05                  -0.01
+    tp  14 days                    -0.08    -0.11                  -0.06
+        28 days                    -0.09    -0.18                  -0.07
+
+%% Cell type:code id: tags:
+
+``` python
+# averaged variables to one single value
+scores = rpss.sel(latitude=slice(None, -60)).weighted(weights).mean('latitude').mean('longitude').to_array().mean(['lead_time','variable']).reset_coords(drop=True)
+# final score score transfered to leaderboard
+print(scores)
+```
+
+%% Output
+
+    <xarray.DataArray ()>
+    array(-0.03815502)
+
+%% Cell type:code id: tags:
+
+``` python
+# check your submission status here: https://renkulab.io/gitlab/tasko.olevski/s2s-ai-competition-scoring-image/-/blob/master/README.md
+```
+
+%% Cell type:markdown id: tags:
+
+# RPS re-calibrated ECMWF benchmark
+for reference to check what the ML-model should beat
+
+%% Cell type:code id: tags:
+
+``` python
+plot = False # show ECMWF RPSS for reference
+```
+
+%% Cell type:code id: tags:
+
+``` python
+!renku storage pull ../data/ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc
+```
+
+%% Output
+
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
+
+%% Cell type:code id: tags:
+
+``` python
+bench_p = xr.open_dataset('../data/ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+assert_predictions_2020(bench_p)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+rps_bench = xs.rps(obs_p, bench_p, category_edges=None, dim=[], input_distributions='p').compute()
+```
+
+%% Cell type:code id: tags:
+
+``` python
+if plot:
+    for v in rps_bench.data_vars:
+        rps_bench[v].mean('forecast_time').plot(robust=True, col='lead_time')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/issues/50
+rps_bench = rps_bench.where(expect, other=2)  # assign RPS=2 where value was expected but NaN found
+
+# following Weigel 2007: https://doi.org/10.1175/MWR3280.1
+rpss_bench = 1 - (rps_bench.mean('forecast_time') / rps_clim.mean('forecast_time'))
+# clip
+rpss_bench = rpss_bench.clip(-10, 1)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+if plot:
+    for v in rpss_bench.data_vars:
+        rpss_bench[v].plot(robust=True, col='lead_time')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+scores_bench = (rpss_bench * mask).weighted(weights).mean('latitude').mean('longitude')
+pd_scores_bench = scores_bench.reset_coords(drop=True).to_dataframe().unstack(0).T.round(2)
+
+pd_scores_bench # for subleaderboards
+```
+
+%% Output
+
+    area           northern_extratropics  tropics  southern_extratropics
+        lead_time
+    tp  14 days                    -0.01    -0.00                  -0.01
+        28 days                    -0.03    -0.06                  -0.01
+    t2m 14 days                     0.05     0.07                   0.00
+        28 days                    -0.00     0.04                  -0.04
+
+%% Cell type:code id: tags:
+
+``` python
+# averaged variables to one single value
+scores_bench = rpss_bench.sel(latitude=slice(None, -60)).weighted(weights).mean('latitude').mean('longitude').to_array().mean(['lead_time','variable']).reset_coords(drop=True)
+# final score score transfered to leaderboard
+print(scores_bench)
+```
+
+%% Output
+
+    <xarray.DataArray ()>
+    array(-0.00158017)
+
+%% Cell type:code id: tags:
+
+``` python
+```
--- a/notebooks/data_access/EWC_catalog.yml
+++ b/notebooks/data_access/EWC_catalog.yml
+plugins:
+  source:
+    - module: intake_xarray
+
+sources:
+  training-input:
+    description: climetlab name in AI/ML community naming for hindcasts as input to the ML-model in training period
+    driver: netcdf
+    parameters:
+      model:
+        description: name of the S2S model
+        type: str
+        default: ecmwf
+        allowed: [ecmwf, eccc, ncep]
+      param:
+        description: variable name
+        type: str
+        default: tp
+        allowed: [t2m, ci, gh, lsm, msl, q, rsn, sm100, sm20, sp, sst, st100, st20, t, tcc, tcw, ttr, tp, v, u]
+      date:
+        description: initialization weekly thursdays
+        type: datetime
+        default: 2020.01.02
+        min: 2020.01.02
+        max: 2020.12.31
+      version:
+        description: versioning of the data
+        type: str
+        default: 0.3.0
+      format:
+        description: data type
+        type: str
+        default: netcdf
+        allowed: [netcdf, grib]
+      ending:
+        description: data format compatible with format; netcdf -> nc, grib -> grib
+        type: str
+        default: nc
+        allowed: [nc, grib]
+    xarray_kwargs:
+        engine: h5netcdf
+    args: # add simplecache:: for caching: https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally
+      urlpath: https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/training-input/{{version}}/{{format}}/{{model}}-hindcast-{{param}}-{{date.strftime("%Y%m%d")}}.{{ending}}
+
+  test-input:
+    description: climetlab name in AI/ML community naming for 2020 forecasts as input to ML model in test period 2020
+    driver: netcdf
+    parameters:
+      model:
+        description: name of the S2S model
+        type: str
+        default: ecmwf
+        allowed: [ecmwf, eccc, ncep]
+      param:
+        description: variable name
+        type: str
+        default: tp
+        allowed: [t2m, ci, gh, lsm, msl, q, rsn, sm100, sm20, sp, sst, st100, st20, t, tcc, tcw, ttr, tp, v, u]
+      date:
+        description: initialization weekly thursdays
+        type: datetime
+        default: 2020.01.02
+        min: 2020.01.02
+        max: 2020.12.31
+      version:
+        description: versioning of the data
+        type: str
+        default: 0.3.0
+      format:
+        description: data type
+        type: str
+        default: netcdf
+        allowed: [netcdf, grib]
+      ending:
+        description: data format compatible with format; netcdf -> nc, grib -> grib
+        type: str
+        default: nc
+        allowed: [nc, grib]
+    xarray_kwargs:
+        engine: h5netcdf
+    args: # add simplecache:: for caching: https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally
+      urlpath: https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-input/{{version}}/{{format}}/{{model}}-forecast-{{param}}-{{date.strftime("%Y%m%d")}}.{{ending}}
+
+  training-output-reference:
+    description: climetlab name in AI/ML community naming for 2020 forecasts as output reference to compare to ML model output to in training period
+    driver: netcdf
+    parameters:
+      param:
+        description: variable name
+        type: str
+        default: tp
+        allowed: [t2m, tp]
+      date:
+        description: initialization weekly thursdays
+        type: datetime
+        default: 2020.01.02
+        min: 2020.01.02
+        max: 2020.12.31
+    xarray_kwargs:
+        engine: h5netcdf
+    args: # add simplecache:: for caching: https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally
+      urlpath: https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-output-reference/{{param}}-{{date.strftime("%Y%m%d")}}.nc
+            
+  test-output-reference:
+    description: climetlab name in AI/ML community naming for 2020 forecasts as output reference to compare to ML model output to in test period 2020
+    driver: netcdf
+    parameters:
+      param:
+        description: variable name
+        type: str
+        default: tp
+        allowed: [t2m, tp]
+      date:
+        description: initialization weekly thursdays
+        type: datetime
+        default: 2020.01.02
+        min: 2020.01.02
+        max: 2020.12.31
+    xarray_kwargs:
+        engine: h5netcdf
+    args: # add simplecache:: for caching: https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally
+      urlpath: https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-output-reference/{{param}}-{{date.strftime("%Y%m%d")}}.nc
--- a/notebooks/data_access/IRIDL.ipynb
+++ b/notebooks/data_access/IRIDL.ipynb
--- a/notebooks/data_access/README.md
+++ b/notebooks/data_access/README.md
+# Data Access
+
+- European Weather Cloud:
+    - [`climetlab-s2s-ai-challenge`](https://github.com/ecmwf-lab/climetlab-s2s-ai-challenge)
+    - `wget`: wget_curl.ipynb
+    - `curl`: wget_curl.ipynb
+    - `mouse`: wget_curl.ipynb
+    - `intake`: intake.ipynb
+- [IRI Data Library](iridl.ldeo.columbia.edu/): IRIDL.ipynb
+    - S2S: http://iridl.ldeo.columbia.edu/SOURCES/.ECMWF/.S2S/ (restricted access explained in IRIDL.ipynb)
+    - SubX: http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/
+    - NMME: http://iridl.ldeo.columbia.edu/SOURCES/.Models/.NMME/
+- s2sprediction.net
--- a/notebooks/data_access/S2S_catalog.yml
+++ b/notebooks/data_access/S2S_catalog.yml
+plugins:
+  source:
+    - module: intake_xarray
+
+sources:
+  training-input:
+    description: S2S hindcasts from IRIDL regridded to 1.5 deg grid and aggregated by mean over lead, https://iridl.ldeo.columbia.edu/SOURCES/.ECMWF/.S2S/overview.html
+    driver: opendap
+    parameters:
+      center:
+        description: name of the center issuing the hindcast
+        type: str
+        default: ECMF
+        allowed: [BOM, CNRM, ECCC, ECMF, HMCR, ISAC, JMA, KMA, NCEP, UKMO]
+      grid:
+        description: regrid to this global resolution
+        type: float
+        default: 1.5
+      lead_name:
+        description: name of the lead_time dimension
+        type: str
+        default: LA
+        allowed: [LA, L]
+      lead_start:
+        description: aggregation start lead passed to RANGEEDGES
+        type: int
+        default: 14
+      lead_end:
+        description: aggregation end lead passed to RANGEEDGES
+        type: int
+        default: 27
+      experiment_type:
+        description: type of experiment
+        type: str
+        default: perturbed
+        allowed: [control, perturbed, RMMS]
+      group:
+        description: group of variables
+        type: str
+        default: 2m_above_ground
+        #allowed: [2m_above_ground, ...] see https://iridl.ldeo.columbia.edu/SOURCES/.ECMWF/.S2S/.ECMF/.reforecast/.perturbed/
+      param:
+        description: variable name
+        type: str
+        default: 2t
+        #allowed: [2t] see https://iridl.ldeo.columbia.edu/SOURCES/.ECMWF/.S2S/.ECMF/.reforecast/.perturbed/
+    xarray_kwargs:
+        engine: netcdf4
+    args:
+      urlpath: http://iridl.ldeo.columbia.edu/SOURCES/.ECMWF/.S2S/.{{center}}/.reforecast/.{{experiment_type}}/.{{group}}/{{param}}/{{lead_name}}/({{lead_start}})/({{lead_end}})/RANGEEDGES/[{{lead_name}}]average/X/0/{{grid}}/358.5/GRID/Y/90/{{grid}}/-90/GRID/dods
+
+
+  test-input:
+    description: S2S forecasts from IRIDL regridded to 1.5 deg grid and aggregated by mean over lead, https://iridl.ldeo.columbia.edu/SOURCES/.ECMWF/.S2S/overview.html
+    driver: opendap
+    parameters:
+      center:
+        description: name of the center issuing the hindcast
+        type: str
+        default: ECMF
+        allowed: ['BOM','CNRM','ECCC','ECMF','HMCR','ISAC','JMA','KMA','NCEP','UKMO']
+      grid:
+        description: regrid to this global resolution
+        type: float
+        default: 1.5
+      lead_name:
+        description: name of the lead_time dimension
+        type: str
+        default: LA
+        allowed: [LA, L, L1]
+      lead_start:
+        description: aggregation start lead passed to RANGEEDGES
+        type: int
+        default: 14
+      lead_end:
+        description: aggregation end lead passed to RANGEEDGES
+        type: int
+        default: 27
+      experiment_type:
+        description: type of experiment
+        type: str
+        default: perturbed
+        allowed: [control, perturbed, RMMS]
+      group:
+        description: group of variables
+        type: str
+        default: 2m_above_ground
+        #allowed: see https://iridl.ldeo.columbia.edu/SOURCES/.ECMWF/.S2S/.ECMF/.reforecast/.perturbed/
+      param:
+        description: variable name
+        type: str
+        default: 2t
+        #allowed: [2t] see https://iridl.ldeo.columbia.edu/SOURCES/.ECMWF/.S2S/.ECMF/.reforecast/.perturbed/
+    xarray_kwargs:
+        engine: netcdf4
+    args:
+        urlpath: http://iridl.ldeo.columbia.edu/SOURCES/.ECMWF/.S2S/.{{center}}/.forecast/.{{experiment_type}}/.{{group}}/{{param}}/S/(0000%201%20Jan%202020)/(0000%2031%20Dec%202020)/RANGEEDGES/{{lead_name}}/({{lead_start}})/({{lead_end}})/RANGEEDGES/[{{lead_name}}]average/X/0/{{grid}}/358.5/GRID/Y/90/{{grid}}/-90/GRID/dods 
--- a/notebooks/data_access/SubX_catalog.yml
+++ b/notebooks/data_access/SubX_catalog.yml
+plugins:
+  source:
+    - module: intake_xarray
+
+sources:
+  training-input:
+    description: SubX hindcasts from IRIDL regridded to 1.5 deg grid and aggregated by mean over lead, http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/outline.html
+    driver: opendap
+    parameters:
+      center:
+        description: name of the center issuing the hindcast
+        type: str
+        default: EMC
+        allowed: [CESM, ECCC, EMC, ESRL, GMAO, NCEP, NRL, RSMAS]
+      model:
+        description: name of the model
+        type: str
+        default: GEFS
+        allowed: [30LCESM1, 46LCESM1, GEM, GEPS6, GEPS5, GEFS, GEFSv12, FIMr1p1, GEOS_V2p1, CFSv2, NESM, CCSM4]
+      grid:
+        description: regrid to this global resolution
+        type: float
+        default: 1.5
+      lead_start:
+        description: aggregation start lead passed to RANGEEDGES
+        type: int
+        default: 14
+      lead_end:
+        description: aggregation end lead passed to RANGEEDGES
+        type: int
+        default: 27        
+      param:
+        description: variable name
+        type: str
+        default: pr
+        #allowed: [pr]
+    xarray_kwargs:
+        engine: netcdf4
+    args:
+      urlpath: http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/.{{center}}/.{{model}}/.hindcast/.{{param}}/L/({{lead_start}})/({{lead_end}})/RANGEEDGES/[L]average/X/0/{{grid}}/358.5/GRID/Y/90/{{grid}}/-90/GRID/dods
+
+  test-input:
+    description: SubX forecasts from IRIDL regridded to 1.5 deg grid and aggregated by mean over lead, http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/outline.html
+    driver: opendap
+    parameters:
+      center:
+        description: name of the center issuing the forecast
+        type: str
+        default: EMC
+        allowed: [CESM, ECCC, EMC, ESRL, GMAO, NCEP, NRL, RSMAS]
+      model:
+        description: name of the model
+        type: str
+        default: GEFS
+        allowed: [30LCESM1, 46LCESM1, GEM, GEPS6, GEPS5, GEFS, GEFSv12, FIMr1p1, GEOS_V2p1, CFSv2, NESM, CCSM4]
+      grid:
+        description: regrid to this global resolution
+        type: float
+        default: 1.5
+      lead_start:
+        description: aggregation start lead passed to RANGEEDGES
+        type: int
+        default: 14
+      lead_end:
+        description: aggregation end lead passed to RANGEEDGES
+        type: int
+        default: 27        
+      param:
+        description: variable name, see http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/outline.html
+        type: str
+        default: pr
+        #allowed: [pr] 
+    xarray_kwargs:
+        engine: netcdf4
+    args:
+        urlpath: http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/.{{center}}/.{{model}}/.forecast/.{{param}}/S/(0000%201%20Jan%202020)/(0000%2031%20Dec%202020)/RANGEEDGES/L/({{lead_start}})/({{lead_end}})/RANGEEDGES/[L]average/X/0/{{grid}}/358.5/GRID/Y/90/{{grid}}/-90/GRID/dods
--- a/notebooks/data_access/intake.ipynb
+++ b/notebooks/data_access/intake.ipynb
+%% Cell type:markdown id: tags:
+
+# Data Access from EWC via `intake`
+
+Data easily available via `climetlab`: https://github.com/ecmwf-lab/climetlab-s2s-ai-challenge
+Data holdings listed: https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-input/0.3.0/netcdf/index.html
+
+Therefore, S3 data also accessible with `intake-xarray` and cachable with `fsspec`.
+
+%% Cell type:code id: tags:
+
+``` python
+import intake
+import fsspec
+import xarray as xr
+import os, glob
+import pandas as pd
+xr.set_options(display_style='text')
+```
+
+%% Output
+
+    /opt/conda/lib/python3.8/site-packages/xarray/backends/cfgrib_.py:27: UserWarning: Failed to load cfgrib - most likely there is a problem accessing the ecCodes library. Try `import cfgrib` to get the full error message
+      warnings.warn(
+
+    <xarray.core.options.set_options at 0x7fa0100dcdc0>
+
+%% Cell type:code id: tags:
+
+``` python
+# prevent aihttp timeout errors
+
+from aiohttp import ClientSession, ClientTimeout
+timeout = ClientTimeout(total=600)
+fsspec.config.conf['https'] = dict(client_kwargs={'timeout': timeout})
+```
+
+%% Cell type:markdown id: tags:
+
+# intake
+
+https://github.com/intake/intake-xarray can read and cache `grib` and `netcdf` from catalogs.
+
+Caching via `fsspec`: https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally
+
+%% Cell type:code id: tags:
+
+``` python
+import intake_xarray
+cache_path = '/work/s2s-ai-challenge-template/data/cache'
+fsspec.config.conf['simplecache'] = {'cache_storage': cache_path, 'same_names':True}
+```
+
+%% Cell type:code id: tags:
+
+``` python
+%%writefile EWC_catalog.yml
+plugins:
+  source:
+    - module: intake_xarray
+
+sources:
+  training-input:
+    description: climetlab name in AI/ML community naming for hindcasts as input to the ML-model in training period
+    driver: netcdf
+    parameters:
+      model:
+        description: name of the S2S model
+        type: str
+        default: ecmwf
+        allowed: [ecmwf, eccc, ncep]
+      param:
+        description: variable name
+        type: str
+        default: tp
+        allowed: [t2m, ci, gh, lsm, msl, q, rsn, sm100, sm20, sp, sst, st100, st20, t, tcc, tcw, ttr, tp, v, u]
+      date:
+        description: initialization weekly thursdays
+        type: datetime
+        default: 2020.01.02
+        min: 2020.01.02
+        max: 2020.12.31
+      version:
+        description: versioning of the data
+        type: str
+        default: 0.3.0
+      format:
+        description: data type
+        type: str
+        default: netcdf
+        allowed: [netcdf, grib]
+      ending:
+        description: data format compatible with format; netcdf -> nc, grib -> grib
+        type: str
+        default: nc
+        allowed: [nc, grib]
+    xarray_kwargs:
+        engine: h5netcdf
+    args: # add simplecache:: for caching: https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally
+      urlpath: https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/training-input/{{version}}/{{format}}/{{model}}-hindcast-{{param}}-{{date.strftime("%Y%m%d")}}.{{ending}}
+
+  test-input:
+    description: climetlab name in AI/ML community naming for 2020 forecasts as input to ML model in test period 2020
+    driver: netcdf
+    parameters:
+      model:
+        description: name of the S2S model
+        type: str
+        default: ecmwf
+        allowed: [ecmwf, eccc, ncep]
+      param:
+        description: variable name
+        type: str
+        default: tp
+        allowed: [t2m, ci, gh, lsm, msl, q, rsn, sm100, sm20, sp, sst, st100, st20, t, tcc, tcw, ttr, tp, v, u]
+      date:
+        description: initialization weekly thursdays
+        type: datetime
+        default: 2020.01.02
+        min: 2020.01.02
+        max: 2020.12.31
+      version:
+        description: versioning of the data
+        type: str
+        default: 0.3.0
+      format:
+        description: data type
+        type: str
+        default: netcdf
+        allowed: [netcdf, grib]
+      ending:
+        description: data format compatible with format; netcdf -> nc, grib -> grib
+        type: str
+        default: nc
+        allowed: [nc, grib]
+    xarray_kwargs:
+        engine: h5netcdf
+    args: # add simplecache:: for caching: https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally
+      urlpath: https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-input/{{version}}/{{format}}/{{model}}-forecast-{{param}}-{{date.strftime("%Y%m%d")}}.{{ending}}
+
+  training-output-reference:
+    description: climetlab name in AI/ML community naming for 2020 forecasts as output reference to compare to ML model output to in training period
+    driver: netcdf
+    parameters:
+      param:
+        description: variable name
+        type: str
+        default: tp
+        allowed: [t2m, ci, gh, lsm, msl, q, rsn, sm100, sm20, sp, sst, st100, st20, t, tcc, tcw, ttr, tp, v, u]
+      date:
+        description: initialization weekly thursdays
+        type: datetime
+        default: 2020.01.02
+        min: 2020.01.02
+        max: 2020.12.31
+    xarray_kwargs:
+        engine: h5netcdf
+    args: # add simplecache:: for caching: https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally
+      urlpath: https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-output-reference/{{param}}-{{date.strftime("%Y%m%d")}}.nc
+
+  test-output-reference:
+    description: climetlab name in AI/ML community naming for 2020 forecasts as output reference to compare to ML model output to in test period 2020
+    driver: netcdf
+    parameters:
+      param:
+        description: variable name
+        type: str
+        default: tp
+        allowed: [t2m, ci, gh, lsm, msl, q, rsn, sm100, sm20, sp, sst, st100, st20, t, tcc, tcw, ttr, tp, v, u]
+      date:
+        description: initialization weekly thursdays
+        type: datetime
+        default: 2020.01.02
+        min: 2020.01.02
+        max: 2020.12.31
+    xarray_kwargs:
+        engine: h5netcdf
+    args: # add simplecache:: for caching: https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally
+      urlpath: https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-output-reference/{{param}}-{{date.strftime("%Y%m%d")}}.nc
+```
+
+%% Output
+
+    Writing EWC_catalog.yml
+
+%% Cell type:code id: tags:
+
+``` python
+cat = intake.open_catalog('EWC_catalog.yml')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# dates for 2020 forecasts and their on-the-fly reforecasts
+dates=pd.date_range(start='2020-01-02',freq='7D',end='2020-12-31')
+dates
+```
+
+%% Output
+
+    DatetimeIndex(['2020-01-02', '2020-01-09', '2020-01-16', '2020-01-23',
+                   '2020-01-30', '2020-02-06', '2020-02-13', '2020-02-20',
+                   '2020-02-27', '2020-03-05', '2020-03-12', '2020-03-19',
+                   '2020-03-26', '2020-04-02', '2020-04-09', '2020-04-16',
+                   '2020-04-23', '2020-04-30', '2020-05-07', '2020-05-14',
+                   '2020-05-21', '2020-05-28', '2020-06-04', '2020-06-11',
+                   '2020-06-18', '2020-06-25', '2020-07-02', '2020-07-09',
+                   '2020-07-16', '2020-07-23', '2020-07-30', '2020-08-06',
+                   '2020-08-13', '2020-08-20', '2020-08-27', '2020-09-03',
+                   '2020-09-10', '2020-09-17', '2020-09-24', '2020-10-01',
+                   '2020-10-08', '2020-10-15', '2020-10-22', '2020-10-29',
+                   '2020-11-05', '2020-11-12', '2020-11-19', '2020-11-26',
+                   '2020-12-03', '2020-12-10', '2020-12-17', '2020-12-24',
+                   '2020-12-31'],
+                  dtype='datetime64[ns]', freq='7D')
+
+%% Cell type:markdown id: tags:
+
+# `hindcast-input`
+
+on-the-fly hindcasts corresponding to the 2020 forecasts
+
+%% Cell type:code id: tags:
+
+``` python
+cat['training-input'](date=dates[10], param='tp', model='eccc').to_dask()
+```
+
+%% Output
+
+    /opt/conda/lib/python3.8/site-packages/xarray/backends/plugins.py:61: RuntimeWarning: Engine 'cfgrib' loading failed:
+    /opt/conda/lib/python3.8/site-packages/gribapi/_bindings.cpython-38-x86_64-linux-gnu.so: undefined symbol: codes_bufr_key_is_header
+      warnings.warn(f"Engine {name!r} loading failed:\n{ex}", RuntimeWarning)
+
+    <xarray.Dataset>
+    Dimensions:        (forecast_time: 20, latitude: 121, lead_time: 32, longitude: 240, realization: 4)
+    Coordinates:
+      * realization    (realization) int64 0 1 2 3
+      * forecast_time  (forecast_time) datetime64[ns] 1998-03-12 ... 2017-03-12
+      * lead_time      (lead_time) timedelta64[ns] 1 days 2 days ... 31 days 32 days
+      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
+        valid_time     (forecast_time, lead_time) datetime64[ns] ...
+    Data variables:
+        tp             (realization, forecast_time, lead_time, latitude, longitude) float32 ...
+    Attributes:
+        GRIB_edition:            [2]
+        GRIB_centre:             cwao
+        GRIB_centreDescription:  Canadian Meteorological Service - Montreal
+        GRIB_subCentre:          [0]
+        Conventions:             CF-1.7
+        institution:             Canadian Meteorological Service - Montreal
+        history:                 2021-05-11T10:03 GRIB to CDM+CF via cfgrib-0.9.9...
+
+%% Cell type:markdown id: tags:
+
+# `forecast-input`
+
+2020
+
+%% Cell type:code id: tags:
+
+``` python
+cat['test-input'](date=dates[10], param='t2m', model='ecmwf').to_dask()
+```
+
+%% Output
+
+    <xarray.Dataset>
+    Dimensions:        (forecast_time: 1, latitude: 121, lead_time: 46, longitude: 240, realization: 51)
+    Coordinates:
+      * realization    (realization) int64 0 1 2 3 4 5 6 7 ... 44 45 46 47 48 49 50
+      * forecast_time  (forecast_time) datetime64[ns] 2020-03-12
+      * lead_time      (lead_time) timedelta64[ns] 1 days 2 days ... 45 days 46 days
+      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
+        valid_time     (forecast_time, lead_time) datetime64[ns] ...
+    Data variables:
+        t2m            (realization, forecast_time, lead_time, latitude, longitude) float32 ...
+    Attributes:
+        GRIB_edition:            [2]
+        GRIB_centre:             ecmf
+        GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
+        GRIB_subCentre:          [0]
+        Conventions:             CF-1.7
+        institution:             European Centre for Medium-Range Weather Forecasts
+        history:                 2021-05-10T16:14:36 GRIB to CDM+CF via cfgrib-0....
+
+%% Cell type:markdown id: tags:
+
+# `hindcast-like-observations`
+
+observations matching hindcasts
+
+%% Cell type:code id: tags:
+
+``` python
+cat['training-output-reference'](date=dates[10], param='t2m').to_dask()
+```
+
+%% Output
+
+    <xarray.Dataset>
+    Dimensions:        (forecast_time: 1, latitude: 121, lead_time: 47, longitude: 240)
+    Coordinates:
+        valid_time     (lead_time, forecast_time) datetime64[ns] ...
+      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
+      * forecast_time  (forecast_time) datetime64[ns] 2020-03-12
+      * lead_time      (lead_time) timedelta64[ns] 0 days 1 days ... 45 days 46 days
+    Data variables:
+        t2m            (lead_time, forecast_time, latitude, longitude) float32 ...
+    Attributes:
+        source_dataset_name:  temperature daily from NOAA NCEP CPC: Climate Predi...
+        source_hosting:       IRIDL
+        source_url:           http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/...
+        created_by_software:  climetlab-s2s-ai-challenge
+        created_by_script:    tools/observations/makefile
+
+%% Cell type:markdown id: tags:
+
+# `forecast-like-observations`
+
+observations matching 2020 forecasts
+
+%% Cell type:code id: tags:
+
+``` python
+cat['test-output-reference'](date=dates[10], param='t2m').to_dask()
+```
+
+%% Output
+
+    <xarray.Dataset>
+    Dimensions:        (forecast_time: 1, latitude: 121, lead_time: 47, longitude: 240)
+    Coordinates:
+        valid_time     (lead_time, forecast_time) datetime64[ns] ...
+      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
+      * forecast_time  (forecast_time) datetime64[ns] 2020-03-12
+      * lead_time      (lead_time) timedelta64[ns] 0 days 1 days ... 45 days 46 days
+    Data variables:
+        t2m            (lead_time, forecast_time, latitude, longitude) float32 ...
+    Attributes:
+        source_dataset_name:  temperature daily from NOAA NCEP CPC: Climate Predi...
+        source_hosting:       IRIDL
+        source_url:           http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/...
+        created_by_software:  climetlab-s2s-ai-challenge
+        created_by_script:    tools/observations/makefile
+
+%% Cell type:code id: tags:
+
+``` python
+```
+%% Cell type:markdown id: tags:
+
+# Data Access from EWC via `intake`
+
+Data easily available via `climetlab`: https://github.com/ecmwf-lab/climetlab-s2s-ai-challenge
+Data holdings listed: https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-input/0.3.0/netcdf/index.html
+
+Therefore, S3 data also accessible with `intake-xarray` and cachable with `fsspec`.
+
+%% Cell type:code id: tags:
+
+``` python
+import intake
+import fsspec
+import xarray as xr
+import os, glob
+import pandas as pd
+xr.set_options(display_style='text')
+```
+
+%% Output
+
+    /opt/conda/lib/python3.8/site-packages/xarray/backends/cfgrib_.py:27: UserWarning: Failed to load cfgrib - most likely there is a problem accessing the ecCodes library. Try `import cfgrib` to get the full error message
+      warnings.warn(
+
+    <xarray.core.options.set_options at 0x7fa0100dcdc0>
+
+%% Cell type:code id: tags:
+
+``` python
+# prevent aihttp timeout errors
+
+from aiohttp import ClientSession, ClientTimeout
+timeout = ClientTimeout(total=600)
+fsspec.config.conf['https'] = dict(client_kwargs={'timeout': timeout})
+```
+
+%% Cell type:markdown id: tags:
+
+# intake
+
+https://github.com/intake/intake-xarray can read and cache `grib` and `netcdf` from catalogs.
+
+Caching via `fsspec`: https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally
+
+%% Cell type:code id: tags:
+
+``` python
+import intake_xarray
+cache_path = '/work/s2s-ai-challenge-template/data/cache'
+fsspec.config.conf['simplecache'] = {'cache_storage': cache_path, 'same_names':True}
+```
+
+%% Cell type:code id: tags:
+
+``` python
+%%writefile EWC_catalog.yml
+plugins:
+  source:
+    - module: intake_xarray
+
+sources:
+  training-input:
+    description: climetlab name in AI/ML community naming for hindcasts as input to the ML-model in training period
+    driver: netcdf
+    parameters:
+      model:
+        description: name of the S2S model
+        type: str
+        default: ecmwf
+        allowed: [ecmwf, eccc, ncep]
+      param:
+        description: variable name
+        type: str
+        default: tp
+        allowed: [t2m, ci, gh, lsm, msl, q, rsn, sm100, sm20, sp, sst, st100, st20, t, tcc, tcw, ttr, tp, v, u]
+      date:
+        description: initialization weekly thursdays
+        type: datetime
+        default: 2020.01.02
+        min: 2020.01.02
+        max: 2020.12.31
+      version:
+        description: versioning of the data
+        type: str
+        default: 0.3.0
+      format:
+        description: data type
+        type: str
+        default: netcdf
+        allowed: [netcdf, grib]
+      ending:
+        description: data format compatible with format; netcdf -> nc, grib -> grib
+        type: str
+        default: nc
+        allowed: [nc, grib]
+    xarray_kwargs:
+        engine: h5netcdf
+    args: # add simplecache:: for caching: https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally
+      urlpath: https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/training-input/{{version}}/{{format}}/{{model}}-hindcast-{{param}}-{{date.strftime("%Y%m%d")}}.{{ending}}
+
+  test-input:
+    description: climetlab name in AI/ML community naming for 2020 forecasts as input to ML model in test period 2020
+    driver: netcdf
+    parameters:
+      model:
+        description: name of the S2S model
+        type: str
+        default: ecmwf
+        allowed: [ecmwf, eccc, ncep]
+      param:
+        description: variable name
+        type: str
+        default: tp
+        allowed: [t2m, ci, gh, lsm, msl, q, rsn, sm100, sm20, sp, sst, st100, st20, t, tcc, tcw, ttr, tp, v, u]
+      date:
+        description: initialization weekly thursdays
+        type: datetime
+        default: 2020.01.02
+        min: 2020.01.02
+        max: 2020.12.31
+      version:
+        description: versioning of the data
+        type: str
+        default: 0.3.0
+      format:
+        description: data type
+        type: str
+        default: netcdf
+        allowed: [netcdf, grib]
+      ending:
+        description: data format compatible with format; netcdf -> nc, grib -> grib
+        type: str
+        default: nc
+        allowed: [nc, grib]
+    xarray_kwargs:
+        engine: h5netcdf
+    args: # add simplecache:: for caching: https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally
+      urlpath: https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-input/{{version}}/{{format}}/{{model}}-forecast-{{param}}-{{date.strftime("%Y%m%d")}}.{{ending}}
+
+  training-output-reference:
+    description: climetlab name in AI/ML community naming for 2020 forecasts as output reference to compare to ML model output to in training period
+    driver: netcdf
+    parameters:
+      param:
+        description: variable name
+        type: str
+        default: tp
+        allowed: [t2m, ci, gh, lsm, msl, q, rsn, sm100, sm20, sp, sst, st100, st20, t, tcc, tcw, ttr, tp, v, u]
+      date:
+        description: initialization weekly thursdays
+        type: datetime
+        default: 2020.01.02
+        min: 2020.01.02
+        max: 2020.12.31
+    xarray_kwargs:
+        engine: h5netcdf
+    args: # add simplecache:: for caching: https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally
+      urlpath: https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-output-reference/{{param}}-{{date.strftime("%Y%m%d")}}.nc
+
+  test-output-reference:
+    description: climetlab name in AI/ML community naming for 2020 forecasts as output reference to compare to ML model output to in test period 2020
+    driver: netcdf
+    parameters:
+      param:
+        description: variable name
+        type: str
+        default: tp
+        allowed: [t2m, ci, gh, lsm, msl, q, rsn, sm100, sm20, sp, sst, st100, st20, t, tcc, tcw, ttr, tp, v, u]
+      date:
+        description: initialization weekly thursdays
+        type: datetime
+        default: 2020.01.02
+        min: 2020.01.02
+        max: 2020.12.31
+    xarray_kwargs:
+        engine: h5netcdf
+    args: # add simplecache:: for caching: https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally
+      urlpath: https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-output-reference/{{param}}-{{date.strftime("%Y%m%d")}}.nc
+```
+
+%% Output
+
+    Writing EWC_catalog.yml
+
+%% Cell type:code id: tags:
+
+``` python
+cat = intake.open_catalog('EWC_catalog.yml')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# dates for 2020 forecasts and their on-the-fly reforecasts
+dates=pd.date_range(start='2020-01-02',freq='7D',end='2020-12-31')
+dates
+```
+
+%% Output
+
+    DatetimeIndex(['2020-01-02', '2020-01-09', '2020-01-16', '2020-01-23',
+                   '2020-01-30', '2020-02-06', '2020-02-13', '2020-02-20',
+                   '2020-02-27', '2020-03-05', '2020-03-12', '2020-03-19',
+                   '2020-03-26', '2020-04-02', '2020-04-09', '2020-04-16',
+                   '2020-04-23', '2020-04-30', '2020-05-07', '2020-05-14',
+                   '2020-05-21', '2020-05-28', '2020-06-04', '2020-06-11',
+                   '2020-06-18', '2020-06-25', '2020-07-02', '2020-07-09',
+                   '2020-07-16', '2020-07-23', '2020-07-30', '2020-08-06',
+                   '2020-08-13', '2020-08-20', '2020-08-27', '2020-09-03',
+                   '2020-09-10', '2020-09-17', '2020-09-24', '2020-10-01',
+                   '2020-10-08', '2020-10-15', '2020-10-22', '2020-10-29',
+                   '2020-11-05', '2020-11-12', '2020-11-19', '2020-11-26',
+                   '2020-12-03', '2020-12-10', '2020-12-17', '2020-12-24',
+                   '2020-12-31'],
+                  dtype='datetime64[ns]', freq='7D')
+
+%% Cell type:markdown id: tags:
+
+# `hindcast-input`
+
+on-the-fly hindcasts corresponding to the 2020 forecasts
+
+%% Cell type:code id: tags:
+
+``` python
+cat['training-input'](date=dates[10], param='tp', model='eccc').to_dask()
+```
+
+%% Output
+
+    /opt/conda/lib/python3.8/site-packages/xarray/backends/plugins.py:61: RuntimeWarning: Engine 'cfgrib' loading failed:
+    /opt/conda/lib/python3.8/site-packages/gribapi/_bindings.cpython-38-x86_64-linux-gnu.so: undefined symbol: codes_bufr_key_is_header
+      warnings.warn(f"Engine {name!r} loading failed:\n{ex}", RuntimeWarning)
+
+    <xarray.Dataset>
+    Dimensions:        (forecast_time: 20, latitude: 121, lead_time: 32, longitude: 240, realization: 4)
+    Coordinates:
+      * realization    (realization) int64 0 1 2 3
+      * forecast_time  (forecast_time) datetime64[ns] 1998-03-12 ... 2017-03-12
+      * lead_time      (lead_time) timedelta64[ns] 1 days 2 days ... 31 days 32 days
+      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
+        valid_time     (forecast_time, lead_time) datetime64[ns] ...
+    Data variables:
+        tp             (realization, forecast_time, lead_time, latitude, longitude) float32 ...
+    Attributes:
+        GRIB_edition:            [2]
+        GRIB_centre:             cwao
+        GRIB_centreDescription:  Canadian Meteorological Service - Montreal
+        GRIB_subCentre:          [0]
+        Conventions:             CF-1.7
+        institution:             Canadian Meteorological Service - Montreal
+        history:                 2021-05-11T10:03 GRIB to CDM+CF via cfgrib-0.9.9...
+
+%% Cell type:markdown id: tags:
+
+# `forecast-input`
+
+2020
+
+%% Cell type:code id: tags:
+
+``` python
+cat['test-input'](date=dates[10], param='t2m', model='ecmwf').to_dask()
+```
+
+%% Output
+
+    <xarray.Dataset>
+    Dimensions:        (forecast_time: 1, latitude: 121, lead_time: 46, longitude: 240, realization: 51)
+    Coordinates:
+      * realization    (realization) int64 0 1 2 3 4 5 6 7 ... 44 45 46 47 48 49 50
+      * forecast_time  (forecast_time) datetime64[ns] 2020-03-12
+      * lead_time      (lead_time) timedelta64[ns] 1 days 2 days ... 45 days 46 days
+      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
+        valid_time     (forecast_time, lead_time) datetime64[ns] ...
+    Data variables:
+        t2m            (realization, forecast_time, lead_time, latitude, longitude) float32 ...
+    Attributes:
+        GRIB_edition:            [2]
+        GRIB_centre:             ecmf
+        GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
+        GRIB_subCentre:          [0]
+        Conventions:             CF-1.7
+        institution:             European Centre for Medium-Range Weather Forecasts
+        history:                 2021-05-10T16:14:36 GRIB to CDM+CF via cfgrib-0....
+
+%% Cell type:markdown id: tags:
+
+# `hindcast-like-observations`
+
+observations matching hindcasts
+
+%% Cell type:code id: tags:
+
+``` python
+cat['training-output-reference'](date=dates[10], param='t2m').to_dask()
+```
+
+%% Output
+
+    <xarray.Dataset>
+    Dimensions:        (forecast_time: 1, latitude: 121, lead_time: 47, longitude: 240)
+    Coordinates:
+        valid_time     (lead_time, forecast_time) datetime64[ns] ...
+      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
+      * forecast_time  (forecast_time) datetime64[ns] 2020-03-12
+      * lead_time      (lead_time) timedelta64[ns] 0 days 1 days ... 45 days 46 days
+    Data variables:
+        t2m            (lead_time, forecast_time, latitude, longitude) float32 ...
+    Attributes:
+        source_dataset_name:  temperature daily from NOAA NCEP CPC: Climate Predi...
+        source_hosting:       IRIDL
+        source_url:           http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/...
+        created_by_software:  climetlab-s2s-ai-challenge
+        created_by_script:    tools/observations/makefile
+
+%% Cell type:markdown id: tags:
+
+# `forecast-like-observations`
+
+observations matching 2020 forecasts
+
+%% Cell type:code id: tags:
+
+``` python
+cat['test-output-reference'](date=dates[10], param='t2m').to_dask()
+```
+
+%% Output
+
+    <xarray.Dataset>
+    Dimensions:        (forecast_time: 1, latitude: 121, lead_time: 47, longitude: 240)
+    Coordinates:
+        valid_time     (lead_time, forecast_time) datetime64[ns] ...
+      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
+      * forecast_time  (forecast_time) datetime64[ns] 2020-03-12
+      * lead_time      (lead_time) timedelta64[ns] 0 days 1 days ... 45 days 46 days
+    Data variables:
+        t2m            (lead_time, forecast_time, latitude, longitude) float32 ...
+    Attributes:
+        source_dataset_name:  temperature daily from NOAA NCEP CPC: Climate Predi...
+        source_hosting:       IRIDL
+        source_url:           http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/...
+        created_by_software:  climetlab-s2s-ai-challenge
+        created_by_script:    tools/observations/makefile
+
+%% Cell type:code id: tags:
+
+``` python
+```
--- a/notebooks/data_access/wget_curl.ipynb
+++ b/notebooks/data_access/wget_curl.ipynb
+%% Cell type:markdown id: tags:
+
+# Data Access via `curl` or `wget`
+
+Data easily available via `climetlab`: https://github.com/ecmwf-lab/climetlab-s2s-ai-challenge
+
+Data holdings listed:
+
+- https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-input/0.3.0/netcdf/index.html
+- https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/training-input/0.3.0/netcdf/index.html
+- https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-output-reference/index.html
+- https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/training-output-reference/index.html
+
+Therefore, S3 data also accessible with `curl` or `wget`. Alternatively, you can click on the html links and download files by mouse click.
+
+%% Cell type:code id: tags:
+
+``` python
+import xarray as xr
+import os
+from subprocess import call
+xr.set_options(display_style='text')
+```
+
+%% Output
+
+    /opt/conda/lib/python3.8/site-packages/xarray/backends/cfgrib_.py:27: UserWarning: Failed to load cfgrib - most likely there is a problem accessing the ecCodes library. Try `import cfgrib` to get the full error message
+      warnings.warn(
+
+    <xarray.core.options.set_options at 0x7f5170570520>
+
+%% Cell type:code id: tags:
+
+``` python
+# version of the EWC data
+version = '0.3.0'
+```
+
+%% Cell type:markdown id: tags:
+
+# `hindcast-input`
+
+on-the-fly hindcasts corresponding to the 2020 forecasts
+
+%% Cell type:code id: tags:
+
+``` python
+parameter = 't2m'
+date = '20200102'
+model = 'ecmwf'
+```
+
+%% Cell type:code id: tags:
+
+``` python
+url = f'https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/training-input/{version}/netcdf/{model}-hindcast-{parameter}-{date}.nc'
+os.system(f'wget {url}')
+
+assert os.path.exists(f'{model}-hindcast-{parameter}-{date}.nc')
+```
+
+%% Cell type:markdown id: tags:
+
+# `forecast-input`
+
+2020
+
+%% Cell type:code id: tags:
+
+``` python
+url = f'https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-input/{version}/netcdf/{model}-forecast-{parameter}-{date}.nc'
+os.system(f'wget {url}')
+
+assert os.path.exists(f'{model}-forecast-{parameter}-{date}.nc')
+```
+
+%% Cell type:markdown id: tags:
+
+# `hindcast-like-observations`
+
+CPC observations formatted like training period hindcasts
+
+%% Cell type:code id: tags:
+
+``` python
+url = f'https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/training-output-reference/{parameter}-{date}.nc'
+os.system(f'wget {url}')
+
+assert os.path.exists(f'{parameter}-{date}.nc')
+```
+
+%% Cell type:markdown id: tags:
+
+# `forecast-like-observations`
+
+CPC observations formatted like test period 2020 forecasts
+
+%% Cell type:code id: tags:
+
+``` python
+url = f'https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-output-reference/{parameter}-{date}.nc'
+os.system(f'wget {url}')
+
+assert os.path.exists(f'{parameter}-{date}.nc')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+%% Cell type:markdown id: tags:
+
+# Data Access via `curl` or `wget`
+
+Data easily available via `climetlab`: https://github.com/ecmwf-lab/climetlab-s2s-ai-challenge
+
+Data holdings listed:
+
+- https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-input/0.3.0/netcdf/index.html
+- https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/training-input/0.3.0/netcdf/index.html
+- https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-output-reference/index.html
+- https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/training-output-reference/index.html
+
+Therefore, S3 data also accessible with `curl` or `wget`. Alternatively, you can click on the html links and download files by mouse click.
+
+%% Cell type:code id: tags:
+
+``` python
+import xarray as xr
+import os
+from subprocess import call
+xr.set_options(display_style='text')
+```
+
+%% Output
+
+    /opt/conda/lib/python3.8/site-packages/xarray/backends/cfgrib_.py:27: UserWarning: Failed to load cfgrib - most likely there is a problem accessing the ecCodes library. Try `import cfgrib` to get the full error message
+      warnings.warn(
+
+    <xarray.core.options.set_options at 0x7f5170570520>
+
+%% Cell type:code id: tags:
+
+``` python
+# version of the EWC data
+version = '0.3.0'
+```
+
+%% Cell type:markdown id: tags:
+
+# `hindcast-input`
+
+on-the-fly hindcasts corresponding to the 2020 forecasts
+
+%% Cell type:code id: tags:
+
+``` python
+parameter = 't2m'
+date = '20200102'
+model = 'ecmwf'
+```
+
+%% Cell type:code id: tags:
+
+``` python
+url = f'https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/training-input/{version}/netcdf/{model}-hindcast-{parameter}-{date}.nc'
+os.system(f'wget {url}')
+
+assert os.path.exists(f'{model}-hindcast-{parameter}-{date}.nc')
+```
+
+%% Cell type:markdown id: tags:
+
+# `forecast-input`
+
+2020
+
+%% Cell type:code id: tags:
+
+``` python
+url = f'https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-input/{version}/netcdf/{model}-forecast-{parameter}-{date}.nc'
+os.system(f'wget {url}')
+
+assert os.path.exists(f'{model}-forecast-{parameter}-{date}.nc')
+```
+
+%% Cell type:markdown id: tags:
+
+# `hindcast-like-observations`
+
+CPC observations formatted like training period hindcasts
+
+%% Cell type:code id: tags:
+
+``` python
+url = f'https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/training-output-reference/{parameter}-{date}.nc'
+os.system(f'wget {url}')
+
+assert os.path.exists(f'{parameter}-{date}.nc')
+```
+
+%% Cell type:markdown id: tags:
+
+# `forecast-like-observations`
+
+CPC observations formatted like test period 2020 forecasts
+
+%% Cell type:code id: tags:
+
+``` python
+url = f'https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-output-reference/{parameter}-{date}.nc'
+os.system(f'wget {url}')
+
+assert os.path.exists(f'{parameter}-{date}.nc')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
--- a/notebooks/mean_bias_reduction.ipynb
+++ b/notebooks/mean_bias_reduction.ipynb
+%% Cell type:markdown id: tags:
+
+# Train ML model to correct predictions of week 3-4 & 5-6
+
+This notebook create a Machine Learning `ML_model` to predict weeks 3-4 & 5-6 based on `S2S` weeks 3-4 & 5-6 forecasts and is compared to `CPC` observations for the [`s2s-ai-challenge`](https://s2s-ai-challenge.github.io/).
+
+%% Cell type:markdown id: tags:
+
+# Synopsis
+
+%% Cell type:markdown id: tags:
+
+## Method: `mean bias reduction`
+
+- calculate the mean bias from 2000-2019 deterministic ensemble mean forecast
+- remove that mean bias from 2020 forecast deterministic ensemble mean forecast
+- no Machine Learning used here
+
+%% Cell type:markdown id: tags:
+
+## Data used
+
+type: renku datasets
+
+Training-input for Machine Learning model:
+- hindcasts of models:
+    - ECMWF: `ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr`
+
+Forecast-input for Machine Learning model:
+- real-time 2020 forecasts of models:
+    - ECMWF: `ecmwf_forecast-input_2020_biweekly_deterministic.zarr`
+
+Compare Machine Learning model forecast against against ground truth:
+- `CPC` observations:
+    - `hindcast-like-observations_biweekly_deterministic.zarr`
+    - `forecast-like-observations_2020_biweekly_deterministic.zarr`
+
+%% Cell type:markdown id: tags:
+
+## Resources used
+for training, details in reproducibility
+
+- platform: MPI-M supercompute 1 Node
+- memory: 64 GB
+- processors: 36 CPU
+- storage required: 10 GB
+
+%% Cell type:markdown id: tags:
+
+## Safeguards
+
+All points have to be [x] checked. If not, your submission is invalid.
+
+Changes to the code after submissions are not possible, as the `commit` before the `tag` will be reviewed.
+(Only in exceptions and if previous effort in reproducibility can be found, it may be allowed to improve readability and reproducibility after November 1st 2021.)
+
+%% Cell type:markdown id: tags:
+
+### Safeguards to prevent [overfitting](https://en.wikipedia.org/wiki/Overfitting?wprov=sfti1)
+
+If the organizers suspect overfitting, your contribution can be disqualified.
+
+  - [x] We didnt use 2020 observations in training (explicit overfitting and cheating)
+  - [x] We didnt repeatedly verify my model on 2020 observations and incrementally improved my RPSS (implicit overfitting)
+  - [x] We provide RPSS scores for the training period with script `skill_by_year`, see in section 6.3 `predict`.
+  - [x] We tried our best to prevent [data leakage](https://en.wikipedia.org/wiki/Leakage_(machine_learning)?wprov=sfti1).
+  - [x] We honor the `train-validate-test` [split principle](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets). This means that the hindcast data is split into `train` and `validate`, whereas `test` is withheld.
+  - [x] We did use `test` explicitly in training or implicitly in incrementally adjusting parameters.
+  - [x] We considered [cross-validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)).
+
+%% Cell type:markdown id: tags:
+
+### Safeguards for Reproducibility
+Notebook/code must be independently reproducible from scratch by the organizers (after the competition), if not possible: no prize
+  - [x] All training data is publicly available (no pre-trained private neural networks, as they are not reproducible for us)
+  - [x] Code is well documented, readable and reproducible.
+  - [x] Code to reproduce training and predictions is preferred to run within a day on the described architecture. If the training takes longer than a day, please justify why this is needed. Please do not submit training piplelines, which take weeks to train.
+
+%% Cell type:markdown id: tags:
+
+# Imports
+
+%% Cell type:code id: tags:
+
+``` python
+import xarray as xr
+xr.set_options(display_style='text')
+```
+
+%% Output
+
+    <xarray.core.options.set_options at 0x7f05cc486340>
+
+%% Cell type:markdown id: tags:
+
+# Get training data
+
+preprocessing of input data may be done in separate notebook/script
+
+%% Cell type:markdown id: tags:
+
+## Hindcast
+
+get weekly initialized hindcasts
+
+%% Cell type:code id: tags:
+
+``` python
+# preprocessed as renku dataset
+!renku storage pull ../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr
+```
+
+%% Output
+
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
+
+%% Cell type:code id: tags:
+
+``` python
+hind_2000_2019 = xr.open_zarr("../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr", consolidated=True)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# preprocessed as renku dataset
+!renku storage pull ../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr
+```
+
+%% Output
+
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
+
+%% Cell type:code id: tags:
+
+``` python
+fct_2020 = xr.open_zarr("../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr", consolidated=True)
+```
+
+%% Cell type:markdown id: tags:
+
+## Observations
+corresponding to hindcasts
+
+%% Cell type:code id: tags:
+
+``` python
+# preprocessed as renku dataset
+!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
+```
+
+%% Output
+
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
+
+%% Cell type:code id: tags:
+
+``` python
+obs_2000_2019 = xr.open_zarr("../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr", consolidated=True)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# preprocessed as renku dataset
+!renku storage pull ../data/forecast-like-observations_2020_biweekly_deterministic.zarr
+```
+
+%% Output
+
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
+
+%% Cell type:code id: tags:
+
+``` python
+obs_2020 = xr.open_zarr("../data/forecast-like-observations_2020_biweekly_deterministic.zarr", consolidated=True)
+```
+
+%% Cell type:markdown id: tags:
+
+# no ML model
+
+%% Cell type:markdown id: tags:
+
+Here, we just remove the mean bias from the ensemble mean forecast.
+
+%% Cell type:code id: tags:
+
+``` python
+from scripts import add_year_week_coords
+obs_2000_2019 = add_year_week_coords(obs_2000_2019)
+hind_2000_2019 = add_year_week_coords(hind_2000_2019)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+bias_2000_2019 = (hind_2000_2019.mean('realization') - obs_2000_2019).groupby('week').mean().compute()
+```
+
+%% Output
+
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
+
+%% Cell type:markdown id: tags:
+
+## `predict`
+
+Create predictions and print `mean(variable, lead_time, longitude, weighted latitude)` RPSS for all years as calculated by `skill_by_year`.
+
+%% Cell type:code id: tags:
+
+``` python
+from scripts import make_probabilistic
+```
+
+%% Cell type:code id: tags:
+
+``` python
+!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc
+```
+
+%% Output
+
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
+
+%% Cell type:code id: tags:
+
+``` python
+tercile_file = f'../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc'
+tercile_edges = xr.open_dataset(tercile_file)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+def create_predictions(fct, bias):
+    if 'week' not in fct.coords:
+        fct = add_year_week_coords(fct)
+    preds = fct - bias.sel(week=fct.week)
+    preds = make_probabilistic(preds, tercile_edges)
+    return preds.astype('float32')
+```
+
+%% Cell type:markdown id: tags:
+
+### `predict` training period in-sample
+
+%% Cell type:code id: tags:
+
+``` python
+!renku storage pull ../data/forecast-like-observations_2020_biweekly_terciled.nc
+```
+
+%% Output
+
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
+
+%% Cell type:code id: tags:
+
+``` python
+!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr
+```
+
+%% Output
+
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
+
+%% Cell type:code id: tags:
+
+``` python
+preds_is = create_predictions(hind_2000_2019, bias_2000_2019).compute()
+```
+
+%% Cell type:code id: tags:
+
+``` python
+from scripts import skill_by_year
+```
+
+%% Cell type:code id: tags:
+
+``` python
+skill_by_year(preds_is)
+```
+
+%% Cell type:markdown id: tags:
+
+### `predict` test
+
+%% Cell type:code id: tags:
+
+``` python
+preds_test = create_predictions(fct_2020, bias_2000_2019)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+skill_by_year(preds_test)
+```
+
+%% Cell type:markdown id: tags:
+
+# Submission
+
+%% Cell type:code id: tags:
+
+``` python
+from scripts import assert_predictions_2020
+assert_predictions_2020(preds_test)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+preds_test.attrs = {'author': 'Aaron Spring', 'author_email': 'aaron.spring@mpimet.mpg.de',
+               'comment': 'created for the s2s-ai-challenge as a template for the website',
+               'notebook': 'mean_bias_reduction.ipynb',
+               'website': 'https://s2s-ai-challenge.github.io/#evaluation'}
+
+html_repr = xr.core.formatting_html.dataset_repr(preds_test)
+
+with open('submission_template_repr.html', 'w') as myFile:
+    myFile.write(html_repr)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+preds_test.to_netcdf('../submissions/ML_prediction_2020.nc')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# !git add ../submissions/ML_prediction_2020.nc
+# !git add mean_bias_reduction.ipynb
+```
+
+%% Cell type:code id: tags:
+
+``` python
+#!git commit -m "template_test no ML mean bias reduction" # whatever message you want
+```
+
+%% Cell type:code id: tags:
+
+``` python
+#!git tag "submission-no_ML_mean_bias_reduction-0.0.2" # if this is to be checked by scorer, only the last submitted==tagged version will be considered
+```
+
+%% Cell type:code id: tags:
+
+``` python
+#!git push --tags
+```
+
+%% Cell type:markdown id: tags:
+
+# Reproducibility
+
+%% Cell type:markdown id: tags:
+
+## memory
+
+%% Cell type:code id: tags:
+
+``` python
+# https://phoenixnap.com/kb/linux-commands-check-memory-usage
+!free -g
+```
+
+%% Cell type:markdown id: tags:
+
+## CPU
+
+%% Cell type:code id: tags:
+
+``` python
+!lscpu
+```
+
+%% Cell type:markdown id: tags:
+
+## software
+
+%% Cell type:code id: tags:
+
+``` python
+!conda list
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+%% Cell type:markdown id: tags:
+
+# Train ML model to correct predictions of week 3-4 & 5-6
+
+This notebook create a Machine Learning `ML_model` to predict weeks 3-4 & 5-6 based on `S2S` weeks 3-4 & 5-6 forecasts and is compared to `CPC` observations for the [`s2s-ai-challenge`](https://s2s-ai-challenge.github.io/).
+
+%% Cell type:markdown id: tags:
+
+# Synopsis
+
+%% Cell type:markdown id: tags:
+
+## Method: `mean bias reduction`
+
+- calculate the mean bias from 2000-2019 deterministic ensemble mean forecast
+- remove that mean bias from 2020 forecast deterministic ensemble mean forecast
+- no Machine Learning used here
+
+%% Cell type:markdown id: tags:
+
+## Data used
+
+type: renku datasets
+
+Training-input for Machine Learning model:
+- hindcasts of models:
+    - ECMWF: `ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr`
+
+Forecast-input for Machine Learning model:
+- real-time 2020 forecasts of models:
+    - ECMWF: `ecmwf_forecast-input_2020_biweekly_deterministic.zarr`
+
+Compare Machine Learning model forecast against against ground truth:
+- `CPC` observations:
+    - `hindcast-like-observations_biweekly_deterministic.zarr`
+    - `forecast-like-observations_2020_biweekly_deterministic.zarr`
+
+%% Cell type:markdown id: tags:
+
+## Resources used
+for training, details in reproducibility
+
+- platform: MPI-M supercompute 1 Node
+- memory: 64 GB
+- processors: 36 CPU
+- storage required: 10 GB
+
+%% Cell type:markdown id: tags:
+
+## Safeguards
+
+All points have to be [x] checked. If not, your submission is invalid.
+
+Changes to the code after submissions are not possible, as the `commit` before the `tag` will be reviewed.
+(Only in exceptions and if previous effort in reproducibility can be found, it may be allowed to improve readability and reproducibility after November 1st 2021.)
+
+%% Cell type:markdown id: tags:
+
+### Safeguards to prevent [overfitting](https://en.wikipedia.org/wiki/Overfitting?wprov=sfti1)
+
+If the organizers suspect overfitting, your contribution can be disqualified.
+
+  - [x] We didnt use 2020 observations in training (explicit overfitting and cheating)
+  - [x] We didnt repeatedly verify my model on 2020 observations and incrementally improved my RPSS (implicit overfitting)
+  - [x] We provide RPSS scores for the training period with script `skill_by_year`, see in section 6.3 `predict`.
+  - [x] We tried our best to prevent [data leakage](https://en.wikipedia.org/wiki/Leakage_(machine_learning)?wprov=sfti1).
+  - [x] We honor the `train-validate-test` [split principle](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets). This means that the hindcast data is split into `train` and `validate`, whereas `test` is withheld.
+  - [x] We did use `test` explicitly in training or implicitly in incrementally adjusting parameters.
+  - [x] We considered [cross-validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)).
+
+%% Cell type:markdown id: tags:
+
+### Safeguards for Reproducibility
+Notebook/code must be independently reproducible from scratch by the organizers (after the competition), if not possible: no prize
+  - [x] All training data is publicly available (no pre-trained private neural networks, as they are not reproducible for us)
+  - [x] Code is well documented, readable and reproducible.
+  - [x] Code to reproduce training and predictions is preferred to run within a day on the described architecture. If the training takes longer than a day, please justify why this is needed. Please do not submit training piplelines, which take weeks to train.
+
+%% Cell type:markdown id: tags:
+
+# Imports
+
+%% Cell type:code id: tags:
+
+``` python
+import xarray as xr
+xr.set_options(display_style='text')
+```
+
+%% Output
+
+    <xarray.core.options.set_options at 0x7f05cc486340>
+
+%% Cell type:markdown id: tags:
+
+# Get training data
+
+preprocessing of input data may be done in separate notebook/script
+
+%% Cell type:markdown id: tags:
+
+## Hindcast
+
+get weekly initialized hindcasts
+
+%% Cell type:code id: tags:
+
+``` python
+# preprocessed as renku dataset
+!renku storage pull ../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr
+```
+
+%% Output
+
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
+
+%% Cell type:code id: tags:
+
+``` python
+hind_2000_2019 = xr.open_zarr("../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr", consolidated=True)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# preprocessed as renku dataset
+!renku storage pull ../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr
+```
+
+%% Output
+
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
+
+%% Cell type:code id: tags:
+
+``` python
+fct_2020 = xr.open_zarr("../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr", consolidated=True)
+```
+
+%% Cell type:markdown id: tags:
+
+## Observations
+corresponding to hindcasts
+
+%% Cell type:code id: tags:
+
+``` python
+# preprocessed as renku dataset
+!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
+```
+
+%% Output
+
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
+
+%% Cell type:code id: tags:
+
+``` python
+obs_2000_2019 = xr.open_zarr("../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr", consolidated=True)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# preprocessed as renku dataset
+!renku storage pull ../data/forecast-like-observations_2020_biweekly_deterministic.zarr
+```
+
+%% Output
+
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
+
+%% Cell type:code id: tags:
+
+``` python
+obs_2020 = xr.open_zarr("../data/forecast-like-observations_2020_biweekly_deterministic.zarr", consolidated=True)
+```
+
+%% Cell type:markdown id: tags:
+
+# no ML model
+
+%% Cell type:markdown id: tags:
+
+Here, we just remove the mean bias from the ensemble mean forecast.
+
+%% Cell type:code id: tags:
+
+``` python
+from scripts import add_year_week_coords
+obs_2000_2019 = add_year_week_coords(obs_2000_2019)
+hind_2000_2019 = add_year_week_coords(hind_2000_2019)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+bias_2000_2019 = (hind_2000_2019.mean('realization') - obs_2000_2019).groupby('week').mean().compute()
+```
+
+%% Output
+
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
+
+%% Cell type:markdown id: tags:
+
+## `predict`
+
+Create predictions and print `mean(variable, lead_time, longitude, weighted latitude)` RPSS for all years as calculated by `skill_by_year`.
+
+%% Cell type:code id: tags:
+
+``` python
+from scripts import make_probabilistic
+```
+
+%% Cell type:code id: tags:
+
+``` python
+!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc
+```
+
+%% Output
+
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
+
+%% Cell type:code id: tags:
+
+``` python
+tercile_file = f'../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc'
+tercile_edges = xr.open_dataset(tercile_file)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+def create_predictions(fct, bias):
+    if 'week' not in fct.coords:
+        fct = add_year_week_coords(fct)
+    preds = fct - bias.sel(week=fct.week)
+    preds = make_probabilistic(preds, tercile_edges)
+    return preds.astype('float32')
+```
+
+%% Cell type:markdown id: tags:
+
+### `predict` training period in-sample
+
+%% Cell type:code id: tags:
+
+``` python
+!renku storage pull ../data/forecast-like-observations_2020_biweekly_terciled.nc
+```
+
+%% Output
+
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
+
+%% Cell type:code id: tags:
+
+``` python
+!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr
+```
+
+%% Output
+
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
+
+%% Cell type:code id: tags:
+
+``` python
+preds_is = create_predictions(hind_2000_2019, bias_2000_2019).compute()
+```
+
+%% Cell type:code id: tags:
+
+``` python
+from scripts import skill_by_year
+```
+
+%% Cell type:code id: tags:
+
+``` python
+skill_by_year(preds_is)
+```
+
+%% Cell type:markdown id: tags:
+
+### `predict` test
+
+%% Cell type:code id: tags:
+
+``` python
+preds_test = create_predictions(fct_2020, bias_2000_2019)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+skill_by_year(preds_test)
+```
+
+%% Cell type:markdown id: tags:
+
+# Submission
+
+%% Cell type:code id: tags:
+
+``` python
+from scripts import assert_predictions_2020
+assert_predictions_2020(preds_test)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+preds_test.attrs = {'author': 'Aaron Spring', 'author_email': 'aaron.spring@mpimet.mpg.de',
+               'comment': 'created for the s2s-ai-challenge as a template for the website',
+               'notebook': 'mean_bias_reduction.ipynb',
+               'website': 'https://s2s-ai-challenge.github.io/#evaluation'}
+
+html_repr = xr.core.formatting_html.dataset_repr(preds_test)
+
+with open('submission_template_repr.html', 'w') as myFile:
+    myFile.write(html_repr)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+preds_test.to_netcdf('../submissions/ML_prediction_2020.nc')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# !git add ../submissions/ML_prediction_2020.nc
+# !git add mean_bias_reduction.ipynb
+```
+
+%% Cell type:code id: tags:
+
+``` python
+#!git commit -m "template_test no ML mean bias reduction" # whatever message you want
+```
+
+%% Cell type:code id: tags:
+
+``` python
+#!git tag "submission-no_ML_mean_bias_reduction-0.0.2" # if this is to be checked by scorer, only the last submitted==tagged version will be considered
+```
+
+%% Cell type:code id: tags:
+
+``` python
+#!git push --tags
+```
+
+%% Cell type:markdown id: tags:
+
+# Reproducibility
+
+%% Cell type:markdown id: tags:
+
+## memory
+
+%% Cell type:code id: tags:
+
+``` python
+# https://phoenixnap.com/kb/linux-commands-check-memory-usage
+!free -g
+```
+
+%% Cell type:markdown id: tags:
+
+## CPU
+
+%% Cell type:code id: tags:
+
+``` python
+!lscpu
+```
+
+%% Cell type:markdown id: tags:
+
+## software
+
+%% Cell type:code id: tags:
+
+``` python
+!conda list
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
--- a/notebooks/renku_datasets_biweekly.ipynb
+++ b/notebooks/renku_datasets_biweekly.ipynb
+%% Cell type:markdown id: tags:
+
+# Create biweekly renku datasets from `climetlab-s2s-ai-challenge`
+
+%% Cell type:markdown id: tags:
+
+Goal:
+
+- Create biweekly renku datasets from [`climatelab-s2s-ai-challenge`](https://github.com/ecmwf-lab/climetlab-s2s-ai-challenge).
+- These renku datasets are then used in notebooks:
+    - `ML_train_and_predict.ipynb` to train the ML model and do ML-based predictions
+    - `RPSS_verification.ipynb` to calculate RPSS of the ML model
+    - `mean_bias_reduction.ipynb` to remove the mean bias and
+
+Requirements:
+- [`climetlab`](https://github.com/ecmwf/climetlab)
+- [`climatelab-s2s-ai-challenge`](https://github.com/ecmwf-lab/climetlab-s2s-ai-challenge)
+- S2S and CPC observations uploaded on [European Weather Cloud (EWC)](https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/training-input/0.3.0/netcdf/index.html)
+
+Output: [renku dataset](https://renku-python.readthedocs.io/en/latest/commands.html#module-renku.cli.dataset) `s2s-ai-challenge`
+- observations
+    - deterministic:
+        - `hindcast-like-observations_2000-2019_biweekly_deterministic.zarr`
+        - `forecast-like-observations_2020_biweekly_deterministic.zarr`
+    - edges:
+        - `hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc`
+    - probabilistic:
+        - `hindcast-like-observations_2000-2019_biweekly_terciled.zarr`
+        - `forecast-like-observations_2020_biweekly_terciled.nc`
+- forecasts/hindcasts
+    - deterministic:
+        - `ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr`
+        - `ecmwf_forecast-input_2020_biweekly_deterministic.zarr`
+    - more models could be added
+- benchmark:
+    - probabilistic:
+        - `ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc`
+
+%% Cell type:code id: tags:
+
+``` python
+import matplotlib.pyplot as plt
+import xarray as xr
+import xskillscore as xs
+import pandas as pd
+
+import climetlab_s2s_ai_challenge
+import climetlab as cml
+print(f'Climetlab version : {cml.__version__}')
+print(f'Climetlab-s2s-ai-challenge plugin version : {climetlab_s2s_ai_challenge.__version__}')
+
+xr.set_options(keep_attrs=True)
+xr.set_options(display_style='text')
+```
+
+%% Output
+
+    WARNING: ecmwflibs universal: cannot find a library called MagPlus
+    Magics library could not be found
+
+    Climetlab version : 0.8.6
+    Climetlab-s2s-ai-challenge plugin version : 0.8.0
+
+    <xarray.core.options.set_options at 0x2b51c148f590>
+
+%% Cell type:code id: tags:
+
+``` python
+# caching path for climetlab
+cache_path = "/work/mh0727/m300524/S2S_AI/cache4" # set your own path
+cml.settings.set("cache-directory", cache_path)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+cache_path = "../data"
+```
+
+%% Cell type:markdown id: tags:
+
+# Download and cache
+
+Download all files for the observations, forecast and hindcast.
+
+%% Cell type:code id: tags:
+
+``` python
+# shortcut
+from scripts import download
+#download()
+```
+
+%% Cell type:markdown id: tags:
+
+## hindcast and forecast `input`
+
+%% Cell type:code id: tags:
+
+``` python
+# starting dates forecast_time in 2020
+dates = xr.cftime_range(start='20200102',freq='7D', periods=53).strftime('%Y%m%d').to_list()
+
+forecast_dataset_labels = ['training-input','test-input'] # ML community
+# equiv to
+forecast_dataset_labels = ['hindcast-input','forecast-input'] # NWP community
+
+varlist_forecast = ['tp','t2m'] # can add more
+
+center_list = ['ecmwf'] # 'ncep', 'eccc'
+```
+
+%% Cell type:code id: tags:
+
+``` python
+%%time
+# takes ~ 10-30 min to download for one model one variable depending on number of model realizations
+# and download settings https://climetlab.readthedocs.io/en/latest/guide/settings.html
+for center in center_list:
+    for ds in forecast_dataset_labels:
+        cml.load_dataset(f"s2s-ai-challenge-{ds}", origin=center, parameter=varlist_forecast, format='netcdf').to_xarray()
+```
+
+%% Cell type:markdown id: tags:
+
+## observations `output-reference`
+
+%% Cell type:code id: tags:
+
+``` python
+obs_dataset_labels = ['training-output-reference','test-output-reference'] # ML community
+# equiv to
+obs_dataset_labels = ['hindcast-like-observations','forecast-like-observations'] # NWP community
+
+varlist_obs = ['tp', 't2m']
+```
+
+%% Cell type:code id: tags:
+
+``` python
+%%time
+# takes 10min to download
+for ds in obs_dataset_labels:
+    print(ds)
+    # only netcdf, no format choice
+    cml.load_dataset(f"s2s-ai-challenge-{ds}", date=dates, parameter=varlist_obs).to_xarray()
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# download obs_time for to create output-reference/observations for other models than ecmwf and eccc,
+# i.e. ncep or any S2S or Sub model
+obs_time = cml.load_dataset(f"s2s-ai-challenge-observations", parameter=['t2m', 'pr']).to_xarray()
+```
+
+%% Cell type:markdown id: tags:
+
+# create bi-weekly aggregates
+
+%% Cell type:code id: tags:
+
+``` python
+from scripts import aggregate_biweekly, ensure_attributes
+
+#aggregate_biweekly??
+```
+
+%% Cell type:code id: tags:
+
+``` python
+for c, center in enumerate(center_list):  # forecast centers (could also take models)
+    for dsl in obs_dataset_labels:# + forecast_dataset_labels:  # climetlab dataset labels
+        for p, parameter in enumerate(varlist_forecast):  # variables
+            if c != 0 and 'observation' in dsl:  # only do once for observations
+                continue
+            print(f"datasetlabel: {dsl}, center: {center}, parameter: {parameter}")
+            if 'input' in dsl:
+                ds = cml.load_dataset(f"s2s-ai-challenge-{dsl}", origin=center, parameter=parameter, format='netcdf').to_xarray()
+            elif 'observation' in dsl: # obs only netcdf, no choice
+                if parameter not in ['t2m', 'tp']:
+                    continue
+                ds = cml.load_dataset(f"s2s-ai-challenge-{dsl}", parameter=parameter, date=dates).to_xarray()
+
+            if p == 0:
+                ds_biweekly = ds.map(aggregate_biweekly)
+            else:
+                ds_biweekly[parameter] = ds.map(aggregate_biweekly)[parameter]
+
+            ds_biweekly = ds_biweekly.map(ensure_attributes, biweekly=True)
+
+            ds_biweekly = ds_biweekly.sortby('forecast_time')
+
+        if 'test' in dsl:
+            ds_biweekly = ds_biweekly.chunk('auto')
+        else:
+            ds_biweekly = ds_biweekly.chunk({'forecast_time':'auto','lead_time':-1,'longitude':-1,'latitude':-1})
+
+        if 'hindcast' in dsl:
+            time = f'{int(ds_biweekly.forecast_time.dt.year.min())}-{int(ds_biweekly.forecast_time.dt.year.max())}'
+            if 'input' in dsl:
+                name = f'{center}_{dsl}'
+            elif 'observations':
+                name = dsl
+
+        elif 'forecast' in dsl:
+            time = '2020'
+            if 'input' in dsl:
+                name = f'{center}_{dsl}'
+            elif 'observations':
+                name = dsl
+        else:
+            assert False
+
+        # pattern: {model_if_not_observations}{observations/forecast/hindcast}_{time}_biweekly_deterministic.zarr
+        zp = f'{cache_path}/{name}_{time}_biweekly_deterministic.zarr'
+        ds_biweekly.attrs.update({'postprocessed_by':'https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/blob/master/notebooks/renku_datasets_biweekly.ipynb'})
+        print(f'save to: {zp}')
+        ds_biweekly.astype('float32').to_zarr(zp, consolidated=True, mode='w')
+```
+
+%% Output
+
+    datasetlabel: hindcast-like-observations, center: ecmwf, parameter: tp
+    By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data.  This dataset has been dowloaded from IRIDL. By downloading this data you also agree to the terms and conditions defined at https://iridl.ldeo.columbia.edu.
+
+    WARNING: ecmwflibs universal: found eccodes at /work/mh0727/m300524/conda-envs/s2s-ai/lib/libeccodes.so
+
+    Warning: ecCodes 2.21.0 or higher is recommended. You are running version 2.18.0
+    By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data.
+
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/indexing.py:1379: PerformanceWarning: Slicing with an out-of-order index is generating 20 times more chunks
+      return self.array[key]
+
+    datasetlabel: hindcast-like-observations, center: ecmwf, parameter: t2m
+
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/indexing.py:1379: PerformanceWarning: Slicing is producing a large chunk. To accept the large
+    chunk and silence this warning, set the option
+        >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
+        ...     array[indexer]
+    
+    To avoid creating the large chunks, set the option
+        >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
+        ...     array[indexer]
+      return self.array[key]
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/indexing.py:1379: PerformanceWarning: Slicing is producing a large chunk. To accept the large
+    chunk and silence this warning, set the option
+        >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
+        ...     array[indexer]
+    
+    To avoid creating the large chunks, set the option
+        >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
+        ...     array[indexer]
+      return self.array[key]
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/indexing.py:1379: PerformanceWarning: Slicing with an out-of-order index is generating 20 times more chunks
+      return self.array[key]
+
+    save to: ../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
+
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
+
+    datasetlabel: forecast-like-observations, center: ecmwf, parameter: tp
+    By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data.  This dataset has been dowloaded from IRIDL. By downloading this data you also agree to the terms and conditions defined at https://iridl.ldeo.columbia.edu.
+    datasetlabel: forecast-like-observations, center: ecmwf, parameter: t2m
+
+    
+
+    save to: ../data/forecast-like-observations_2020_biweekly_deterministic.zarr
+
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
+
+%% Cell type:markdown id: tags:
+
+## add to `renku` dataset `s2s-ai-challenge`
+
+%% Cell type:code id: tags:
+
+``` python
+# observations as hindcast
+# run renku commands from projects root directory only
+# !renku dataset add s2s-ai-challenge data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# for further use retrieve from git lfs
+# !renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
+```
+
+%% Cell type:code id: tags:
+
+``` python
+obs_2000_2019 = xr.open_zarr(f"{cache_path}/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr", consolidated=True)
+print(obs_2000_2019.sizes,'\n',obs_2000_2019.coords,'\n', obs_2000_2019.nbytes/1e6,'MB')
+```
+
+%% Output
+
+    Frozen(SortedKeysDict({'forecast_time': 1060, 'latitude': 121, 'lead_time': 2, 'longitude': 240}))
+     Coordinates:
+      * forecast_time  (forecast_time) datetime64[ns] 2000-01-02 ... 2019-12-31
+      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+      * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
+      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
+        valid_time     (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 1060), meta=np.ndarray>
+     492.546744 MB
+
+%% Cell type:code id: tags:
+
+``` python
+# observations as forecast
+# run renku commands from projects root directory only
+# !renku dataset add s2s-ai-challenge data/forecast-like-observations_2020_biweekly_deterministic.zarr
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# for further use retrieve from git lfs
+# !renku storage pull ../data/forecast-like-observations_2020_biweekly_deterministic.zarr
+```
+
+%% Cell type:code id: tags:
+
+``` python
+obs_2020 = xr.open_zarr(f"{cache_path}/forecast-like-observations_2020_biweekly_deterministic.zarr", consolidated=True)
+print(obs_2020.sizes,'\n',obs_2020.coords,'\n', obs_2020.nbytes/1e6,'MB')
+```
+
+%% Output
+
+    Frozen(SortedKeysDict({'forecast_time': 53, 'latitude': 121, 'lead_time': 2, 'longitude': 240}))
+     Coordinates:
+      * forecast_time  (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31
+      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+      * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
+      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
+        valid_time     (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 53), meta=np.ndarray>
+     24.630096 MB
+
+%% Cell type:code id: tags:
+
+``` python
+# ecmwf hindcast-input
+# run renku commands from projects root directory only
+# !renku dataset add s2s-ai-challenge data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# for further use retrieve from git lfs
+# !renku storage pull ../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr
+```
+
+%% Cell type:code id: tags:
+
+``` python
+hind_2000_2019 = xr.open_zarr(f"{cache_path}/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr", consolidated=True)
+print(hind_2000_2019.sizes,'\n',hind_2000_2019.coords,'\n', hind_2000_2019.nbytes/1e6,'MB')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# ecmwf forecast-input
+# run renku commands from projects root directory only
+# !renku dataset add s2s-ai-challenge data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# for further use retrieve from git lfs
+# !renku storage pull ../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr
+```
+
+%% Cell type:code id: tags:
+
+``` python
+fct_2020 = xr.open_zarr(f"{cache_path}/ecmwf_forecast-input_2020_biweekly_deterministic.zarr", consolidated=True)
+print(fct_2020.sizes,'\n',fct_2020.coords,'\n', fct_2020.nbytes/1e6,'MB')
+```
+
+%% Cell type:markdown id: tags:
+
+# tercile edges
+
+Create 2 tercile edges at 1/3 and 2/3 quantiles of the 2000-2019 biweekly distrbution for each week of the year
+
+%% Cell type:code id: tags:
+
+``` python
+obs_2000_2019 = xr.open_zarr(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr', consolidated=True)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+from scripts import add_year_week_coords
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# add week for groupby, see https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge/-/issues/29
+obs_2000_2019 = add_year_week_coords(obs_2000_2019)
+```
+
+%% Output
+
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:383: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
+      FutureWarning,
+
+%% Cell type:code id: tags:
+
+``` python
+obs_2000_2019
+```
+
+%% Output
+
+    <xarray.Dataset>
+    Dimensions:        (forecast_time: 1060, latitude: 121, lead_time: 2, longitude: 240)
+    Coordinates:
+      * forecast_time  (forecast_time) datetime64[ns] 2000-01-02 ... 2019-12-31
+      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+      * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
+      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
+        valid_time     (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 1060), meta=np.ndarray>
+        week           (forecast_time) int64 1 2 3 4 5 6 7 ... 47 48 49 50 51 52 53
+        year           (forecast_time) int64 2000 2000 2000 2000 ... 2019 2019 2019
+    Data variables:
+        t2m            (lead_time, forecast_time, latitude, longitude) float32 dask.array<chunksize=(2, 530, 121, 240), meta=np.ndarray>
+        tp             (lead_time, forecast_time, latitude, longitude) float32 dask.array<chunksize=(2, 530, 121, 240), meta=np.ndarray>
+    Attributes:
+        created_by_script:    tools/observations/makefile
+        created_by_software:  climetlab-s2s-ai-challenge
+        function:             climetlab_s2s_ai_challenge.extra.forecast_like_obse...
+        postprocessed_by:     https://renkulab.io/gitlab/aaron.spring/s2s-ai-chal...
+        regrid_method:        conservative
+        source_dataset_name:  NOAA NCEP CPC UNIFIED_PRCP GAUGE_BASED GLOBAL v1p0 ...
+        source_hosting:       IRIDL
+        source_url:           http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/...
+
+%% Cell type:code id: tags:
+
+``` python
+tercile_file = f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc'
+```
+
+%% Cell type:code id: tags:
+
+``` python
+%%time
+obs_2000_2019.chunk({'forecast_time':-1,'longitude':'auto'}).groupby('week').quantile(q=[1./3.,2./3.], dim='forecast_time').rename({'quantile':'category_edge'}).astype('float32').to_netcdf(tercile_file)
+```
+
+%% Output
+
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/numpy/lib/nanfunctions.py:1390: RuntimeWarning: All-NaN slice encountered
+      overwrite_input, interpolation)
+
+    CPU times: user 19min 35s, sys: 8min 33s, total: 28min 9s
+    Wall time: 16min 44s
+
+%% Cell type:code id: tags:
+
+``` python
+tercile_edges = xr.open_dataset(tercile_file)
+
+tercile_edges
+```
+
+%% Output
+
+    <xarray.Dataset>
+    Dimensions:        (category_edge: 2, latitude: 121, lead_time: 2, longitude: 240, week: 53)
+    Coordinates:
+      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+      * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
+      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
+      * category_edge  (category_edge) float64 0.3333 0.6667
+      * week           (week) int64 1 2 3 4 5 6 7 8 9 ... 45 46 47 48 49 50 51 52 53
+    Data variables:
+        t2m            (week, category_edge, lead_time, latitude, longitude) float32 ...
+        tp             (week, category_edge, lead_time, latitude, longitude) float32 ...
+    Attributes:
+        created_by_script:    tools/observations/makefile
+        created_by_software:  climetlab-s2s-ai-challenge
+        function:             climetlab_s2s_ai_challenge.extra.forecast_like_obse...
+        postprocessed_by:     https://renkulab.io/gitlab/aaron.spring/s2s-ai-chal...
+        regrid_method:        conservative
+        source_dataset_name:  NOAA NCEP CPC UNIFIED_PRCP GAUGE_BASED GLOBAL v1p0 ...
+        source_hosting:       IRIDL
+        source_url:           http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/...
+
+%% Cell type:code id: tags:
+
+``` python
+tercile_edges.nbytes*1e-6,'MB'
+```
+
+%% Output
+
+    (49.255184, 'MB')
+
+%% Cell type:code id: tags:
+
+``` python
+# run renku commands from projects root directory only
+# tercile edges
+#!renku dataset add s2s-ai-challenge data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# to use retrieve from git lfs
+#!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc
+#xr.open_dataset("../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc")
+```
+
+%% Cell type:markdown id: tags:
+
+# observations in categories
+
+- counting how many deterministic forecasts realizations fall into each category, like counting rps
+- categorize forecast-like-observations 2020 into categories
+
+%% Cell type:code id: tags:
+
+``` python
+obs_2020 = xr.open_zarr(f'{cache_path}/forecast-like-observations_2020_biweekly_deterministic.zarr', consolidated=True)
+obs_2020.sizes
+```
+
+%% Output
+
+    Frozen(SortedKeysDict({'forecast_time': 53, 'latitude': 121, 'lead_time': 2, 'longitude': 240}))
+
+%% Cell type:code id: tags:
+
+``` python
+# create a mask for land grid
+mask = obs_2020.std(['lead_time','forecast_time']).notnull()
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# mask.to_array().plot(col='variable')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# total precipitation in arid regions are masked
+# Frederic Vitart suggested by email: "Based on your map we could mask all the areas where the lower tercile boundary is lower than 0.1 mm"
+# we are using a dry mask as in https://doi.org/10.1175/MWR-D-17-0092.1
+th = 0.01
+tp_arid_mask = tercile_edges.tp.isel(category_edge=0, lead_time=0, drop=True) > th
+#tp_arid_mask.where(mask.tp).plot(col='forecast_time', col_wrap=4)
+#plt.suptitle(f'dry mask: week 3-4 tp 1/3 category_edge > {th} kg m-2',y=1., x=.4)
+#plt.savefig('dry_mask.png')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# look into tercile edges
+
+# tercile_edges.isel(forecast_time=0)['tp'].plot(col='lead_time',row='category_edge', robust=True)
+# tercile_edges.isel(forecast_time=[0,20],category_edge=1)['tp'].plot(col='lead_time', row='forecast_time', robust=True)
+# tercile_edges.tp.mean(['forecast_time']).plot(col='lead_time',row='category_edge',vmax=.5)
+```
+
+%% Cell type:markdown id: tags:
+
+## categorize observations
+
+%% Cell type:markdown id: tags:
+
+### forecast 2020
+
+%% Cell type:code id: tags:
+
+``` python
+from scripts import make_probabilistic
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# tp_arid_mask.isel(week=[0,10,20,30,40]).plot(col='week')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+obs_2020_p = make_probabilistic(obs_2020, tercile_edges, mask=mask)
+```
+
+%% Output
+
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:383: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
+      FutureWarning,
+
+%% Cell type:code id: tags:
+
+``` python
+obs_2020_p.nbytes/1e6, 'MB'
+```
+
+%% Output
+
+    (147.75984, 'MB')
+
+%% Cell type:code id: tags:
+
+``` python
+obs_2020_p
+```
+
+%% Output
+
+    <xarray.Dataset>
+    Dimensions:        (category: 3, forecast_time: 53, latitude: 121, lead_time: 2, longitude: 240)
+    Coordinates:
+      * forecast_time  (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31
+      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+      * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
+      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
+        valid_time     (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 53), meta=np.ndarray>
+      * category       (category) <U12 'below normal' 'near normal' 'above normal'
+    Data variables:
+        t2m            (category, lead_time, forecast_time, latitude, longitude) float64 dask.array<chunksize=(1, 2, 53, 121, 240), meta=np.ndarray>
+        tp             (category, lead_time, forecast_time, latitude, longitude) float64 dask.array<chunksize=(1, 2, 53, 121, 240), meta=np.ndarray>
+
+%% Cell type:code id: tags:
+
+``` python
+obs_2020_p.astype('float32').to_netcdf(f'{cache_path}/forecast-like-observations_2020_biweekly_terciled.nc')
+```
+
+%% Output
+
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
+
+%% Cell type:code id: tags:
+
+``` python
+# forecast-like-observations terciled
+# run renku commands from projects root directory only
+# !renku dataset add s2s-ai-challenge data/forecast-like-observations_2020_biweekly_terciled.nc
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# to use retrieve from git lfs
+#!renku storage pull ../data/forecast-like-observations_2020_biweekly_terciled.nc
+xr.open_dataset("../data/forecast-like-observations_2020_biweekly_terciled.nc")
+```
+
+%% Output
+
+    <xarray.Dataset>
+    Dimensions:        (category: 3, forecast_time: 53, latitude: 121, lead_time: 2, longitude: 240)
+    Coordinates:
+      * forecast_time  (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31
+      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+      * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
+      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
+        valid_time     (lead_time, forecast_time) datetime64[ns] ...
+      * category       (category) object 'below normal' 'near normal' 'above normal'
+    Data variables:
+        t2m            (category, lead_time, forecast_time, latitude, longitude) float32 ...
+        tp             (category, lead_time, forecast_time, latitude, longitude) float32 ...
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+### hindcast 2000_2019
+
+%% Cell type:code id: tags:
+
+``` python
+obs_2000_2019 = xr.open_zarr(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr', consolidated=True)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+obs_2000_2019_p = make_probabilistic(obs_2000_2019, tercile_edges, mask=mask)
+```
+
+%% Output
+
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:383: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
+      FutureWarning,
+
+%% Cell type:code id: tags:
+
+``` python
+obs_2000_2019_p.nbytes/1e6, 'MB'
+```
+
+%% Output
+
+    (2955.138888, 'MB')
+
+%% Cell type:code id: tags:
+
+``` python
+obs_2000_2019_p.astype('float32').chunk('auto').to_zarr(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_terciled.zarr', consolidated=True, mode='w')
+```
+
+%% Output
+
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
+
+    <xarray.backends.zarr.ZarrStore at 0x2b51c1233360>
+
+%% Cell type:code id: tags:
+
+``` python
+# forecast-like-observations terciled
+# run renku commands from projects root directory only
+# !renku dataset add s2s-ai-challenge data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# to use retrieve from git lfs
+#!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr
+xr.open_zarr("../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr")
+```
+
+%% Output
+
+    <xarray.Dataset>
+    Dimensions:        (category: 3, forecast_time: 1060, latitude: 121, lead_time: 2, longitude: 240)
+    Coordinates:
+      * category       (category) <U12 'below normal' 'near normal' 'above normal'
+      * forecast_time  (forecast_time) datetime64[ns] 2000-01-02 ... 2019-12-31
+      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+      * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
+      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
+        valid_time     (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 1060), meta=np.ndarray>
+    Data variables:
+        t2m            (category, lead_time, forecast_time, latitude, longitude) float32 dask.array<chunksize=(1, 2, 530, 121, 240), meta=np.ndarray>
+        tp             (category, lead_time, forecast_time, latitude, longitude) float32 dask.array<chunksize=(1, 2, 530, 121, 240), meta=np.ndarray>
+
+%% Cell type:code id: tags:
+
+``` python
+# checking category frequencies
+# o = xr.open_zarr("../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr")
+# w=0
+# v='tp'
+# o.sel(forecast_time=o.forecast_time.dt.dayofyear==2+7*w).sum('forecast_time', skipna=False)[v].plot(row='lead_time',col='category', levels=[5.5,6.5,7.5])
+# o.sel(forecast_time=o.forecast_time.dt.dayofyear==2+7*w).sum('forecast_time', skipna=False).sum('category', skipna=False)[v].plot(row='lead_time', levels=[16.5,17.5,18.5,19.5,20.5])
+```
+
+%% Cell type:markdown id: tags:
+
+# Benchmark
+
+center: ECMWF
+
+The calibration has been performed by using the tercile boundaries from the model climatology rather than from observations. Script by Frederic Vitart.
+
+%% Cell type:code id: tags:
+
+``` python
+bench_p = cml.load_dataset("s2s-ai-challenge-test-output-benchmark", parameter=['tp','t2m']).to_xarray()
+```
+
+%% Output
+
+     50%|█████     | 1/2 [00:00<00:00,  6.11it/s]
+
+    By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data.
+
+    100%|██████████| 2/2 [00:00<00:00,  6.89it/s]
+    WARNING: ecmwflibs universal: found eccodes at /work/mh0727/m300524/conda-envs/s2s-ai/lib/libeccodes.so
+
+    Warning: ecCodes 2.21.0 or higher is recommended. You are running version 2.12.3
+
+%% Cell type:code id: tags:
+
+``` python
+bench_p['category'].attrs = {'long_name': 'tercile category probabilities', 'units': '1',
+                        'description': 'Probabilities for three tercile categories. All three tercile category probabilities must add up to 1.'}
+```
+
+%% Cell type:code id: tags:
+
+``` python
+bench_p['lead_time'] = [pd.Timedelta(f"{i} d") for i in [14, 28]] # take first day of biweekly average as new coordinate
+
+bench_p['lead_time'].attrs = {'long_name':'forecast_period', 'description': 'Forecast period is the time interval between the forecast reference time and the validity time.',
+                         'aggregate': 'The pd.Timedelta corresponds to the first day of a biweekly aggregate.',
+                         'week34_t2m': 'mean[day 14, 27]',
+                         'week56_t2m': 'mean[day 28, 41]',
+                         'week34_tp': 'day 28 minus day 14',
+                         'week56_tp': 'day 42 minus day 28'}
+```
+
+%% Cell type:code id: tags:
+
+``` python
+bench_p = bench_p / 100 # convert percent to [0-1] probability
+```
+
+%% Cell type:code id: tags:
+
+``` python
+bench_p = bench_p.map(ensure_attributes, biweekly=True)
+```
+
+%% Output
+
+      0%|          | 0/1 [00:00<?, ?it/s]
+
+    By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data.
+
+    100%|██████████| 1/1 [00:00<00:00,  4.34it/s]
+    100%|██████████| 1/1 [00:00<00:00,  4.22it/s]
+
+%% Cell type:code id: tags:
+
+``` python
+# bench_p.isel(forecast_time=2).t2m.plot(row='lead_time', col='category')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+bench_p
+```
+
+%% Output
+
+    <xarray.Dataset>
+    Dimensions:        (category: 3, forecast_time: 53, latitude: 121, lead_time: 2, longitude: 240)
+    Coordinates:
+      * category       (category) object 'below normal' 'near normal' 'above normal'
+      * forecast_time  (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31
+      * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
+      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
+        valid_time     (forecast_time, lead_time) datetime64[ns] 2020-01-16 ... 2...
+    Data variables:
+        tp             (category, forecast_time, lead_time, latitude, longitude) float32 ...
+        t2m            (category, forecast_time, lead_time, latitude, longitude) float32 ...
+
+%% Cell type:code id: tags:
+
+``` python
+bench_p.astype('float32').to_netcdf('../data/ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+#!renku dataset add s2s-ai-challenge data/ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc
+```
+%% Cell type:markdown id: tags:
+
+# Create biweekly renku datasets from `climetlab-s2s-ai-challenge`
+
+%% Cell type:markdown id: tags:
+
+Goal:
+
+- Create biweekly renku datasets from [`climatelab-s2s-ai-challenge`](https://github.com/ecmwf-lab/climetlab-s2s-ai-challenge).
+- These renku datasets are then used in notebooks:
+    - `ML_train_and_predict.ipynb` to train the ML model and do ML-based predictions
+    - `RPSS_verification.ipynb` to calculate RPSS of the ML model
+    - `mean_bias_reduction.ipynb` to remove the mean bias and
+
+Requirements:
+- [`climetlab`](https://github.com/ecmwf/climetlab)
+- [`climatelab-s2s-ai-challenge`](https://github.com/ecmwf-lab/climetlab-s2s-ai-challenge)
+- S2S and CPC observations uploaded on [European Weather Cloud (EWC)](https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/training-input/0.3.0/netcdf/index.html)
+
+Output: [renku dataset](https://renku-python.readthedocs.io/en/latest/commands.html#module-renku.cli.dataset) `s2s-ai-challenge`
+- observations
+    - deterministic:
+        - `hindcast-like-observations_2000-2019_biweekly_deterministic.zarr`
+        - `forecast-like-observations_2020_biweekly_deterministic.zarr`
+    - edges:
+        - `hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc`
+    - probabilistic:
+        - `hindcast-like-observations_2000-2019_biweekly_terciled.zarr`
+        - `forecast-like-observations_2020_biweekly_terciled.nc`
+- forecasts/hindcasts
+    - deterministic:
+        - `ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr`
+        - `ecmwf_forecast-input_2020_biweekly_deterministic.zarr`
+    - more models could be added
+- benchmark:
+    - probabilistic:
+        - `ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc`
+
+%% Cell type:code id: tags:
+
+``` python
+import matplotlib.pyplot as plt
+import xarray as xr
+import xskillscore as xs
+import pandas as pd
+
+import climetlab_s2s_ai_challenge
+import climetlab as cml
+print(f'Climetlab version : {cml.__version__}')
+print(f'Climetlab-s2s-ai-challenge plugin version : {climetlab_s2s_ai_challenge.__version__}')
+
+xr.set_options(keep_attrs=True)
+xr.set_options(display_style='text')
+```
+
+%% Output
+
+    WARNING: ecmwflibs universal: cannot find a library called MagPlus
+    Magics library could not be found
+
+    Climetlab version : 0.8.6
+    Climetlab-s2s-ai-challenge plugin version : 0.8.0
+
+    <xarray.core.options.set_options at 0x2b51c148f590>
+
+%% Cell type:code id: tags:
+
+``` python
+# caching path for climetlab
+cache_path = "/work/mh0727/m300524/S2S_AI/cache4" # set your own path
+cml.settings.set("cache-directory", cache_path)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+cache_path = "../data"
+```
+
+%% Cell type:markdown id: tags:
+
+# Download and cache
+
+Download all files for the observations, forecast and hindcast.
+
+%% Cell type:code id: tags:
+
+``` python
+# shortcut
+from scripts import download
+#download()
+```
+
+%% Cell type:markdown id: tags:
+
+## hindcast and forecast `input`
+
+%% Cell type:code id: tags:
+
+``` python
+# starting dates forecast_time in 2020
+dates = xr.cftime_range(start='20200102',freq='7D', periods=53).strftime('%Y%m%d').to_list()
+
+forecast_dataset_labels = ['training-input','test-input'] # ML community
+# equiv to
+forecast_dataset_labels = ['hindcast-input','forecast-input'] # NWP community
+
+varlist_forecast = ['tp','t2m'] # can add more
+
+center_list = ['ecmwf'] # 'ncep', 'eccc'
+```
+
+%% Cell type:code id: tags:
+
+``` python
+%%time
+# takes ~ 10-30 min to download for one model one variable depending on number of model realizations
+# and download settings https://climetlab.readthedocs.io/en/latest/guide/settings.html
+for center in center_list:
+    for ds in forecast_dataset_labels:
+        cml.load_dataset(f"s2s-ai-challenge-{ds}", origin=center, parameter=varlist_forecast, format='netcdf').to_xarray()
+```
+
+%% Cell type:markdown id: tags:
+
+## observations `output-reference`
+
+%% Cell type:code id: tags:
+
+``` python
+obs_dataset_labels = ['training-output-reference','test-output-reference'] # ML community
+# equiv to
+obs_dataset_labels = ['hindcast-like-observations','forecast-like-observations'] # NWP community
+
+varlist_obs = ['tp', 't2m']
+```
+
+%% Cell type:code id: tags:
+
+``` python
+%%time
+# takes 10min to download
+for ds in obs_dataset_labels:
+    print(ds)
+    # only netcdf, no format choice
+    cml.load_dataset(f"s2s-ai-challenge-{ds}", date=dates, parameter=varlist_obs).to_xarray()
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# download obs_time for to create output-reference/observations for other models than ecmwf and eccc,
+# i.e. ncep or any S2S or Sub model
+obs_time = cml.load_dataset(f"s2s-ai-challenge-observations", parameter=['t2m', 'pr']).to_xarray()
+```
+
+%% Cell type:markdown id: tags:
+
+# create bi-weekly aggregates
+
+%% Cell type:code id: tags:
+
+``` python
+from scripts import aggregate_biweekly, ensure_attributes
+
+#aggregate_biweekly??
+```
+
+%% Cell type:code id: tags:
+
+``` python
+for c, center in enumerate(center_list):  # forecast centers (could also take models)
+    for dsl in obs_dataset_labels:# + forecast_dataset_labels:  # climetlab dataset labels
+        for p, parameter in enumerate(varlist_forecast):  # variables
+            if c != 0 and 'observation' in dsl:  # only do once for observations
+                continue
+            print(f"datasetlabel: {dsl}, center: {center}, parameter: {parameter}")
+            if 'input' in dsl:
+                ds = cml.load_dataset(f"s2s-ai-challenge-{dsl}", origin=center, parameter=parameter, format='netcdf').to_xarray()
+            elif 'observation' in dsl: # obs only netcdf, no choice
+                if parameter not in ['t2m', 'tp']:
+                    continue
+                ds = cml.load_dataset(f"s2s-ai-challenge-{dsl}", parameter=parameter, date=dates).to_xarray()
+
+            if p == 0:
+                ds_biweekly = ds.map(aggregate_biweekly)
+            else:
+                ds_biweekly[parameter] = ds.map(aggregate_biweekly)[parameter]
+
+            ds_biweekly = ds_biweekly.map(ensure_attributes, biweekly=True)
+
+            ds_biweekly = ds_biweekly.sortby('forecast_time')
+
+        if 'test' in dsl:
+            ds_biweekly = ds_biweekly.chunk('auto')
+        else:
+            ds_biweekly = ds_biweekly.chunk({'forecast_time':'auto','lead_time':-1,'longitude':-1,'latitude':-1})
+
+        if 'hindcast' in dsl:
+            time = f'{int(ds_biweekly.forecast_time.dt.year.min())}-{int(ds_biweekly.forecast_time.dt.year.max())}'
+            if 'input' in dsl:
+                name = f'{center}_{dsl}'
+            elif 'observations':
+                name = dsl
+
+        elif 'forecast' in dsl:
+            time = '2020'
+            if 'input' in dsl:
+                name = f'{center}_{dsl}'
+            elif 'observations':
+                name = dsl
+        else:
+            assert False
+
+        # pattern: {model_if_not_observations}{observations/forecast/hindcast}_{time}_biweekly_deterministic.zarr
+        zp = f'{cache_path}/{name}_{time}_biweekly_deterministic.zarr'
+        ds_biweekly.attrs.update({'postprocessed_by':'https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/blob/master/notebooks/renku_datasets_biweekly.ipynb'})
+        print(f'save to: {zp}')
+        ds_biweekly.astype('float32').to_zarr(zp, consolidated=True, mode='w')
+```
+
+%% Output
+
+    datasetlabel: hindcast-like-observations, center: ecmwf, parameter: tp
+    By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data.  This dataset has been dowloaded from IRIDL. By downloading this data you also agree to the terms and conditions defined at https://iridl.ldeo.columbia.edu.
+
+    WARNING: ecmwflibs universal: found eccodes at /work/mh0727/m300524/conda-envs/s2s-ai/lib/libeccodes.so
+
+    Warning: ecCodes 2.21.0 or higher is recommended. You are running version 2.18.0
+    By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data.
+
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/indexing.py:1379: PerformanceWarning: Slicing with an out-of-order index is generating 20 times more chunks
+      return self.array[key]
+
+    datasetlabel: hindcast-like-observations, center: ecmwf, parameter: t2m
+
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/indexing.py:1379: PerformanceWarning: Slicing is producing a large chunk. To accept the large
+    chunk and silence this warning, set the option
+        >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
+        ...     array[indexer]
+    
+    To avoid creating the large chunks, set the option
+        >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
+        ...     array[indexer]
+      return self.array[key]
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/indexing.py:1379: PerformanceWarning: Slicing is producing a large chunk. To accept the large
+    chunk and silence this warning, set the option
+        >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
+        ...     array[indexer]
+    
+    To avoid creating the large chunks, set the option
+        >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
+        ...     array[indexer]
+      return self.array[key]
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/indexing.py:1379: PerformanceWarning: Slicing with an out-of-order index is generating 20 times more chunks
+      return self.array[key]
+
+    save to: ../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
+
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
+
+    datasetlabel: forecast-like-observations, center: ecmwf, parameter: tp
+    By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data.  This dataset has been dowloaded from IRIDL. By downloading this data you also agree to the terms and conditions defined at https://iridl.ldeo.columbia.edu.
+    datasetlabel: forecast-like-observations, center: ecmwf, parameter: t2m
+
+    
+
+    save to: ../data/forecast-like-observations_2020_biweekly_deterministic.zarr
+
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
+
+%% Cell type:markdown id: tags:
+
+## add to `renku` dataset `s2s-ai-challenge`
+
+%% Cell type:code id: tags:
+
+``` python
+# observations as hindcast
+# run renku commands from projects root directory only
+# !renku dataset add s2s-ai-challenge data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# for further use retrieve from git lfs
+# !renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
+```
+
+%% Cell type:code id: tags:
+
+``` python
+obs_2000_2019 = xr.open_zarr(f"{cache_path}/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr", consolidated=True)
+print(obs_2000_2019.sizes,'\n',obs_2000_2019.coords,'\n', obs_2000_2019.nbytes/1e6,'MB')
+```
+
+%% Output
+
+    Frozen(SortedKeysDict({'forecast_time': 1060, 'latitude': 121, 'lead_time': 2, 'longitude': 240}))
+     Coordinates:
+      * forecast_time  (forecast_time) datetime64[ns] 2000-01-02 ... 2019-12-31
+      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+      * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
+      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
+        valid_time     (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 1060), meta=np.ndarray>
+     492.546744 MB
+
+%% Cell type:code id: tags:
+
+``` python
+# observations as forecast
+# run renku commands from projects root directory only
+# !renku dataset add s2s-ai-challenge data/forecast-like-observations_2020_biweekly_deterministic.zarr
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# for further use retrieve from git lfs
+# !renku storage pull ../data/forecast-like-observations_2020_biweekly_deterministic.zarr
+```
+
+%% Cell type:code id: tags:
+
+``` python
+obs_2020 = xr.open_zarr(f"{cache_path}/forecast-like-observations_2020_biweekly_deterministic.zarr", consolidated=True)
+print(obs_2020.sizes,'\n',obs_2020.coords,'\n', obs_2020.nbytes/1e6,'MB')
+```
+
+%% Output
+
+    Frozen(SortedKeysDict({'forecast_time': 53, 'latitude': 121, 'lead_time': 2, 'longitude': 240}))
+     Coordinates:
+      * forecast_time  (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31
+      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+      * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
+      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
+        valid_time     (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 53), meta=np.ndarray>
+     24.630096 MB
+
+%% Cell type:code id: tags:
+
+``` python
+# ecmwf hindcast-input
+# run renku commands from projects root directory only
+# !renku dataset add s2s-ai-challenge data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# for further use retrieve from git lfs
+# !renku storage pull ../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr
+```
+
+%% Cell type:code id: tags:
+
+``` python
+hind_2000_2019 = xr.open_zarr(f"{cache_path}/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr", consolidated=True)
+print(hind_2000_2019.sizes,'\n',hind_2000_2019.coords,'\n', hind_2000_2019.nbytes/1e6,'MB')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# ecmwf forecast-input
+# run renku commands from projects root directory only
+# !renku dataset add s2s-ai-challenge data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# for further use retrieve from git lfs
+# !renku storage pull ../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr
+```
+
+%% Cell type:code id: tags:
+
+``` python
+fct_2020 = xr.open_zarr(f"{cache_path}/ecmwf_forecast-input_2020_biweekly_deterministic.zarr", consolidated=True)
+print(fct_2020.sizes,'\n',fct_2020.coords,'\n', fct_2020.nbytes/1e6,'MB')
+```
+
+%% Cell type:markdown id: tags:
+
+# tercile edges
+
+Create 2 tercile edges at 1/3 and 2/3 quantiles of the 2000-2019 biweekly distrbution for each week of the year
+
+%% Cell type:code id: tags:
+
+``` python
+obs_2000_2019 = xr.open_zarr(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr', consolidated=True)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+from scripts import add_year_week_coords
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# add week for groupby, see https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge/-/issues/29
+obs_2000_2019 = add_year_week_coords(obs_2000_2019)
+```
+
+%% Output
+
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:383: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
+      FutureWarning,
+
+%% Cell type:code id: tags:
+
+``` python
+obs_2000_2019
+```
+
+%% Output
+
+    <xarray.Dataset>
+    Dimensions:        (forecast_time: 1060, latitude: 121, lead_time: 2, longitude: 240)
+    Coordinates:
+      * forecast_time  (forecast_time) datetime64[ns] 2000-01-02 ... 2019-12-31
+      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+      * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
+      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
+        valid_time     (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 1060), meta=np.ndarray>
+        week           (forecast_time) int64 1 2 3 4 5 6 7 ... 47 48 49 50 51 52 53
+        year           (forecast_time) int64 2000 2000 2000 2000 ... 2019 2019 2019
+    Data variables:
+        t2m            (lead_time, forecast_time, latitude, longitude) float32 dask.array<chunksize=(2, 530, 121, 240), meta=np.ndarray>
+        tp             (lead_time, forecast_time, latitude, longitude) float32 dask.array<chunksize=(2, 530, 121, 240), meta=np.ndarray>
+    Attributes:
+        created_by_script:    tools/observations/makefile
+        created_by_software:  climetlab-s2s-ai-challenge
+        function:             climetlab_s2s_ai_challenge.extra.forecast_like_obse...
+        postprocessed_by:     https://renkulab.io/gitlab/aaron.spring/s2s-ai-chal...
+        regrid_method:        conservative
+        source_dataset_name:  NOAA NCEP CPC UNIFIED_PRCP GAUGE_BASED GLOBAL v1p0 ...
+        source_hosting:       IRIDL
+        source_url:           http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/...
+
+%% Cell type:code id: tags:
+
+``` python
+tercile_file = f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc'
+```
+
+%% Cell type:code id: tags:
+
+``` python
+%%time
+obs_2000_2019.chunk({'forecast_time':-1,'longitude':'auto'}).groupby('week').quantile(q=[1./3.,2./3.], dim='forecast_time').rename({'quantile':'category_edge'}).astype('float32').to_netcdf(tercile_file)
+```
+
+%% Output
+
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/numpy/lib/nanfunctions.py:1390: RuntimeWarning: All-NaN slice encountered
+      overwrite_input, interpolation)
+
+    CPU times: user 19min 35s, sys: 8min 33s, total: 28min 9s
+    Wall time: 16min 44s
+
+%% Cell type:code id: tags:
+
+``` python
+tercile_edges = xr.open_dataset(tercile_file)
+
+tercile_edges
+```
+
+%% Output
+
+    <xarray.Dataset>
+    Dimensions:        (category_edge: 2, latitude: 121, lead_time: 2, longitude: 240, week: 53)
+    Coordinates:
+      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+      * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
+      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
+      * category_edge  (category_edge) float64 0.3333 0.6667
+      * week           (week) int64 1 2 3 4 5 6 7 8 9 ... 45 46 47 48 49 50 51 52 53
+    Data variables:
+        t2m            (week, category_edge, lead_time, latitude, longitude) float32 ...
+        tp             (week, category_edge, lead_time, latitude, longitude) float32 ...
+    Attributes:
+        created_by_script:    tools/observations/makefile
+        created_by_software:  climetlab-s2s-ai-challenge
+        function:             climetlab_s2s_ai_challenge.extra.forecast_like_obse...
+        postprocessed_by:     https://renkulab.io/gitlab/aaron.spring/s2s-ai-chal...
+        regrid_method:        conservative
+        source_dataset_name:  NOAA NCEP CPC UNIFIED_PRCP GAUGE_BASED GLOBAL v1p0 ...
+        source_hosting:       IRIDL
+        source_url:           http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/...
+
+%% Cell type:code id: tags:
+
+``` python
+tercile_edges.nbytes*1e-6,'MB'
+```
+
+%% Output
+
+    (49.255184, 'MB')
+
+%% Cell type:code id: tags:
+
+``` python
+# run renku commands from projects root directory only
+# tercile edges
+#!renku dataset add s2s-ai-challenge data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# to use retrieve from git lfs
+#!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc
+#xr.open_dataset("../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc")
+```
+
+%% Cell type:markdown id: tags:
+
+# observations in categories
+
+- counting how many deterministic forecasts realizations fall into each category, like counting rps
+- categorize forecast-like-observations 2020 into categories
+
+%% Cell type:code id: tags:
+
+``` python
+obs_2020 = xr.open_zarr(f'{cache_path}/forecast-like-observations_2020_biweekly_deterministic.zarr', consolidated=True)
+obs_2020.sizes
+```
+
+%% Output
+
+    Frozen(SortedKeysDict({'forecast_time': 53, 'latitude': 121, 'lead_time': 2, 'longitude': 240}))
+
+%% Cell type:code id: tags:
+
+``` python
+# create a mask for land grid
+mask = obs_2020.std(['lead_time','forecast_time']).notnull()
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# mask.to_array().plot(col='variable')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# total precipitation in arid regions are masked
+# Frederic Vitart suggested by email: "Based on your map we could mask all the areas where the lower tercile boundary is lower than 0.1 mm"
+# we are using a dry mask as in https://doi.org/10.1175/MWR-D-17-0092.1
+th = 0.01
+tp_arid_mask = tercile_edges.tp.isel(category_edge=0, lead_time=0, drop=True) > th
+#tp_arid_mask.where(mask.tp).plot(col='forecast_time', col_wrap=4)
+#plt.suptitle(f'dry mask: week 3-4 tp 1/3 category_edge > {th} kg m-2',y=1., x=.4)
+#plt.savefig('dry_mask.png')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# look into tercile edges
+
+# tercile_edges.isel(forecast_time=0)['tp'].plot(col='lead_time',row='category_edge', robust=True)
+# tercile_edges.isel(forecast_time=[0,20],category_edge=1)['tp'].plot(col='lead_time', row='forecast_time', robust=True)
+# tercile_edges.tp.mean(['forecast_time']).plot(col='lead_time',row='category_edge',vmax=.5)
+```
+
+%% Cell type:markdown id: tags:
+
+## categorize observations
+
+%% Cell type:markdown id: tags:
+
+### forecast 2020
+
+%% Cell type:code id: tags:
+
+``` python
+from scripts import make_probabilistic
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# tp_arid_mask.isel(week=[0,10,20,30,40]).plot(col='week')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+obs_2020_p = make_probabilistic(obs_2020, tercile_edges, mask=mask)
+```
+
+%% Output
+
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:383: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
+      FutureWarning,
+
+%% Cell type:code id: tags:
+
+``` python
+obs_2020_p.nbytes/1e6, 'MB'
+```
+
+%% Output
+
+    (147.75984, 'MB')
+
+%% Cell type:code id: tags:
+
+``` python
+obs_2020_p
+```
+
+%% Output
+
+    <xarray.Dataset>
+    Dimensions:        (category: 3, forecast_time: 53, latitude: 121, lead_time: 2, longitude: 240)
+    Coordinates:
+      * forecast_time  (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31
+      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+      * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
+      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
+        valid_time     (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 53), meta=np.ndarray>
+      * category       (category) <U12 'below normal' 'near normal' 'above normal'
+    Data variables:
+        t2m            (category, lead_time, forecast_time, latitude, longitude) float64 dask.array<chunksize=(1, 2, 53, 121, 240), meta=np.ndarray>
+        tp             (category, lead_time, forecast_time, latitude, longitude) float64 dask.array<chunksize=(1, 2, 53, 121, 240), meta=np.ndarray>
+
+%% Cell type:code id: tags:
+
+``` python
+obs_2020_p.astype('float32').to_netcdf(f'{cache_path}/forecast-like-observations_2020_biweekly_terciled.nc')
+```
+
+%% Output
+
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
+
+%% Cell type:code id: tags:
+
+``` python
+# forecast-like-observations terciled
+# run renku commands from projects root directory only
+# !renku dataset add s2s-ai-challenge data/forecast-like-observations_2020_biweekly_terciled.nc
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# to use retrieve from git lfs
+#!renku storage pull ../data/forecast-like-observations_2020_biweekly_terciled.nc
+xr.open_dataset("../data/forecast-like-observations_2020_biweekly_terciled.nc")
+```
+
+%% Output
+
+    <xarray.Dataset>
+    Dimensions:        (category: 3, forecast_time: 53, latitude: 121, lead_time: 2, longitude: 240)
+    Coordinates:
+      * forecast_time  (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31
+      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+      * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
+      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
+        valid_time     (lead_time, forecast_time) datetime64[ns] ...
+      * category       (category) object 'below normal' 'near normal' 'above normal'
+    Data variables:
+        t2m            (category, lead_time, forecast_time, latitude, longitude) float32 ...
+        tp             (category, lead_time, forecast_time, latitude, longitude) float32 ...
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+### hindcast 2000_2019
+
+%% Cell type:code id: tags:
+
+``` python
+obs_2000_2019 = xr.open_zarr(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr', consolidated=True)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+obs_2000_2019_p = make_probabilistic(obs_2000_2019, tercile_edges, mask=mask)
+```
+
+%% Output
+
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:383: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
+      FutureWarning,
+
+%% Cell type:code id: tags:
+
+``` python
+obs_2000_2019_p.nbytes/1e6, 'MB'
+```
+
+%% Output
+
+    (2955.138888, 'MB')
+
+%% Cell type:code id: tags:
+
+``` python
+obs_2000_2019_p.astype('float32').chunk('auto').to_zarr(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_terciled.zarr', consolidated=True, mode='w')
+```
+
+%% Output
+
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
+
+    <xarray.backends.zarr.ZarrStore at 0x2b51c1233360>
+
+%% Cell type:code id: tags:
+
+``` python
+# forecast-like-observations terciled
+# run renku commands from projects root directory only
+# !renku dataset add s2s-ai-challenge data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# to use retrieve from git lfs
+#!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr
+xr.open_zarr("../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr")
+```
+
+%% Output
+
+    <xarray.Dataset>
+    Dimensions:        (category: 3, forecast_time: 1060, latitude: 121, lead_time: 2, longitude: 240)
+    Coordinates:
+      * category       (category) <U12 'below normal' 'near normal' 'above normal'
+      * forecast_time  (forecast_time) datetime64[ns] 2000-01-02 ... 2019-12-31
+      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+      * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
+      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
+        valid_time     (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 1060), meta=np.ndarray>
+    Data variables:
+        t2m            (category, lead_time, forecast_time, latitude, longitude) float32 dask.array<chunksize=(1, 2, 530, 121, 240), meta=np.ndarray>
+        tp             (category, lead_time, forecast_time, latitude, longitude) float32 dask.array<chunksize=(1, 2, 530, 121, 240), meta=np.ndarray>
+
+%% Cell type:code id: tags:
+
+``` python
+# checking category frequencies
+# o = xr.open_zarr("../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr")
+# w=0
+# v='tp'
+# o.sel(forecast_time=o.forecast_time.dt.dayofyear==2+7*w).sum('forecast_time', skipna=False)[v].plot(row='lead_time',col='category', levels=[5.5,6.5,7.5])
+# o.sel(forecast_time=o.forecast_time.dt.dayofyear==2+7*w).sum('forecast_time', skipna=False).sum('category', skipna=False)[v].plot(row='lead_time', levels=[16.5,17.5,18.5,19.5,20.5])
+```
+
+%% Cell type:markdown id: tags:
+
+# Benchmark
+
+center: ECMWF
+
+The calibration has been performed by using the tercile boundaries from the model climatology rather than from observations. Script by Frederic Vitart.
+
+%% Cell type:code id: tags:
+
+``` python
+bench_p = cml.load_dataset("s2s-ai-challenge-test-output-benchmark", parameter=['tp','t2m']).to_xarray()
+```
+
+%% Output
+
+     50%|█████     | 1/2 [00:00<00:00,  6.11it/s]
+
+    By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data.
+
+    100%|██████████| 2/2 [00:00<00:00,  6.89it/s]
+    WARNING: ecmwflibs universal: found eccodes at /work/mh0727/m300524/conda-envs/s2s-ai/lib/libeccodes.so
+
+    Warning: ecCodes 2.21.0 or higher is recommended. You are running version 2.12.3
+
+%% Cell type:code id: tags:
+
+``` python
+bench_p['category'].attrs = {'long_name': 'tercile category probabilities', 'units': '1',
+                        'description': 'Probabilities for three tercile categories. All three tercile category probabilities must add up to 1.'}
+```
+
+%% Cell type:code id: tags:
+
+``` python
+bench_p['lead_time'] = [pd.Timedelta(f"{i} d") for i in [14, 28]] # take first day of biweekly average as new coordinate
+
+bench_p['lead_time'].attrs = {'long_name':'forecast_period', 'description': 'Forecast period is the time interval between the forecast reference time and the validity time.',
+                         'aggregate': 'The pd.Timedelta corresponds to the first day of a biweekly aggregate.',
+                         'week34_t2m': 'mean[day 14, 27]',
+                         'week56_t2m': 'mean[day 28, 41]',
+                         'week34_tp': 'day 28 minus day 14',
+                         'week56_tp': 'day 42 minus day 28'}
+```
+
+%% Cell type:code id: tags:
+
+``` python
+bench_p = bench_p / 100 # convert percent to [0-1] probability
+```
+
+%% Cell type:code id: tags:
+
+``` python
+bench_p = bench_p.map(ensure_attributes, biweekly=True)
+```
+
+%% Output
+
+      0%|          | 0/1 [00:00<?, ?it/s]
+
+    By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data.
+
+    100%|██████████| 1/1 [00:00<00:00,  4.34it/s]
+    100%|██████████| 1/1 [00:00<00:00,  4.22it/s]
+
+%% Cell type:code id: tags:
+
+``` python
+# bench_p.isel(forecast_time=2).t2m.plot(row='lead_time', col='category')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+bench_p
+```
+
+%% Output
+
+    <xarray.Dataset>
+    Dimensions:        (category: 3, forecast_time: 53, latitude: 121, lead_time: 2, longitude: 240)
+    Coordinates:
+      * category       (category) object 'below normal' 'near normal' 'above normal'
+      * forecast_time  (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31
+      * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
+      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
+        valid_time     (forecast_time, lead_time) datetime64[ns] 2020-01-16 ... 2...
+    Data variables:
+        tp             (category, forecast_time, lead_time, latitude, longitude) float32 ...
+        t2m            (category, forecast_time, lead_time, latitude, longitude) float32 ...
+
+%% Cell type:code id: tags:
+
+``` python
+bench_p.astype('float32').to_netcdf('../data/ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+#!renku dataset add s2s-ai-challenge data/ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc
+```
--- a/notebooks/scripts.py
+++ b/notebooks/scripts.py
+import xarray as xr
+import pandas as pd
+import numpy as np
+import climetlab_s2s_ai_challenge
+import climetlab as cml
+
+cache_path = '../data'
+
+
+def download(varlist_forecast=['tp','t2m'],
+             center_list=['ecwmf'],
+             forecast_dataset_labels=['hindcast-input','forecast-input'],
+             obs_dataset_labels=['hindcast-like-observations','forecast-like-observations'],
+             varlist_observations=['t2m','tp'],
+             benchmark=True,
+             format='netcdf'
+            ):
+    """Download files with climetlab to cache_path. Set cache_path:
+    cml.settings.set("cache-directory", cache_path)
+    """
+    if isinstance(center_list, str):
+        center_list = [center_list]
+    if isinstance(varlist_forecast, str):
+        varlist_forecast = [varlist_forecast]
+
+    dates = xr.cftime_range(start='20200102',freq='7D', periods=53).strftime('%Y%m%d').to_list()
+    
+    if forecast_dataset_labels:
+        print(f'Downloads variables {varlist_forecast} from datasets {forecast_dataset_labels} from center {center_list} in {format} format.')
+        for center in center_list:
+            for ds in forecast_dataset_labels:
+                for parameter in varlist_forecast: 
+                    try:
+                        cml.load_dataset(f"s2s-ai-challenge-{ds}", origin=center, parameter=varlist_forecast, format=format).to_xarray()
+                    except:
+                        pass
+    if obs_dataset_labels:
+        print(f'Downloads variables tp and t2m from datasets {obs_dataset_labels} netcdf format. Additionally downloads raw t2m and pr observations with a time dimension.')
+        try:
+            for ds in obs_dataset_labels:
+                for parameter in varlist_observations:
+                    cml.load_dataset(f"s2s-ai-challenge-{ds}", date=dates, parameter=parameter).to_xarray()
+        except:
+            pass
+        # raw
+        cml.load_dataset(f"s2s-ai-challenge-observations", parameter=varlist_observations).to_xarray()
+    if benchmark:
+        cml.load_dataset("s2s-ai-challenge-test-output-benchmark", parameter=['tp','t2m']).to_xarray()
+    print('finished')
+    return
+
+
+def add_valid_time_from_forecast_reference_time_and_lead_time(forecast, init_dim='forecast_time'):
+    """Creates valid_time(forecast_time, lead_time).
+    
+    lead_time: pd.Timedelta
+    forecast_time: datetime
+    """
+    times = xr.concat(
+        [
+            xr.DataArray(
+                forecast[init_dim] + lead,
+                dims=init_dim,
+                coords={init_dim: forecast[init_dim]},
+            )
+            for lead in forecast.lead_time
+        ],
+        dim="lead_time",
+        join="inner",
+        compat="broadcast_equals",
+    )
+    forecast = forecast.assign_coords(valid_time=times)
+    return forecast
+
+
+def aggregate_biweekly(da):
+    """
+    Aggregate initialized S2S forecasts biweekly for xr.DataArrays.
+    Use ds.map(aggregate_biweekly) for xr.Datasets.
+    
+    Applies to the ECMWF S2S data model: https://confluence.ecmwf.int/display/S2S/Parameters
+    """
+    # biweekly averaging
+    w34 = [pd.Timedelta(f'{i} d') for i in range(14,28)]
+    w34 = xr.DataArray(w34,dims='lead_time', coords={'lead_time':w34})
+    
+    w56 = [pd.Timedelta(f'{i} d') for i in range(28,42)]
+    w56 = xr.DataArray(w56,dims='lead_time', coords={'lead_time':w56})
+    
+    biweekly_lead = [pd.Timedelta(f"{i} d") for i in [14, 28]] # take first day of biweekly average as new coordinate
+
+    v = da.name
+    if climetlab_s2s_ai_challenge.CF_CELL_METHODS[v] == 'sum': # biweekly difference for sum variables: tp and ttr
+        d34 = da.sel(lead_time=pd.Timedelta("28 d")) - da.sel(lead_time=pd.Timedelta("14 d")) # tp from day 14 to day 27
+        d56 = da.sel(lead_time=pd.Timedelta("42 d")) - da.sel(lead_time=pd.Timedelta("28 d")) # tp from day 28 to day 42
+        da_biweekly = xr.concat([d34,d56],'lead_time').assign_coords(lead_time=biweekly_lead)
+    else: # t2m, see climetlab_s2s_ai_challenge.CF_CELL_METHODS # biweekly: mean [day 14, day 27]
+        d34 = da.sel(lead_time=w34).mean('lead_time')
+        d56 = da.sel(lead_time=w56).mean('lead_time')
+        da_biweekly = xr.concat([d34,d56],'lead_time').assign_coords(lead_time=biweekly_lead)
+    
+    da_biweekly = add_valid_time_from_forecast_reference_time_and_lead_time(da_biweekly)
+    da_biweekly['lead_time'].attrs = {'long_name':'forecast_period', 'description': 'Forecast period is the time interval between the forecast reference time and the validity time.',
+                         'aggregate': 'The pd.Timedelta corresponds to the first day of a biweekly aggregate.',
+                         'week34_t2m': 'mean[day 14, 27]',
+                         'week56_t2m': 'mean[day 28, 41]',
+                         'week34_tp': 'day 28 minus day 14',
+                         'week56_tp': 'day 42 minus day 28'}
+    return da_biweekly
+
+
+def ensure_attributes(da, biweekly=False):
+    """Ensure that coordinates and variables have proper attributes. Set biweekly==True to set special comments for the biweely aggregates."""
+    template = cml.load_dataset('s2s-ai-challenge-test-input',parameter='t2m', origin='ecmwf', format='netcdf', date='20200102').to_xarray()
+    for c in da.coords:
+        if c in template.coords:
+            da.coords[c].attrs.update(template.coords[c].attrs)
+    
+    if 'valid_time' in da.coords:
+        da['valid_time'].attrs.update({'long_name': 'validity time',
+                                     'standard_name': 'time',
+                                     'description': 'time for which the forecast is valid',
+                                     'calculate':'forecast_time + lead_time'})
+    if 'forecast_time' in da.coords:
+        da['forecast_time'].attrs.update({'long_name' : 'initial time of forecast', 'standard_name': 'forecast_reference_time',
+                                      'description':'The forecast reference time in NWP is the "data time", the time of the analysis from which the forecast was made. It is not the time for which the forecast is valid.'})
+    # fix tp
+    if da.name == 'tp':
+        da.attrs['units'] = 'kg m-2'
+    if biweekly:
+        da['lead_time'].attrs.update({'standard_name':'forecast_period', 'long_name': 'lead time',
+                                      'description': 'Forecast period is the time interval between the forecast reference time and the validity time.',
+                         'aggregate': 'The pd.Timedelta corresponds to the first day of a biweekly aggregate.',
+                         'week34_t2m': 'mean[14 days, 27 days]',
+                         'week56_t2m': 'mean[28 days, 41 days]',
+                         'week34_tp': '28 days minus 14 days',
+                         'week56_tp': '42 days minus 28 days'})
+        if da.name == 'tp':
+            da.attrs.update({'aggregate_week34': '28 days minus 14 days',
+                      'aggregate_week56': '42 days minus 28 days',
+                      'description': 'https://confluence.ecmwf.int/display/S2S/S2S+Total+Precipitation'})
+        if da.name == 't2m':
+            da.attrs.update({'aggregate_week34': 'mean[14 days, 27 days]',
+                      'aggregate_week56': 'mean[28 days, 41 days]',
+                      'variable_before_categorization': 'https://confluence.ecmwf.int/display/S2S/S2S+Surface+Air+Temperature'})
+    return da
+
+
+def add_year_week_coords(ds):
+    import numpy as np
+    if 'week' not in ds.coords and 'year' not in ds.coords:
+        year = ds.forecast_time.dt.year.to_index().unique()
+        week = (list(np.arange(1,54)))
+        weeks = week * len(year)
+        years = np.repeat(year,len(week))
+        ds.coords["week"] = ("forecast_time", weeks)
+        ds.coords['week'].attrs['description'] = "This week represents the number of forecast_time starting from 1 to 53. Note: This week is different from the ISO week from groupby('forecast_time.weekofyear'), see https://en.wikipedia.org/wiki/ISO_week_date and https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge/-/issues/29"
+        ds.coords["year"] = ("forecast_time", years)
+        ds.coords['year'].attrs['long_name'] = "calendar year"
+    return ds
+
+
+def make_probabilistic(ds, tercile_edges, member_dim='realization', mask=None, groupby_coord='week'):
+    """Compute probabilities from ds (observations or forecasts) based on tercile_edges."""
+    # broadcast
+    ds = add_year_week_coords(ds)
+    tercile_edges = tercile_edges.sel({groupby_coord: ds.coords[groupby_coord]})
+    bn = ds < tercile_edges.isel(category_edge=0, drop=True)  # below normal
+    n = (ds >= tercile_edges.isel(category_edge=0, drop=True)) & (ds < tercile_edges.isel(category_edge=1, drop=True))  # normal
+    an = ds >= tercile_edges.isel(category_edge=1, drop=True)  # above normal
+    if member_dim in ds.dims:
+        bn = bn.mean(member_dim)
+        an = an.mean(member_dim)
+        n = n.mean(member_dim)
+    ds_p = xr.concat([bn, n, an],'category').assign_coords(category=['below normal', 'near normal', 'above normal'])
+    if mask is not None:
+        ds_p = ds_p.where(mask)
+    if 'tp' in ds_p.data_vars:
+        # mask arid grid cells where category_edge are too close to 0
+        # we are using a dry mask as in https://doi.org/10.1175/MWR-D-17-0092.1
+        tp_arid_mask = tercile_edges.tp.isel(category_edge=0, lead_time=0, drop=True) > 0.01
+        ds_p['tp'] = ds_p['tp'].where(tp_arid_mask)
+    ds_p['category'].attrs = {'long_name': 'tercile category probabilities', 'units': '1',
+                        'description': 'Probabilities for three tercile categories. All three tercile category probabilities must add up to 1.'}
+    ds_p['tp'].attrs = {'long_name': 'Probability of total precipitation in tercile categories', 'units': '1',
+                      'comment': 'All three tercile category probabilities must add up to 1.',
+                      'variable_before_categorization': 'https://confluence.ecmwf.int/display/S2S/S2S+Total+Precipitation'
+                     }
+    ds_p['t2m'].attrs = {'long_name': 'Probability of 2m temperature in tercile categories', 'units': '1',
+                      'comment': 'All three tercile category probabilities must add up to 1.',
+                      'variable_before_categorization': 'https://confluence.ecmwf.int/display/S2S/S2S+Surface+Air+Temperature'
+                      }
+    if 'year' in ds_p.coords:
+        del ds_p.coords['year']
+    if groupby_coord in ds_p.coords:
+        ds_p = ds_p.drop(groupby_coord)
+    return ds_p
+
+
+def skill_by_year(preds, adapt=False):
+    """Returns pd.Dataframe of RPSS per year."""
+    # similar verification_RPSS.ipynb
+    # as scorer bot but returns a score for each year
+    import xarray as xr
+    import xskillscore as xs
+    import pandas as pd
+    import numpy as np
+    xr.set_options(keep_attrs=True)
+    
+    # from root
+    #renku storage pull data/forecast-like-observations_2020_biweekly_terciled.nc
+    #renku storage pull data/hindcast-like-observations_2000-2019_biweekly_terciled.nc
+    cache_path = '../data'
+    if 2020 in preds.forecast_time.dt.year:
+        obs_p = xr.open_dataset(f'{cache_path}/forecast-like-observations_2020_biweekly_terciled.nc').sel(forecast_time=preds.forecast_time)
+    else:
+        obs_p = xr.open_dataset(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_terciled.zarr', engine='zarr').sel(forecast_time=preds.forecast_time)
+    
+    # ML probabilities
+    fct_p = preds
+
+    
+    # climatology
+    clim_p = xr.DataArray([1/3, 1/3, 1/3], dims='category', coords={'category':['below normal', 'near normal', 'above normal']}).to_dataset(name='tp')
+    clim_p['t2m'] = clim_p['tp']
+    
+    if adapt:
+        # select only obs_p where fct_p forecasts provided
+        for c in ['longitude', 'latitude', 'forecast_time', 'lead_time']:
+            obs_p = obs_p.sel({c:fct_p[c]})
+        obs_p = obs_p[list(fct_p.data_vars)]
+        clim_p = clim_p[list(fct_p.data_vars)]
+    
+    else:
+        # check inputs
+        assert_predictions_2020(obs_p)
+        assert_predictions_2020(fct_p)
+        
+    # rps_ML
+    rps_ML = xs.rps(obs_p, fct_p, category_edges=None, dim=[], input_distributions='p').compute()
+    # rps_clim
+    rps_clim = xs.rps(obs_p, clim_p, category_edges=None, dim=[], input_distributions='p').compute()
+
+    ## RPSS
+    # penalize # https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/issues/7
+    expect = obs_p.sum('category')
+    expect = expect.where(expect > 0.98).where(expect < 1.02)  # should be True if not all NaN
+
+    # https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/issues/50
+    rps_ML = rps_ML.where(expect, other=2)  # assign RPS=2 where value was expected but NaN found
+
+    # following Weigel 2007: https://doi.org/10.1175/MWR3280.1
+    rpss = 1 - (rps_ML.groupby('forecast_time.year').mean() / rps_clim.groupby('forecast_time.year').mean())
+    # clip
+    rpss = rpss.clip(-10, 1)
+    
+    # weighted area mean
+    weights = np.cos(np.deg2rad(np.abs(rpss.latitude)))
+    # spatially weighted score averaged over lead_times and variables to one single value
+    scores = rpss.sel(latitude=slice(None, -60)).weighted(weights).mean('latitude').mean('longitude')
+    scores = scores.to_array().mean(['lead_time', 'variable'])
+    return scores.to_dataframe('RPSS')
+
+
+def assert_predictions_2020(preds_test, exclude='weekofyear'):
+    """Check the variables, coordinates and dimensions of 2020 predictions."""
+    from xarray.testing import assert_equal # doesnt care about attrs but checks coords
+    
+    # is dataset
+    assert isinstance(preds_test, xr.Dataset)
+
+    # has both vars: tp and t2m
+    if 'data_vars' in exclude:
+        assert 'tp' in preds_test.data_vars
+        assert 't2m' in preds_test.data_vars
+    
+    ## coords
+    # ignore weekofyear coord if not dim
+    if 'weekofyear' in exclude and 'weekofyear' in preds_test.coords and 'weekofyear' not in preds_test.dims:
+        preds_test = preds_test.drop('weekofyear')
+    
+    # forecast_time
+    if 'forecast_time' in exclude:
+        d = pd.date_range(start='2020-01-02', freq='7D', periods=53)
+        forecast_time = xr.DataArray(d, dims='forecast_time', coords={'forecast_time':d}, name='forecast_time')
+        assert_equal(forecast_time,  preds_test['forecast_time'])
+
+    # longitude
+    if 'longitude' in exclude:
+        lon = np.arange(0., 360., 1.5)
+        longitude = xr.DataArray(lon, dims='longitude', coords={'longitude': lon}, name='longitude')
+        assert_equal(longitude, preds_test['longitude'])
+
+    # latitude
+    if 'latitude' in exclude:
+        lat = np.arange(-90., 90.1, 1.5)[::-1]
+        latitude = xr.DataArray(lat, dims='latitude', coords={'latitude': lat}, name='latitude')
+        assert_equal(latitude, preds_test['latitude'])
+    
+    # lead_time
+    if 'lead_time' in exclude:
+        lead = [pd.Timedelta(f'{i} d') for i in [14, 28]]
+        lead_time = xr.DataArray(lead, dims='lead_time', coords={'lead_time': lead}, name='lead_time')
+        assert_equal(lead_time, preds_test['lead_time'])
+    
+    # category
+    if 'category' in exclude:
+        cat = np.array(['below normal', 'near normal', 'above normal'], dtype='<U12')
+        category = xr.DataArray(cat, dims='category', coords={'category': cat}, name='category')
+        assert_equal(category, preds_test['category'])
+    
+    # size
+    if 'size' in exclude:
+        from dask.utils import format_bytes
+        size_in_MB = float(format_bytes(preds_test.nbytes).split(' ')[0])
+        # todo: refine for dtypes
+        assert size_in_MB > 50
+        assert size_in_MB < 250
+    
+    # no other dims
+    if 'dims' in exclude:
+        assert set(preds_test.dims) - {'category', 'forecast_time', 'latitude', 'lead_time', 'longitude'} == set()
+    
--- a/submissions/.gitattributes
+++ b/submissions/.gitattributes
+“*.nc” filter=lfs diff=lfs merge=lfs -text
--- a/submissions/ML_prediction_2020.nc
+++ b/submissions/ML_prediction_2020.nc
--- a/update_renku_dataset.sh
+++ b/update_renku_dataset.sh
+
+echo remove previous zarrs from renku dataset
+
+renku dataset unlink s2s-ai-challenge --include "data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr/**/**" -y
+
+renku dataset unlink s2s-ai-challenge --include "data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/**/**" -y
+
+renku dataset unlink s2s-ai-challenge --include "data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr/**/**" -y
+
+renku dataset unlink s2s-ai-challenge --include "data/forecast-like-observations_2020_biweekly_deterministic.zarr/**/**" -y
+
+renku dataset unlink s2s-ai-challenge --include "data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr/**/**" -y
+
+renku dataset unlink s2s-ai-challenge --include "data/**/**" -y
+
+
+# observations and derived
+renku dataset add s2s-ai-challenge data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
+
+renku dataset add s2s-ai-challenge data/forecast-like-observations_2020_biweekly_deterministic.zarr
+
+renku dataset add s2s-ai-challenge -o data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc
+
+renku dataset add s2s-ai-challenge -o data/forecast-like-observations_2020_biweekly_terciled.nc
+
+renku dataset add s2s-ai-challenge data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr
+
+# benchmark
+renku dataset add s2s-ai-challenge -o data/ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc
+
+# forecast / hindcast
+for center in ecmwf; do
+    renku dataset add s2s-ai-challenge data/${center}_hindcast-input_2000-2019_biweekly_deterministic.zarr
+
+    renku dataset add s2s-ai-challenge data/${center}_forecast-input_2020_biweekly_deterministic.zarr
+
+done
+
+renku dataset ls-files s2s-ai-challenge
+
+renku dataset ls-tags s2s-ai-challenge
+
+echo consider new tag: renku dataset tag -d description s2s-ai-challenge 0.x
No results found