Compare revisions

9019caf1 · 9019caf1 · 9019caf1 · 9019caf1 · 9019caf1 · 9019caf1
--- a/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/tp/2.1.0.0.1
+++ b/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/tp/2.1.0.0.1
--- a/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/tp/2.1.1.0.0
+++ b/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/tp/2.1.1.0.0
--- a/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/tp/2.1.1.0.1
+++ b/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/tp/2.1.1.0.1
--- a/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/tp/2.1.2.0.0
+++ b/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/tp/2.1.2.0.0
--- a/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/tp/2.1.2.0.1
+++ b/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/tp/2.1.2.0.1
--- a/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/tp/2.1.3.0.0
+++ b/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/tp/2.1.3.0.0
--- a/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/tp/2.1.3.0.1
+++ b/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/tp/2.1.3.0.1
--- a/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/valid_time/.zattrs
+++ b/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/valid_time/.zattrs
--- a/environment.yml
+++ b/environment.yml
@@ -5,27 +5,28 @@ dependencies:
  - xarray
  # ML
  - tensorflow
-  #- pytorch
+  - pytorch
  # viz
  - matplotlib-base
  # - cartopy
  # scoring
-  - xskillscore  # includes sklearn
+  - xskillscore>=0.0.20  # includes sklearn
  # data access
-  #- intake
-  #- fsspec
+  - intake
+  - fsspec
  - zarr
  - s3fs
-  #- intake-xarray
+  - intake-xarray
  - cfgrib
-  #- pydap
-  #- h5netcdf
-  # - netcdf4#==1.5.1  # see https://github.com/pydata/xarray/issues/4925
+  - eccodes
+  - nc-time-axis
+  - pydap
+  - h5netcdf
+  - netcdf4
  - pip
  - pip:
-    - climetlab >= 0.7.0
-    - climetlab_s2s_ai_challenge >= 0.6.3
+    - climetlab >= 0.8.0
+    - climetlab_s2s_ai_challenge >= 0.7.1
    - configargparse # for weatherbench
-    - netcdf4 # ==1.5.1  # see https://github.com/pydata/xarray/issues/4925
-    - git+https://github.com/phausamann/sklearn-xarray.git@develop
+    - netcdf4==1.5.4
 prefix: "/opt/conda"
--- a/notebooks/ML_forecast_template.ipynb
+++ b/notebooks/ML_forecast_template.ipynb
 %% Cell type:markdown id: tags:

 # Train ML model for predictions of week 3-4 & 5-6

 This notebook create a Machine Learning `ML_model` to predict weeks 3-4 & 5-6 based on `S2S` weeks 3-4 & 5-6 forecasts and is compared to `CPC` observations for the [`s2s-ai-challenge`](https://s2s-ai-challenge.github.io/).

 %% Cell type:markdown id: tags:

 # Synopsis

 %% Cell type:markdown id: tags:

 ## Method: `name`

 - decription
 - a few details

 %% Cell type:markdown id: tags:

 ## Data used

 Training-input for Machine Learning model:
 - renku datasets, climetlab, IRIDL

 Forecast-input for Machine Learning model:
 - renku datasets, climetlab, IRIDL

 Compare Machine Learning model forecast against ground truth:
 - renku datasets, climetlab, IRIDL

 %% Cell type:markdown id: tags:

 ## Resources used
 for training, details in reproducibility

 - platform: renku
 - memory: 8 GB
 - processors: 2 CPU
 - storage required: 10 GB

 %% Cell type:markdown id: tags:

 ## Safeguards

 All points have to be [x] checked. If not, your submission is invalid.

 Changes to the code after submissions are not possible, as the `commit` before the `tag` will be reviewed.
 (Only in exceptions and if previous effort in reproducibility can be found, it may be allowed to improve readability and reproducibility after November 1st 2021.)

 %% Cell type:markdown id: tags:

 ### Safeguards to prevent [overfitting](https://en.wikipedia.org/wiki/Overfitting?wprov=sfti1)

 If the organizers suspect overfitting, your contribution can be disqualified.

-  - [ ] We didnt use 2020 observations in training (explicit overfitting and cheating)
-  - [ ] We didnt repeatedly verify my model on 2020 observations and incrementally improved my RPSS (implicit overfitting)
+  - [ ] We did not use 2020 observations in training (explicit overfitting and cheating)
+  - [ ] We did not repeatedly verify my model on 2020 observations and incrementally improved my RPSS (implicit overfitting)
  - [ ] We provide RPSS scores for the training period with script `skill_by_year`, see in section 6.3 `predict`.
  - [ ] We tried our best to prevent [data leakage](https://en.wikipedia.org/wiki/Leakage_(machine_learning)?wprov=sfti1).
  - [ ] We honor the `train-validate-test` [split principle](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets). This means that the hindcast data is split into `train` and `validate`, whereas `test` is withheld.
-  - [ ] We did use `test` explicitly in training or implicitly in incrementally adjusting parameters.
+  - [ ] We did not use `test` explicitly in training or implicitly in incrementally adjusting parameters.
  - [ ] We considered [cross-validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)).

 %% Cell type:markdown id: tags:

 ### Safeguards for Reproducibility
 Notebook/code must be independently reproducible from scratch by the organizers (after the competition), if not possible: no prize
  - [ ] All training data is publicly available (no pre-trained private neural networks, as they are not reproducible for us)
  - [ ] Code is well documented, readable and reproducible.
-  - [ ] Code to reproduce training and predictions should run within a day on the described architecture. If the training takes longer than a day, please justify why this is needed. Please do not submit training piplelines, which take weeks to train.
+  - [ ] Code to reproduce training and predictions is preferred to run within a day on the described architecture. If the training takes longer than a day, please justify why this is needed. Please do not submit training piplelines, which take weeks to train.

 %% Cell type:markdown id: tags:

 # Todos to improve template

 This is just a demo.

 - [ ] for both variables
 - [ ] for both `lead_time`s
 - [ ] ensure probabilistic prediction outcome with `category` dim

 %% Cell type:markdown id: tags:

 # Imports

 %% Cell type:code id: tags:

 ``` python
 from tensorflow.keras.layers import Input, Dense, Flatten
 from tensorflow.keras.models import Sequential

 import matplotlib.pyplot as plt

 import xarray as xr
 xr.set_options(display_style='text')

 from dask.utils import format_bytes
 import xskillscore as xs
 ```

 %% Cell type:markdown id: tags:

 # Get training data

 preprocessing of input data may be done in separate notebook/script

 %% Cell type:markdown id: tags:

 ## Hindcast

 get weekly initialized hindcasts

 %% Cell type:code id: tags:

 ``` python
 # consider renku datasets
 #! renku storage pull path
 ```

 %% Cell type:code id: tags:

 ``` python
 ```

 %% Cell type:markdown id: tags:

 ## Observations
 corresponding to hindcasts

 %% Cell type:code id: tags:

 ``` python
 # consider renku datasets
 #! renku storage pull path
 ```

 %% Cell type:code id: tags:

 ``` python
 ```

 %% Cell type:markdown id: tags:

 # ML model

 %% Cell type:code id: tags:

 ``` python
 bs=32

 import numpy as np
 class DataGenerator(keras.utils.Sequence):
    def __init__(self):
        """
        Data generator

        Template from https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly

        Args:

        """

        self.on_epoch_end()

        # For some weird reason calling .load() earlier messes up the mean and std computations
        if load: print('Loading data into RAM'); self.data.load()

    def __len__(self):
        'Denotes the number of batches per epoch'
        return int(np.ceil(self.n_samples / self.batch_size))

    def __getitem__(self, i):
        'Generate one batch of data'
        idxs = self.idxs[i * self.batch_size:(i + 1) * self.batch_size]
        # got all nan if nans not masked
        X = self.data.isel(time=idxs).fillna(0.).values
        y = self.verif_data.isel(time=idxs).fillna(0.).values
        return X, y

    def on_epoch_end(self):
        'Updates indexes after each epoch'
        self.idxs = np.arange(self.n_samples)
        if self.shuffle == True:
            np.random.shuffle(self.idxs)
 ```

 %% Cell type:markdown id: tags:

 ## data prep: train, valid, test

 %% Cell type:code id: tags:

 ``` python
 # time is the forecast_reference_time
 time_train_start,time_train_end='2000','2017'
 time_valid_start,time_valid_end='2018','2019'
 time_test = '2020'
 ```

 %% Cell type:code id: tags:

 ``` python
 dg_train = DataGenerator()
 ```

 %% Cell type:code id: tags:

 ``` python
 dg_valid = DataGenerator()
 ```

 %% Cell type:code id: tags:

 ``` python
 dg_test = DataGenerator()
 ```

 %% Cell type:markdown id: tags:

 ## `fit`

 %% Cell type:code id: tags:

 ``` python
 cnn = keras.models.Sequential([])
 ```

 %% Cell type:code id: tags:

 ``` python
 cnn.summary()
 ```

 %% Cell type:code id: tags:

 ``` python
 cnn.compile(keras.optimizers.Adam(1e-4), 'mse')
 ```

 %% Cell type:code id: tags:

 ``` python
 import warnings
 warnings.simplefilter("ignore")
 ```

 %% Cell type:code id: tags:

 ``` python
 cnn.fit(dg_train, epochs=1, validation_data=dg_valid)
 ```

 %% Cell type:markdown id: tags:

 ## `predict`

-Create predictions and print `mean(variable, lead_time, longitude, weighted latitude)` RPSS for all years as calculated by `skill_by_year`. For now RPS, todo: change to RPSS.
+Create predictions and print `mean(variable, lead_time, longitude, weighted latitude)` RPSS for all years as calculated by `skill_by_year`.

 %% Cell type:code id: tags:

 ``` python
 from scripts import skill_by_year
 ```

 %% Cell type:code id: tags:

 ``` python
 def create_predictions(model, dg):
    """Create non-iterative predictions"""
    preds = model.predict(dg).squeeze()
    # transform

    return preds
 ```

 %% Cell type:markdown id: tags:

 ### `predict` training period in-sample

 %% Cell type:code id: tags:

 ``` python
 preds_is = create_predictions(cnn, dg_train)
 ```

 %% Cell type:code id: tags:

 ``` python
 skill_by_year(preds_is)
 ```

 %% Cell type:markdown id: tags:

 ### `predict` valid out-of-sample

 %% Cell type:code id: tags:

 ``` python
 preds_os = create_predictions(cnn, dg_valid)
 ```

 %% Cell type:code id: tags:

 ``` python
 skill_by_year(preds_os)
 ```

 %% Cell type:markdown id: tags:

 ### `predict` test

 %% Cell type:code id: tags:

 ``` python
 preds_test = create_predictions(cnn, dg_test)
 ```

 %% Cell type:code id: tags:

 ``` python
 skill_by_year(preds_test)
 ```

 %% Cell type:markdown id: tags:

 # Submission

 %% Cell type:code id: tags:

 ``` python
 preds_test.sizes # expect: category(3), longitude, latitude, lead_time(2), forecast_time (53)
 ```

 %% Cell type:code id: tags:

 ``` python
 from scripts import assert_predictions_2020
 assert_predictions_2020(preds_test)
 ```

 %% Cell type:code id: tags:

 ``` python
 preds_test.to_netcdf('../submissions/ML_prediction_2020.nc')
 ```

 %% Cell type:code id: tags:

 ``` python
 #!git add ../submissions/ML_prediction_2020.nc
+#!git add ML_forecast_template.ipynb
 ```

 %% Cell type:code id: tags:

 ``` python
 #!git commit -m "commit submission for my_method_name" # whatever message you want
 ```

 %% Cell type:code id: tags:

 ``` python
 #!git tag "submission-my_method_name-0.0.1" # if this is to be checked by scorer, only the last submitted==tagged version will be considered
 ```

 %% Cell type:code id: tags:

 ``` python
 #!git push --tags
 ```

 %% Cell type:code id: tags:

 ``` python
 ```

 %% Cell type:markdown id: tags:

 # Reproducibility

 %% Cell type:markdown id: tags:

 ## memory

 %% Cell type:code id: tags:

 ``` python
 # https://phoenixnap.com/kb/linux-commands-check-memory-usage
 !free -g
 ```

 %% Cell type:markdown id: tags:

 ## CPU

 %% Cell type:code id: tags:

 ``` python
 !lscpu
 ```

 %% Cell type:markdown id: tags:

 ## software

 %% Cell type:code id: tags:

 ``` python
 !conda list
 ```

 %% Cell type:code id: tags:

 ``` python
 ```

 %% Cell type:markdown id: tags:

 # Train ML model for predictions of week 3-4 & 5-6

 This notebook create a Machine Learning `ML_model` to predict weeks 3-4 & 5-6 based on `S2S` weeks 3-4 & 5-6 forecasts and is compared to `CPC` observations for the [`s2s-ai-challenge`](https://s2s-ai-challenge.github.io/).

 %% Cell type:markdown id: tags:

 # Synopsis

 %% Cell type:markdown id: tags:

 ## Method: `name`

 - decription
 - a few details

 %% Cell type:markdown id: tags:

 ## Data used

 Training-input for Machine Learning model:
 - renku datasets, climetlab, IRIDL

 Forecast-input for Machine Learning model:
 - renku datasets, climetlab, IRIDL

 Compare Machine Learning model forecast against ground truth:
 - renku datasets, climetlab, IRIDL

 %% Cell type:markdown id: tags:

 ## Resources used
 for training, details in reproducibility

 - platform: renku
 - memory: 8 GB
 - processors: 2 CPU
 - storage required: 10 GB

 %% Cell type:markdown id: tags:

 ## Safeguards

 All points have to be [x] checked. If not, your submission is invalid.

 Changes to the code after submissions are not possible, as the `commit` before the `tag` will be reviewed.
 (Only in exceptions and if previous effort in reproducibility can be found, it may be allowed to improve readability and reproducibility after November 1st 2021.)

 %% Cell type:markdown id: tags:

 ### Safeguards to prevent [overfitting](https://en.wikipedia.org/wiki/Overfitting?wprov=sfti1)

 If the organizers suspect overfitting, your contribution can be disqualified.

-  - [ ] We didnt use 2020 observations in training (explicit overfitting and cheating)
-  - [ ] We didnt repeatedly verify my model on 2020 observations and incrementally improved my RPSS (implicit overfitting)
+  - [ ] We did not use 2020 observations in training (explicit overfitting and cheating)
+  - [ ] We did not repeatedly verify my model on 2020 observations and incrementally improved my RPSS (implicit overfitting)
  - [ ] We provide RPSS scores for the training period with script `skill_by_year`, see in section 6.3 `predict`.
  - [ ] We tried our best to prevent [data leakage](https://en.wikipedia.org/wiki/Leakage_(machine_learning)?wprov=sfti1).
  - [ ] We honor the `train-validate-test` [split principle](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets). This means that the hindcast data is split into `train` and `validate`, whereas `test` is withheld.
-  - [ ] We did use `test` explicitly in training or implicitly in incrementally adjusting parameters.
+  - [ ] We did not use `test` explicitly in training or implicitly in incrementally adjusting parameters.
  - [ ] We considered [cross-validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)).

 %% Cell type:markdown id: tags:

 ### Safeguards for Reproducibility
 Notebook/code must be independently reproducible from scratch by the organizers (after the competition), if not possible: no prize
  - [ ] All training data is publicly available (no pre-trained private neural networks, as they are not reproducible for us)
  - [ ] Code is well documented, readable and reproducible.
-  - [ ] Code to reproduce training and predictions should run within a day on the described architecture. If the training takes longer than a day, please justify why this is needed. Please do not submit training piplelines, which take weeks to train.
+  - [ ] Code to reproduce training and predictions is preferred to run within a day on the described architecture. If the training takes longer than a day, please justify why this is needed. Please do not submit training piplelines, which take weeks to train.

 %% Cell type:markdown id: tags:

 # Todos to improve template

 This is just a demo.

 - [ ] for both variables
 - [ ] for both `lead_time`s
 - [ ] ensure probabilistic prediction outcome with `category` dim

 %% Cell type:markdown id: tags:

 # Imports

 %% Cell type:code id: tags:

 ``` python
 from tensorflow.keras.layers import Input, Dense, Flatten
 from tensorflow.keras.models import Sequential

 import matplotlib.pyplot as plt

 import xarray as xr
 xr.set_options(display_style='text')

 from dask.utils import format_bytes
 import xskillscore as xs
 ```

 %% Cell type:markdown id: tags:

 # Get training data

 preprocessing of input data may be done in separate notebook/script

 %% Cell type:markdown id: tags:

 ## Hindcast

 get weekly initialized hindcasts

 %% Cell type:code id: tags:

 ``` python
 # consider renku datasets
 #! renku storage pull path
 ```

 %% Cell type:code id: tags:

 ``` python
 ```

 %% Cell type:markdown id: tags:

 ## Observations
 corresponding to hindcasts

 %% Cell type:code id: tags:

 ``` python
 # consider renku datasets
 #! renku storage pull path
 ```

 %% Cell type:code id: tags:

 ``` python
 ```

 %% Cell type:markdown id: tags:

 # ML model

 %% Cell type:code id: tags:

 ``` python
 bs=32

 import numpy as np
 class DataGenerator(keras.utils.Sequence):
    def __init__(self):
        """
        Data generator

        Template from https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly

        Args:

        """

        self.on_epoch_end()

        # For some weird reason calling .load() earlier messes up the mean and std computations
        if load: print('Loading data into RAM'); self.data.load()

    def __len__(self):
        'Denotes the number of batches per epoch'
        return int(np.ceil(self.n_samples / self.batch_size))

    def __getitem__(self, i):
        'Generate one batch of data'
        idxs = self.idxs[i * self.batch_size:(i + 1) * self.batch_size]
        # got all nan if nans not masked
        X = self.data.isel(time=idxs).fillna(0.).values
        y = self.verif_data.isel(time=idxs).fillna(0.).values
        return X, y

    def on_epoch_end(self):
        'Updates indexes after each epoch'
        self.idxs = np.arange(self.n_samples)
        if self.shuffle == True:
            np.random.shuffle(self.idxs)
 ```

 %% Cell type:markdown id: tags:

 ## data prep: train, valid, test

 %% Cell type:code id: tags:

 ``` python
 # time is the forecast_reference_time
 time_train_start,time_train_end='2000','2017'
 time_valid_start,time_valid_end='2018','2019'
 time_test = '2020'
 ```

 %% Cell type:code id: tags:

 ``` python
 dg_train = DataGenerator()
 ```

 %% Cell type:code id: tags:

 ``` python
 dg_valid = DataGenerator()
 ```

 %% Cell type:code id: tags:

 ``` python
 dg_test = DataGenerator()
 ```

 %% Cell type:markdown id: tags:

 ## `fit`

 %% Cell type:code id: tags:

 ``` python
 cnn = keras.models.Sequential([])
 ```

 %% Cell type:code id: tags:

 ``` python
 cnn.summary()
 ```

 %% Cell type:code id: tags:

 ``` python
 cnn.compile(keras.optimizers.Adam(1e-4), 'mse')
 ```

 %% Cell type:code id: tags:

 ``` python
 import warnings
 warnings.simplefilter("ignore")
 ```

 %% Cell type:code id: tags:

 ``` python
 cnn.fit(dg_train, epochs=1, validation_data=dg_valid)
 ```

 %% Cell type:markdown id: tags:

 ## `predict`

-Create predictions and print `mean(variable, lead_time, longitude, weighted latitude)` RPSS for all years as calculated by `skill_by_year`. For now RPS, todo: change to RPSS.
+Create predictions and print `mean(variable, lead_time, longitude, weighted latitude)` RPSS for all years as calculated by `skill_by_year`.

 %% Cell type:code id: tags:

 ``` python
 from scripts import skill_by_year
 ```

 %% Cell type:code id: tags:

 ``` python
 def create_predictions(model, dg):
    """Create non-iterative predictions"""
    preds = model.predict(dg).squeeze()
    # transform

    return preds
 ```

 %% Cell type:markdown id: tags:

 ### `predict` training period in-sample

 %% Cell type:code id: tags:

 ``` python
 preds_is = create_predictions(cnn, dg_train)
 ```

 %% Cell type:code id: tags:

 ``` python
 skill_by_year(preds_is)
 ```

 %% Cell type:markdown id: tags:

 ### `predict` valid out-of-sample

 %% Cell type:code id: tags:

 ``` python
 preds_os = create_predictions(cnn, dg_valid)
 ```

 %% Cell type:code id: tags:

 ``` python
 skill_by_year(preds_os)
 ```

 %% Cell type:markdown id: tags:

 ### `predict` test

 %% Cell type:code id: tags:

 ``` python
 preds_test = create_predictions(cnn, dg_test)
 ```

 %% Cell type:code id: tags:

 ``` python
 skill_by_year(preds_test)
 ```

 %% Cell type:markdown id: tags:

 # Submission

 %% Cell type:code id: tags:

 ``` python
 preds_test.sizes # expect: category(3), longitude, latitude, lead_time(2), forecast_time (53)
 ```

 %% Cell type:code id: tags:

 ``` python
 from scripts import assert_predictions_2020
 assert_predictions_2020(preds_test)
 ```

 %% Cell type:code id: tags:

 ``` python
 preds_test.to_netcdf('../submissions/ML_prediction_2020.nc')
 ```

 %% Cell type:code id: tags:

 ``` python
 #!git add ../submissions/ML_prediction_2020.nc
+#!git add ML_forecast_template.ipynb
 ```

 %% Cell type:code id: tags:

 ``` python
 #!git commit -m "commit submission for my_method_name" # whatever message you want
 ```

 %% Cell type:code id: tags:

 ``` python
 #!git tag "submission-my_method_name-0.0.1" # if this is to be checked by scorer, only the last submitted==tagged version will be considered
 ```

 %% Cell type:code id: tags:

 ``` python
 #!git push --tags
 ```

 %% Cell type:code id: tags:

 ``` python
 ```

 %% Cell type:markdown id: tags:

 # Reproducibility

 %% Cell type:markdown id: tags:

 ## memory

 %% Cell type:code id: tags:

 ``` python
 # https://phoenixnap.com/kb/linux-commands-check-memory-usage
 !free -g
 ```

 %% Cell type:markdown id: tags:

 ## CPU

 %% Cell type:code id: tags:

 ``` python
 !lscpu
 ```

 %% Cell type:markdown id: tags:

 ## software

 %% Cell type:code id: tags:

 ``` python
 !conda list
 ```

 %% Cell type:code id: tags:

 ``` python
 ```

--- a/notebooks/ML_train_and_predict.ipynb
+++ b/notebooks/ML_train_and_predict.ipynb
 %% Cell type:markdown id: tags:

 # Train ML model to correct predictions of week 3-4 & 5-6

 This notebook create a Machine Learning `ML_model` to predict weeks 3-4 & 5-6 based on `S2S` weeks 3-4 & 5-6 forecasts and is compared to `CPC` observations for the [`s2s-ai-challenge`](https://s2s-ai-challenge.github.io/).

 %% Cell type:markdown id: tags:

 # Synopsis

 %% Cell type:markdown id: tags:

 ## Method: `ML-based mean bias reduction`

 - calculate the ML-based bias from 2000-2019 deterministic ensemble mean forecast
 - remove that the ML-based bias from 2020 forecast deterministic ensemble mean forecast

 %% Cell type:markdown id: tags:

 ## Data used

 type: renku datasets

 Training-input for Machine Learning model:
 - hindcasts of models:
    - ECMWF: `ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr`

 Forecast-input for Machine Learning model:
 - real-time 2020 forecasts of models:
    - ECMWF: `ecmwf_forecast-input_2020_biweekly_deterministic.zarr`

 Compare Machine Learning model forecast against against ground truth:
 - `CPC` observations:
    - `hindcast-like-observations_biweekly_deterministic.zarr`
    - `forecast-like-observations_2020_biweekly_deterministic.zarr`

 %% Cell type:markdown id: tags:

 ## Resources used
 for training, details in reproducibility

- platform: MPI-M supercompute 1 Node
- memory: 64 GB
- processors: 36 CPU
+- platform: renku
+- memory: 8 GB
+- processors: 2 CPU
 - storage required: 10 GB

 %% Cell type:markdown id: tags:

 ## Safeguards

 All points have to be [x] checked. If not, your submission is invalid.

 Changes to the code after submissions are not possible, as the `commit` before the `tag` will be reviewed.
 (Only in exceptions and if previous effort in reproducibility can be found, it may be allowed to improve readability and reproducibility after November 1st 2021.)

 %% Cell type:markdown id: tags:

 ### Safeguards to prevent [overfitting](https://en.wikipedia.org/wiki/Overfitting?wprov=sfti1)

 If the organizers suspect overfitting, your contribution can be disqualified.

-  - [x] We didnt use 2020 observations in training (explicit overfitting and cheating)
-  - [x] We didnt repeatedly verify my model on 2020 observations and incrementally improved my RPSS (implicit overfitting)
+  - [x] We did not use 2020 observations in training (explicit overfitting and cheating)
+  - [x] We did not repeatedly verify my model on 2020 observations and incrementally improved my RPSS (implicit overfitting)
  - [x] We provide RPSS scores for the training period with script `print_RPS_per_year`, see in section 6.3 `predict`.
  - [x] We tried our best to prevent [data leakage](https://en.wikipedia.org/wiki/Leakage_(machine_learning)?wprov=sfti1).
  - [x] We honor the `train-validate-test` [split principle](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets). This means that the hindcast data is split into `train` and `validate`, whereas `test` is withheld.
-  - [x] We did use `test` explicitly in training or implicitly in incrementally adjusting parameters.
+  - [x] We did not use `test` explicitly in training or implicitly in incrementally adjusting parameters.
  - [x] We considered [cross-validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)).

 %% Cell type:markdown id: tags:

 ### Safeguards for Reproducibility
 Notebook/code must be independently reproducible from scratch by the organizers (after the competition), if not possible: no prize
  - [x] All training data is publicly available (no pre-trained private neural networks, as they are not reproducible for us)
  - [x] Code is well documented, readable and reproducible.
-  - [x] Code to reproduce training and predictions should run within a day on the described architecture. If the training takes longer than a day, please justify why this is needed. Please do not submit training piplelines, which take weeks to train.
+  - [x] Code to reproduce training and predictions is preferred to run within a day on the described architecture. If the training takes longer than a day, please justify why this is needed. Please do not submit training piplelines, which take weeks to train.

 %% Cell type:markdown id: tags:

 # Todos to improve template

 This is just a demo.

 - [ ] use multiple predictor variables and two predicted variables
 - [ ] for both `lead_time`s in one go
 - [ ] consider seasonality, for now all `forecast_time` months are mixed
 - [ ] make probabilistic predictions with `category` dim, for now works deterministic

 %% Cell type:markdown id: tags:

 # Imports

 %% Cell type:code id: tags:

 ``` python
 from tensorflow.keras.layers import Input, Dense, Flatten
 from tensorflow.keras.models import Sequential

 import matplotlib.pyplot as plt

 import xarray as xr
 xr.set_options(display_style='text')
 import numpy as np

 from dask.utils import format_bytes
 import xskillscore as xs
 ```

-%% Output
-
-    /opt/conda/lib/python3.8/site-packages/xarray/backends/cfgrib_.py:27: UserWarning: Failed to load cfgrib - most likely there is a problem accessing the ecCodes library. Try `import cfgrib` to get the full error message
-      warnings.warn(
-
 %% Cell type:markdown id: tags:

 # Get training data

 preprocessing of input data may be done in separate notebook/script

 %% Cell type:markdown id: tags:

 ## Hindcast

 get weekly initialized hindcasts

 %% Cell type:code id: tags:

 ``` python
 v='t2m'
 ```

 %% Cell type:code id: tags:

 ``` python
 # preprocessed as renku dataset
 !renku storage pull ../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr
 ```

 %% Output

    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m

 %% Cell type:code id: tags:

 ``` python
-hind_2000_2019 = xr.open_zarr("../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr", consolidated=True)#[v]
+hind_2000_2019 = xr.open_zarr("../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr", consolidated=True)
 ```

-%% Output
-
-    /opt/conda/lib/python3.8/site-packages/xarray/backends/plugins.py:61: RuntimeWarning: Engine 'cfgrib' loading failed:
-    /opt/conda/lib/python3.8/site-packages/gribapi/_bindings.cpython-38-x86_64-linux-gnu.so: undefined symbol: codes_bufr_key_is_header
-      warnings.warn(f"Engine {name!r} loading failed:\n{ex}", RuntimeWarning)
-
 %% Cell type:code id: tags:

 ``` python
 # preprocessed as renku dataset
 !renku storage pull ../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr
 ```

 %% Output

    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m

 %% Cell type:code id: tags:

 ``` python
-fct_2020 = xr.open_zarr("../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr", consolidated=True)#[v]
+fct_2020 = xr.open_zarr("../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr", consolidated=True)
 ```

 %% Cell type:markdown id: tags:

 ## Observations
 corresponding to hindcasts

 %% Cell type:code id: tags:

 ``` python
 # preprocessed as renku dataset
 !renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
 ```

 %% Output

    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m

 %% Cell type:code id: tags:

 ``` python
 obs_2000_2019 = xr.open_zarr("../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr", consolidated=True)#[v]
 ```

 %% Cell type:code id: tags:

 ``` python
 # preprocessed as renku dataset
 !renku storage pull ../data/forecast-like-observations_2020_biweekly_deterministic.zarr
 ```

 %% Output

    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m

 %% Cell type:code id: tags:

 ``` python
 obs_2020 = xr.open_zarr("../data/forecast-like-observations_2020_biweekly_deterministic.zarr", consolidated=True)#[v]
 ```

 %% Cell type:markdown id: tags:

 # ML model

 %% Cell type:markdown id: tags:

 based on [Weatherbench](https://github.com/pangeo-data/WeatherBench/blob/master/quickstart.ipynb)

 %% Cell type:code id: tags:

 ``` python
 # run once only and dont commit
 !git clone https://github.com/pangeo-data/WeatherBench/
 ```

 %% Output

    fatal: destination path 'WeatherBench' already exists and is not an empty directory.

 %% Cell type:code id: tags:

 ``` python
 import sys
 sys.path.insert(1, 'WeatherBench')
 from WeatherBench.src.train_nn import DataGenerator, PeriodicConv2D, create_predictions
 import tensorflow.keras as keras
 ```

 %% Cell type:code id: tags:

 ``` python
 bs=32

 import numpy as np
 class DataGenerator(keras.utils.Sequence):
    def __init__(self, fct, verif, lead_time, batch_size=bs, shuffle=True, load=True,
                 mean=None, std=None):
        """
        Data generator for WeatherBench data.
        Template from https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly

        Args:
            fct: forecasts from S2S models: xr.DataArray (xr.Dataset doesnt work properly)
            verif: observations with same dimensionality (xr.Dataset doesnt work properly)
            lead_time: Lead_time as in model
            batch_size: Batch size
            shuffle: bool. If True, data is shuffled.
            load: bool. If True, datadet is loaded into RAM.
            mean: If None, compute mean from data.
            std: If None, compute standard deviation from data.

        Todo:
        - use number in a better way, now uses only ensemble mean forecast
        - dont use .sel(lead_time=lead_time) to train over all lead_time at once
        - be sensitive with forecast_time, pool a few around the weekofyear given
        - use more variables as predictors
        - predict more variables
        """

        if isinstance(fct, xr.Dataset):
            print('convert fct to array')
            fct = fct.to_array().transpose(...,'variable')
            self.fct_dataset=True
        else:
            self.fct_dataset=False

        if isinstance(verif, xr.Dataset):
            print('convert verif to array')
            verif = verif.to_array().transpose(...,'variable')
            self.verif_dataset=True
        else:
            self.verif_dataset=False

        #self.fct = fct
        self.batch_size = batch_size
        self.shuffle = shuffle
        self.lead_time = lead_time

        self.fct_data = fct.transpose('forecast_time', ...).sel(lead_time=lead_time)
        self.fct_mean = self.fct_data.mean('forecast_time').compute() if mean is None else mean
        self.fct_std = self.fct_data.std('forecast_time').compute() if std is None else std

        self.verif_data = verif.transpose('forecast_time', ...).sel(lead_time=lead_time)
        self.verif_mean = self.verif_data.mean('forecast_time').compute() if mean is None else mean
        self.verif_std = self.verif_data.std('forecast_time').compute() if std is None else std

        # Normalize
        self.fct_data = (self.fct_data - self.fct_mean) / self.fct_std
        self.verif_data = (self.verif_data - self.verif_mean) / self.verif_std

        self.n_samples = self.fct_data.forecast_time.size
        self.forecast_time = self.fct_data.forecast_time

        self.on_epoch_end()

        # For some weird reason calling .load() earlier messes up the mean and std computations
        if load:
            # print('Loading data into RAM')
            self.fct_data.load()

    def __len__(self):
        'Denotes the number of batches per epoch'
        return int(np.ceil(self.n_samples / self.batch_size))

    def __getitem__(self, i):
        'Generate one batch of data'
        idxs = self.idxs[i * self.batch_size:(i + 1) * self.batch_size]
        # got all nan if nans not masked
        X = self.fct_data.isel(forecast_time=idxs).fillna(0.).values
        y = self.verif_data.isel(forecast_time=idxs).fillna(0.).values
        return X, y

    def on_epoch_end(self):
        'Updates indexes after each epoch'
        self.idxs = np.arange(self.n_samples)
        if self.shuffle == True:
            np.random.shuffle(self.idxs)
 ```

 %% Cell type:code id: tags:

 ``` python
 # 2 bi-weekly `lead_time`: week 3-4
 lead = hind_2000_2019.isel(lead_time=0).lead_time

 lead
 ```

 %% Output

    <xarray.DataArray 'lead_time' ()>
    array(1209600000000000, dtype='timedelta64[ns]')
    Coordinates:
        lead_time  timedelta64[ns] 14 days
    Attributes:
-        comment:  lead_time describes bi-weekly aggregates. The pd.Timedelta corr...
+        aggregate:      The pd.Timedelta corresponds to the first day of a biweek...
+        description:    Forecast period is the time interval between the forecast...
+        long_name:      lead time
+        standard_name:  forecast_period
+        week34_t2m:     mean[14 days, 27 days]
+        week34_tp:      28 days minus 14 days
+        week56_t2m:     mean[28 days, 41 days]
+        week56_tp:      42 days minus 28 days

 %% Cell type:code id: tags:

 ``` python
 # mask, needed?
 hind_2000_2019 = hind_2000_2019.where(obs_2000_2019.isel(forecast_time=0, lead_time=0,drop=True).notnull())
 ```

 %% Cell type:markdown id: tags:

 ## data prep: train, valid, test

 [Use the hindcast period to split train and valid.](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets) Do not use the 2020 data for testing!

 %% Cell type:code id: tags:

 ``` python
 # time is the forecast_time
 time_train_start,time_train_end='2000','2017' # train
 time_valid_start,time_valid_end='2018','2019' # valid
 time_test = '2020'                            # test
 ```

 %% Cell type:code id: tags:

 ``` python
 dg_train = DataGenerator(
    hind_2000_2019.mean('realization').sel(forecast_time=slice(time_train_start,time_train_end))[v],
    obs_2000_2019.sel(forecast_time=slice(time_train_start,time_train_end))[v],
    lead_time=lead, batch_size=bs, load=True)
 ```

 %% Output

    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)

 %% Cell type:code id: tags:

 ``` python
 dg_valid = DataGenerator(
    hind_2000_2019.mean('realization').sel(forecast_time=slice(time_valid_start,time_valid_end))[v],
    obs_2000_2019.sel(forecast_time=slice(time_valid_start,time_valid_end))[v],
    lead_time=lead, batch_size=bs, shuffle=False, load=True)
 ```

 %% Output

    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)

 %% Cell type:code id: tags:

 ``` python
 # do not use, delete?
 dg_test = DataGenerator(
    fct_2020.mean('realization').sel(forecast_time=time_test)[v],
    obs_2020.sel(forecast_time=time_test)[v],
    lead_time=lead, batch_size=bs, load=True, mean=dg_train.fct_mean, std=dg_train.fct_std, shuffle=False)
 ```

 %% Cell type:code id: tags:

 ``` python
 X, y = dg_valid[0]
 X.shape, y.shape
 ```

 %% Output

    ((32, 121, 240), (32, 121, 240))

 %% Cell type:code id: tags:

 ``` python
 # short look into training data: large biases
 # any problem from normalizing?
-i=4
-xr.DataArray(np.vstack([X[i],y[i]])).plot(yincrease=False, robust=True)
+# i=4
+# xr.DataArray(np.vstack([X[i],y[i]])).plot(yincrease=False, robust=True)
 ```

-%% Output
-
-    <matplotlib.collections.QuadMesh at 0x7f3a7e44b730>
-
-
-
 %% Cell type:markdown id: tags:

 ## `fit`

 %% Cell type:code id: tags:

 ``` python
 cnn = keras.models.Sequential([
    PeriodicConv2D(filters=32, kernel_size=5, conv_kwargs={'activation':'relu'}, input_shape=(32, 64, 1)),
    PeriodicConv2D(filters=1, kernel_size=5)
 ])
 ```

 %% Output

-    WARNING:tensorflow:AutoGraph could not transform <bound method PeriodicPadding2D.call of <WeatherBench.src.train_nn.PeriodicPadding2D object at 0x7f3a7c21bfd0>> and will run it as-is.
+    WARNING:tensorflow:AutoGraph could not transform <bound method PeriodicPadding2D.call of <WeatherBench.src.train_nn.PeriodicPadding2D object at 0x7f86042986a0>> and will run it as-is.
    Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
    Cause: module 'gast' has no attribute 'Index'
    To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
-    WARNING: AutoGraph could not transform <bound method PeriodicPadding2D.call of <WeatherBench.src.train_nn.PeriodicPadding2D object at 0x7f3a7c21bfd0>> and will run it as-is.
+    WARNING: AutoGraph could not transform <bound method PeriodicPadding2D.call of <WeatherBench.src.train_nn.PeriodicPadding2D object at 0x7f86042986a0>> and will run it as-is.
    Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
    Cause: module 'gast' has no attribute 'Index'
    To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert

 %% Cell type:code id: tags:

 ``` python
 cnn.summary()
 ```

 %% Output

    Model: "sequential"
    _________________________________________________________________
    Layer (type)                 Output Shape              Param #
    =================================================================
    periodic_conv2d (PeriodicCon (None, 32, 64, 32)        832
    _________________________________________________________________
    periodic_conv2d_1 (PeriodicC (None, 32, 64, 1)         801
    =================================================================
    Total params: 1,633
    Trainable params: 1,633
    Non-trainable params: 0
    _________________________________________________________________

 %% Cell type:code id: tags:

 ``` python
 cnn.compile(keras.optimizers.Adam(1e-4), 'mse')
 ```

 %% Cell type:code id: tags:

 ``` python
 import warnings
 warnings.simplefilter("ignore")
 ```

 %% Cell type:code id: tags:

 ``` python
-cnn.fit(dg_train, epochs=3, validation_data=dg_valid)
+cnn.fit(dg_train, epochs=2, validation_data=dg_valid)
 ```

 %% Output

-    Epoch 1/3
-    30/30 [==============================] - 24s 744ms/step - loss: 0.2325 - val_loss: 0.1270
-    Epoch 2/3
-    30/30 [==============================] - 22s 717ms/step - loss: 0.1188 - val_loss: 0.0791
-    Epoch 3/3
-    30/30 [==============================] - 22s 733ms/step - loss: 0.0766 - val_loss: 0.0620
+    Epoch 1/2
+    30/30 [==============================] - 58s 2s/step - loss: 0.1472 - val_loss: 0.0742
+    Epoch 2/2
+    30/30 [==============================] - 45s 1s/step - loss: 0.0712 - val_loss: 0.0545

-    <tensorflow.python.keras.callbacks.History at 0x7f3a880d2700>
+    <tensorflow.python.keras.callbacks.History at 0x7f865c2103d0>

 %% Cell type:markdown id: tags:

 ## `predict`

-Create predictions and print `mean(variable, lead_time, longitude, weighted latitude)` RPSS for all years as calculated by `skill_by_year`. For now RPS, todo: change to RPSS.
+Create predictions and print `mean(variable, lead_time, longitude, weighted latitude)` RPSS for all years as calculated by `skill_by_year`.

 %% Cell type:code id: tags:

 ``` python
 from scripts import add_valid_time_from_forecast_reference_time_and_lead_time

 def _create_predictions(model, dg, lead):
    """Create non-iterative predictions"""
    preds = model.predict(dg).squeeze()
    # Unnormalize
    preds = preds * dg.fct_std.values + dg.fct_mean.values
    if dg.verif_dataset:
        da = xr.DataArray(
                    preds,
                    dims=['forecast_time', 'latitude', 'longitude','variable'],
                    coords={'forecast_time': dg.fct_data.forecast_time, 'latitude': dg.fct_data.latitude,
                            'longitude': dg.fct_data.longitude},
                ).to_dataset() # doesnt work yet
    else:
        da = xr.DataArray(
                    preds,
                    dims=['forecast_time', 'latitude', 'longitude'],
                    coords={'forecast_time': dg.fct_data.forecast_time, 'latitude': dg.fct_data.latitude,
                            'longitude': dg.fct_data.longitude},
                )
    da = da.assign_coords(lead_time=lead)
    # da = add_valid_time_from_forecast_reference_time_and_lead_time(da)
    return da
 ```

 %% Cell type:code id: tags:

 ``` python
 # optionally masking the ocean when making probabilistic
 mask = obs_2020.std(['lead_time','forecast_time']).notnull()
 ```

 %% Cell type:code id: tags:

 ``` python
 from scripts import make_probabilistic
 ```

 %% Cell type:code id: tags:

 ``` python
 !renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc
 ```

 %% Output

    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m

 %% Cell type:code id: tags:

 ``` python
 cache_path='../data'
 tercile_file = f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc'
 tercile_edges = xr.open_dataset(tercile_file)
 ```

 %% Cell type:code id: tags:

 ``` python
 # this is not useful but results have expected dimensions
 # actually train for each lead_time

 def create_predictions(cnn, fct, obs, time):
    preds_test=[]
    for lead in fct.lead_time:
        dg = DataGenerator(fct.mean('realization').sel(forecast_time=time)[v],
                           obs.sel(forecast_time=time)[v],
                           lead_time=lead, batch_size=bs, mean=dg_train.fct_mean, std=dg_train.fct_std, shuffle=False)
        preds_test.append(_create_predictions(cnn, dg, lead))
    preds_test = xr.concat(preds_test, 'lead_time')
    preds_test['lead_time'] = fct.lead_time
    # add valid_time coord
    preds_test = add_valid_time_from_forecast_reference_time_and_lead_time(preds_test)
    preds_test = preds_test.to_dataset(name=v)
    # add fake var
    preds_test['tp'] = preds_test['t2m']
    # make probabilistic
    preds_test = make_probabilistic(preds_test.expand_dims('realization'), tercile_edges, mask=mask)
    return preds_test
 ```

 %% Cell type:markdown id: tags:

 ### `predict` training period in-sample

 %% Cell type:code id: tags:

 ``` python
 !renku storage pull ../data/forecast-like-observations_2020_biweekly_terciled.nc
 ```

 %% Output

    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m

 %% Cell type:code id: tags:

 ``` python
 !renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr
 ```

 %% Output

    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m

 %% Cell type:code id: tags:

 ``` python
 from scripts import skill_by_year
-```
-
-%% Cell type:code id: tags:
-
-``` python
-step = 3
-for year in np.arange(int(time_train_start), int(time_train_end) -1, step): # loop over years to consume less memory on renku
-    preds_is = create_predictions(cnn, hind_2000_2019, obs_2000_2019, time=slice(str(year), str(year+step-1))).compute()
-    print(skill_by_year(preds_is))
+import os
+if os.environ['HOME'] == '/home/jovyan':
+    import pandas as pd
+    # assume on renku with small memory
+    step = 2
+    skill_list = []
+    for year in np.arange(int(time_train_start), int(time_train_end) -1, step): # loop over years to consume less memory on renku
+        preds_is = create_predictions(cnn, hind_2000_2019, obs_2000_2019, time=slice(str(year), str(year+step-1))).compute()
+        skill_list.append(skill_by_year(preds_is))
+    skill = pd.concat(skill_list)
+else: # with larger memory, simply do
+    preds_is = create_predictions(cnn, hind_2000_2019, obs_2000_2019, time=slice(time_train_start, time_train_end))
+    skill = skill_by_year(preds_is)
+skill
 ```

 %% Output

-               RPS
-    year
-    2000  0.864946
-    2001  0.944157
-    2002  0.975950
-               RPS
+              RPSS
    year
-    2003  0.940808
-    2004  0.946251
-    2005  1.008755
-               RPS
-    year
-    2006  0.962032
-    2007  1.009629
-    2008  0.975747
-               RPS
-    year
-    2009  1.009750
-    2010  1.005306
-    2011  0.955670
-               RPS
-    year
-    2012  0.991964
-    2013  1.010457
-    2014  1.014053
-               RPS
-    year
-    2015  1.020807
-    2016  1.084565
-    2017  1.054406
-
-%% Cell type:code id: tags:
-
-``` python
-# not on renkulab, simply do
-# preds_is = create_predictions(cnn, hind_2000_2019, obs_2000_2019, time=slice(time_train_start, time_train_end))
-# skill_by_year(preds_is)
-```
+    2000 -0.862483
+    2001 -1.015485
+    2002 -1.101022
+    2003 -1.032647
+    2004 -1.056348
+    2005 -1.165675
+    2006 -1.057217
+    2007 -1.170849
+    2008 -1.049785
+    2009 -1.169108
+    2010 -1.130845
+    2011 -1.052670
+    2012 -1.126449
+    2013 -1.126930
+    2014 -1.095896
+    2015 -1.117486

 %% Cell type:markdown id: tags:

 ### `predict` validation period out-of-sample

 %% Cell type:code id: tags:

 ``` python
 preds_os = create_predictions(cnn, hind_2000_2019, obs_2000_2019, time=slice(time_valid_start, time_valid_end))

 skill_by_year(preds_os)
 ```

 %% Output

-               RPS
+              RPSS
    year
-    2018  1.045750
-    2019  1.097249
+    2018 -1.099744
+    2019 -1.172401

 %% Cell type:markdown id: tags:

 ### `predict` test

 %% Cell type:code id: tags:

 ``` python
 preds_test = create_predictions(cnn, fct_2020, obs_2020, time=time_test)

-print_RPS_per_year(preds_test)
+skill_by_year(preds_test)
 ```

 %% Output

-               RPS
+              RPSS
    year
-    2020  1.052517
+    2020 -1.076834

 %% Cell type:markdown id: tags:

 # Submission

 %% Cell type:code id: tags:

 ``` python
 from scripts import assert_predictions_2020
 assert_predictions_2020(preds_test)
 ```

 %% Cell type:code id: tags:

 ``` python
 preds_test.to_netcdf('../submissions/ML_prediction_2020.nc')
 ```

 %% Cell type:code id: tags:

 ``` python
-#!git add ../submissions/ML_prediction_2020.nc
+# !git add ../submissions/ML_prediction_2020.nc
+# !git add ML_train_and_prediction.ipynb
 ```

 %% Cell type:code id: tags:

 ``` python
-#!git commit -m "template_test commit message" # whatever message you want
+# !git commit -m "template_test commit message" # whatever message you want
 ```

 %% Cell type:code id: tags:

 ``` python
-#!git tag "submission-template_test-0.0.1" # if this is to be checked by scorer, only the last submitted==tagged version will be considered
+# !git tag "submission-template_test-0.0.1" # if this is to be checked by scorer, only the last submitted==tagged version will be considered
 ```

 %% Cell type:code id: tags:

 ``` python
-#!git push --tags
+# !git push --tags
 ```

 %% Cell type:code id: tags:

 ``` python
 ```

 %% Cell type:markdown id: tags:

 # Reproducibility

 %% Cell type:markdown id: tags:

 ## memory

 %% Cell type:code id: tags:

 ``` python
 # https://phoenixnap.com/kb/linux-commands-check-memory-usage
 !free -g
 ```

+%% Output
+
+                  total        used        free      shared  buff/cache   available
+    Mem:             31           7          11           0          12          24
+    Swap:             0           0           0
+
 %% Cell type:markdown id: tags:

 ## CPU

 %% Cell type:code id: tags:

 ``` python
 !lscpu
 ```

+%% Output
+
+    Architecture:                    x86_64
+    CPU op-mode(s):                  32-bit, 64-bit
+    Byte Order:                      Little Endian
+    Address sizes:                   40 bits physical, 48 bits virtual
+    CPU(s):                          8
+    On-line CPU(s) list:             0-7
+    Thread(s) per core:              1
+    Core(s) per socket:              1
+    Socket(s):                       8
+    NUMA node(s):                    1
+    Vendor ID:                       GenuineIntel
+    CPU family:                      6
+    Model:                           85
+    Model name:                      Intel Xeon Processor (Skylake, IBRS)
+    Stepping:                        4
+    CPU MHz:                         2095.078
+    BogoMIPS:                        4190.15
+    Virtualization:                  VT-x
+    Hypervisor vendor:               KVM
+    Virtualization type:             full
+    L1d cache:                       256 KiB
+    L1i cache:                       256 KiB
+    L2 cache:                        32 MiB
+    L3 cache:                        128 MiB
+    NUMA node0 CPU(s):               0-7
+    Vulnerability Itlb multihit:     KVM: Mitigation: Split huge pages
+    Vulnerability L1tf:              Mitigation; PTE Inversion; VMX conditional cach
+                                     e flushes, SMT disabled
+    Vulnerability Mds:               Vulnerable: Clear CPU buffers attempted, no mic
+                                     rocode; SMT Host state unknown
+    Vulnerability Meltdown:          Mitigation; PTI
+    Vulnerability Spec store bypass: Vulnerable
+    Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user
+                                      pointer sanitization
+    Vulnerability Spectre v2:        Mitigation; Full generic retpoline, IBPB condit
+                                     ional, IBRS_FW, STIBP disabled, RSB filling
+    Vulnerability Srbds:             Not affected
+    Vulnerability Tsx async abort:   Not affected
+    Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtr
+                                     r pge mca cmov pat pse36 clflush mmx fxsr sse s
+                                     se2 syscall nx pdpe1gb rdtscp lm constant_tsc r
+                                     ep_good nopl xtopology cpuid tsc_known_freq pni
+                                      pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_
+                                     2 x2apic movbe popcnt tsc_deadline_timer aes xs
+                                     ave avx f16c rdrand hypervisor lahf_lm abm 3dno
+                                     wprefetch cpuid_fault invpcid_single pti ibrs i
+                                     bpb tpr_shadow vnmi flexpriority ept vpid ept_a
+                                     d fsgsbase bmi1 avx2 smep bmi2 erms invpcid avx
+                                     512f avx512dq rdseed adx smap clwb avx512cd avx
+                                     512bw avx512vl xsaveopt xsavec xgetbv1 arat pku
+                                      ospke
+
 %% Cell type:markdown id: tags:

 ## software

 %% Cell type:code id: tags:

 ``` python
 !conda list
 ```

+%% Output
+
+    # packages in environment at /opt/conda:
+    #
+    # Name                    Version                   Build  Channel
+    _libgcc_mutex             0.1                 conda_forge    conda-forge
+    _openmp_mutex             4.5                       1_gnu    conda-forge
+    _pytorch_select           0.1                       cpu_0    defaults
+    _tflow_select             2.3.0                       mkl    defaults
+    absl-py                   0.13.0           py38h06a4308_0    defaults
+    aiobotocore               1.4.1              pyhd3eb1b0_0    defaults
+    aiohttp                   3.7.4.post0      py38h7f8727e_2    defaults
+    aioitertools              0.7.1              pyhd3eb1b0_0    defaults
+    alembic                   1.4.3              pyh9f0ad1d_0    conda-forge
+    ansiwrap                  0.8.4                    pypi_0    pypi
+    appdirs                   1.4.4                    pypi_0    pypi
+    argcomplete               1.12.3                   pypi_0    pypi
+    argon2-cffi               20.1.0           py38h497a2fe_2    conda-forge
+    argparse                  1.4.0                    pypi_0    pypi
+    asciitree                 0.3.3                      py_2    defaults
+    astor                     0.8.1            py38h06a4308_0    defaults
+    astunparse                1.6.3                      py_0    defaults
+    async-timeout             3.0.1                    pypi_0    pypi
+    async_generator           1.10                       py_0    conda-forge
+    attrs                     21.2.0                   pypi_0    pypi
+    backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
+    backports                 1.0                        py_2    conda-forge
+    backports.functools_lru_cache 1.6.1                      py_0    conda-forge
+    bagit                     1.8.1                    pypi_0    pypi
+    beautifulsoup4            4.10.0             pyh06a4308_0    defaults
+    binutils_impl_linux-64    2.35.1               h193b22a_1    conda-forge
+    binutils_linux-64         2.35                h67ddf6f_30    conda-forge
+    black                     20.8b1                   pypi_0    pypi
+    blas                      1.0                         mkl    defaults
+    bleach                    3.2.1              pyh9f0ad1d_0    conda-forge
+    blinker                   1.4                        py_1    conda-forge
+    bokeh                     2.3.3            py38h06a4308_0    defaults
+    botocore                  1.20.106           pyhd3eb1b0_0    defaults
+    bottleneck                1.3.2            py38heb32a55_1    defaults
+    bracex                    2.1.1                    pypi_0    pypi
+    branca                    0.3.1                    pypi_0    pypi
+    brotli                    1.0.9                he6710b0_2    defaults
+    brotlipy                  0.7.0           py38h497a2fe_1001    conda-forge
+    bzip2                     1.0.8                h7f98852_4    conda-forge
+    c-ares                    1.17.1               h36c2ea0_0    conda-forge
+    ca-certificates           2021.7.5             h06a4308_1    defaults
+    cachecontrol              0.12.6                   pypi_0    pypi
+    cachetools                4.2.4                    pypi_0    pypi
+    calamus                   0.3.12                   pypi_0    pypi
+    cdsapi                    0.5.1                    pypi_0    pypi
+    certifi                   2021.5.30                pypi_0    pypi
+    certipy                   0.1.3                      py_0    conda-forge
+    cffi                      1.14.6                   pypi_0    pypi
+    cfgrib                    0.9.9.0            pyhd8ed1ab_1    conda-forge
+    cftime                    1.5.0            py38h6323ea4_0    defaults
+    chardet                   3.0.4                    pypi_0    pypi
+    click                     7.1.2                    pypi_0    pypi
+    click-completion          0.5.2                    pypi_0    pypi
+    click-option-group        0.5.3                    pypi_0    pypi
+    click-plugins             1.1.1                    pypi_0    pypi
+    climetlab                 0.8.31                   pypi_0    pypi
+    climetlab-s2s-ai-challenge 0.8.0                    pypi_0    pypi
+    cloudpickle               2.0.0              pyhd3eb1b0_0    defaults
+    colorama                  0.4.4                    pypi_0    pypi
+    coloredlogs               15.0.1                   pypi_0    pypi
+    commonmark                0.9.1                    pypi_0    pypi
+    conda                     4.9.2            py38h578d9bd_0    conda-forge
+    conda-package-handling    1.7.2            py38h8df0ef7_0    conda-forge
+    configargparse            1.5.2                    pypi_0    pypi
+    configurable-http-proxy   1.3.0                         0    conda-forge
+    coverage                  5.5              py38h27cfd23_2    defaults
+    cryptography              3.4.8                    pypi_0    pypi
+    curl                      7.71.1               he644dc0_8    conda-forge
+    cwlgen                    0.4.2                    pypi_0    pypi
+    cwltool                   3.1.20211004060744          pypi_0    pypi
+    cycler                    0.10.0                   py38_0    defaults
+    cython                    0.29.24          py38h295c915_0    defaults
+    cytoolz                   0.11.0           py38h7b6447c_0    defaults
+    dask                      2021.8.1           pyhd3eb1b0_0    defaults
+    dask-core                 2021.8.1           pyhd3eb1b0_0    defaults
+    dataclasses               0.8                pyh6d0b6a4_7    defaults
+    decorator                 4.4.2                      py_0    conda-forge
+    defusedxml                0.6.0                      py_0    conda-forge
+    distributed               2021.8.1         py38h06a4308_0    defaults
+    distro                    1.5.0                    pypi_0    pypi
+    docopt                    0.6.2            py38h06a4308_0    defaults
+    eccodes                   2.21.0               ha0e6eb6_0    conda-forge
+    ecmwf-api-client          1.6.1                    pypi_0    pypi
+    ecmwflibs                 0.3.14                   pypi_0    pypi
+    entrypoints               0.3             pyhd8ed1ab_1003    conda-forge
+    environ-config            21.2.0                   pypi_0    pypi
+    fasteners                 0.16.3             pyhd3eb1b0_0    defaults
+    filelock                  3.0.12                   pypi_0    pypi
+    findlibs                  0.0.2                    pypi_0    pypi
+    fonttools                 4.25.0             pyhd3eb1b0_0    defaults
+    freetype                  2.10.4               h5ab3b9f_0    defaults
+    frozendict                2.0.6                    pypi_0    pypi
+    fsspec                    2021.7.0           pyhd3eb1b0_0    defaults
+    gast                      0.4.0              pyhd3eb1b0_0    defaults
+    gcc_impl_linux-64         9.3.0               h70c0ae5_18    conda-forge
+    gcc_linux-64              9.3.0               hf25ea35_30    conda-forge
+    gitdb                     4.0.7                    pypi_0    pypi
+    gitpython                 3.1.14                   pypi_0    pypi
+    google-auth               1.33.0             pyhd3eb1b0_0    defaults
+    google-auth-oauthlib      0.4.4              pyhd3eb1b0_0    defaults
+    google-pasta              0.2.0              pyhd3eb1b0_0    defaults
+    grpcio                    1.36.1           py38h2157cd5_1    defaults
+    gxx_impl_linux-64         9.3.0               hd87eabc_18    conda-forge
+    gxx_linux-64              9.3.0               h3fbe746_30    conda-forge
+    h5netcdf                  0.11.0             pyhd8ed1ab_0    conda-forge
+    h5py                      2.10.0           py38hd6299e0_1    defaults
+    hdf4                      4.2.13               h3ca952b_2    defaults
+    hdf5                      1.10.6          nompi_h6a2412b_1114    conda-forge
+    heapdict                  1.0.1              pyhd3eb1b0_0    defaults
+    humanfriendly             10.0                     pypi_0    pypi
+    humanize                  3.7.1                    pypi_0    pypi
+    icu                       68.1                 h58526e2_0    conda-forge
+    idna                      2.10               pyh9f0ad1d_0    conda-forge
+    importlib-metadata        3.4.0            py38h578d9bd_0    conda-forge
+    importlib_metadata        3.4.0                hd8ed1ab_0    conda-forge
+    intake                    0.6.3              pyhd3eb1b0_0    defaults
+    intake-xarray             0.5.0              pyhd3eb1b0_0    defaults
+    intel-openmp              2019.4                      243    defaults
+    ipykernel                 5.4.2            py38h81c977d_0    conda-forge
+    ipython                   7.19.0           py38h81c977d_2    conda-forge
+    ipython_genutils          0.2.0                      py_1    conda-forge
+    isodate                   0.6.0                    pypi_0    pypi
+    jasper                    1.900.1              hd497a04_4    defaults
+    jedi                      0.17.2           py38h578d9bd_1    conda-forge
+    jellyfish                 0.8.8                    pypi_0    pypi
+    jinja2                    3.0.1                    pypi_0    pypi
+    jmespath                  0.10.0             pyhd3eb1b0_0    defaults
+    joblib                    1.0.1              pyhd3eb1b0_0    defaults
+    jpeg                      9d                   h7f8727e_0    defaults
+    json5                     0.9.5              pyh9f0ad1d_0    conda-forge
+    jsonschema                3.2.0                      py_2    conda-forge
+    jupyter-server-proxy      1.6.0                    pypi_0    pypi
+    jupyter_client            6.1.11             pyhd8ed1ab_1    conda-forge
+    jupyter_core              4.7.0            py38h578d9bd_0    conda-forge
+    jupyter_telemetry         0.1.0              pyhd8ed1ab_1    conda-forge
+    jupyterhub                1.2.2                    pypi_0    pypi
+    jupyterlab                2.2.9                      py_0    conda-forge
+    jupyterlab-git            0.23.3                   pypi_0    pypi
+    jupyterlab_pygments       0.1.2              pyh9f0ad1d_0    conda-forge
+    jupyterlab_server         1.2.0                      py_0    conda-forge
+    keras-preprocessing       1.1.2              pyhd3eb1b0_0    defaults
+    kernel-headers_linux-64   2.6.32              h77966d4_13    conda-forge
+    kiwisolver                1.3.1            py38h2531618_0    defaults
+    krb5                      1.17.2               h926e7f8_0    conda-forge
+    lazy-object-proxy         1.6.0                    pypi_0    pypi
+    lcms2                     2.12                 h3be6417_0    defaults
+    ld_impl_linux-64          2.35.1               hea4e1c9_1    conda-forge
+    libaec                    1.0.4                he6710b0_1    defaults
+    libblas                   3.9.0           1_h86c2bf4_netlib    conda-forge
+    libcblas                  3.9.0           5_h92ddd45_netlib    conda-forge
+    libcurl                   7.71.1               hcdd3856_8    conda-forge
+    libedit                   3.1.20191231         he28a2e2_2    conda-forge
+    libev                     4.33                 h516909a_1    conda-forge
+    libffi                    3.3                  h58526e2_2    conda-forge
+    libgcc-devel_linux-64     9.3.0               h7864c58_18    conda-forge
+    libgcc-ng                 9.3.0               h2828fa1_18    conda-forge
+    libgfortran-ng            9.3.0               ha5ec8a7_17    defaults
+    libgfortran5              9.3.0               ha5ec8a7_17    defaults
+    libgomp                   9.3.0               h2828fa1_18    conda-forge
+    liblapack                 3.9.0           5_h92ddd45_netlib    conda-forge
+    libllvm10                 10.0.1               hbcb73fb_5    defaults
+    libmklml                  2019.0.5                      0    defaults
+    libnetcdf                 4.7.4           nompi_h56d31a8_107    conda-forge
+    libnghttp2                1.41.0               h8cfc5f6_2    conda-forge
+    libpng                    1.6.37               hbc83047_0    defaults
+    libprotobuf               3.17.2               h4ff587b_1    defaults
+    libsodium                 1.0.18               h36c2ea0_1    conda-forge
+    libssh2                   1.9.0                hab1572f_5    conda-forge
+    libstdcxx-devel_linux-64  9.3.0               hb016644_18    conda-forge
+    libstdcxx-ng              9.3.0               h6de172a_18    conda-forge
+    libtiff                   4.2.0                h85742a9_0    defaults
+    libuv                     1.40.0               h7f98852_0    conda-forge
+    libwebp-base              1.2.0                h27cfd23_0    defaults
+    llvmlite                  0.36.0           py38h612dafd_4    defaults
+    locket                    0.2.1            py38h06a4308_1    defaults
+    lockfile                  0.12.2                   pypi_0    pypi
+    lxml                      4.6.3                    pypi_0    pypi
+    lz4-c                     1.9.3                h295c915_1    defaults
+    magics                    1.5.6                    pypi_0    pypi
+    mako                      1.1.4              pyh44b312d_0    conda-forge
+    markdown                  3.3.4            py38h06a4308_0    defaults
+    markupsafe                2.0.1                    pypi_0    pypi
+    marshmallow               3.13.0                   pypi_0    pypi
+    matplotlib-base           3.4.2            py38hab158f2_0    defaults
+    mistune                   0.8.4           py38h497a2fe_1003    conda-forge
+    mkl                       2020.2                      256    defaults
+    mkl-service               2.3.0            py38he904b0f_0    defaults
+    mkl_fft                   1.3.0            py38h54f3939_0    defaults
+    mkl_random                1.1.1            py38h0573a6f_0    defaults
+    msgpack-python            1.0.2            py38hff7bd54_1    defaults
+    multidict                 5.1.0            py38h27cfd23_2    defaults
+    munkres                   1.1.4                      py_0    defaults
+    mypy-extensions           0.4.3                    pypi_0    pypi
+    nbclient                  0.5.0                    pypi_0    pypi
+    nbconvert                 6.0.7            py38h578d9bd_3    conda-forge
+    nbdime                    2.1.0                    pypi_0    pypi
+    nbformat                  5.1.2              pyhd8ed1ab_1    conda-forge
+    nbresuse                  0.4.0                    pypi_0    pypi
+    nc-time-axis              1.3.1              pyhd8ed1ab_2    conda-forge
+    ncurses                   6.2                  h58526e2_4    conda-forge
+    ndg-httpsclient           0.5.1                    pypi_0    pypi
+    nest-asyncio              1.4.3              pyhd8ed1ab_0    conda-forge
+    netcdf4                   1.5.4                    pypi_0    pypi
+    networkx                  2.6.3                    pypi_0    pypi
+    ninja                     1.10.2               hff7bd54_1    defaults
+    nodejs                    15.3.0               h25f6087_0    conda-forge
+    notebook                  6.2.0            py38h578d9bd_0    conda-forge
+    numba                     0.53.1           py38ha9443f7_0    defaults
+    numcodecs                 0.8.0            py38h2531618_0    defaults
+    numexpr                   2.7.3            py38hb2eb853_0    defaults
+    numpy                     1.19.2           py38h54aff64_0    defaults
+    numpy-base                1.19.2           py38hfa32c7d_0    defaults
+    oauthlib                  3.0.1                      py_0    conda-forge
+    olefile                   0.46               pyhd3eb1b0_0    defaults
+    openjpeg                  2.4.0                h3ad879b_0    defaults
+    openssl                   1.1.1l               h7f8727e_0    defaults
+    opt_einsum                3.3.0              pyhd3eb1b0_1    defaults
+    owlrl                     5.2.3                    pypi_0    pypi
+    packaging                 20.8               pyhd3deb0d_0    conda-forge
+    pamela                    1.0.0                      py_0    conda-forge
+    pandas                    1.3.2            py38h8c16a72_0    defaults
+    pandoc                    2.11.3.2             h7f98852_0    conda-forge
+    pandocfilters             1.4.2                      py_1    conda-forge
+    papermill                 2.3.1                    pypi_0    pypi
+    parso                     0.7.1              pyh9f0ad1d_0    conda-forge
+    partd                     1.2.0              pyhd3eb1b0_0    defaults
+    pathspec                  0.9.0                    pypi_0    pypi
+    patool                    1.12                     pypi_0    pypi
+    pdbufr                    0.9.0                    pypi_0    pypi
+    pexpect                   4.8.0              pyh9f0ad1d_2    conda-forge
+    pickleshare               0.7.5                   py_1003    conda-forge
+    pillow                    8.3.1            py38h2c7a002_0    defaults
+    pip                       21.0.1                   pypi_0    pypi
+    pipx                      0.16.1.0                 pypi_0    pypi
+    pluggy                    0.13.1                   pypi_0    pypi
+    portalocker               2.3.2                    pypi_0    pypi
+    powerline-shell           0.7.0                    pypi_0    pypi
+    prometheus_client         0.9.0              pyhd3deb0d_0    conda-forge
+    prompt-toolkit            3.0.10             pyha770c72_0    conda-forge
+    properscoring             0.1                        py_0    conda-forge
+    protobuf                  3.17.2           py38h295c915_0    defaults
+    prov                      1.5.1                    pypi_0    pypi
+    psutil                    5.8.0            py38h27cfd23_1    defaults
+    ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
+    pyasn1                    0.4.8              pyhd3eb1b0_0    defaults
+    pyasn1-modules            0.2.8                      py_0    defaults
+    pycosat                   0.6.3           py38h497a2fe_1006    conda-forge
+    pycparser                 2.20               pyh9f0ad1d_2    conda-forge
+    pycurl                    7.43.0.6         py38h996a351_1    conda-forge
+    pydap                     3.2.2           pyh9f0ad1d_1001    conda-forge
+    pydot                     1.4.2                    pypi_0    pypi
+    pygments                  2.10.0                   pypi_0    pypi
+    pyjwt                     2.1.0                    pypi_0    pypi
+    pyld                      2.0.3                    pypi_0    pypi
+    pyodc                     1.1.1                    pypi_0    pypi
+    pyopenssl                 20.0.1             pyhd8ed1ab_0    conda-forge
+    pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
+    pyrsistent                0.17.3           py38h497a2fe_2    conda-forge
+    pyshacl                   0.17.0.post1             pypi_0    pypi
+    pysocks                   1.7.1            py38h578d9bd_3    conda-forge
+    python                    3.8.6           hffdb5ce_4_cpython    conda-forge
+    python-dateutil           2.8.1                      py_0    conda-forge
+    python-eccodes            2021.03.0        py38hb5d20a5_1    conda-forge
+    python-editor             1.0.4                    pypi_0    pypi
+    python-flatbuffers        1.12               pyhd3eb1b0_0    defaults
+    python-json-logger        2.0.1              pyh9f0ad1d_0    conda-forge
+    python-snappy             0.6.0            py38h2531618_3    defaults
+    python_abi                3.8                      1_cp38    conda-forge
+    pytorch                   1.8.1           cpu_py38h60491be_0    defaults
+    pytz                      2021.1             pyhd3eb1b0_0    defaults
+    pyyaml                    5.4.1                    pypi_0    pypi
+    pyzmq                     21.0.1           py38h3d7ac18_0    conda-forge
+    rdflib                    6.0.1                    pypi_0    pypi
+    rdflib-jsonld             0.5.0                    pypi_0    pypi
+    readline                  8.0                  he28a2e2_2    conda-forge
+    regex                     2021.4.4                 pypi_0    pypi
+    renku                     0.16.2                   pypi_0    pypi
+    requests                  2.24.0                   pypi_0    pypi
+    requests-oauthlib         1.3.0                      py_0    defaults
+    rich                      10.3.0                   pypi_0    pypi
+    rsa                       4.7.2              pyhd3eb1b0_1    defaults
+    ruamel-yaml               0.16.5                   pypi_0    pypi
+    ruamel.yaml.clib          0.2.2            py38h497a2fe_2    conda-forge
+    ruamel_yaml               0.15.80         py38h497a2fe_1003    conda-forge
+    s3fs                      2021.7.0           pyhd3eb1b0_0    defaults
+    schema-salad              8.2.20210918131710          pypi_0    pypi
+    scikit-learn              0.24.2           py38ha9443f7_0    defaults
+    scipy                     1.7.0            py38h7b17777_1    conda-forge
+    send2trash                1.5.0                      py_0    conda-forge
+    setuptools                58.2.0                   pypi_0    pypi
+    setuptools-scm            6.0.1                    pypi_0    pypi
+    shellescape               3.8.1                    pypi_0    pypi
+    shellingham               1.4.0                    pypi_0    pypi
+    simpervisor               0.4                      pypi_0    pypi
+    six                       1.16.0                   pypi_0    pypi
+    smmap                     4.0.0                    pypi_0    pypi
+    snappy                    1.1.8                he6710b0_0    defaults
+    sortedcontainers          2.4.0              pyhd3eb1b0_0    defaults
+    soupsieve                 2.2.1              pyhd3eb1b0_0    defaults
+    sqlalchemy                1.3.22           py38h497a2fe_1    conda-forge
+    sqlite                    3.34.0               h74cdb3f_0    conda-forge
+    sysroot_linux-64          2.12                h77966d4_13    conda-forge
+    tabulate                  0.8.9                    pypi_0    pypi
+    tbb                       2020.3               hfd86e86_0    defaults
+    tblib                     1.7.0              pyhd3eb1b0_0    defaults
+    tenacity                  7.0.0                    pypi_0    pypi
+    tensorboard               2.4.0              pyhc547734_0    defaults
+    tensorboard-plugin-wit    1.6.0                      py_0    defaults
+    tensorflow                2.4.1           mkl_py38hb2083e0_0    defaults
+    tensorflow-base           2.4.1           mkl_py38h43e0292_0    defaults
+    tensorflow-estimator      2.6.0              pyh7b7c402_0    defaults
+    termcolor                 1.1.0            py38h06a4308_1    defaults
+    terminado                 0.9.2            py38h578d9bd_0    conda-forge
+    testpath                  0.4.4                      py_0    conda-forge
+    textwrap3                 0.9.2                    pypi_0    pypi
+    threadpoolctl             2.2.0              pyh0d69192_0    defaults
+    tini                      0.18.0            h14c3975_1001    conda-forge
+    tk                        8.6.10               h21135ba_1    conda-forge
+    toml                      0.10.2                   pypi_0    pypi
+    toolz                     0.11.1             pyhd3eb1b0_0    defaults
+    tornado                   6.1              py38h497a2fe_1    conda-forge
+    tqdm                      4.60.0                   pypi_0    pypi
+    traitlets                 5.0.5                      py_0    conda-forge
+    typed-ast                 1.4.2                    pypi_0    pypi
+    typing-extensions         3.7.4.3                  pypi_0    pypi
+    typing_extensions         3.10.0.2           pyh06a4308_0    defaults
+    urllib3                   1.25.11                  pypi_0    pypi
+    userpath                  1.4.2                    pypi_0    pypi
+    wcmatch                   8.2                      pypi_0    pypi
+    wcwidth                   0.2.5              pyh9f0ad1d_2    conda-forge
+    webencodings              0.5.1                      py_1    conda-forge
+    webob                     1.8.7              pyhd3eb1b0_0    defaults
+    werkzeug                  2.0.1              pyhd3eb1b0_0    defaults
+    wheel                     0.36.2             pyhd3deb0d_0    conda-forge
+    wrapt                     1.12.1           py38h7b6447c_1    defaults
+    xarray                    0.19.0             pyhd3eb1b0_1    defaults
+    xhistogram                0.3.0              pyhd8ed1ab_0    conda-forge
+    xskillscore               0.0.23             pyhd8ed1ab_0    conda-forge
+    xz                        5.2.5                h516909a_1    conda-forge
+    yagup                     0.1.1                    pypi_0    pypi
+    yaml                      0.2.5                h516909a_0    conda-forge
+    yarl                      1.6.3            py38h27cfd23_0    defaults
+    zarr                      2.8.1              pyhd3eb1b0_0    defaults
+    zeromq                    4.3.3                h58526e2_3    conda-forge
+    zict                      2.0.0              pyhd3eb1b0_0    defaults
+    zipp                      3.4.0                      py_0    conda-forge
+    zlib                      1.2.11            h516909a_1010    conda-forge
+    zstd                      1.4.9                haebb681_0    defaults
+
 %% Cell type:code id: tags:

 ``` python
 ```

 %% Cell type:markdown id: tags:

 # Train ML model to correct predictions of week 3-4 & 5-6

 This notebook create a Machine Learning `ML_model` to predict weeks 3-4 & 5-6 based on `S2S` weeks 3-4 & 5-6 forecasts and is compared to `CPC` observations for the [`s2s-ai-challenge`](https://s2s-ai-challenge.github.io/).

 %% Cell type:markdown id: tags:

 # Synopsis

 %% Cell type:markdown id: tags:

 ## Method: `ML-based mean bias reduction`

 - calculate the ML-based bias from 2000-2019 deterministic ensemble mean forecast
 - remove that the ML-based bias from 2020 forecast deterministic ensemble mean forecast

 %% Cell type:markdown id: tags:

 ## Data used

 type: renku datasets

 Training-input for Machine Learning model:
 - hindcasts of models:
    - ECMWF: `ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr`

 Forecast-input for Machine Learning model:
 - real-time 2020 forecasts of models:
    - ECMWF: `ecmwf_forecast-input_2020_biweekly_deterministic.zarr`

 Compare Machine Learning model forecast against against ground truth:
 - `CPC` observations:
    - `hindcast-like-observations_biweekly_deterministic.zarr`
    - `forecast-like-observations_2020_biweekly_deterministic.zarr`

 %% Cell type:markdown id: tags:

 ## Resources used
 for training, details in reproducibility

- platform: MPI-M supercompute 1 Node
- memory: 64 GB
- processors: 36 CPU
+- platform: renku
+- memory: 8 GB
+- processors: 2 CPU
 - storage required: 10 GB

 %% Cell type:markdown id: tags:

 ## Safeguards

 All points have to be [x] checked. If not, your submission is invalid.

 Changes to the code after submissions are not possible, as the `commit` before the `tag` will be reviewed.
 (Only in exceptions and if previous effort in reproducibility can be found, it may be allowed to improve readability and reproducibility after November 1st 2021.)

 %% Cell type:markdown id: tags:

 ### Safeguards to prevent [overfitting](https://en.wikipedia.org/wiki/Overfitting?wprov=sfti1)

 If the organizers suspect overfitting, your contribution can be disqualified.

-  - [x] We didnt use 2020 observations in training (explicit overfitting and cheating)
-  - [x] We didnt repeatedly verify my model on 2020 observations and incrementally improved my RPSS (implicit overfitting)
+  - [x] We did not use 2020 observations in training (explicit overfitting and cheating)
+  - [x] We did not repeatedly verify my model on 2020 observations and incrementally improved my RPSS (implicit overfitting)
  - [x] We provide RPSS scores for the training period with script `print_RPS_per_year`, see in section 6.3 `predict`.
  - [x] We tried our best to prevent [data leakage](https://en.wikipedia.org/wiki/Leakage_(machine_learning)?wprov=sfti1).
  - [x] We honor the `train-validate-test` [split principle](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets). This means that the hindcast data is split into `train` and `validate`, whereas `test` is withheld.
-  - [x] We did use `test` explicitly in training or implicitly in incrementally adjusting parameters.
+  - [x] We did not use `test` explicitly in training or implicitly in incrementally adjusting parameters.
  - [x] We considered [cross-validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)).

 %% Cell type:markdown id: tags:

 ### Safeguards for Reproducibility
 Notebook/code must be independently reproducible from scratch by the organizers (after the competition), if not possible: no prize
  - [x] All training data is publicly available (no pre-trained private neural networks, as they are not reproducible for us)
  - [x] Code is well documented, readable and reproducible.
-  - [x] Code to reproduce training and predictions should run within a day on the described architecture. If the training takes longer than a day, please justify why this is needed. Please do not submit training piplelines, which take weeks to train.
+  - [x] Code to reproduce training and predictions is preferred to run within a day on the described architecture. If the training takes longer than a day, please justify why this is needed. Please do not submit training piplelines, which take weeks to train.

 %% Cell type:markdown id: tags:

 # Todos to improve template

 This is just a demo.

 - [ ] use multiple predictor variables and two predicted variables
 - [ ] for both `lead_time`s in one go
 - [ ] consider seasonality, for now all `forecast_time` months are mixed
 - [ ] make probabilistic predictions with `category` dim, for now works deterministic

 %% Cell type:markdown id: tags:

 # Imports

 %% Cell type:code id: tags:

 ``` python
 from tensorflow.keras.layers import Input, Dense, Flatten
 from tensorflow.keras.models import Sequential

 import matplotlib.pyplot as plt

 import xarray as xr
 xr.set_options(display_style='text')
 import numpy as np

 from dask.utils import format_bytes
 import xskillscore as xs
 ```

-%% Output
-
-    /opt/conda/lib/python3.8/site-packages/xarray/backends/cfgrib_.py:27: UserWarning: Failed to load cfgrib - most likely there is a problem accessing the ecCodes library. Try `import cfgrib` to get the full error message
-      warnings.warn(
-
 %% Cell type:markdown id: tags:

 # Get training data

 preprocessing of input data may be done in separate notebook/script

 %% Cell type:markdown id: tags:

 ## Hindcast

 get weekly initialized hindcasts

 %% Cell type:code id: tags:

 ``` python
 v='t2m'
 ```

 %% Cell type:code id: tags:

 ``` python
 # preprocessed as renku dataset
 !renku storage pull ../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr
 ```

 %% Output

    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m

 %% Cell type:code id: tags:

 ``` python
-hind_2000_2019 = xr.open_zarr("../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr", consolidated=True)#[v]
+hind_2000_2019 = xr.open_zarr("../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr", consolidated=True)
 ```

-%% Output
-
-    /opt/conda/lib/python3.8/site-packages/xarray/backends/plugins.py:61: RuntimeWarning: Engine 'cfgrib' loading failed:
-    /opt/conda/lib/python3.8/site-packages/gribapi/_bindings.cpython-38-x86_64-linux-gnu.so: undefined symbol: codes_bufr_key_is_header
-      warnings.warn(f"Engine {name!r} loading failed:\n{ex}", RuntimeWarning)
-
 %% Cell type:code id: tags:

 ``` python
 # preprocessed as renku dataset
 !renku storage pull ../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr
 ```

 %% Output

    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m

 %% Cell type:code id: tags:

 ``` python
-fct_2020 = xr.open_zarr("../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr", consolidated=True)#[v]
+fct_2020 = xr.open_zarr("../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr", consolidated=True)
 ```

 %% Cell type:markdown id: tags:

 ## Observations
 corresponding to hindcasts

 %% Cell type:code id: tags:

 ``` python
 # preprocessed as renku dataset
 !renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
 ```

 %% Output

    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m

 %% Cell type:code id: tags:

 ``` python
 obs_2000_2019 = xr.open_zarr("../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr", consolidated=True)#[v]
 ```

 %% Cell type:code id: tags:

 ``` python
 # preprocessed as renku dataset
 !renku storage pull ../data/forecast-like-observations_2020_biweekly_deterministic.zarr
 ```

 %% Output

    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m

 %% Cell type:code id: tags:

 ``` python
 obs_2020 = xr.open_zarr("../data/forecast-like-observations_2020_biweekly_deterministic.zarr", consolidated=True)#[v]
 ```

 %% Cell type:markdown id: tags:

 # ML model

 %% Cell type:markdown id: tags:

 based on [Weatherbench](https://github.com/pangeo-data/WeatherBench/blob/master/quickstart.ipynb)

 %% Cell type:code id: tags:

 ``` python
 # run once only and dont commit
 !git clone https://github.com/pangeo-data/WeatherBench/
 ```

 %% Output

    fatal: destination path 'WeatherBench' already exists and is not an empty directory.

 %% Cell type:code id: tags:

 ``` python
 import sys
 sys.path.insert(1, 'WeatherBench')
 from WeatherBench.src.train_nn import DataGenerator, PeriodicConv2D, create_predictions
 import tensorflow.keras as keras
 ```

 %% Cell type:code id: tags:

 ``` python
 bs=32

 import numpy as np
 class DataGenerator(keras.utils.Sequence):
    def __init__(self, fct, verif, lead_time, batch_size=bs, shuffle=True, load=True,
                 mean=None, std=None):
        """
        Data generator for WeatherBench data.
        Template from https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly

        Args:
            fct: forecasts from S2S models: xr.DataArray (xr.Dataset doesnt work properly)
            verif: observations with same dimensionality (xr.Dataset doesnt work properly)
            lead_time: Lead_time as in model
            batch_size: Batch size
            shuffle: bool. If True, data is shuffled.
            load: bool. If True, datadet is loaded into RAM.
            mean: If None, compute mean from data.
            std: If None, compute standard deviation from data.

        Todo:
        - use number in a better way, now uses only ensemble mean forecast
        - dont use .sel(lead_time=lead_time) to train over all lead_time at once
        - be sensitive with forecast_time, pool a few around the weekofyear given
        - use more variables as predictors
        - predict more variables
        """

        if isinstance(fct, xr.Dataset):
            print('convert fct to array')
            fct = fct.to_array().transpose(...,'variable')
            self.fct_dataset=True
        else:
            self.fct_dataset=False

        if isinstance(verif, xr.Dataset):
            print('convert verif to array')
            verif = verif.to_array().transpose(...,'variable')
            self.verif_dataset=True
        else:
            self.verif_dataset=False

        #self.fct = fct
        self.batch_size = batch_size
        self.shuffle = shuffle
        self.lead_time = lead_time

        self.fct_data = fct.transpose('forecast_time', ...).sel(lead_time=lead_time)
        self.fct_mean = self.fct_data.mean('forecast_time').compute() if mean is None else mean
        self.fct_std = self.fct_data.std('forecast_time').compute() if std is None else std

        self.verif_data = verif.transpose('forecast_time', ...).sel(lead_time=lead_time)
        self.verif_mean = self.verif_data.mean('forecast_time').compute() if mean is None else mean
        self.verif_std = self.verif_data.std('forecast_time').compute() if std is None else std

        # Normalize
        self.fct_data = (self.fct_data - self.fct_mean) / self.fct_std
        self.verif_data = (self.verif_data - self.verif_mean) / self.verif_std

        self.n_samples = self.fct_data.forecast_time.size
        self.forecast_time = self.fct_data.forecast_time

        self.on_epoch_end()

        # For some weird reason calling .load() earlier messes up the mean and std computations
        if load:
            # print('Loading data into RAM')
            self.fct_data.load()

    def __len__(self):
        'Denotes the number of batches per epoch'
        return int(np.ceil(self.n_samples / self.batch_size))

    def __getitem__(self, i):
        'Generate one batch of data'
        idxs = self.idxs[i * self.batch_size:(i + 1) * self.batch_size]
        # got all nan if nans not masked
        X = self.fct_data.isel(forecast_time=idxs).fillna(0.).values
        y = self.verif_data.isel(forecast_time=idxs).fillna(0.).values
        return X, y

    def on_epoch_end(self):
        'Updates indexes after each epoch'
        self.idxs = np.arange(self.n_samples)
        if self.shuffle == True:
            np.random.shuffle(self.idxs)
 ```

 %% Cell type:code id: tags:

 ``` python
 # 2 bi-weekly `lead_time`: week 3-4
 lead = hind_2000_2019.isel(lead_time=0).lead_time

 lead
 ```

 %% Output

    <xarray.DataArray 'lead_time' ()>
    array(1209600000000000, dtype='timedelta64[ns]')
    Coordinates:
        lead_time  timedelta64[ns] 14 days
    Attributes:
-        comment:  lead_time describes bi-weekly aggregates. The pd.Timedelta corr...
+        aggregate:      The pd.Timedelta corresponds to the first day of a biweek...
+        description:    Forecast period is the time interval between the forecast...
+        long_name:      lead time
+        standard_name:  forecast_period
+        week34_t2m:     mean[14 days, 27 days]
+        week34_tp:      28 days minus 14 days
+        week56_t2m:     mean[28 days, 41 days]
+        week56_tp:      42 days minus 28 days

 %% Cell type:code id: tags:

 ``` python
 # mask, needed?
 hind_2000_2019 = hind_2000_2019.where(obs_2000_2019.isel(forecast_time=0, lead_time=0,drop=True).notnull())
 ```

 %% Cell type:markdown id: tags:

 ## data prep: train, valid, test

 [Use the hindcast period to split train and valid.](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets) Do not use the 2020 data for testing!

 %% Cell type:code id: tags:

 ``` python
 # time is the forecast_time
 time_train_start,time_train_end='2000','2017' # train
 time_valid_start,time_valid_end='2018','2019' # valid
 time_test = '2020'                            # test
 ```

 %% Cell type:code id: tags:

 ``` python
 dg_train = DataGenerator(
    hind_2000_2019.mean('realization').sel(forecast_time=slice(time_train_start,time_train_end))[v],
    obs_2000_2019.sel(forecast_time=slice(time_train_start,time_train_end))[v],
    lead_time=lead, batch_size=bs, load=True)
 ```

 %% Output

    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)

 %% Cell type:code id: tags:

 ``` python
 dg_valid = DataGenerator(
    hind_2000_2019.mean('realization').sel(forecast_time=slice(time_valid_start,time_valid_end))[v],
    obs_2000_2019.sel(forecast_time=slice(time_valid_start,time_valid_end))[v],
    lead_time=lead, batch_size=bs, shuffle=False, load=True)
 ```

 %% Output

    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)

 %% Cell type:code id: tags:

 ``` python
 # do not use, delete?
 dg_test = DataGenerator(
    fct_2020.mean('realization').sel(forecast_time=time_test)[v],
    obs_2020.sel(forecast_time=time_test)[v],
    lead_time=lead, batch_size=bs, load=True, mean=dg_train.fct_mean, std=dg_train.fct_std, shuffle=False)
 ```

 %% Cell type:code id: tags:

 ``` python
 X, y = dg_valid[0]
 X.shape, y.shape
 ```

 %% Output

    ((32, 121, 240), (32, 121, 240))

 %% Cell type:code id: tags:

 ``` python
 # short look into training data: large biases
 # any problem from normalizing?
-i=4
-xr.DataArray(np.vstack([X[i],y[i]])).plot(yincrease=False, robust=True)
+# i=4
+# xr.DataArray(np.vstack([X[i],y[i]])).plot(yincrease=False, robust=True)
 ```

-%% Output
-
-    <matplotlib.collections.QuadMesh at 0x7f3a7e44b730>
-
-
-
 %% Cell type:markdown id: tags:

 ## `fit`

 %% Cell type:code id: tags:

 ``` python
 cnn = keras.models.Sequential([
    PeriodicConv2D(filters=32, kernel_size=5, conv_kwargs={'activation':'relu'}, input_shape=(32, 64, 1)),
    PeriodicConv2D(filters=1, kernel_size=5)
 ])
 ```

 %% Output

-    WARNING:tensorflow:AutoGraph could not transform <bound method PeriodicPadding2D.call of <WeatherBench.src.train_nn.PeriodicPadding2D object at 0x7f3a7c21bfd0>> and will run it as-is.
+    WARNING:tensorflow:AutoGraph could not transform <bound method PeriodicPadding2D.call of <WeatherBench.src.train_nn.PeriodicPadding2D object at 0x7f86042986a0>> and will run it as-is.
    Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
    Cause: module 'gast' has no attribute 'Index'
    To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
-    WARNING: AutoGraph could not transform <bound method PeriodicPadding2D.call of <WeatherBench.src.train_nn.PeriodicPadding2D object at 0x7f3a7c21bfd0>> and will run it as-is.
+    WARNING: AutoGraph could not transform <bound method PeriodicPadding2D.call of <WeatherBench.src.train_nn.PeriodicPadding2D object at 0x7f86042986a0>> and will run it as-is.
    Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
    Cause: module 'gast' has no attribute 'Index'
    To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert

 %% Cell type:code id: tags:

 ``` python
 cnn.summary()
 ```

 %% Output

    Model: "sequential"
    _________________________________________________________________
    Layer (type)                 Output Shape              Param #
    =================================================================
    periodic_conv2d (PeriodicCon (None, 32, 64, 32)        832
    _________________________________________________________________
    periodic_conv2d_1 (PeriodicC (None, 32, 64, 1)         801
    =================================================================
    Total params: 1,633
    Trainable params: 1,633
    Non-trainable params: 0
    _________________________________________________________________

 %% Cell type:code id: tags:

 ``` python
 cnn.compile(keras.optimizers.Adam(1e-4), 'mse')
 ```

 %% Cell type:code id: tags:

 ``` python
 import warnings
 warnings.simplefilter("ignore")
 ```

 %% Cell type:code id: tags:

 ``` python
-cnn.fit(dg_train, epochs=3, validation_data=dg_valid)
+cnn.fit(dg_train, epochs=2, validation_data=dg_valid)
 ```

 %% Output

-    Epoch 1/3
-    30/30 [==============================] - 24s 744ms/step - loss: 0.2325 - val_loss: 0.1270
-    Epoch 2/3
-    30/30 [==============================] - 22s 717ms/step - loss: 0.1188 - val_loss: 0.0791
-    Epoch 3/3
-    30/30 [==============================] - 22s 733ms/step - loss: 0.0766 - val_loss: 0.0620
+    Epoch 1/2
+    30/30 [==============================] - 58s 2s/step - loss: 0.1472 - val_loss: 0.0742
+    Epoch 2/2
+    30/30 [==============================] - 45s 1s/step - loss: 0.0712 - val_loss: 0.0545

-    <tensorflow.python.keras.callbacks.History at 0x7f3a880d2700>
+    <tensorflow.python.keras.callbacks.History at 0x7f865c2103d0>

 %% Cell type:markdown id: tags:

 ## `predict`

-Create predictions and print `mean(variable, lead_time, longitude, weighted latitude)` RPSS for all years as calculated by `skill_by_year`. For now RPS, todo: change to RPSS.
+Create predictions and print `mean(variable, lead_time, longitude, weighted latitude)` RPSS for all years as calculated by `skill_by_year`.

 %% Cell type:code id: tags:

 ``` python
 from scripts import add_valid_time_from_forecast_reference_time_and_lead_time

 def _create_predictions(model, dg, lead):
    """Create non-iterative predictions"""
    preds = model.predict(dg).squeeze()
    # Unnormalize
    preds = preds * dg.fct_std.values + dg.fct_mean.values
    if dg.verif_dataset:
        da = xr.DataArray(
                    preds,
                    dims=['forecast_time', 'latitude', 'longitude','variable'],
                    coords={'forecast_time': dg.fct_data.forecast_time, 'latitude': dg.fct_data.latitude,
                            'longitude': dg.fct_data.longitude},
                ).to_dataset() # doesnt work yet
    else:
        da = xr.DataArray(
                    preds,
                    dims=['forecast_time', 'latitude', 'longitude'],
                    coords={'forecast_time': dg.fct_data.forecast_time, 'latitude': dg.fct_data.latitude,
                            'longitude': dg.fct_data.longitude},
                )
    da = da.assign_coords(lead_time=lead)
    # da = add_valid_time_from_forecast_reference_time_and_lead_time(da)
    return da
 ```

 %% Cell type:code id: tags:

 ``` python
 # optionally masking the ocean when making probabilistic
 mask = obs_2020.std(['lead_time','forecast_time']).notnull()
 ```

 %% Cell type:code id: tags:

 ``` python
 from scripts import make_probabilistic
 ```

 %% Cell type:code id: tags:

 ``` python
 !renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc
 ```

 %% Output

    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m

 %% Cell type:code id: tags:

 ``` python
 cache_path='../data'
 tercile_file = f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc'
 tercile_edges = xr.open_dataset(tercile_file)
 ```

 %% Cell type:code id: tags:

 ``` python
 # this is not useful but results have expected dimensions
 # actually train for each lead_time

 def create_predictions(cnn, fct, obs, time):
    preds_test=[]
    for lead in fct.lead_time:
        dg = DataGenerator(fct.mean('realization').sel(forecast_time=time)[v],
                           obs.sel(forecast_time=time)[v],
                           lead_time=lead, batch_size=bs, mean=dg_train.fct_mean, std=dg_train.fct_std, shuffle=False)
        preds_test.append(_create_predictions(cnn, dg, lead))
    preds_test = xr.concat(preds_test, 'lead_time')
    preds_test['lead_time'] = fct.lead_time
    # add valid_time coord
    preds_test = add_valid_time_from_forecast_reference_time_and_lead_time(preds_test)
    preds_test = preds_test.to_dataset(name=v)
    # add fake var
    preds_test['tp'] = preds_test['t2m']
    # make probabilistic
    preds_test = make_probabilistic(preds_test.expand_dims('realization'), tercile_edges, mask=mask)
    return preds_test
 ```

 %% Cell type:markdown id: tags:

 ### `predict` training period in-sample

 %% Cell type:code id: tags:

 ``` python
 !renku storage pull ../data/forecast-like-observations_2020_biweekly_terciled.nc
 ```

 %% Output

    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m

 %% Cell type:code id: tags:

 ``` python
 !renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr
 ```

 %% Output

    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m

 %% Cell type:code id: tags:

 ``` python
 from scripts import skill_by_year
-```
-
-%% Cell type:code id: tags:
-
-``` python
-step = 3
-for year in np.arange(int(time_train_start), int(time_train_end) -1, step): # loop over years to consume less memory on renku
-    preds_is = create_predictions(cnn, hind_2000_2019, obs_2000_2019, time=slice(str(year), str(year+step-1))).compute()
-    print(skill_by_year(preds_is))
+import os
+if os.environ['HOME'] == '/home/jovyan':
+    import pandas as pd
+    # assume on renku with small memory
+    step = 2
+    skill_list = []
+    for year in np.arange(int(time_train_start), int(time_train_end) -1, step): # loop over years to consume less memory on renku
+        preds_is = create_predictions(cnn, hind_2000_2019, obs_2000_2019, time=slice(str(year), str(year+step-1))).compute()
+        skill_list.append(skill_by_year(preds_is))
+    skill = pd.concat(skill_list)
+else: # with larger memory, simply do
+    preds_is = create_predictions(cnn, hind_2000_2019, obs_2000_2019, time=slice(time_train_start, time_train_end))
+    skill = skill_by_year(preds_is)
+skill
 ```

 %% Output

-               RPS
-    year
-    2000  0.864946
-    2001  0.944157
-    2002  0.975950
-               RPS
+              RPSS
    year
-    2003  0.940808
-    2004  0.946251
-    2005  1.008755
-               RPS
-    year
-    2006  0.962032
-    2007  1.009629
-    2008  0.975747
-               RPS
-    year
-    2009  1.009750
-    2010  1.005306
-    2011  0.955670
-               RPS
-    year
-    2012  0.991964
-    2013  1.010457
-    2014  1.014053
-               RPS
-    year
-    2015  1.020807
-    2016  1.084565
-    2017  1.054406
-
-%% Cell type:code id: tags:
-
-``` python
-# not on renkulab, simply do
-# preds_is = create_predictions(cnn, hind_2000_2019, obs_2000_2019, time=slice(time_train_start, time_train_end))
-# skill_by_year(preds_is)
-```
+    2000 -0.862483
+    2001 -1.015485
+    2002 -1.101022
+    2003 -1.032647
+    2004 -1.056348
+    2005 -1.165675
+    2006 -1.057217
+    2007 -1.170849
+    2008 -1.049785
+    2009 -1.169108
+    2010 -1.130845
+    2011 -1.052670
+    2012 -1.126449
+    2013 -1.126930
+    2014 -1.095896
+    2015 -1.117486

 %% Cell type:markdown id: tags:

 ### `predict` validation period out-of-sample

 %% Cell type:code id: tags:

 ``` python
 preds_os = create_predictions(cnn, hind_2000_2019, obs_2000_2019, time=slice(time_valid_start, time_valid_end))

 skill_by_year(preds_os)
 ```

 %% Output

-               RPS
+              RPSS
    year
-    2018  1.045750
-    2019  1.097249
+    2018 -1.099744
+    2019 -1.172401

 %% Cell type:markdown id: tags:

 ### `predict` test

 %% Cell type:code id: tags:

 ``` python
 preds_test = create_predictions(cnn, fct_2020, obs_2020, time=time_test)

-print_RPS_per_year(preds_test)
+skill_by_year(preds_test)
 ```

 %% Output

-               RPS
+              RPSS
    year
-    2020  1.052517
+    2020 -1.076834

 %% Cell type:markdown id: tags:

 # Submission

 %% Cell type:code id: tags:

 ``` python
 from scripts import assert_predictions_2020
 assert_predictions_2020(preds_test)
 ```

 %% Cell type:code id: tags:

 ``` python
 preds_test.to_netcdf('../submissions/ML_prediction_2020.nc')
 ```

 %% Cell type:code id: tags:

 ``` python
-#!git add ../submissions/ML_prediction_2020.nc
+# !git add ../submissions/ML_prediction_2020.nc
+# !git add ML_train_and_prediction.ipynb
 ```

 %% Cell type:code id: tags:

 ``` python
-#!git commit -m "template_test commit message" # whatever message you want
+# !git commit -m "template_test commit message" # whatever message you want
 ```

 %% Cell type:code id: tags:

 ``` python
-#!git tag "submission-template_test-0.0.1" # if this is to be checked by scorer, only the last submitted==tagged version will be considered
+# !git tag "submission-template_test-0.0.1" # if this is to be checked by scorer, only the last submitted==tagged version will be considered
 ```

 %% Cell type:code id: tags:

 ``` python
-#!git push --tags
+# !git push --tags
 ```

 %% Cell type:code id: tags:

 ``` python
 ```

 %% Cell type:markdown id: tags:

 # Reproducibility

 %% Cell type:markdown id: tags:

 ## memory

 %% Cell type:code id: tags:

 ``` python
 # https://phoenixnap.com/kb/linux-commands-check-memory-usage
 !free -g
 ```

+%% Output
+
+                  total        used        free      shared  buff/cache   available
+    Mem:             31           7          11           0          12          24
+    Swap:             0           0           0
+
 %% Cell type:markdown id: tags:

 ## CPU

 %% Cell type:code id: tags:

 ``` python
 !lscpu
 ```

+%% Output
+
+    Architecture:                    x86_64
+    CPU op-mode(s):                  32-bit, 64-bit
+    Byte Order:                      Little Endian
+    Address sizes:                   40 bits physical, 48 bits virtual
+    CPU(s):                          8
+    On-line CPU(s) list:             0-7
+    Thread(s) per core:              1
+    Core(s) per socket:              1
+    Socket(s):                       8
+    NUMA node(s):                    1
+    Vendor ID:                       GenuineIntel
+    CPU family:                      6
+    Model:                           85
+    Model name:                      Intel Xeon Processor (Skylake, IBRS)
+    Stepping:                        4
+    CPU MHz:                         2095.078
+    BogoMIPS:                        4190.15
+    Virtualization:                  VT-x
+    Hypervisor vendor:               KVM
+    Virtualization type:             full
+    L1d cache:                       256 KiB
+    L1i cache:                       256 KiB
+    L2 cache:                        32 MiB
+    L3 cache:                        128 MiB
+    NUMA node0 CPU(s):               0-7
+    Vulnerability Itlb multihit:     KVM: Mitigation: Split huge pages
+    Vulnerability L1tf:              Mitigation; PTE Inversion; VMX conditional cach
+                                     e flushes, SMT disabled
+    Vulnerability Mds:               Vulnerable: Clear CPU buffers attempted, no mic
+                                     rocode; SMT Host state unknown
+    Vulnerability Meltdown:          Mitigation; PTI
+    Vulnerability Spec store bypass: Vulnerable
+    Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user
+                                      pointer sanitization
+    Vulnerability Spectre v2:        Mitigation; Full generic retpoline, IBPB condit
+                                     ional, IBRS_FW, STIBP disabled, RSB filling
+    Vulnerability Srbds:             Not affected
+    Vulnerability Tsx async abort:   Not affected
+    Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtr
+                                     r pge mca cmov pat pse36 clflush mmx fxsr sse s
+                                     se2 syscall nx pdpe1gb rdtscp lm constant_tsc r
+                                     ep_good nopl xtopology cpuid tsc_known_freq pni
+                                      pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_
+                                     2 x2apic movbe popcnt tsc_deadline_timer aes xs
+                                     ave avx f16c rdrand hypervisor lahf_lm abm 3dno
+                                     wprefetch cpuid_fault invpcid_single pti ibrs i
+                                     bpb tpr_shadow vnmi flexpriority ept vpid ept_a
+                                     d fsgsbase bmi1 avx2 smep bmi2 erms invpcid avx
+                                     512f avx512dq rdseed adx smap clwb avx512cd avx
+                                     512bw avx512vl xsaveopt xsavec xgetbv1 arat pku
+                                      ospke
+
 %% Cell type:markdown id: tags:

 ## software

 %% Cell type:code id: tags:

 ``` python
 !conda list
 ```

+%% Output
+
+    # packages in environment at /opt/conda:
+    #
+    # Name                    Version                   Build  Channel
+    _libgcc_mutex             0.1                 conda_forge    conda-forge
+    _openmp_mutex             4.5                       1_gnu    conda-forge
+    _pytorch_select           0.1                       cpu_0    defaults
+    _tflow_select             2.3.0                       mkl    defaults
+    absl-py                   0.13.0           py38h06a4308_0    defaults
+    aiobotocore               1.4.1              pyhd3eb1b0_0    defaults
+    aiohttp                   3.7.4.post0      py38h7f8727e_2    defaults
+    aioitertools              0.7.1              pyhd3eb1b0_0    defaults
+    alembic                   1.4.3              pyh9f0ad1d_0    conda-forge
+    ansiwrap                  0.8.4                    pypi_0    pypi
+    appdirs                   1.4.4                    pypi_0    pypi
+    argcomplete               1.12.3                   pypi_0    pypi
+    argon2-cffi               20.1.0           py38h497a2fe_2    conda-forge
+    argparse                  1.4.0                    pypi_0    pypi
+    asciitree                 0.3.3                      py_2    defaults
+    astor                     0.8.1            py38h06a4308_0    defaults
+    astunparse                1.6.3                      py_0    defaults
+    async-timeout             3.0.1                    pypi_0    pypi
+    async_generator           1.10                       py_0    conda-forge
+    attrs                     21.2.0                   pypi_0    pypi
+    backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
+    backports                 1.0                        py_2    conda-forge
+    backports.functools_lru_cache 1.6.1                      py_0    conda-forge
+    bagit                     1.8.1                    pypi_0    pypi
+    beautifulsoup4            4.10.0             pyh06a4308_0    defaults
+    binutils_impl_linux-64    2.35.1               h193b22a_1    conda-forge
+    binutils_linux-64         2.35                h67ddf6f_30    conda-forge
+    black                     20.8b1                   pypi_0    pypi
+    blas                      1.0                         mkl    defaults
+    bleach                    3.2.1              pyh9f0ad1d_0    conda-forge
+    blinker                   1.4                        py_1    conda-forge
+    bokeh                     2.3.3            py38h06a4308_0    defaults
+    botocore                  1.20.106           pyhd3eb1b0_0    defaults
+    bottleneck                1.3.2            py38heb32a55_1    defaults
+    bracex                    2.1.1                    pypi_0    pypi
+    branca                    0.3.1                    pypi_0    pypi
+    brotli                    1.0.9                he6710b0_2    defaults
+    brotlipy                  0.7.0           py38h497a2fe_1001    conda-forge
+    bzip2                     1.0.8                h7f98852_4    conda-forge
+    c-ares                    1.17.1               h36c2ea0_0    conda-forge
+    ca-certificates           2021.7.5             h06a4308_1    defaults
+    cachecontrol              0.12.6                   pypi_0    pypi
+    cachetools                4.2.4                    pypi_0    pypi
+    calamus                   0.3.12                   pypi_0    pypi
+    cdsapi                    0.5.1                    pypi_0    pypi
+    certifi                   2021.5.30                pypi_0    pypi
+    certipy                   0.1.3                      py_0    conda-forge
+    cffi                      1.14.6                   pypi_0    pypi
+    cfgrib                    0.9.9.0            pyhd8ed1ab_1    conda-forge
+    cftime                    1.5.0            py38h6323ea4_0    defaults
+    chardet                   3.0.4                    pypi_0    pypi
+    click                     7.1.2                    pypi_0    pypi
+    click-completion          0.5.2                    pypi_0    pypi
+    click-option-group        0.5.3                    pypi_0    pypi
+    click-plugins             1.1.1                    pypi_0    pypi
+    climetlab                 0.8.31                   pypi_0    pypi
+    climetlab-s2s-ai-challenge 0.8.0                    pypi_0    pypi
+    cloudpickle               2.0.0              pyhd3eb1b0_0    defaults
+    colorama                  0.4.4                    pypi_0    pypi
+    coloredlogs               15.0.1                   pypi_0    pypi
+    commonmark                0.9.1                    pypi_0    pypi
+    conda                     4.9.2            py38h578d9bd_0    conda-forge
+    conda-package-handling    1.7.2            py38h8df0ef7_0    conda-forge
+    configargparse            1.5.2                    pypi_0    pypi
+    configurable-http-proxy   1.3.0                         0    conda-forge
+    coverage                  5.5              py38h27cfd23_2    defaults
+    cryptography              3.4.8                    pypi_0    pypi
+    curl                      7.71.1               he644dc0_8    conda-forge
+    cwlgen                    0.4.2                    pypi_0    pypi
+    cwltool                   3.1.20211004060744          pypi_0    pypi
+    cycler                    0.10.0                   py38_0    defaults
+    cython                    0.29.24          py38h295c915_0    defaults
+    cytoolz                   0.11.0           py38h7b6447c_0    defaults
+    dask                      2021.8.1           pyhd3eb1b0_0    defaults
+    dask-core                 2021.8.1           pyhd3eb1b0_0    defaults
+    dataclasses               0.8                pyh6d0b6a4_7    defaults
+    decorator                 4.4.2                      py_0    conda-forge
+    defusedxml                0.6.0                      py_0    conda-forge
+    distributed               2021.8.1         py38h06a4308_0    defaults
+    distro                    1.5.0                    pypi_0    pypi
+    docopt                    0.6.2            py38h06a4308_0    defaults
+    eccodes                   2.21.0               ha0e6eb6_0    conda-forge
+    ecmwf-api-client          1.6.1                    pypi_0    pypi
+    ecmwflibs                 0.3.14                   pypi_0    pypi
+    entrypoints               0.3             pyhd8ed1ab_1003    conda-forge
+    environ-config            21.2.0                   pypi_0    pypi
+    fasteners                 0.16.3             pyhd3eb1b0_0    defaults
+    filelock                  3.0.12                   pypi_0    pypi
+    findlibs                  0.0.2                    pypi_0    pypi
+    fonttools                 4.25.0             pyhd3eb1b0_0    defaults
+    freetype                  2.10.4               h5ab3b9f_0    defaults
+    frozendict                2.0.6                    pypi_0    pypi
+    fsspec                    2021.7.0           pyhd3eb1b0_0    defaults
+    gast                      0.4.0              pyhd3eb1b0_0    defaults
+    gcc_impl_linux-64         9.3.0               h70c0ae5_18    conda-forge
+    gcc_linux-64              9.3.0               hf25ea35_30    conda-forge
+    gitdb                     4.0.7                    pypi_0    pypi
+    gitpython                 3.1.14                   pypi_0    pypi
+    google-auth               1.33.0             pyhd3eb1b0_0    defaults
+    google-auth-oauthlib      0.4.4              pyhd3eb1b0_0    defaults
+    google-pasta              0.2.0              pyhd3eb1b0_0    defaults
+    grpcio                    1.36.1           py38h2157cd5_1    defaults
+    gxx_impl_linux-64         9.3.0               hd87eabc_18    conda-forge
+    gxx_linux-64              9.3.0               h3fbe746_30    conda-forge
+    h5netcdf                  0.11.0             pyhd8ed1ab_0    conda-forge
+    h5py                      2.10.0           py38hd6299e0_1    defaults
+    hdf4                      4.2.13               h3ca952b_2    defaults
+    hdf5                      1.10.6          nompi_h6a2412b_1114    conda-forge
+    heapdict                  1.0.1              pyhd3eb1b0_0    defaults
+    humanfriendly             10.0                     pypi_0    pypi
+    humanize                  3.7.1                    pypi_0    pypi
+    icu                       68.1                 h58526e2_0    conda-forge
+    idna                      2.10               pyh9f0ad1d_0    conda-forge
+    importlib-metadata        3.4.0            py38h578d9bd_0    conda-forge
+    importlib_metadata        3.4.0                hd8ed1ab_0    conda-forge
+    intake                    0.6.3              pyhd3eb1b0_0    defaults
+    intake-xarray             0.5.0              pyhd3eb1b0_0    defaults
+    intel-openmp              2019.4                      243    defaults
+    ipykernel                 5.4.2            py38h81c977d_0    conda-forge
+    ipython                   7.19.0           py38h81c977d_2    conda-forge
+    ipython_genutils          0.2.0                      py_1    conda-forge
+    isodate                   0.6.0                    pypi_0    pypi
+    jasper                    1.900.1              hd497a04_4    defaults
+    jedi                      0.17.2           py38h578d9bd_1    conda-forge
+    jellyfish                 0.8.8                    pypi_0    pypi
+    jinja2                    3.0.1                    pypi_0    pypi
+    jmespath                  0.10.0             pyhd3eb1b0_0    defaults
+    joblib                    1.0.1              pyhd3eb1b0_0    defaults
+    jpeg                      9d                   h7f8727e_0    defaults
+    json5                     0.9.5              pyh9f0ad1d_0    conda-forge
+    jsonschema                3.2.0                      py_2    conda-forge
+    jupyter-server-proxy      1.6.0                    pypi_0    pypi
+    jupyter_client            6.1.11             pyhd8ed1ab_1    conda-forge
+    jupyter_core              4.7.0            py38h578d9bd_0    conda-forge
+    jupyter_telemetry         0.1.0              pyhd8ed1ab_1    conda-forge
+    jupyterhub                1.2.2                    pypi_0    pypi
+    jupyterlab                2.2.9                      py_0    conda-forge
+    jupyterlab-git            0.23.3                   pypi_0    pypi
+    jupyterlab_pygments       0.1.2              pyh9f0ad1d_0    conda-forge
+    jupyterlab_server         1.2.0                      py_0    conda-forge
+    keras-preprocessing       1.1.2              pyhd3eb1b0_0    defaults
+    kernel-headers_linux-64   2.6.32              h77966d4_13    conda-forge
+    kiwisolver                1.3.1            py38h2531618_0    defaults
+    krb5                      1.17.2               h926e7f8_0    conda-forge
+    lazy-object-proxy         1.6.0                    pypi_0    pypi
+    lcms2                     2.12                 h3be6417_0    defaults
+    ld_impl_linux-64          2.35.1               hea4e1c9_1    conda-forge
+    libaec                    1.0.4                he6710b0_1    defaults
+    libblas                   3.9.0           1_h86c2bf4_netlib    conda-forge
+    libcblas                  3.9.0           5_h92ddd45_netlib    conda-forge
+    libcurl                   7.71.1               hcdd3856_8    conda-forge
+    libedit                   3.1.20191231         he28a2e2_2    conda-forge
+    libev                     4.33                 h516909a_1    conda-forge
+    libffi                    3.3                  h58526e2_2    conda-forge
+    libgcc-devel_linux-64     9.3.0               h7864c58_18    conda-forge
+    libgcc-ng                 9.3.0               h2828fa1_18    conda-forge
+    libgfortran-ng            9.3.0               ha5ec8a7_17    defaults
+    libgfortran5              9.3.0               ha5ec8a7_17    defaults
+    libgomp                   9.3.0               h2828fa1_18    conda-forge
+    liblapack                 3.9.0           5_h92ddd45_netlib    conda-forge
+    libllvm10                 10.0.1               hbcb73fb_5    defaults
+    libmklml                  2019.0.5                      0    defaults
+    libnetcdf                 4.7.4           nompi_h56d31a8_107    conda-forge
+    libnghttp2                1.41.0               h8cfc5f6_2    conda-forge
+    libpng                    1.6.37               hbc83047_0    defaults
+    libprotobuf               3.17.2               h4ff587b_1    defaults
+    libsodium                 1.0.18               h36c2ea0_1    conda-forge
+    libssh2                   1.9.0                hab1572f_5    conda-forge
+    libstdcxx-devel_linux-64  9.3.0               hb016644_18    conda-forge
+    libstdcxx-ng              9.3.0               h6de172a_18    conda-forge
+    libtiff                   4.2.0                h85742a9_0    defaults
+    libuv                     1.40.0               h7f98852_0    conda-forge
+    libwebp-base              1.2.0                h27cfd23_0    defaults
+    llvmlite                  0.36.0           py38h612dafd_4    defaults
+    locket                    0.2.1            py38h06a4308_1    defaults
+    lockfile                  0.12.2                   pypi_0    pypi
+    lxml                      4.6.3                    pypi_0    pypi
+    lz4-c                     1.9.3                h295c915_1    defaults
+    magics                    1.5.6                    pypi_0    pypi
+    mako                      1.1.4              pyh44b312d_0    conda-forge
+    markdown                  3.3.4            py38h06a4308_0    defaults
+    markupsafe                2.0.1                    pypi_0    pypi
+    marshmallow               3.13.0                   pypi_0    pypi
+    matplotlib-base           3.4.2            py38hab158f2_0    defaults
+    mistune                   0.8.4           py38h497a2fe_1003    conda-forge
+    mkl                       2020.2                      256    defaults
+    mkl-service               2.3.0            py38he904b0f_0    defaults
+    mkl_fft                   1.3.0            py38h54f3939_0    defaults
+    mkl_random                1.1.1            py38h0573a6f_0    defaults
+    msgpack-python            1.0.2            py38hff7bd54_1    defaults
+    multidict                 5.1.0            py38h27cfd23_2    defaults
+    munkres                   1.1.4                      py_0    defaults
+    mypy-extensions           0.4.3                    pypi_0    pypi
+    nbclient                  0.5.0                    pypi_0    pypi
+    nbconvert                 6.0.7            py38h578d9bd_3    conda-forge
+    nbdime                    2.1.0                    pypi_0    pypi
+    nbformat                  5.1.2              pyhd8ed1ab_1    conda-forge
+    nbresuse                  0.4.0                    pypi_0    pypi
+    nc-time-axis              1.3.1              pyhd8ed1ab_2    conda-forge
+    ncurses                   6.2                  h58526e2_4    conda-forge
+    ndg-httpsclient           0.5.1                    pypi_0    pypi
+    nest-asyncio              1.4.3              pyhd8ed1ab_0    conda-forge
+    netcdf4                   1.5.4                    pypi_0    pypi
+    networkx                  2.6.3                    pypi_0    pypi
+    ninja                     1.10.2               hff7bd54_1    defaults
+    nodejs                    15.3.0               h25f6087_0    conda-forge
+    notebook                  6.2.0            py38h578d9bd_0    conda-forge
+    numba                     0.53.1           py38ha9443f7_0    defaults
+    numcodecs                 0.8.0            py38h2531618_0    defaults
+    numexpr                   2.7.3            py38hb2eb853_0    defaults
+    numpy                     1.19.2           py38h54aff64_0    defaults
+    numpy-base                1.19.2           py38hfa32c7d_0    defaults
+    oauthlib                  3.0.1                      py_0    conda-forge
+    olefile                   0.46               pyhd3eb1b0_0    defaults
+    openjpeg                  2.4.0                h3ad879b_0    defaults
+    openssl                   1.1.1l               h7f8727e_0    defaults
+    opt_einsum                3.3.0              pyhd3eb1b0_1    defaults
+    owlrl                     5.2.3                    pypi_0    pypi
+    packaging                 20.8               pyhd3deb0d_0    conda-forge
+    pamela                    1.0.0                      py_0    conda-forge
+    pandas                    1.3.2            py38h8c16a72_0    defaults
+    pandoc                    2.11.3.2             h7f98852_0    conda-forge
+    pandocfilters             1.4.2                      py_1    conda-forge
+    papermill                 2.3.1                    pypi_0    pypi
+    parso                     0.7.1              pyh9f0ad1d_0    conda-forge
+    partd                     1.2.0              pyhd3eb1b0_0    defaults
+    pathspec                  0.9.0                    pypi_0    pypi
+    patool                    1.12                     pypi_0    pypi
+    pdbufr                    0.9.0                    pypi_0    pypi
+    pexpect                   4.8.0              pyh9f0ad1d_2    conda-forge
+    pickleshare               0.7.5                   py_1003    conda-forge
+    pillow                    8.3.1            py38h2c7a002_0    defaults
+    pip                       21.0.1                   pypi_0    pypi
+    pipx                      0.16.1.0                 pypi_0    pypi
+    pluggy                    0.13.1                   pypi_0    pypi
+    portalocker               2.3.2                    pypi_0    pypi
+    powerline-shell           0.7.0                    pypi_0    pypi
+    prometheus_client         0.9.0              pyhd3deb0d_0    conda-forge
+    prompt-toolkit            3.0.10             pyha770c72_0    conda-forge
+    properscoring             0.1                        py_0    conda-forge
+    protobuf                  3.17.2           py38h295c915_0    defaults
+    prov                      1.5.1                    pypi_0    pypi
+    psutil                    5.8.0            py38h27cfd23_1    defaults
+    ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
+    pyasn1                    0.4.8              pyhd3eb1b0_0    defaults
+    pyasn1-modules            0.2.8                      py_0    defaults
+    pycosat                   0.6.3           py38h497a2fe_1006    conda-forge
+    pycparser                 2.20               pyh9f0ad1d_2    conda-forge
+    pycurl                    7.43.0.6         py38h996a351_1    conda-forge
+    pydap                     3.2.2           pyh9f0ad1d_1001    conda-forge
+    pydot                     1.4.2                    pypi_0    pypi
+    pygments                  2.10.0                   pypi_0    pypi
+    pyjwt                     2.1.0                    pypi_0    pypi
+    pyld                      2.0.3                    pypi_0    pypi
+    pyodc                     1.1.1                    pypi_0    pypi
+    pyopenssl                 20.0.1             pyhd8ed1ab_0    conda-forge
+    pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
+    pyrsistent                0.17.3           py38h497a2fe_2    conda-forge
+    pyshacl                   0.17.0.post1             pypi_0    pypi
+    pysocks                   1.7.1            py38h578d9bd_3    conda-forge
+    python                    3.8.6           hffdb5ce_4_cpython    conda-forge
+    python-dateutil           2.8.1                      py_0    conda-forge
+    python-eccodes            2021.03.0        py38hb5d20a5_1    conda-forge
+    python-editor             1.0.4                    pypi_0    pypi
+    python-flatbuffers        1.12               pyhd3eb1b0_0    defaults
+    python-json-logger        2.0.1              pyh9f0ad1d_0    conda-forge
+    python-snappy             0.6.0            py38h2531618_3    defaults
+    python_abi                3.8                      1_cp38    conda-forge
+    pytorch                   1.8.1           cpu_py38h60491be_0    defaults
+    pytz                      2021.1             pyhd3eb1b0_0    defaults
+    pyyaml                    5.4.1                    pypi_0    pypi
+    pyzmq                     21.0.1           py38h3d7ac18_0    conda-forge
+    rdflib                    6.0.1                    pypi_0    pypi
+    rdflib-jsonld             0.5.0                    pypi_0    pypi
+    readline                  8.0                  he28a2e2_2    conda-forge
+    regex                     2021.4.4                 pypi_0    pypi
+    renku                     0.16.2                   pypi_0    pypi
+    requests                  2.24.0                   pypi_0    pypi
+    requests-oauthlib         1.3.0                      py_0    defaults
+    rich                      10.3.0                   pypi_0    pypi
+    rsa                       4.7.2              pyhd3eb1b0_1    defaults
+    ruamel-yaml               0.16.5                   pypi_0    pypi
+    ruamel.yaml.clib          0.2.2            py38h497a2fe_2    conda-forge
+    ruamel_yaml               0.15.80         py38h497a2fe_1003    conda-forge
+    s3fs                      2021.7.0           pyhd3eb1b0_0    defaults
+    schema-salad              8.2.20210918131710          pypi_0    pypi
+    scikit-learn              0.24.2           py38ha9443f7_0    defaults
+    scipy                     1.7.0            py38h7b17777_1    conda-forge
+    send2trash                1.5.0                      py_0    conda-forge
+    setuptools                58.2.0                   pypi_0    pypi
+    setuptools-scm            6.0.1                    pypi_0    pypi
+    shellescape               3.8.1                    pypi_0    pypi
+    shellingham               1.4.0                    pypi_0    pypi
+    simpervisor               0.4                      pypi_0    pypi
+    six                       1.16.0                   pypi_0    pypi
+    smmap                     4.0.0                    pypi_0    pypi
+    snappy                    1.1.8                he6710b0_0    defaults
+    sortedcontainers          2.4.0              pyhd3eb1b0_0    defaults
+    soupsieve                 2.2.1              pyhd3eb1b0_0    defaults
+    sqlalchemy                1.3.22           py38h497a2fe_1    conda-forge
+    sqlite                    3.34.0               h74cdb3f_0    conda-forge
+    sysroot_linux-64          2.12                h77966d4_13    conda-forge
+    tabulate                  0.8.9                    pypi_0    pypi
+    tbb                       2020.3               hfd86e86_0    defaults
+    tblib                     1.7.0              pyhd3eb1b0_0    defaults
+    tenacity                  7.0.0                    pypi_0    pypi
+    tensorboard               2.4.0              pyhc547734_0    defaults
+    tensorboard-plugin-wit    1.6.0                      py_0    defaults
+    tensorflow                2.4.1           mkl_py38hb2083e0_0    defaults
+    tensorflow-base           2.4.1           mkl_py38h43e0292_0    defaults
+    tensorflow-estimator      2.6.0              pyh7b7c402_0    defaults
+    termcolor                 1.1.0            py38h06a4308_1    defaults
+    terminado                 0.9.2            py38h578d9bd_0    conda-forge
+    testpath                  0.4.4                      py_0    conda-forge
+    textwrap3                 0.9.2                    pypi_0    pypi
+    threadpoolctl             2.2.0              pyh0d69192_0    defaults
+    tini                      0.18.0            h14c3975_1001    conda-forge
+    tk                        8.6.10               h21135ba_1    conda-forge
+    toml                      0.10.2                   pypi_0    pypi
+    toolz                     0.11.1             pyhd3eb1b0_0    defaults
+    tornado                   6.1              py38h497a2fe_1    conda-forge
+    tqdm                      4.60.0                   pypi_0    pypi
+    traitlets                 5.0.5                      py_0    conda-forge
+    typed-ast                 1.4.2                    pypi_0    pypi
+    typing-extensions         3.7.4.3                  pypi_0    pypi
+    typing_extensions         3.10.0.2           pyh06a4308_0    defaults
+    urllib3                   1.25.11                  pypi_0    pypi
+    userpath                  1.4.2                    pypi_0    pypi
+    wcmatch                   8.2                      pypi_0    pypi
+    wcwidth                   0.2.5              pyh9f0ad1d_2    conda-forge
+    webencodings              0.5.1                      py_1    conda-forge
+    webob                     1.8.7              pyhd3eb1b0_0    defaults
+    werkzeug                  2.0.1              pyhd3eb1b0_0    defaults
+    wheel                     0.36.2             pyhd3deb0d_0    conda-forge
+    wrapt                     1.12.1           py38h7b6447c_1    defaults
+    xarray                    0.19.0             pyhd3eb1b0_1    defaults
+    xhistogram                0.3.0              pyhd8ed1ab_0    conda-forge
+    xskillscore               0.0.23             pyhd8ed1ab_0    conda-forge
+    xz                        5.2.5                h516909a_1    conda-forge
+    yagup                     0.1.1                    pypi_0    pypi
+    yaml                      0.2.5                h516909a_0    conda-forge
+    yarl                      1.6.3            py38h27cfd23_0    defaults
+    zarr                      2.8.1              pyhd3eb1b0_0    defaults
+    zeromq                    4.3.3                h58526e2_3    conda-forge
+    zict                      2.0.0              pyhd3eb1b0_0    defaults
+    zipp                      3.4.0                      py_0    conda-forge
+    zlib                      1.2.11            h516909a_1010    conda-forge
+    zstd                      1.4.9                haebb681_0    defaults
+
 %% Cell type:code id: tags:

 ``` python
 ```

--- a/notebooks/RPSS_verification.ipynb
+++ b/notebooks/RPSS_verification.ipynb
--- a/notebooks/data_access/EWC_catalog.yml
+++ b/notebooks/data_access/EWC_catalog.yml
+plugins:
+  source:
+    - module: intake_xarray
+
+sources:
+  training-input:
+    description: climetlab name in AI/ML community naming for hindcasts as input to the ML-model in training period
+    driver: netcdf
+    parameters:
+      model:
+        description: name of the S2S model
+        type: str
+        default: ecmwf
+        allowed: [ecmwf, eccc, ncep]
+      param:
+        description: variable name
+        type: str
+        default: tp
+        allowed: [t2m, ci, gh, lsm, msl, q, rsn, sm100, sm20, sp, sst, st100, st20, t, tcc, tcw, ttr, tp, v, u]
+      date:
+        description: initialization weekly thursdays
+        type: datetime
+        default: 2020.01.02
+        min: 2020.01.02
+        max: 2020.12.31
+      version:
+        description: versioning of the data
+        type: str
+        default: 0.3.0
+      format:
+        description: data type
+        type: str
+        default: netcdf
+        allowed: [netcdf, grib]
+      ending:
+        description: data format compatible with format; netcdf -> nc, grib -> grib
+        type: str
+        default: nc
+        allowed: [nc, grib]
+    xarray_kwargs:
+        engine: h5netcdf
+    args: # add simplecache:: for caching: https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally
+      urlpath: https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/training-input/{{version}}/{{format}}/{{model}}-hindcast-{{param}}-{{date.strftime("%Y%m%d")}}.{{ending}}
+
+  test-input:
+    description: climetlab name in AI/ML community naming for 2020 forecasts as input to ML model in test period 2020
+    driver: netcdf
+    parameters:
+      model:
+        description: name of the S2S model
+        type: str
+        default: ecmwf
+        allowed: [ecmwf, eccc, ncep]
+      param:
+        description: variable name
+        type: str
+        default: tp
+        allowed: [t2m, ci, gh, lsm, msl, q, rsn, sm100, sm20, sp, sst, st100, st20, t, tcc, tcw, ttr, tp, v, u]
+      date:
+        description: initialization weekly thursdays
+        type: datetime
+        default: 2020.01.02
+        min: 2020.01.02
+        max: 2020.12.31
+      version:
+        description: versioning of the data
+        type: str
+        default: 0.3.0
+      format:
+        description: data type
+        type: str
+        default: netcdf
+        allowed: [netcdf, grib]
+      ending:
+        description: data format compatible with format; netcdf -> nc, grib -> grib
+        type: str
+        default: nc
+        allowed: [nc, grib]
+    xarray_kwargs:
+        engine: h5netcdf
+    args: # add simplecache:: for caching: https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally
+      urlpath: https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-input/{{version}}/{{format}}/{{model}}-forecast-{{param}}-{{date.strftime("%Y%m%d")}}.{{ending}}
+
+  training-output-reference:
+    description: climetlab name in AI/ML community naming for 2020 forecasts as output reference to compare to ML model output to in training period
+    driver: netcdf
+    parameters:
+      param:
+        description: variable name
+        type: str
+        default: tp
+        allowed: [t2m, tp]
+      date:
+        description: initialization weekly thursdays
+        type: datetime
+        default: 2020.01.02
+        min: 2020.01.02
+        max: 2020.12.31
+    xarray_kwargs:
+        engine: h5netcdf
+    args: # add simplecache:: for caching: https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally
+      urlpath: https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-output-reference/{{param}}-{{date.strftime("%Y%m%d")}}.nc
+            
+  test-output-reference:
+    description: climetlab name in AI/ML community naming for 2020 forecasts as output reference to compare to ML model output to in test period 2020
+    driver: netcdf
+    parameters:
+      param:
+        description: variable name
+        type: str
+        default: tp
+        allowed: [t2m, tp]
+      date:
+        description: initialization weekly thursdays
+        type: datetime
+        default: 2020.01.02
+        min: 2020.01.02
+        max: 2020.12.31
+    xarray_kwargs:
+        engine: h5netcdf
+    args: # add simplecache:: for caching: https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally
+      urlpath: https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-output-reference/{{param}}-{{date.strftime("%Y%m%d")}}.nc
--- a/notebooks/data_access/IRIDL.ipynb
+++ b/notebooks/data_access/IRIDL.ipynb
--- a/notebooks/data_access/README.md
+++ b/notebooks/data_access/README.md
+# Data Access
+
+- European Weather Cloud:
+    - [`climetlab-s2s-ai-challenge`](https://github.com/ecmwf-lab/climetlab-s2s-ai-challenge)
+    - `wget`: wget_curl.ipynb
+    - `curl`: wget_curl.ipynb
+    - `mouse`: wget_curl.ipynb
+    - `intake`: intake.ipynb
+- [IRI Data Library](iridl.ldeo.columbia.edu/): IRIDL.ipynb
+    - S2S: http://iridl.ldeo.columbia.edu/SOURCES/.ECMWF/.S2S/ (restricted access explained in IRIDL.ipynb)
+    - SubX: http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/
+    - NMME: http://iridl.ldeo.columbia.edu/SOURCES/.Models/.NMME/
+- s2sprediction.net
--- a/notebooks/data_access/S2S_catalog.yml
+++ b/notebooks/data_access/S2S_catalog.yml
+plugins:
+  source:
+    - module: intake_xarray
+
+sources:
+  training-input:
+    description: S2S hindcasts from IRIDL regridded to 1.5 deg grid and aggregated by mean over lead, https://iridl.ldeo.columbia.edu/SOURCES/.ECMWF/.S2S/overview.html
+    driver: opendap
+    parameters:
+      center:
+        description: name of the center issuing the hindcast
+        type: str
+        default: ECMF
+        allowed: [BOM, CNRM, ECCC, ECMF, HMCR, ISAC, JMA, KMA, NCEP, UKMO]
+      grid:
+        description: regrid to this global resolution
+        type: float
+        default: 1.5
+      lead_name:
+        description: name of the lead_time dimension
+        type: str
+        default: LA
+        allowed: [LA, L]
+      lead_start:
+        description: aggregation start lead passed to RANGEEDGES
+        type: int
+        default: 14
+      lead_end:
+        description: aggregation end lead passed to RANGEEDGES
+        type: int
+        default: 27
+      experiment_type:
+        description: type of experiment
+        type: str
+        default: perturbed
+        allowed: [control, perturbed, RMMS]
+      group:
+        description: group of variables
+        type: str
+        default: 2m_above_ground
+        #allowed: [2m_above_ground, ...] see https://iridl.ldeo.columbia.edu/SOURCES/.ECMWF/.S2S/.ECMF/.reforecast/.perturbed/
+      param:
+        description: variable name
+        type: str
+        default: 2t
+        #allowed: [2t] see https://iridl.ldeo.columbia.edu/SOURCES/.ECMWF/.S2S/.ECMF/.reforecast/.perturbed/
+    xarray_kwargs:
+        engine: netcdf4
+    args:
+      urlpath: http://iridl.ldeo.columbia.edu/SOURCES/.ECMWF/.S2S/.{{center}}/.reforecast/.{{experiment_type}}/.{{group}}/{{param}}/{{lead_name}}/({{lead_start}})/({{lead_end}})/RANGEEDGES/[{{lead_name}}]average/X/0/{{grid}}/358.5/GRID/Y/90/{{grid}}/-90/GRID/dods
+
+
+  test-input:
+    description: S2S forecasts from IRIDL regridded to 1.5 deg grid and aggregated by mean over lead, https://iridl.ldeo.columbia.edu/SOURCES/.ECMWF/.S2S/overview.html
+    driver: opendap
+    parameters:
+      center:
+        description: name of the center issuing the hindcast
+        type: str
+        default: ECMF
+        allowed: ['BOM','CNRM','ECCC','ECMF','HMCR','ISAC','JMA','KMA','NCEP','UKMO']
+      grid:
+        description: regrid to this global resolution
+        type: float
+        default: 1.5
+      lead_name:
+        description: name of the lead_time dimension
+        type: str
+        default: LA
+        allowed: [LA, L, L1]
+      lead_start:
+        description: aggregation start lead passed to RANGEEDGES
+        type: int
+        default: 14
+      lead_end:
+        description: aggregation end lead passed to RANGEEDGES
+        type: int
+        default: 27
+      experiment_type:
+        description: type of experiment
+        type: str
+        default: perturbed
+        allowed: [control, perturbed, RMMS]
+      group:
+        description: group of variables
+        type: str
+        default: 2m_above_ground
+        #allowed: see https://iridl.ldeo.columbia.edu/SOURCES/.ECMWF/.S2S/.ECMF/.reforecast/.perturbed/
+      param:
+        description: variable name
+        type: str
+        default: 2t
+        #allowed: [2t] see https://iridl.ldeo.columbia.edu/SOURCES/.ECMWF/.S2S/.ECMF/.reforecast/.perturbed/
+    xarray_kwargs:
+        engine: netcdf4
+    args:
+        urlpath: http://iridl.ldeo.columbia.edu/SOURCES/.ECMWF/.S2S/.{{center}}/.forecast/.{{experiment_type}}/.{{group}}/{{param}}/S/(0000%201%20Jan%202020)/(0000%2031%20Dec%202020)/RANGEEDGES/{{lead_name}}/({{lead_start}})/({{lead_end}})/RANGEEDGES/[{{lead_name}}]average/X/0/{{grid}}/358.5/GRID/Y/90/{{grid}}/-90/GRID/dods 
--- a/notebooks/data_access/SubX_catalog.yml
+++ b/notebooks/data_access/SubX_catalog.yml
+plugins:
+  source:
+    - module: intake_xarray
+
+sources:
+  training-input:
+    description: SubX hindcasts from IRIDL regridded to 1.5 deg grid and aggregated by mean over lead, http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/outline.html
+    driver: opendap
+    parameters:
+      center:
+        description: name of the center issuing the hindcast
+        type: str
+        default: EMC
+        allowed: [CESM, ECCC, EMC, ESRL, GMAO, NCEP, NRL, RSMAS]
+      model:
+        description: name of the model
+        type: str
+        default: GEFS
+        allowed: [30LCESM1, 46LCESM1, GEM, GEPS6, GEPS5, GEFS, GEFSv12, FIMr1p1, GEOS_V2p1, CFSv2, NESM, CCSM4]
+      grid:
+        description: regrid to this global resolution
+        type: float
+        default: 1.5
+      lead_start:
+        description: aggregation start lead passed to RANGEEDGES
+        type: int
+        default: 14
+      lead_end:
+        description: aggregation end lead passed to RANGEEDGES
+        type: int
+        default: 27        
+      param:
+        description: variable name
+        type: str
+        default: pr
+        #allowed: [pr]
+    xarray_kwargs:
+        engine: netcdf4
+    args:
+      urlpath: http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/.{{center}}/.{{model}}/.hindcast/.{{param}}/L/({{lead_start}})/({{lead_end}})/RANGEEDGES/[L]average/X/0/{{grid}}/358.5/GRID/Y/90/{{grid}}/-90/GRID/dods
+
+  test-input:
+    description: SubX forecasts from IRIDL regridded to 1.5 deg grid and aggregated by mean over lead, http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/outline.html
+    driver: opendap
+    parameters:
+      center:
+        description: name of the center issuing the forecast
+        type: str
+        default: EMC
+        allowed: [CESM, ECCC, EMC, ESRL, GMAO, NCEP, NRL, RSMAS]
+      model:
+        description: name of the model
+        type: str
+        default: GEFS
+        allowed: [30LCESM1, 46LCESM1, GEM, GEPS6, GEPS5, GEFS, GEFSv12, FIMr1p1, GEOS_V2p1, CFSv2, NESM, CCSM4]
+      grid:
+        description: regrid to this global resolution
+        type: float
+        default: 1.5
+      lead_start:
+        description: aggregation start lead passed to RANGEEDGES
+        type: int
+        default: 14
+      lead_end:
+        description: aggregation end lead passed to RANGEEDGES
+        type: int
+        default: 27        
+      param:
+        description: variable name, see http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/outline.html
+        type: str
+        default: pr
+        #allowed: [pr] 
+    xarray_kwargs:
+        engine: netcdf4
+    args:
+        urlpath: http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/.{{center}}/.{{model}}/.forecast/.{{param}}/S/(0000%201%20Jan%202020)/(0000%2031%20Dec%202020)/RANGEEDGES/L/({{lead_start}})/({{lead_end}})/RANGEEDGES/[L]average/X/0/{{grid}}/358.5/GRID/Y/90/{{grid}}/-90/GRID/dods
--- a/notebooks/data_access/intake.ipynb
+++ b/notebooks/data_access/intake.ipynb
+%% Cell type:markdown id: tags:
+
+# Data Access from EWC via `intake`
+
+Data easily available via `climetlab`: https://github.com/ecmwf-lab/climetlab-s2s-ai-challenge
+Data holdings listed: https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-input/0.3.0/netcdf/index.html
+
+Therefore, S3 data also accessible with `intake-xarray` and cachable with `fsspec`.
+
+%% Cell type:code id: tags:
+
+``` python
+import intake
+import fsspec
+import xarray as xr
+import os, glob
+import pandas as pd
+xr.set_options(display_style='text')
+```
+
+%% Output
+
+    /opt/conda/lib/python3.8/site-packages/xarray/backends/cfgrib_.py:27: UserWarning: Failed to load cfgrib - most likely there is a problem accessing the ecCodes library. Try `import cfgrib` to get the full error message
+      warnings.warn(
+
+    <xarray.core.options.set_options at 0x7fa0100dcdc0>
+
+%% Cell type:code id: tags:
+
+``` python
+# prevent aihttp timeout errors
+
+from aiohttp import ClientSession, ClientTimeout
+timeout = ClientTimeout(total=600)
+fsspec.config.conf['https'] = dict(client_kwargs={'timeout': timeout})
+```
+
+%% Cell type:markdown id: tags:
+
+# intake
+
+https://github.com/intake/intake-xarray can read and cache `grib` and `netcdf` from catalogs.
+
+Caching via `fsspec`: https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally
+
+%% Cell type:code id: tags:
+
+``` python
+import intake_xarray
+cache_path = '/work/s2s-ai-challenge-template/data/cache'
+fsspec.config.conf['simplecache'] = {'cache_storage': cache_path, 'same_names':True}
+```
+
+%% Cell type:code id: tags:
+
+``` python
+%%writefile EWC_catalog.yml
+plugins:
+  source:
+    - module: intake_xarray
+
+sources:
+  training-input:
+    description: climetlab name in AI/ML community naming for hindcasts as input to the ML-model in training period
+    driver: netcdf
+    parameters:
+      model:
+        description: name of the S2S model
+        type: str
+        default: ecmwf
+        allowed: [ecmwf, eccc, ncep]
+      param:
+        description: variable name
+        type: str
+        default: tp
+        allowed: [t2m, ci, gh, lsm, msl, q, rsn, sm100, sm20, sp, sst, st100, st20, t, tcc, tcw, ttr, tp, v, u]
+      date:
+        description: initialization weekly thursdays
+        type: datetime
+        default: 2020.01.02
+        min: 2020.01.02
+        max: 2020.12.31
+      version:
+        description: versioning of the data
+        type: str
+        default: 0.3.0
+      format:
+        description: data type
+        type: str
+        default: netcdf
+        allowed: [netcdf, grib]
+      ending:
+        description: data format compatible with format; netcdf -> nc, grib -> grib
+        type: str
+        default: nc
+        allowed: [nc, grib]
+    xarray_kwargs:
+        engine: h5netcdf
+    args: # add simplecache:: for caching: https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally
+      urlpath: https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/training-input/{{version}}/{{format}}/{{model}}-hindcast-{{param}}-{{date.strftime("%Y%m%d")}}.{{ending}}
+
+  test-input:
+    description: climetlab name in AI/ML community naming for 2020 forecasts as input to ML model in test period 2020
+    driver: netcdf
+    parameters:
+      model:
+        description: name of the S2S model
+        type: str
+        default: ecmwf
+        allowed: [ecmwf, eccc, ncep]
+      param:
+        description: variable name
+        type: str
+        default: tp
+        allowed: [t2m, ci, gh, lsm, msl, q, rsn, sm100, sm20, sp, sst, st100, st20, t, tcc, tcw, ttr, tp, v, u]
+      date:
+        description: initialization weekly thursdays
+        type: datetime
+        default: 2020.01.02
+        min: 2020.01.02
+        max: 2020.12.31
+      version:
+        description: versioning of the data
+        type: str
+        default: 0.3.0
+      format:
+        description: data type
+        type: str
+        default: netcdf
+        allowed: [netcdf, grib]
+      ending:
+        description: data format compatible with format; netcdf -> nc, grib -> grib
+        type: str
+        default: nc
+        allowed: [nc, grib]
+    xarray_kwargs:
+        engine: h5netcdf
+    args: # add simplecache:: for caching: https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally
+      urlpath: https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-input/{{version}}/{{format}}/{{model}}-forecast-{{param}}-{{date.strftime("%Y%m%d")}}.{{ending}}
+
+  training-output-reference:
+    description: climetlab name in AI/ML community naming for 2020 forecasts as output reference to compare to ML model output to in training period
+    driver: netcdf
+    parameters:
+      param:
+        description: variable name
+        type: str
+        default: tp
+        allowed: [t2m, ci, gh, lsm, msl, q, rsn, sm100, sm20, sp, sst, st100, st20, t, tcc, tcw, ttr, tp, v, u]
+      date:
+        description: initialization weekly thursdays
+        type: datetime
+        default: 2020.01.02
+        min: 2020.01.02
+        max: 2020.12.31
+    xarray_kwargs:
+        engine: h5netcdf
+    args: # add simplecache:: for caching: https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally
+      urlpath: https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-output-reference/{{param}}-{{date.strftime("%Y%m%d")}}.nc
+
+  test-output-reference:
+    description: climetlab name in AI/ML community naming for 2020 forecasts as output reference to compare to ML model output to in test period 2020
+    driver: netcdf
+    parameters:
+      param:
+        description: variable name
+        type: str
+        default: tp
+        allowed: [t2m, ci, gh, lsm, msl, q, rsn, sm100, sm20, sp, sst, st100, st20, t, tcc, tcw, ttr, tp, v, u]
+      date:
+        description: initialization weekly thursdays
+        type: datetime
+        default: 2020.01.02
+        min: 2020.01.02
+        max: 2020.12.31
+    xarray_kwargs:
+        engine: h5netcdf
+    args: # add simplecache:: for caching: https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally
+      urlpath: https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-output-reference/{{param}}-{{date.strftime("%Y%m%d")}}.nc
+```
+
+%% Output
+
+    Writing EWC_catalog.yml
+
+%% Cell type:code id: tags:
+
+``` python
+cat = intake.open_catalog('EWC_catalog.yml')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# dates for 2020 forecasts and their on-the-fly reforecasts
+dates=pd.date_range(start='2020-01-02',freq='7D',end='2020-12-31')
+dates
+```
+
+%% Output
+
+    DatetimeIndex(['2020-01-02', '2020-01-09', '2020-01-16', '2020-01-23',
+                   '2020-01-30', '2020-02-06', '2020-02-13', '2020-02-20',
+                   '2020-02-27', '2020-03-05', '2020-03-12', '2020-03-19',
+                   '2020-03-26', '2020-04-02', '2020-04-09', '2020-04-16',
+                   '2020-04-23', '2020-04-30', '2020-05-07', '2020-05-14',
+                   '2020-05-21', '2020-05-28', '2020-06-04', '2020-06-11',
+                   '2020-06-18', '2020-06-25', '2020-07-02', '2020-07-09',
+                   '2020-07-16', '2020-07-23', '2020-07-30', '2020-08-06',
+                   '2020-08-13', '2020-08-20', '2020-08-27', '2020-09-03',
+                   '2020-09-10', '2020-09-17', '2020-09-24', '2020-10-01',
+                   '2020-10-08', '2020-10-15', '2020-10-22', '2020-10-29',
+                   '2020-11-05', '2020-11-12', '2020-11-19', '2020-11-26',
+                   '2020-12-03', '2020-12-10', '2020-12-17', '2020-12-24',
+                   '2020-12-31'],
+                  dtype='datetime64[ns]', freq='7D')
+
+%% Cell type:markdown id: tags:
+
+# `hindcast-input`
+
+on-the-fly hindcasts corresponding to the 2020 forecasts
+
+%% Cell type:code id: tags:
+
+``` python
+cat['training-input'](date=dates[10], param='tp', model='eccc').to_dask()
+```
+
+%% Output
+
+    /opt/conda/lib/python3.8/site-packages/xarray/backends/plugins.py:61: RuntimeWarning: Engine 'cfgrib' loading failed:
+    /opt/conda/lib/python3.8/site-packages/gribapi/_bindings.cpython-38-x86_64-linux-gnu.so: undefined symbol: codes_bufr_key_is_header
+      warnings.warn(f"Engine {name!r} loading failed:\n{ex}", RuntimeWarning)
+
+    <xarray.Dataset>
+    Dimensions:        (forecast_time: 20, latitude: 121, lead_time: 32, longitude: 240, realization: 4)
+    Coordinates:
+      * realization    (realization) int64 0 1 2 3
+      * forecast_time  (forecast_time) datetime64[ns] 1998-03-12 ... 2017-03-12
+      * lead_time      (lead_time) timedelta64[ns] 1 days 2 days ... 31 days 32 days
+      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
+        valid_time     (forecast_time, lead_time) datetime64[ns] ...
+    Data variables:
+        tp             (realization, forecast_time, lead_time, latitude, longitude) float32 ...
+    Attributes:
+        GRIB_edition:            [2]
+        GRIB_centre:             cwao
+        GRIB_centreDescription:  Canadian Meteorological Service - Montreal
+        GRIB_subCentre:          [0]
+        Conventions:             CF-1.7
+        institution:             Canadian Meteorological Service - Montreal
+        history:                 2021-05-11T10:03 GRIB to CDM+CF via cfgrib-0.9.9...
+
+%% Cell type:markdown id: tags:
+
+# `forecast-input`
+
+2020
+
+%% Cell type:code id: tags:
+
+``` python
+cat['test-input'](date=dates[10], param='t2m', model='ecmwf').to_dask()
+```
+
+%% Output
+
+    <xarray.Dataset>
+    Dimensions:        (forecast_time: 1, latitude: 121, lead_time: 46, longitude: 240, realization: 51)
+    Coordinates:
+      * realization    (realization) int64 0 1 2 3 4 5 6 7 ... 44 45 46 47 48 49 50
+      * forecast_time  (forecast_time) datetime64[ns] 2020-03-12
+      * lead_time      (lead_time) timedelta64[ns] 1 days 2 days ... 45 days 46 days
+      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
+        valid_time     (forecast_time, lead_time) datetime64[ns] ...
+    Data variables:
+        t2m            (realization, forecast_time, lead_time, latitude, longitude) float32 ...
+    Attributes:
+        GRIB_edition:            [2]
+        GRIB_centre:             ecmf
+        GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
+        GRIB_subCentre:          [0]
+        Conventions:             CF-1.7
+        institution:             European Centre for Medium-Range Weather Forecasts
+        history:                 2021-05-10T16:14:36 GRIB to CDM+CF via cfgrib-0....
+
+%% Cell type:markdown id: tags:
+
+# `hindcast-like-observations`
+
+observations matching hindcasts
+
+%% Cell type:code id: tags:
+
+``` python
+cat['training-output-reference'](date=dates[10], param='t2m').to_dask()
+```
+
+%% Output
+
+    <xarray.Dataset>
+    Dimensions:        (forecast_time: 1, latitude: 121, lead_time: 47, longitude: 240)
+    Coordinates:
+        valid_time     (lead_time, forecast_time) datetime64[ns] ...
+      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
+      * forecast_time  (forecast_time) datetime64[ns] 2020-03-12
+      * lead_time      (lead_time) timedelta64[ns] 0 days 1 days ... 45 days 46 days
+    Data variables:
+        t2m            (lead_time, forecast_time, latitude, longitude) float32 ...
+    Attributes:
+        source_dataset_name:  temperature daily from NOAA NCEP CPC: Climate Predi...
+        source_hosting:       IRIDL
+        source_url:           http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/...
+        created_by_software:  climetlab-s2s-ai-challenge
+        created_by_script:    tools/observations/makefile
+
+%% Cell type:markdown id: tags:
+
+# `forecast-like-observations`
+
+observations matching 2020 forecasts
+
+%% Cell type:code id: tags:
+
+``` python
+cat['test-output-reference'](date=dates[10], param='t2m').to_dask()
+```
+
+%% Output
+
+    <xarray.Dataset>
+    Dimensions:        (forecast_time: 1, latitude: 121, lead_time: 47, longitude: 240)
+    Coordinates:
+        valid_time     (lead_time, forecast_time) datetime64[ns] ...
+      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
+      * forecast_time  (forecast_time) datetime64[ns] 2020-03-12
+      * lead_time      (lead_time) timedelta64[ns] 0 days 1 days ... 45 days 46 days
+    Data variables:
+        t2m            (lead_time, forecast_time, latitude, longitude) float32 ...
+    Attributes:
+        source_dataset_name:  temperature daily from NOAA NCEP CPC: Climate Predi...
+        source_hosting:       IRIDL
+        source_url:           http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/...
+        created_by_software:  climetlab-s2s-ai-challenge
+        created_by_script:    tools/observations/makefile
+
+%% Cell type:code id: tags:
+
+``` python
+```
+%% Cell type:markdown id: tags:
+
+# Data Access from EWC via `intake`
+
+Data easily available via `climetlab`: https://github.com/ecmwf-lab/climetlab-s2s-ai-challenge
+Data holdings listed: https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-input/0.3.0/netcdf/index.html
+
+Therefore, S3 data also accessible with `intake-xarray` and cachable with `fsspec`.
+
+%% Cell type:code id: tags:
+
+``` python
+import intake
+import fsspec
+import xarray as xr
+import os, glob
+import pandas as pd
+xr.set_options(display_style='text')
+```
+
+%% Output
+
+    /opt/conda/lib/python3.8/site-packages/xarray/backends/cfgrib_.py:27: UserWarning: Failed to load cfgrib - most likely there is a problem accessing the ecCodes library. Try `import cfgrib` to get the full error message
+      warnings.warn(
+
+    <xarray.core.options.set_options at 0x7fa0100dcdc0>
+
+%% Cell type:code id: tags:
+
+``` python
+# prevent aihttp timeout errors
+
+from aiohttp import ClientSession, ClientTimeout
+timeout = ClientTimeout(total=600)
+fsspec.config.conf['https'] = dict(client_kwargs={'timeout': timeout})
+```
+
+%% Cell type:markdown id: tags:
+
+# intake
+
+https://github.com/intake/intake-xarray can read and cache `grib` and `netcdf` from catalogs.
+
+Caching via `fsspec`: https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally
+
+%% Cell type:code id: tags:
+
+``` python
+import intake_xarray
+cache_path = '/work/s2s-ai-challenge-template/data/cache'
+fsspec.config.conf['simplecache'] = {'cache_storage': cache_path, 'same_names':True}
+```
+
+%% Cell type:code id: tags:
+
+``` python
+%%writefile EWC_catalog.yml
+plugins:
+  source:
+    - module: intake_xarray
+
+sources:
+  training-input:
+    description: climetlab name in AI/ML community naming for hindcasts as input to the ML-model in training period
+    driver: netcdf
+    parameters:
+      model:
+        description: name of the S2S model
+        type: str
+        default: ecmwf
+        allowed: [ecmwf, eccc, ncep]
+      param:
+        description: variable name
+        type: str
+        default: tp
+        allowed: [t2m, ci, gh, lsm, msl, q, rsn, sm100, sm20, sp, sst, st100, st20, t, tcc, tcw, ttr, tp, v, u]
+      date:
+        description: initialization weekly thursdays
+        type: datetime
+        default: 2020.01.02
+        min: 2020.01.02
+        max: 2020.12.31
+      version:
+        description: versioning of the data
+        type: str
+        default: 0.3.0
+      format:
+        description: data type
+        type: str
+        default: netcdf
+        allowed: [netcdf, grib]
+      ending:
+        description: data format compatible with format; netcdf -> nc, grib -> grib
+        type: str
+        default: nc
+        allowed: [nc, grib]
+    xarray_kwargs:
+        engine: h5netcdf
+    args: # add simplecache:: for caching: https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally
+      urlpath: https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/training-input/{{version}}/{{format}}/{{model}}-hindcast-{{param}}-{{date.strftime("%Y%m%d")}}.{{ending}}
+
+  test-input:
+    description: climetlab name in AI/ML community naming for 2020 forecasts as input to ML model in test period 2020
+    driver: netcdf
+    parameters:
+      model:
+        description: name of the S2S model
+        type: str
+        default: ecmwf
+        allowed: [ecmwf, eccc, ncep]
+      param:
+        description: variable name
+        type: str
+        default: tp
+        allowed: [t2m, ci, gh, lsm, msl, q, rsn, sm100, sm20, sp, sst, st100, st20, t, tcc, tcw, ttr, tp, v, u]
+      date:
+        description: initialization weekly thursdays
+        type: datetime
+        default: 2020.01.02
+        min: 2020.01.02
+        max: 2020.12.31
+      version:
+        description: versioning of the data
+        type: str
+        default: 0.3.0
+      format:
+        description: data type
+        type: str
+        default: netcdf
+        allowed: [netcdf, grib]
+      ending:
+        description: data format compatible with format; netcdf -> nc, grib -> grib
+        type: str
+        default: nc
+        allowed: [nc, grib]
+    xarray_kwargs:
+        engine: h5netcdf
+    args: # add simplecache:: for caching: https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally
+      urlpath: https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-input/{{version}}/{{format}}/{{model}}-forecast-{{param}}-{{date.strftime("%Y%m%d")}}.{{ending}}
+
+  training-output-reference:
+    description: climetlab name in AI/ML community naming for 2020 forecasts as output reference to compare to ML model output to in training period
+    driver: netcdf
+    parameters:
+      param:
+        description: variable name
+        type: str
+        default: tp
+        allowed: [t2m, ci, gh, lsm, msl, q, rsn, sm100, sm20, sp, sst, st100, st20, t, tcc, tcw, ttr, tp, v, u]
+      date:
+        description: initialization weekly thursdays
+        type: datetime
+        default: 2020.01.02
+        min: 2020.01.02
+        max: 2020.12.31
+    xarray_kwargs:
+        engine: h5netcdf
+    args: # add simplecache:: for caching: https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally
+      urlpath: https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-output-reference/{{param}}-{{date.strftime("%Y%m%d")}}.nc
+
+  test-output-reference:
+    description: climetlab name in AI/ML community naming for 2020 forecasts as output reference to compare to ML model output to in test period 2020
+    driver: netcdf
+    parameters:
+      param:
+        description: variable name
+        type: str
+        default: tp
+        allowed: [t2m, ci, gh, lsm, msl, q, rsn, sm100, sm20, sp, sst, st100, st20, t, tcc, tcw, ttr, tp, v, u]
+      date:
+        description: initialization weekly thursdays
+        type: datetime
+        default: 2020.01.02
+        min: 2020.01.02
+        max: 2020.12.31
+    xarray_kwargs:
+        engine: h5netcdf
+    args: # add simplecache:: for caching: https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally
+      urlpath: https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-output-reference/{{param}}-{{date.strftime("%Y%m%d")}}.nc
+```
+
+%% Output
+
+    Writing EWC_catalog.yml
+
+%% Cell type:code id: tags:
+
+``` python
+cat = intake.open_catalog('EWC_catalog.yml')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# dates for 2020 forecasts and their on-the-fly reforecasts
+dates=pd.date_range(start='2020-01-02',freq='7D',end='2020-12-31')
+dates
+```
+
+%% Output
+
+    DatetimeIndex(['2020-01-02', '2020-01-09', '2020-01-16', '2020-01-23',
+                   '2020-01-30', '2020-02-06', '2020-02-13', '2020-02-20',
+                   '2020-02-27', '2020-03-05', '2020-03-12', '2020-03-19',
+                   '2020-03-26', '2020-04-02', '2020-04-09', '2020-04-16',
+                   '2020-04-23', '2020-04-30', '2020-05-07', '2020-05-14',
+                   '2020-05-21', '2020-05-28', '2020-06-04', '2020-06-11',
+                   '2020-06-18', '2020-06-25', '2020-07-02', '2020-07-09',
+                   '2020-07-16', '2020-07-23', '2020-07-30', '2020-08-06',
+                   '2020-08-13', '2020-08-20', '2020-08-27', '2020-09-03',
+                   '2020-09-10', '2020-09-17', '2020-09-24', '2020-10-01',
+                   '2020-10-08', '2020-10-15', '2020-10-22', '2020-10-29',
+                   '2020-11-05', '2020-11-12', '2020-11-19', '2020-11-26',
+                   '2020-12-03', '2020-12-10', '2020-12-17', '2020-12-24',
+                   '2020-12-31'],
+                  dtype='datetime64[ns]', freq='7D')
+
+%% Cell type:markdown id: tags:
+
+# `hindcast-input`
+
+on-the-fly hindcasts corresponding to the 2020 forecasts
+
+%% Cell type:code id: tags:
+
+``` python
+cat['training-input'](date=dates[10], param='tp', model='eccc').to_dask()
+```
+
+%% Output
+
+    /opt/conda/lib/python3.8/site-packages/xarray/backends/plugins.py:61: RuntimeWarning: Engine 'cfgrib' loading failed:
+    /opt/conda/lib/python3.8/site-packages/gribapi/_bindings.cpython-38-x86_64-linux-gnu.so: undefined symbol: codes_bufr_key_is_header
+      warnings.warn(f"Engine {name!r} loading failed:\n{ex}", RuntimeWarning)
+
+    <xarray.Dataset>
+    Dimensions:        (forecast_time: 20, latitude: 121, lead_time: 32, longitude: 240, realization: 4)
+    Coordinates:
+      * realization    (realization) int64 0 1 2 3
+      * forecast_time  (forecast_time) datetime64[ns] 1998-03-12 ... 2017-03-12
+      * lead_time      (lead_time) timedelta64[ns] 1 days 2 days ... 31 days 32 days
+      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
+        valid_time     (forecast_time, lead_time) datetime64[ns] ...
+    Data variables:
+        tp             (realization, forecast_time, lead_time, latitude, longitude) float32 ...
+    Attributes:
+        GRIB_edition:            [2]
+        GRIB_centre:             cwao
+        GRIB_centreDescription:  Canadian Meteorological Service - Montreal
+        GRIB_subCentre:          [0]
+        Conventions:             CF-1.7
+        institution:             Canadian Meteorological Service - Montreal
+        history:                 2021-05-11T10:03 GRIB to CDM+CF via cfgrib-0.9.9...
+
+%% Cell type:markdown id: tags:
+
+# `forecast-input`
+
+2020
+
+%% Cell type:code id: tags:
+
+``` python
+cat['test-input'](date=dates[10], param='t2m', model='ecmwf').to_dask()
+```
+
+%% Output
+
+    <xarray.Dataset>
+    Dimensions:        (forecast_time: 1, latitude: 121, lead_time: 46, longitude: 240, realization: 51)
+    Coordinates:
+      * realization    (realization) int64 0 1 2 3 4 5 6 7 ... 44 45 46 47 48 49 50
+      * forecast_time  (forecast_time) datetime64[ns] 2020-03-12
+      * lead_time      (lead_time) timedelta64[ns] 1 days 2 days ... 45 days 46 days
+      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
+        valid_time     (forecast_time, lead_time) datetime64[ns] ...
+    Data variables:
+        t2m            (realization, forecast_time, lead_time, latitude, longitude) float32 ...
+    Attributes:
+        GRIB_edition:            [2]
+        GRIB_centre:             ecmf
+        GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
+        GRIB_subCentre:          [0]
+        Conventions:             CF-1.7
+        institution:             European Centre for Medium-Range Weather Forecasts
+        history:                 2021-05-10T16:14:36 GRIB to CDM+CF via cfgrib-0....
+
+%% Cell type:markdown id: tags:
+
+# `hindcast-like-observations`
+
+observations matching hindcasts
+
+%% Cell type:code id: tags:
+
+``` python
+cat['training-output-reference'](date=dates[10], param='t2m').to_dask()
+```
+
+%% Output
+
+    <xarray.Dataset>
+    Dimensions:        (forecast_time: 1, latitude: 121, lead_time: 47, longitude: 240)
+    Coordinates:
+        valid_time     (lead_time, forecast_time) datetime64[ns] ...
+      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
+      * forecast_time  (forecast_time) datetime64[ns] 2020-03-12
+      * lead_time      (lead_time) timedelta64[ns] 0 days 1 days ... 45 days 46 days
+    Data variables:
+        t2m            (lead_time, forecast_time, latitude, longitude) float32 ...
+    Attributes:
+        source_dataset_name:  temperature daily from NOAA NCEP CPC: Climate Predi...
+        source_hosting:       IRIDL
+        source_url:           http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/...
+        created_by_software:  climetlab-s2s-ai-challenge
+        created_by_script:    tools/observations/makefile
+
+%% Cell type:markdown id: tags:
+
+# `forecast-like-observations`
+
+observations matching 2020 forecasts
+
+%% Cell type:code id: tags:
+
+``` python
+cat['test-output-reference'](date=dates[10], param='t2m').to_dask()
+```
+
+%% Output
+
+    <xarray.Dataset>
+    Dimensions:        (forecast_time: 1, latitude: 121, lead_time: 47, longitude: 240)
+    Coordinates:
+        valid_time     (lead_time, forecast_time) datetime64[ns] ...
+      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
+      * forecast_time  (forecast_time) datetime64[ns] 2020-03-12
+      * lead_time      (lead_time) timedelta64[ns] 0 days 1 days ... 45 days 46 days
+    Data variables:
+        t2m            (lead_time, forecast_time, latitude, longitude) float32 ...
+    Attributes:
+        source_dataset_name:  temperature daily from NOAA NCEP CPC: Climate Predi...
+        source_hosting:       IRIDL
+        source_url:           http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/...
+        created_by_software:  climetlab-s2s-ai-challenge
+        created_by_script:    tools/observations/makefile
+
+%% Cell type:code id: tags:
+
+``` python
+```
--- a/notebooks/data_access/wget_curl.ipynb
+++ b/notebooks/data_access/wget_curl.ipynb
+%% Cell type:markdown id: tags:
+
+# Data Access via `curl` or `wget`
+
+Data easily available via `climetlab`: https://github.com/ecmwf-lab/climetlab-s2s-ai-challenge
+
+Data holdings listed:
+
+- https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-input/0.3.0/netcdf/index.html
+- https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/training-input/0.3.0/netcdf/index.html
+- https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-output-reference/index.html
+- https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/training-output-reference/index.html
+
+Therefore, S3 data also accessible with `curl` or `wget`. Alternatively, you can click on the html links and download files by mouse click.
+
+%% Cell type:code id: tags:
+
+``` python
+import xarray as xr
+import os
+from subprocess import call
+xr.set_options(display_style='text')
+```
+
+%% Output
+
+    /opt/conda/lib/python3.8/site-packages/xarray/backends/cfgrib_.py:27: UserWarning: Failed to load cfgrib - most likely there is a problem accessing the ecCodes library. Try `import cfgrib` to get the full error message
+      warnings.warn(
+
+    <xarray.core.options.set_options at 0x7f5170570520>
+
+%% Cell type:code id: tags:
+
+``` python
+# version of the EWC data
+version = '0.3.0'
+```
+
+%% Cell type:markdown id: tags:
+
+# `hindcast-input`
+
+on-the-fly hindcasts corresponding to the 2020 forecasts
+
+%% Cell type:code id: tags:
+
+``` python
+parameter = 't2m'
+date = '20200102'
+model = 'ecmwf'
+```
+
+%% Cell type:code id: tags:
+
+``` python
+url = f'https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/training-input/{version}/netcdf/{model}-hindcast-{parameter}-{date}.nc'
+os.system(f'wget {url}')
+
+assert os.path.exists(f'{model}-hindcast-{parameter}-{date}.nc')
+```
+
+%% Cell type:markdown id: tags:
+
+# `forecast-input`
+
+2020
+
+%% Cell type:code id: tags:
+
+``` python
+url = f'https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-input/{version}/netcdf/{model}-forecast-{parameter}-{date}.nc'
+os.system(f'wget {url}')
+
+assert os.path.exists(f'{model}-forecast-{parameter}-{date}.nc')
+```
+
+%% Cell type:markdown id: tags:
+
+# `hindcast-like-observations`
+
+CPC observations formatted like training period hindcasts
+
+%% Cell type:code id: tags:
+
+``` python
+url = f'https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/training-output-reference/{parameter}-{date}.nc'
+os.system(f'wget {url}')
+
+assert os.path.exists(f'{parameter}-{date}.nc')
+```
+
+%% Cell type:markdown id: tags:
+
+# `forecast-like-observations`
+
+CPC observations formatted like test period 2020 forecasts
+
+%% Cell type:code id: tags:
+
+``` python
+url = f'https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-output-reference/{parameter}-{date}.nc'
+os.system(f'wget {url}')
+
+assert os.path.exists(f'{parameter}-{date}.nc')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+%% Cell type:markdown id: tags:
+
+# Data Access via `curl` or `wget`
+
+Data easily available via `climetlab`: https://github.com/ecmwf-lab/climetlab-s2s-ai-challenge
+
+Data holdings listed:
+
+- https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-input/0.3.0/netcdf/index.html
+- https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/training-input/0.3.0/netcdf/index.html
+- https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-output-reference/index.html
+- https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/training-output-reference/index.html
+
+Therefore, S3 data also accessible with `curl` or `wget`. Alternatively, you can click on the html links and download files by mouse click.
+
+%% Cell type:code id: tags:
+
+``` python
+import xarray as xr
+import os
+from subprocess import call
+xr.set_options(display_style='text')
+```
+
+%% Output
+
+    /opt/conda/lib/python3.8/site-packages/xarray/backends/cfgrib_.py:27: UserWarning: Failed to load cfgrib - most likely there is a problem accessing the ecCodes library. Try `import cfgrib` to get the full error message
+      warnings.warn(
+
+    <xarray.core.options.set_options at 0x7f5170570520>
+
+%% Cell type:code id: tags:
+
+``` python
+# version of the EWC data
+version = '0.3.0'
+```
+
+%% Cell type:markdown id: tags:
+
+# `hindcast-input`
+
+on-the-fly hindcasts corresponding to the 2020 forecasts
+
+%% Cell type:code id: tags:
+
+``` python
+parameter = 't2m'
+date = '20200102'
+model = 'ecmwf'
+```
+
+%% Cell type:code id: tags:
+
+``` python
+url = f'https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/training-input/{version}/netcdf/{model}-hindcast-{parameter}-{date}.nc'
+os.system(f'wget {url}')
+
+assert os.path.exists(f'{model}-hindcast-{parameter}-{date}.nc')
+```
+
+%% Cell type:markdown id: tags:
+
+# `forecast-input`
+
+2020
+
+%% Cell type:code id: tags:
+
+``` python
+url = f'https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-input/{version}/netcdf/{model}-forecast-{parameter}-{date}.nc'
+os.system(f'wget {url}')
+
+assert os.path.exists(f'{model}-forecast-{parameter}-{date}.nc')
+```
+
+%% Cell type:markdown id: tags:
+
+# `hindcast-like-observations`
+
+CPC observations formatted like training period hindcasts
+
+%% Cell type:code id: tags:
+
+``` python
+url = f'https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/training-output-reference/{parameter}-{date}.nc'
+os.system(f'wget {url}')
+
+assert os.path.exists(f'{parameter}-{date}.nc')
+```
+
+%% Cell type:markdown id: tags:
+
+# `forecast-like-observations`
+
+CPC observations formatted like test period 2020 forecasts
+
+%% Cell type:code id: tags:
+
+``` python
+url = f'https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-output-reference/{parameter}-{date}.nc'
+os.system(f'wget {url}')
+
+assert os.path.exists(f'{parameter}-{date}.nc')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
--- a/notebooks/mean_bias_reduction.ipynb
+++ b/notebooks/mean_bias_reduction.ipynb
 %% Cell type:markdown id: tags:

 # Train ML model to correct predictions of week 3-4 & 5-6

 This notebook create a Machine Learning `ML_model` to predict weeks 3-4 & 5-6 based on `S2S` weeks 3-4 & 5-6 forecasts and is compared to `CPC` observations for the [`s2s-ai-challenge`](https://s2s-ai-challenge.github.io/).

 %% Cell type:markdown id: tags:

 # Synopsis

 %% Cell type:markdown id: tags:

 ## Method: `mean bias reduction`

 - calculate the mean bias from 2000-2019 deterministic ensemble mean forecast
 - remove that mean bias from 2020 forecast deterministic ensemble mean forecast
 - no Machine Learning used here

 %% Cell type:markdown id: tags:

 ## Data used

 type: renku datasets

 Training-input for Machine Learning model:
 - hindcasts of models:
    - ECMWF: `ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr`

 Forecast-input for Machine Learning model:
 - real-time 2020 forecasts of models:
    - ECMWF: `ecmwf_forecast-input_2020_biweekly_deterministic.zarr`

 Compare Machine Learning model forecast against against ground truth:
 - `CPC` observations:
    - `hindcast-like-observations_biweekly_deterministic.zarr`
    - `forecast-like-observations_2020_biweekly_deterministic.zarr`

 %% Cell type:markdown id: tags:

 ## Resources used
 for training, details in reproducibility

 - platform: MPI-M supercompute 1 Node
 - memory: 64 GB
 - processors: 36 CPU
 - storage required: 10 GB

 %% Cell type:markdown id: tags:

 ## Safeguards

 All points have to be [x] checked. If not, your submission is invalid.

 Changes to the code after submissions are not possible, as the `commit` before the `tag` will be reviewed.
 (Only in exceptions and if previous effort in reproducibility can be found, it may be allowed to improve readability and reproducibility after November 1st 2021.)

 %% Cell type:markdown id: tags:

 ### Safeguards to prevent [overfitting](https://en.wikipedia.org/wiki/Overfitting?wprov=sfti1)

 If the organizers suspect overfitting, your contribution can be disqualified.

  - [x] We didnt use 2020 observations in training (explicit overfitting and cheating)
  - [x] We didnt repeatedly verify my model on 2020 observations and incrementally improved my RPSS (implicit overfitting)
  - [x] We provide RPSS scores for the training period with script `skill_by_year`, see in section 6.3 `predict`.
  - [x] We tried our best to prevent [data leakage](https://en.wikipedia.org/wiki/Leakage_(machine_learning)?wprov=sfti1).
  - [x] We honor the `train-validate-test` [split principle](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets). This means that the hindcast data is split into `train` and `validate`, whereas `test` is withheld.
  - [x] We did use `test` explicitly in training or implicitly in incrementally adjusting parameters.
  - [x] We considered [cross-validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)).

 %% Cell type:markdown id: tags:

 ### Safeguards for Reproducibility
 Notebook/code must be independently reproducible from scratch by the organizers (after the competition), if not possible: no prize
  - [x] All training data is publicly available (no pre-trained private neural networks, as they are not reproducible for us)
  - [x] Code is well documented, readable and reproducible.
-  - [x] Code to reproduce training and predictions should run within a day on the described architecture. If the training takes longer than a day, please justify why this is needed. Please do not submit training piplelines, which take weeks to train.
+  - [x] Code to reproduce training and predictions is preferred to run within a day on the described architecture. If the training takes longer than a day, please justify why this is needed. Please do not submit training piplelines, which take weeks to train.

 %% Cell type:markdown id: tags:

 # Imports

 %% Cell type:code id: tags:

 ``` python
 import xarray as xr
 xr.set_options(display_style='text')
-import numpy as np
-
-from dask.utils import format_bytes
-import xskillscore as xs
 ```

+%% Output
+
+    <xarray.core.options.set_options at 0x7f05cc486340>
+
 %% Cell type:markdown id: tags:

 # Get training data

 preprocessing of input data may be done in separate notebook/script

 %% Cell type:markdown id: tags:

 ## Hindcast

 get weekly initialized hindcasts

 %% Cell type:code id: tags:

 ``` python
 # preprocessed as renku dataset
 !renku storage pull ../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr
 ```

 %% Output

    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m

 %% Cell type:code id: tags:

 ``` python
 hind_2000_2019 = xr.open_zarr("../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr", consolidated=True)
 ```

 %% Cell type:code id: tags:

 ``` python
 # preprocessed as renku dataset
 !renku storage pull ../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr
 ```

 %% Output

    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m

 %% Cell type:code id: tags:

 ``` python
 fct_2020 = xr.open_zarr("../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr", consolidated=True)
 ```

 %% Cell type:markdown id: tags:

 ## Observations
 corresponding to hindcasts

 %% Cell type:code id: tags:

 ``` python
 # preprocessed as renku dataset
 !renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
 ```

 %% Output

    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m

 %% Cell type:code id: tags:

 ``` python
 obs_2000_2019 = xr.open_zarr("../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr", consolidated=True)
 ```

 %% Cell type:code id: tags:

 ``` python
 # preprocessed as renku dataset
 !renku storage pull ../data/forecast-like-observations_2020_biweekly_deterministic.zarr
 ```

 %% Output

    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m

 %% Cell type:code id: tags:

 ``` python
 obs_2020 = xr.open_zarr("../data/forecast-like-observations_2020_biweekly_deterministic.zarr", consolidated=True)
 ```

 %% Cell type:markdown id: tags:

 # no ML model

 %% Cell type:markdown id: tags:

 Here, we just remove the mean bias from the ensemble mean forecast.

 %% Cell type:code id: tags:

 ``` python
-bias_2000_2019 = (hind_2000_2019.mean('realization') - obs_2000_2019).groupby('forecast_time.weekofyear').mean().compute()
+from scripts import add_year_week_coords
+obs_2000_2019 = add_year_week_coords(obs_2000_2019)
+hind_2000_2019 = add_year_week_coords(hind_2000_2019)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+bias_2000_2019 = (hind_2000_2019.mean('realization') - obs_2000_2019).groupby('week').mean().compute()
 ```

 %% Output

-    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
-      FutureWarning,
-    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)

 %% Cell type:markdown id: tags:

 ## `predict`

-Create predictions and print `mean(variable, lead_time, longitude, weighted latitude)` RPSS for all years as calculated by `skill_by_year`. For now RPS, todo: change to RPSS.
+Create predictions and print `mean(variable, lead_time, longitude, weighted latitude)` RPSS for all years as calculated by `skill_by_year`.

 %% Cell type:code id: tags:

 ``` python
 from scripts import make_probabilistic
 ```

 %% Cell type:code id: tags:

 ``` python
 !renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc
 ```

 %% Output

    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m

 %% Cell type:code id: tags:

 ``` python
-cache_path='../data'
-tercile_file = f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc'
+tercile_file = f'../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc'
 tercile_edges = xr.open_dataset(tercile_file)
 ```

 %% Cell type:code id: tags:

 ``` python
-# this is not useful but results have expected dimensions
-# actually train for each lead_time
-
 def create_predictions(fct, bias):
-    preds = fct - bias.sel(weekofyear=fct.forecast_time.dt.weekofyear)
+    if 'week' not in fct.coords:
+        fct = add_year_week_coords(fct)
+    preds = fct - bias.sel(week=fct.week)
    preds = make_probabilistic(preds, tercile_edges)
-    return preds
+    return preds.astype('float32')
 ```

 %% Cell type:markdown id: tags:

 ### `predict` training period in-sample

 %% Cell type:code id: tags:

 ``` python
-#!renku storage pull ../data/forecast-like-observations_2020_biweekly_terciled.nc
+!renku storage pull ../data/forecast-like-observations_2020_biweekly_terciled.nc
 ```

-%% Cell type:code id: tags:
+%% Output

-``` python
-#!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr
-```
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m

 %% Cell type:code id: tags:

 ``` python
-from scripts import skill_by_year
+!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr
 ```

+%% Output
+
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
+
 %% Cell type:code id: tags:

 ``` python
 preds_is = create_predictions(hind_2000_2019, bias_2000_2019).compute()
 ```

-%% Output
+%% Cell type:code id: tags:

-    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
-      FutureWarning,
-    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
-      FutureWarning,
+``` python
+from scripts import skill_by_year
+```

 %% Cell type:code id: tags:

 ``` python
 skill_by_year(preds_is)
 ```

-%% Output
-
-               RPS
-    year
-    2000  0.463290
-    2001  0.501615
-    2002  0.498100
-    2003  0.499914
-    2004  0.533146
-    2005  0.486682
-    2006  0.492787
-    2007  0.555934
-    2008  0.507756
-    2009  0.515228
-    2010  0.498032
-    2011  0.548217
-    2012  0.556501
-    2013  0.519008
-    2014  0.521487
-    2015  0.507068
-    2016  0.520476
-    2017  0.590591
-    2018  0.604847
-    2019  0.546725
-
 %% Cell type:markdown id: tags:

 ### `predict` test

 %% Cell type:code id: tags:

 ``` python
 preds_test = create_predictions(fct_2020, bias_2000_2019)
 ```

-%% Output
-
-    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
-      FutureWarning,
-    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
-      FutureWarning,
-
 %% Cell type:code id: tags:

 ``` python
 skill_by_year(preds_test)
 ```

-%% Output
-
-               RPS
-    year
-    2020  0.520714
-
 %% Cell type:markdown id: tags:

 # Submission

 %% Cell type:code id: tags:

 ``` python
 from scripts import assert_predictions_2020
 assert_predictions_2020(preds_test)
 ```

 %% Cell type:code id: tags:

 ``` python
-del preds_test['weekofyear']
+preds_test.attrs = {'author': 'Aaron Spring', 'author_email': 'aaron.spring@mpimet.mpg.de',
+               'comment': 'created for the s2s-ai-challenge as a template for the website',
+               'notebook': 'mean_bias_reduction.ipynb',
+               'website': 'https://s2s-ai-challenge.github.io/#evaluation'}
+
+html_repr = xr.core.formatting_html.dataset_repr(preds_test)
+
+with open('submission_template_repr.html', 'w') as myFile:
+    myFile.write(html_repr)
 ```

 %% Cell type:code id: tags:

 ``` python
 preds_test.to_netcdf('../submissions/ML_prediction_2020.nc')
 ```

 %% Cell type:code id: tags:

 ``` python
-#!git add ../submissions/ML_prediction_2020.nc
+# !git add ../submissions/ML_prediction_2020.nc
+# !git add mean_bias_reduction.ipynb
 ```

 %% Cell type:code id: tags:

 ``` python
 #!git commit -m "template_test no ML mean bias reduction" # whatever message you want
 ```

 %% Cell type:code id: tags:

 ``` python
-#!git tag "submission-no_ML_mean_bias_reduction-0.0.1" # if this is to be checked by scorer, only the last submitted==tagged version will be considered
+#!git tag "submission-no_ML_mean_bias_reduction-0.0.2" # if this is to be checked by scorer, only the last submitted==tagged version will be considered
 ```

 %% Cell type:code id: tags:

 ``` python
 #!git push --tags
 ```

-%% Cell type:code id: tags:
-
-``` python
-```
-
 %% Cell type:markdown id: tags:

 # Reproducibility

 %% Cell type:markdown id: tags:

 ## memory

 %% Cell type:code id: tags:

 ``` python
 # https://phoenixnap.com/kb/linux-commands-check-memory-usage
 !free -g
 ```

-%% Output
-
-                 total       used       free     shared    buffers     cached
-    Mem:            62         21         41          0          0          5
-    -/+ buffers/cache:         15         47
-    Swap:            0          0          0
-
 %% Cell type:markdown id: tags:

 ## CPU

 %% Cell type:code id: tags:

 ``` python
 !lscpu
 ```

-%% Output
-
-    Architecture:          x86_64
-    CPU op-mode(s):        32-bit, 64-bit
-    Byte Order:            Little Endian
-    CPU(s):                72
-    On-line CPU(s) list:   0-71
-    Thread(s) per core:    2
-    Core(s) per socket:    18
-    Socket(s):             2
-    NUMA node(s):          2
-    Vendor ID:             GenuineIntel
-    CPU family:            6
-    Model:                 79
-    Model name:            Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10GHz
-    Stepping:              1
-    CPU MHz:               1200.000
-    BogoMIPS:              4190.00
-    Virtualization:        VT-x
-    L1d cache:             32K
-    L1i cache:             32K
-    L2 cache:              256K
-    L3 cache:              46080K
-    NUMA node0 CPU(s):     0-17,36-53
-    NUMA node1 CPU(s):     18-35,54-71
-
 %% Cell type:markdown id: tags:

 ## software

 %% Cell type:code id: tags:

 ``` python
 !conda list
 ```

-%% Output
-
-    # packages in environment at /opt/conda:
-    #
-    # Name                    Version                   Build  Channel
-    _libgcc_mutex             0.1                 conda_forge    conda-forge
-    _openmp_mutex             4.5                       1_gnu    conda-forge
-    _tflow_select             2.3.0                       mkl    defaults
-    absl-py                   0.12.0           py38h06a4308_0    defaults
-    aiobotocore               1.2.2              pyhd3eb1b0_0    defaults
-    aiohttp                   3.7.4.post0              pypi_0    pypi
-    aioitertools              0.7.1              pyhd3eb1b0_0    defaults
-    alembic                   1.4.3              pyh9f0ad1d_0    conda-forge
-    ansiwrap                  0.8.4                    pypi_0    pypi
-    appdirs                   1.4.4                    pypi_0    pypi
-    argcomplete               1.12.2                   pypi_0    pypi
-    argon2-cffi               20.1.0           py38h497a2fe_2    conda-forge
-    argparse                  1.4.0                    pypi_0    pypi
-    asciitree                 0.3.3                      py_2    defaults
-    astunparse                1.6.3                      py_0    defaults
-    async-timeout             3.0.1                    pypi_0    pypi
-    async_generator           1.10                       py_0    conda-forge
-    attrs                     20.3.0             pyhd3deb0d_0    conda-forge
-    backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
-    backports                 1.0                        py_2    conda-forge
-    backports.functools_lru_cache 1.6.1                      py_0    conda-forge
-    binutils_impl_linux-64    2.35.1               h193b22a_1    conda-forge
-    binutils_linux-64         2.35                h67ddf6f_30    conda-forge
-    black                     20.8b1                   pypi_0    pypi
-    blas                      1.0                         mkl    defaults
-    bleach                    3.2.1              pyh9f0ad1d_0    conda-forge
-    blinker                   1.4                        py_1    conda-forge
-    bokeh                     2.3.2            py38h06a4308_0    defaults
-    botocore                  1.20.78            pyhd3eb1b0_1    defaults
-    bottleneck                1.3.2            py38heb32a55_1    defaults
-    branca                    0.3.1                    pypi_0    pypi
-    brotlipy                  0.7.0           py38h497a2fe_1001    conda-forge
-    bzip2                     1.0.8                h7f98852_4    conda-forge
-    c-ares                    1.17.1               h36c2ea0_0    conda-forge
-    ca-certificates           2021.4.13            h06a4308_1    defaults
-    cachetools                4.2.2              pyhd3eb1b0_0    defaults
-    cdsapi                    0.5.1                    pypi_0    pypi
-    certifi                   2020.12.5        py38h06a4308_0    defaults
-    certipy                   0.1.3                      py_0    conda-forge
-    cffi                      1.14.4           py38ha65f79e_1    conda-forge
-    cfgrib                    0.9.9.0            pyhd8ed1ab_1    conda-forge
-    cftime                    1.5.0            py38h6323ea4_0    defaults
-    chardet                   4.0.0            py38h578d9bd_1    conda-forge
-    click                     7.1.2                    pypi_0    pypi
-    climetlab                 0.7.0                    pypi_0    pypi
-    climetlab-s2s-ai-challenge 0.6.2                    pypi_0    pypi
-    cloudpickle               1.6.0                      py_0    defaults
-    colorama                  0.4.4                    pypi_0    pypi
-    conda                     4.9.2            py38h578d9bd_0    conda-forge
-    conda-package-handling    1.7.2            py38h8df0ef7_0    conda-forge
-    configargparse            1.4.1                    pypi_0    pypi
-    configurable-http-proxy   1.3.0                         0    conda-forge
-    coverage                  5.5              py38h27cfd23_2    defaults
-    cryptography              3.3.1            py38h2b97feb_1    conda-forge
-    curl                      7.71.1               he644dc0_8    conda-forge
-    cycler                    0.10.0                   py38_0    defaults
-    cython                    0.29.23          py38h2531618_0    defaults
-    cytoolz                   0.11.0           py38h7b6447c_0    defaults
-    dask                      2021.4.0           pyhd3eb1b0_0    defaults
-    dask-core                 2021.4.0           pyhd3eb1b0_0    defaults
-    decorator                 4.4.2                      py_0    conda-forge
-    defusedxml                0.6.0                      py_0    conda-forge
-    distributed               2021.5.0         py38h06a4308_0    defaults
-    distro                    1.5.0                    pypi_0    pypi
-    eccodes                   2.18.0               hf05d9b7_0    conda-forge
-    ecmwf-api-client          1.6.1                    pypi_0    pypi
-    ecmwflibs                 0.3.7                    pypi_0    pypi
-    entrypoints               0.3             pyhd8ed1ab_1003    conda-forge
-    fasteners                 0.16               pyhd3eb1b0_0    defaults
-    findlibs                  0.0.2                    pypi_0    pypi
-    folium                    0.12.1                   pypi_0    pypi
-    freetype                  2.10.4               h5ab3b9f_0    defaults
-    fsspec                    0.9.0              pyhd3eb1b0_0    defaults
-    gast                      0.4.0                      py_0    defaults
-    gcc_impl_linux-64         9.3.0               h70c0ae5_18    conda-forge
-    gcc_linux-64              9.3.0               hf25ea35_30    conda-forge
-    gitdb                     4.0.7                    pypi_0    pypi
-    gitpython                 3.1.14                   pypi_0    pypi
-    google-auth               1.30.1             pyhd3eb1b0_0    defaults
-    google-auth-oauthlib      0.4.4              pyhd3eb1b0_0    defaults
-    google-pasta              0.2.0                      py_0    defaults
-    grpcio                    1.36.1           py38h2157cd5_1    defaults
-    gxx_impl_linux-64         9.3.0               hd87eabc_18    conda-forge
-    gxx_linux-64              9.3.0               h3fbe746_30    conda-forge
-    h5py                      2.10.0           py38hd6299e0_1    defaults
-    hdf4                      4.2.13               h3ca952b_2    defaults
-    hdf5                      1.10.6          nompi_h3c11f04_101    conda-forge
-    heapdict                  1.0.1                      py_0    defaults
-    icu                       68.1                 h58526e2_0    conda-forge
-    idna                      2.10               pyh9f0ad1d_0    conda-forge
-    importlib-metadata        3.4.0            py38h578d9bd_0    conda-forge
-    importlib_metadata        3.4.0                hd8ed1ab_0    conda-forge
-    intel-openmp              2021.2.0           h06a4308_610    defaults
-    ipykernel                 5.4.2            py38h81c977d_0    conda-forge
-    ipython                   7.19.0           py38h81c977d_2    conda-forge
-    ipython_genutils          0.2.0                      py_1    conda-forge
-    jasper                    1.900.1              hd497a04_4    defaults
-    jedi                      0.17.2           py38h578d9bd_1    conda-forge
-    jinja2                    2.11.2             pyh9f0ad1d_0    conda-forge
-    jmespath                  0.10.0                     py_0    defaults
-    joblib                    1.0.1              pyhd3eb1b0_0    defaults
-    jpeg                      9d                   h36c2ea0_0    conda-forge
-    json5                     0.9.5              pyh9f0ad1d_0    conda-forge
-    jsonschema                3.2.0                      py_2    conda-forge
-    jupyter-server-proxy      1.6.0                    pypi_0    pypi
-    jupyter_client            6.1.11             pyhd8ed1ab_1    conda-forge
-    jupyter_core              4.7.0            py38h578d9bd_0    conda-forge
-    jupyter_telemetry         0.1.0              pyhd8ed1ab_1    conda-forge
-    jupyterhub                1.2.2                    pypi_0    pypi
-    jupyterlab                2.2.9                      py_0    conda-forge
-    jupyterlab-git            0.23.3                   pypi_0    pypi
-    jupyterlab_pygments       0.1.2              pyh9f0ad1d_0    conda-forge
-    jupyterlab_server         1.2.0                      py_0    conda-forge
-    keras-preprocessing       1.1.2              pyhd3eb1b0_0    defaults
-    kernel-headers_linux-64   2.6.32              h77966d4_13    conda-forge
-    kiwisolver                1.3.1            py38h2531618_0    defaults
-    krb5                      1.17.2               h926e7f8_0    conda-forge
-    lcms2                     2.12                 h3be6417_0    defaults
-    ld_impl_linux-64          2.35.1               hea4e1c9_1    conda-forge
-    libaec                    1.0.4                he6710b0_1    defaults
-    libcurl                   7.71.1               hcdd3856_8    conda-forge
-    libedit                   3.1.20191231         he28a2e2_2    conda-forge
-    libev                     4.33                 h516909a_1    conda-forge
-    libffi                    3.3                  h58526e2_2    conda-forge
-    libgcc-devel_linux-64     9.3.0               h7864c58_18    conda-forge
-    libgcc-ng                 9.3.0               h2828fa1_18    conda-forge
-    libgfortran-ng            7.3.0                hdf63c60_0    defaults
-    libgomp                   9.3.0               h2828fa1_18    conda-forge
-    libllvm10                 10.0.1               hbcb73fb_5    defaults
-    libnetcdf                 4.7.4           nompi_h56d31a8_107    conda-forge
-    libnghttp2                1.41.0               h8cfc5f6_2    conda-forge
-    libpng                    1.6.37               hbc83047_0    defaults
-    libprotobuf               3.14.0               h8c45485_0    defaults
-    libsodium                 1.0.18               h36c2ea0_1    conda-forge
-    libssh2                   1.9.0                hab1572f_5    conda-forge
-    libstdcxx-devel_linux-64  9.3.0               hb016644_18    conda-forge
-    libstdcxx-ng              9.3.0               h6de172a_18    conda-forge
-    libtiff                   4.1.0                h2733197_1    defaults
-    libuv                     1.40.0               h7f98852_0    conda-forge
-    llvmlite                  0.36.0           py38h612dafd_4    defaults
-    locket                    0.2.1            py38h06a4308_1    defaults
-    lz4-c                     1.9.3                h2531618_0    defaults
-    magics                    1.5.6                    pypi_0    pypi
-    mako                      1.1.4              pyh44b312d_0    conda-forge
-    markdown                  3.3.4            py38h06a4308_0    defaults
-    markupsafe                1.1.1            py38h497a2fe_3    conda-forge
-    matplotlib-base           3.3.4            py38h62a2d02_0    defaults
-    mistune                   0.8.4           py38h497a2fe_1003    conda-forge
-    mkl                       2021.2.0           h06a4308_296    defaults
-    mkl-service               2.3.0            py38h27cfd23_1    defaults
-    mkl_fft                   1.3.0            py38h42c9631_2    defaults
-    mkl_random                1.2.1            py38ha9443f7_2    defaults
-    monotonic                 1.5                        py_0    defaults
-    msgpack-python            1.0.2            py38hff7bd54_1    defaults
-    multidict                 5.1.0            py38h27cfd23_2    defaults
-    mypy-extensions           0.4.3                    pypi_0    pypi
-    nbclient                  0.5.0                    pypi_0    pypi
-    nbconvert                 6.0.7            py38h578d9bd_3    conda-forge
-    nbdime                    2.1.0                    pypi_0    pypi
-    nbformat                  5.1.2              pyhd8ed1ab_1    conda-forge
-    nbresuse                  0.4.0                    pypi_0    pypi
-    ncurses                   6.2                  h58526e2_4    conda-forge
-    nest-asyncio              1.4.3              pyhd8ed1ab_0    conda-forge
-    netcdf4                   1.5.6                    pypi_0    pypi
-    nodejs                    15.3.0               h25f6087_0    conda-forge
-    notebook                  6.2.0            py38h578d9bd_0    conda-forge
-    numba                     0.53.1           py38ha9443f7_0    defaults
-    numcodecs                 0.7.3            py38h2531618_0    defaults
-    numpy                     1.20.2           py38h2d18471_0    defaults
-    numpy-base                1.20.2           py38hfae3a4d_0    defaults
-    oauthlib                  3.0.1                      py_0    conda-forge
-    olefile                   0.46                       py_0    defaults
-    openssl                   1.1.1k               h27cfd23_0    defaults
-    opt_einsum                3.3.0              pyhd3eb1b0_1    defaults
-    packaging                 20.8               pyhd3deb0d_0    conda-forge
-    pamela                    1.0.0                      py_0    conda-forge
-    pandas                    1.2.4            py38h2531618_0    defaults
-    pandoc                    2.11.3.2             h7f98852_0    conda-forge
-    pandocfilters             1.4.2                      py_1    conda-forge
-    papermill                 2.3.1                    pypi_0    pypi
-    parso                     0.7.1              pyh9f0ad1d_0    conda-forge
-    partd                     1.2.0              pyhd3eb1b0_0    defaults
-    pathspec                  0.8.1                    pypi_0    pypi
-    pdbufr                    0.8.2                    pypi_0    pypi
-    pexpect                   4.8.0              pyh9f0ad1d_2    conda-forge
-    pickleshare               0.7.5                   py_1003    conda-forge
-    pillow                    8.2.0            py38he98fc37_0    defaults
-    pip                       21.0.1                   pypi_0    pypi
-    pipx                      0.16.1.0                 pypi_0    pypi
-    powerline-shell           0.7.0                    pypi_0    pypi
-    prometheus_client         0.9.0              pyhd3deb0d_0    conda-forge
-    prompt-toolkit            3.0.10             pyha770c72_0    conda-forge
-    properscoring             0.1                        py_0    conda-forge
-    protobuf                  3.14.0           py38h2531618_1    defaults
-    psutil                    5.8.0            py38h27cfd23_1    defaults
-    ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
-    pyasn1                    0.4.8                      py_0    defaults
-    pyasn1-modules            0.2.8                      py_0    defaults
-    pycosat                   0.6.3           py38h497a2fe_1006    conda-forge
-    pycparser                 2.20               pyh9f0ad1d_2    conda-forge
-    pycurl                    7.43.0.6         py38h996a351_1    conda-forge
-    pygments                  2.7.4              pyhd8ed1ab_0    conda-forge
-    pyjwt                     2.0.1              pyhd8ed1ab_0    conda-forge
-    pyodc                     1.0.3                    pypi_0    pypi
-    pyopenssl                 20.0.1             pyhd8ed1ab_0    conda-forge
-    pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
-    pyrsistent                0.17.3           py38h497a2fe_2    conda-forge
-    pysocks                   1.7.1            py38h578d9bd_3    conda-forge
-    python                    3.8.6           hffdb5ce_4_cpython    conda-forge
-    python-dateutil           2.8.1                      py_0    conda-forge
-    python-eccodes            2021.03.0        py38hb5d20a5_0    conda-forge
-    python-editor             1.0.4                      py_0    conda-forge
-    python-flatbuffers        1.12               pyhd3eb1b0_0    defaults
-    python-json-logger        2.0.1              pyh9f0ad1d_0    conda-forge
-    python_abi                3.8                      1_cp38    conda-forge
-    pytz                      2021.1             pyhd3eb1b0_0    defaults
-    pyyaml                    5.4.1                    pypi_0    pypi
-    pyzmq                     21.0.1           py38h3d7ac18_0    conda-forge
-    readline                  8.0                  he28a2e2_2    conda-forge
-    regex                     2021.4.4                 pypi_0    pypi
-    requests                  2.25.1             pyhd3deb0d_0    conda-forge
-    requests-oauthlib         1.3.0                      py_0    defaults
-    rsa                       4.7.2              pyhd3eb1b0_1    defaults
-    ruamel.yaml               0.16.12          py38h497a2fe_2    conda-forge
-    ruamel.yaml.clib          0.2.2            py38h497a2fe_2    conda-forge
-    ruamel_yaml               0.15.80         py38h497a2fe_1003    conda-forge
-    s3fs                      0.6.0              pyhd3eb1b0_0    defaults
-    scikit-learn              0.24.2           py38ha9443f7_0    defaults
-    scipy                     1.6.2            py38had2a1c9_1    defaults
-    send2trash                1.5.0                      py_0    conda-forge
-    setuptools                49.6.0           py38h578d9bd_3    conda-forge
-    simpervisor               0.4                      pypi_0    pypi
-    six                       1.15.0             pyh9f0ad1d_0    conda-forge
-    sklearn-xarray            0.4.0                    pypi_0    pypi
-    smmap                     4.0.0                    pypi_0    pypi
-    sortedcontainers          2.3.0              pyhd3eb1b0_0    defaults
-    sqlalchemy                1.3.22           py38h497a2fe_1    conda-forge
-    sqlite                    3.34.0               h74cdb3f_0    conda-forge
-    sysroot_linux-64          2.12                h77966d4_13    conda-forge
-    tbb                       2020.3               hfd86e86_0    defaults
-    tblib                     1.7.0                      py_0    defaults
-    tenacity                  7.0.0                    pypi_0    pypi
-    tensorboard               2.4.0              pyhc547734_0    defaults
-    tensorboard-plugin-wit    1.6.0                      py_0    defaults
-    tensorflow                2.4.1           mkl_py38hb2083e0_0    defaults
-    tensorflow-base           2.4.1           mkl_py38h43e0292_0    defaults
-    tensorflow-estimator      2.4.1              pyheb71bc4_0    defaults
-    termcolor                 1.1.0            py38h06a4308_1    defaults
-    terminado                 0.9.2            py38h578d9bd_0    conda-forge
-    testpath                  0.4.4                      py_0    conda-forge
-    textwrap3                 0.9.2                    pypi_0    pypi
-    threadpoolctl             2.1.0              pyh5ca1d4c_0    defaults
-    tini                      0.18.0            h14c3975_1001    conda-forge
-    tk                        8.6.10               h21135ba_1    conda-forge
-    toml                      0.10.2                   pypi_0    pypi
-    toolz                     0.11.1             pyhd3eb1b0_0    defaults
-    tornado                   6.1              py38h497a2fe_1    conda-forge
-    tqdm                      4.56.0             pyhd8ed1ab_0    conda-forge
-    traitlets                 5.0.5                      py_0    conda-forge
-    typed-ast                 1.4.2                    pypi_0    pypi
-    typing-extensions         3.7.4.3              hd3eb1b0_0    defaults
-    typing_extensions         3.7.4.3            pyh06a4308_0    defaults
-    urllib3                   1.26.2             pyhd8ed1ab_0    conda-forge
-    userpath                  1.4.2                    pypi_0    pypi
-    wcwidth                   0.2.5              pyh9f0ad1d_2    conda-forge
-    webencodings              0.5.1                      py_1    conda-forge
-    werkzeug                  1.0.1              pyhd3eb1b0_0    defaults
-    wheel                     0.36.2             pyhd3deb0d_0    conda-forge
-    wrapt                     1.12.1           py38h7b6447c_1    defaults
-    xarray                    0.18.0             pyhd3eb1b0_1    defaults
-    xhistogram                0.1.2              pyhd8ed1ab_0    conda-forge
-    xskillscore               0.0.20             pyhd8ed1ab_1    conda-forge
-    xz                        5.2.5                h516909a_1    conda-forge
-    yaml                      0.2.5                h516909a_0    conda-forge
-    yarl                      1.6.3            py38h27cfd23_0    defaults
-    zarr                      2.8.1              pyhd3eb1b0_0    defaults
-    zeromq                    4.3.3                h58526e2_3    conda-forge
-    zict                      2.0.0              pyhd3eb1b0_0    defaults
-    zipp                      3.4.0                      py_0    conda-forge
-    zlib                      1.2.11            h516909a_1010    conda-forge
-    zstd                      1.4.9                haebb681_0    defaults
-
 %% Cell type:code id: tags:

 ``` python
 ```

 %% Cell type:markdown id: tags:

 # Train ML model to correct predictions of week 3-4 & 5-6

 This notebook create a Machine Learning `ML_model` to predict weeks 3-4 & 5-6 based on `S2S` weeks 3-4 & 5-6 forecasts and is compared to `CPC` observations for the [`s2s-ai-challenge`](https://s2s-ai-challenge.github.io/).

 %% Cell type:markdown id: tags:

 # Synopsis

 %% Cell type:markdown id: tags:

 ## Method: `mean bias reduction`

 - calculate the mean bias from 2000-2019 deterministic ensemble mean forecast
 - remove that mean bias from 2020 forecast deterministic ensemble mean forecast
 - no Machine Learning used here

 %% Cell type:markdown id: tags:

 ## Data used

 type: renku datasets

 Training-input for Machine Learning model:
 - hindcasts of models:
    - ECMWF: `ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr`

 Forecast-input for Machine Learning model:
 - real-time 2020 forecasts of models:
    - ECMWF: `ecmwf_forecast-input_2020_biweekly_deterministic.zarr`

 Compare Machine Learning model forecast against against ground truth:
 - `CPC` observations:
    - `hindcast-like-observations_biweekly_deterministic.zarr`
    - `forecast-like-observations_2020_biweekly_deterministic.zarr`

 %% Cell type:markdown id: tags:

 ## Resources used
 for training, details in reproducibility

 - platform: MPI-M supercompute 1 Node
 - memory: 64 GB
 - processors: 36 CPU
 - storage required: 10 GB

 %% Cell type:markdown id: tags:

 ## Safeguards

 All points have to be [x] checked. If not, your submission is invalid.

 Changes to the code after submissions are not possible, as the `commit` before the `tag` will be reviewed.
 (Only in exceptions and if previous effort in reproducibility can be found, it may be allowed to improve readability and reproducibility after November 1st 2021.)

 %% Cell type:markdown id: tags:

 ### Safeguards to prevent [overfitting](https://en.wikipedia.org/wiki/Overfitting?wprov=sfti1)

 If the organizers suspect overfitting, your contribution can be disqualified.

  - [x] We didnt use 2020 observations in training (explicit overfitting and cheating)
  - [x] We didnt repeatedly verify my model on 2020 observations and incrementally improved my RPSS (implicit overfitting)
  - [x] We provide RPSS scores for the training period with script `skill_by_year`, see in section 6.3 `predict`.
  - [x] We tried our best to prevent [data leakage](https://en.wikipedia.org/wiki/Leakage_(machine_learning)?wprov=sfti1).
  - [x] We honor the `train-validate-test` [split principle](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets). This means that the hindcast data is split into `train` and `validate`, whereas `test` is withheld.
  - [x] We did use `test` explicitly in training or implicitly in incrementally adjusting parameters.
  - [x] We considered [cross-validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)).

 %% Cell type:markdown id: tags:

 ### Safeguards for Reproducibility
 Notebook/code must be independently reproducible from scratch by the organizers (after the competition), if not possible: no prize
  - [x] All training data is publicly available (no pre-trained private neural networks, as they are not reproducible for us)
  - [x] Code is well documented, readable and reproducible.
-  - [x] Code to reproduce training and predictions should run within a day on the described architecture. If the training takes longer than a day, please justify why this is needed. Please do not submit training piplelines, which take weeks to train.
+  - [x] Code to reproduce training and predictions is preferred to run within a day on the described architecture. If the training takes longer than a day, please justify why this is needed. Please do not submit training piplelines, which take weeks to train.

 %% Cell type:markdown id: tags:

 # Imports

 %% Cell type:code id: tags:

 ``` python
 import xarray as xr
 xr.set_options(display_style='text')
-import numpy as np
-
-from dask.utils import format_bytes
-import xskillscore as xs
 ```

+%% Output
+
+    <xarray.core.options.set_options at 0x7f05cc486340>
+
 %% Cell type:markdown id: tags:

 # Get training data

 preprocessing of input data may be done in separate notebook/script

 %% Cell type:markdown id: tags:

 ## Hindcast

 get weekly initialized hindcasts

 %% Cell type:code id: tags:

 ``` python
 # preprocessed as renku dataset
 !renku storage pull ../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr
 ```

 %% Output

    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m

 %% Cell type:code id: tags:

 ``` python
 hind_2000_2019 = xr.open_zarr("../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr", consolidated=True)
 ```

 %% Cell type:code id: tags:

 ``` python
 # preprocessed as renku dataset
 !renku storage pull ../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr
 ```

 %% Output

    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m

 %% Cell type:code id: tags:

 ``` python
 fct_2020 = xr.open_zarr("../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr", consolidated=True)
 ```

 %% Cell type:markdown id: tags:

 ## Observations
 corresponding to hindcasts

 %% Cell type:code id: tags:

 ``` python
 # preprocessed as renku dataset
 !renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
 ```

 %% Output

    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m

 %% Cell type:code id: tags:

 ``` python
 obs_2000_2019 = xr.open_zarr("../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr", consolidated=True)
 ```

 %% Cell type:code id: tags:

 ``` python
 # preprocessed as renku dataset
 !renku storage pull ../data/forecast-like-observations_2020_biweekly_deterministic.zarr
 ```

 %% Output

    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m

 %% Cell type:code id: tags:

 ``` python
 obs_2020 = xr.open_zarr("../data/forecast-like-observations_2020_biweekly_deterministic.zarr", consolidated=True)
 ```

 %% Cell type:markdown id: tags:

 # no ML model

 %% Cell type:markdown id: tags:

 Here, we just remove the mean bias from the ensemble mean forecast.

 %% Cell type:code id: tags:

 ``` python
-bias_2000_2019 = (hind_2000_2019.mean('realization') - obs_2000_2019).groupby('forecast_time.weekofyear').mean().compute()
+from scripts import add_year_week_coords
+obs_2000_2019 = add_year_week_coords(obs_2000_2019)
+hind_2000_2019 = add_year_week_coords(hind_2000_2019)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+bias_2000_2019 = (hind_2000_2019.mean('realization') - obs_2000_2019).groupby('week').mean().compute()
 ```

 %% Output

-    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
-      FutureWarning,
-    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)

 %% Cell type:markdown id: tags:

 ## `predict`

-Create predictions and print `mean(variable, lead_time, longitude, weighted latitude)` RPSS for all years as calculated by `skill_by_year`. For now RPS, todo: change to RPSS.
+Create predictions and print `mean(variable, lead_time, longitude, weighted latitude)` RPSS for all years as calculated by `skill_by_year`.

 %% Cell type:code id: tags:

 ``` python
 from scripts import make_probabilistic
 ```

 %% Cell type:code id: tags:

 ``` python
 !renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc
 ```

 %% Output

    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m

 %% Cell type:code id: tags:

 ``` python
-cache_path='../data'
-tercile_file = f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc'
+tercile_file = f'../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc'
 tercile_edges = xr.open_dataset(tercile_file)
 ```

 %% Cell type:code id: tags:

 ``` python
-# this is not useful but results have expected dimensions
-# actually train for each lead_time
-
 def create_predictions(fct, bias):
-    preds = fct - bias.sel(weekofyear=fct.forecast_time.dt.weekofyear)
+    if 'week' not in fct.coords:
+        fct = add_year_week_coords(fct)
+    preds = fct - bias.sel(week=fct.week)
    preds = make_probabilistic(preds, tercile_edges)
-    return preds
+    return preds.astype('float32')
 ```

 %% Cell type:markdown id: tags:

 ### `predict` training period in-sample

 %% Cell type:code id: tags:

 ``` python
-#!renku storage pull ../data/forecast-like-observations_2020_biweekly_terciled.nc
+!renku storage pull ../data/forecast-like-observations_2020_biweekly_terciled.nc
 ```

-%% Cell type:code id: tags:
+%% Output

-``` python
-#!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr
-```
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m

 %% Cell type:code id: tags:

 ``` python
-from scripts import skill_by_year
+!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr
 ```

+%% Output
+
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
+
 %% Cell type:code id: tags:

 ``` python
 preds_is = create_predictions(hind_2000_2019, bias_2000_2019).compute()
 ```

-%% Output
+%% Cell type:code id: tags:

-    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
-      FutureWarning,
-    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
-      FutureWarning,
+``` python
+from scripts import skill_by_year
+```

 %% Cell type:code id: tags:

 ``` python
 skill_by_year(preds_is)
 ```

-%% Output
-
-               RPS
-    year
-    2000  0.463290
-    2001  0.501615
-    2002  0.498100
-    2003  0.499914
-    2004  0.533146
-    2005  0.486682
-    2006  0.492787
-    2007  0.555934
-    2008  0.507756
-    2009  0.515228
-    2010  0.498032
-    2011  0.548217
-    2012  0.556501
-    2013  0.519008
-    2014  0.521487
-    2015  0.507068
-    2016  0.520476
-    2017  0.590591
-    2018  0.604847
-    2019  0.546725
-
 %% Cell type:markdown id: tags:

 ### `predict` test

 %% Cell type:code id: tags:

 ``` python
 preds_test = create_predictions(fct_2020, bias_2000_2019)
 ```

-%% Output
-
-    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
-      FutureWarning,
-    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
-      FutureWarning,
-
 %% Cell type:code id: tags:

 ``` python
 skill_by_year(preds_test)
 ```

-%% Output
-
-               RPS
-    year
-    2020  0.520714
-
 %% Cell type:markdown id: tags:

 # Submission

 %% Cell type:code id: tags:

 ``` python
 from scripts import assert_predictions_2020
 assert_predictions_2020(preds_test)
 ```

 %% Cell type:code id: tags:

 ``` python
-del preds_test['weekofyear']
+preds_test.attrs = {'author': 'Aaron Spring', 'author_email': 'aaron.spring@mpimet.mpg.de',
+               'comment': 'created for the s2s-ai-challenge as a template for the website',
+               'notebook': 'mean_bias_reduction.ipynb',
+               'website': 'https://s2s-ai-challenge.github.io/#evaluation'}
+
+html_repr = xr.core.formatting_html.dataset_repr(preds_test)
+
+with open('submission_template_repr.html', 'w') as myFile:
+    myFile.write(html_repr)
 ```

 %% Cell type:code id: tags:

 ``` python
 preds_test.to_netcdf('../submissions/ML_prediction_2020.nc')
 ```

 %% Cell type:code id: tags:

 ``` python
-#!git add ../submissions/ML_prediction_2020.nc
+# !git add ../submissions/ML_prediction_2020.nc
+# !git add mean_bias_reduction.ipynb
 ```

 %% Cell type:code id: tags:

 ``` python
 #!git commit -m "template_test no ML mean bias reduction" # whatever message you want
 ```

 %% Cell type:code id: tags:

 ``` python
-#!git tag "submission-no_ML_mean_bias_reduction-0.0.1" # if this is to be checked by scorer, only the last submitted==tagged version will be considered
+#!git tag "submission-no_ML_mean_bias_reduction-0.0.2" # if this is to be checked by scorer, only the last submitted==tagged version will be considered
 ```

 %% Cell type:code id: tags:

 ``` python
 #!git push --tags
 ```

-%% Cell type:code id: tags:
-
-``` python
-```
-
 %% Cell type:markdown id: tags:

 # Reproducibility

 %% Cell type:markdown id: tags:

 ## memory

 %% Cell type:code id: tags:

 ``` python
 # https://phoenixnap.com/kb/linux-commands-check-memory-usage
 !free -g
 ```

-%% Output
-
-                 total       used       free     shared    buffers     cached
-    Mem:            62         21         41          0          0          5
-    -/+ buffers/cache:         15         47
-    Swap:            0          0          0
-
 %% Cell type:markdown id: tags:

 ## CPU

 %% Cell type:code id: tags:

 ``` python
 !lscpu
 ```

-%% Output
-
-    Architecture:          x86_64
-    CPU op-mode(s):        32-bit, 64-bit
-    Byte Order:            Little Endian
-    CPU(s):                72
-    On-line CPU(s) list:   0-71
-    Thread(s) per core:    2
-    Core(s) per socket:    18
-    Socket(s):             2
-    NUMA node(s):          2
-    Vendor ID:             GenuineIntel
-    CPU family:            6
-    Model:                 79
-    Model name:            Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10GHz
-    Stepping:              1
-    CPU MHz:               1200.000
-    BogoMIPS:              4190.00
-    Virtualization:        VT-x
-    L1d cache:             32K
-    L1i cache:             32K
-    L2 cache:              256K
-    L3 cache:              46080K
-    NUMA node0 CPU(s):     0-17,36-53
-    NUMA node1 CPU(s):     18-35,54-71
-
 %% Cell type:markdown id: tags:

 ## software

 %% Cell type:code id: tags:

 ``` python
 !conda list
 ```

-%% Output
-
-    # packages in environment at /opt/conda:
-    #
-    # Name                    Version                   Build  Channel
-    _libgcc_mutex             0.1                 conda_forge    conda-forge
-    _openmp_mutex             4.5                       1_gnu    conda-forge
-    _tflow_select             2.3.0                       mkl    defaults
-    absl-py                   0.12.0           py38h06a4308_0    defaults
-    aiobotocore               1.2.2              pyhd3eb1b0_0    defaults
-    aiohttp                   3.7.4.post0              pypi_0    pypi
-    aioitertools              0.7.1              pyhd3eb1b0_0    defaults
-    alembic                   1.4.3              pyh9f0ad1d_0    conda-forge
-    ansiwrap                  0.8.4                    pypi_0    pypi
-    appdirs                   1.4.4                    pypi_0    pypi
-    argcomplete               1.12.2                   pypi_0    pypi
-    argon2-cffi               20.1.0           py38h497a2fe_2    conda-forge
-    argparse                  1.4.0                    pypi_0    pypi
-    asciitree                 0.3.3                      py_2    defaults
-    astunparse                1.6.3                      py_0    defaults
-    async-timeout             3.0.1                    pypi_0    pypi
-    async_generator           1.10                       py_0    conda-forge
-    attrs                     20.3.0             pyhd3deb0d_0    conda-forge
-    backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
-    backports                 1.0                        py_2    conda-forge
-    backports.functools_lru_cache 1.6.1                      py_0    conda-forge
-    binutils_impl_linux-64    2.35.1               h193b22a_1    conda-forge
-    binutils_linux-64         2.35                h67ddf6f_30    conda-forge
-    black                     20.8b1                   pypi_0    pypi
-    blas                      1.0                         mkl    defaults
-    bleach                    3.2.1              pyh9f0ad1d_0    conda-forge
-    blinker                   1.4                        py_1    conda-forge
-    bokeh                     2.3.2            py38h06a4308_0    defaults
-    botocore                  1.20.78            pyhd3eb1b0_1    defaults
-    bottleneck                1.3.2            py38heb32a55_1    defaults
-    branca                    0.3.1                    pypi_0    pypi
-    brotlipy                  0.7.0           py38h497a2fe_1001    conda-forge
-    bzip2                     1.0.8                h7f98852_4    conda-forge
-    c-ares                    1.17.1               h36c2ea0_0    conda-forge
-    ca-certificates           2021.4.13            h06a4308_1    defaults
-    cachetools                4.2.2              pyhd3eb1b0_0    defaults
-    cdsapi                    0.5.1                    pypi_0    pypi
-    certifi                   2020.12.5        py38h06a4308_0    defaults
-    certipy                   0.1.3                      py_0    conda-forge
-    cffi                      1.14.4           py38ha65f79e_1    conda-forge
-    cfgrib                    0.9.9.0            pyhd8ed1ab_1    conda-forge
-    cftime                    1.5.0            py38h6323ea4_0    defaults
-    chardet                   4.0.0            py38h578d9bd_1    conda-forge
-    click                     7.1.2                    pypi_0    pypi
-    climetlab                 0.7.0                    pypi_0    pypi
-    climetlab-s2s-ai-challenge 0.6.2                    pypi_0    pypi
-    cloudpickle               1.6.0                      py_0    defaults
-    colorama                  0.4.4                    pypi_0    pypi
-    conda                     4.9.2            py38h578d9bd_0    conda-forge
-    conda-package-handling    1.7.2            py38h8df0ef7_0    conda-forge
-    configargparse            1.4.1                    pypi_0    pypi
-    configurable-http-proxy   1.3.0                         0    conda-forge
-    coverage                  5.5              py38h27cfd23_2    defaults
-    cryptography              3.3.1            py38h2b97feb_1    conda-forge
-    curl                      7.71.1               he644dc0_8    conda-forge
-    cycler                    0.10.0                   py38_0    defaults
-    cython                    0.29.23          py38h2531618_0    defaults
-    cytoolz                   0.11.0           py38h7b6447c_0    defaults
-    dask                      2021.4.0           pyhd3eb1b0_0    defaults
-    dask-core                 2021.4.0           pyhd3eb1b0_0    defaults
-    decorator                 4.4.2                      py_0    conda-forge
-    defusedxml                0.6.0                      py_0    conda-forge
-    distributed               2021.5.0         py38h06a4308_0    defaults
-    distro                    1.5.0                    pypi_0    pypi
-    eccodes                   2.18.0               hf05d9b7_0    conda-forge
-    ecmwf-api-client          1.6.1                    pypi_0    pypi
-    ecmwflibs                 0.3.7                    pypi_0    pypi
-    entrypoints               0.3             pyhd8ed1ab_1003    conda-forge
-    fasteners                 0.16               pyhd3eb1b0_0    defaults
-    findlibs                  0.0.2                    pypi_0    pypi
-    folium                    0.12.1                   pypi_0    pypi
-    freetype                  2.10.4               h5ab3b9f_0    defaults
-    fsspec                    0.9.0              pyhd3eb1b0_0    defaults
-    gast                      0.4.0                      py_0    defaults
-    gcc_impl_linux-64         9.3.0               h70c0ae5_18    conda-forge
-    gcc_linux-64              9.3.0               hf25ea35_30    conda-forge
-    gitdb                     4.0.7                    pypi_0    pypi
-    gitpython                 3.1.14                   pypi_0    pypi
-    google-auth               1.30.1             pyhd3eb1b0_0    defaults
-    google-auth-oauthlib      0.4.4              pyhd3eb1b0_0    defaults
-    google-pasta              0.2.0                      py_0    defaults
-    grpcio                    1.36.1           py38h2157cd5_1    defaults
-    gxx_impl_linux-64         9.3.0               hd87eabc_18    conda-forge
-    gxx_linux-64              9.3.0               h3fbe746_30    conda-forge
-    h5py                      2.10.0           py38hd6299e0_1    defaults
-    hdf4                      4.2.13               h3ca952b_2    defaults
-    hdf5                      1.10.6          nompi_h3c11f04_101    conda-forge
-    heapdict                  1.0.1                      py_0    defaults
-    icu                       68.1                 h58526e2_0    conda-forge
-    idna                      2.10               pyh9f0ad1d_0    conda-forge
-    importlib-metadata        3.4.0            py38h578d9bd_0    conda-forge
-    importlib_metadata        3.4.0                hd8ed1ab_0    conda-forge
-    intel-openmp              2021.2.0           h06a4308_610    defaults
-    ipykernel                 5.4.2            py38h81c977d_0    conda-forge
-    ipython                   7.19.0           py38h81c977d_2    conda-forge
-    ipython_genutils          0.2.0                      py_1    conda-forge
-    jasper                    1.900.1              hd497a04_4    defaults
-    jedi                      0.17.2           py38h578d9bd_1    conda-forge
-    jinja2                    2.11.2             pyh9f0ad1d_0    conda-forge
-    jmespath                  0.10.0                     py_0    defaults
-    joblib                    1.0.1              pyhd3eb1b0_0    defaults
-    jpeg                      9d                   h36c2ea0_0    conda-forge
-    json5                     0.9.5              pyh9f0ad1d_0    conda-forge
-    jsonschema                3.2.0                      py_2    conda-forge
-    jupyter-server-proxy      1.6.0                    pypi_0    pypi
-    jupyter_client            6.1.11             pyhd8ed1ab_1    conda-forge
-    jupyter_core              4.7.0            py38h578d9bd_0    conda-forge
-    jupyter_telemetry         0.1.0              pyhd8ed1ab_1    conda-forge
-    jupyterhub                1.2.2                    pypi_0    pypi
-    jupyterlab                2.2.9                      py_0    conda-forge
-    jupyterlab-git            0.23.3                   pypi_0    pypi
-    jupyterlab_pygments       0.1.2              pyh9f0ad1d_0    conda-forge
-    jupyterlab_server         1.2.0                      py_0    conda-forge
-    keras-preprocessing       1.1.2              pyhd3eb1b0_0    defaults
-    kernel-headers_linux-64   2.6.32              h77966d4_13    conda-forge
-    kiwisolver                1.3.1            py38h2531618_0    defaults
-    krb5                      1.17.2               h926e7f8_0    conda-forge
-    lcms2                     2.12                 h3be6417_0    defaults
-    ld_impl_linux-64          2.35.1               hea4e1c9_1    conda-forge
-    libaec                    1.0.4                he6710b0_1    defaults
-    libcurl                   7.71.1               hcdd3856_8    conda-forge
-    libedit                   3.1.20191231         he28a2e2_2    conda-forge
-    libev                     4.33                 h516909a_1    conda-forge
-    libffi                    3.3                  h58526e2_2    conda-forge
-    libgcc-devel_linux-64     9.3.0               h7864c58_18    conda-forge
-    libgcc-ng                 9.3.0               h2828fa1_18    conda-forge
-    libgfortran-ng            7.3.0                hdf63c60_0    defaults
-    libgomp                   9.3.0               h2828fa1_18    conda-forge
-    libllvm10                 10.0.1               hbcb73fb_5    defaults
-    libnetcdf                 4.7.4           nompi_h56d31a8_107    conda-forge
-    libnghttp2                1.41.0               h8cfc5f6_2    conda-forge
-    libpng                    1.6.37               hbc83047_0    defaults
-    libprotobuf               3.14.0               h8c45485_0    defaults
-    libsodium                 1.0.18               h36c2ea0_1    conda-forge
-    libssh2                   1.9.0                hab1572f_5    conda-forge
-    libstdcxx-devel_linux-64  9.3.0               hb016644_18    conda-forge
-    libstdcxx-ng              9.3.0               h6de172a_18    conda-forge
-    libtiff                   4.1.0                h2733197_1    defaults
-    libuv                     1.40.0               h7f98852_0    conda-forge
-    llvmlite                  0.36.0           py38h612dafd_4    defaults
-    locket                    0.2.1            py38h06a4308_1    defaults
-    lz4-c                     1.9.3                h2531618_0    defaults
-    magics                    1.5.6                    pypi_0    pypi
-    mako                      1.1.4              pyh44b312d_0    conda-forge
-    markdown                  3.3.4            py38h06a4308_0    defaults
-    markupsafe                1.1.1            py38h497a2fe_3    conda-forge
-    matplotlib-base           3.3.4            py38h62a2d02_0    defaults
-    mistune                   0.8.4           py38h497a2fe_1003    conda-forge
-    mkl                       2021.2.0           h06a4308_296    defaults
-    mkl-service               2.3.0            py38h27cfd23_1    defaults
-    mkl_fft                   1.3.0            py38h42c9631_2    defaults
-    mkl_random                1.2.1            py38ha9443f7_2    defaults
-    monotonic                 1.5                        py_0    defaults
-    msgpack-python            1.0.2            py38hff7bd54_1    defaults
-    multidict                 5.1.0            py38h27cfd23_2    defaults
-    mypy-extensions           0.4.3                    pypi_0    pypi
-    nbclient                  0.5.0                    pypi_0    pypi
-    nbconvert                 6.0.7            py38h578d9bd_3    conda-forge
-    nbdime                    2.1.0                    pypi_0    pypi
-    nbformat                  5.1.2              pyhd8ed1ab_1    conda-forge
-    nbresuse                  0.4.0                    pypi_0    pypi
-    ncurses                   6.2                  h58526e2_4    conda-forge
-    nest-asyncio              1.4.3              pyhd8ed1ab_0    conda-forge
-    netcdf4                   1.5.6                    pypi_0    pypi
-    nodejs                    15.3.0               h25f6087_0    conda-forge
-    notebook                  6.2.0            py38h578d9bd_0    conda-forge
-    numba                     0.53.1           py38ha9443f7_0    defaults
-    numcodecs                 0.7.3            py38h2531618_0    defaults
-    numpy                     1.20.2           py38h2d18471_0    defaults
-    numpy-base                1.20.2           py38hfae3a4d_0    defaults
-    oauthlib                  3.0.1                      py_0    conda-forge
-    olefile                   0.46                       py_0    defaults
-    openssl                   1.1.1k               h27cfd23_0    defaults
-    opt_einsum                3.3.0              pyhd3eb1b0_1    defaults
-    packaging                 20.8               pyhd3deb0d_0    conda-forge
-    pamela                    1.0.0                      py_0    conda-forge
-    pandas                    1.2.4            py38h2531618_0    defaults
-    pandoc                    2.11.3.2             h7f98852_0    conda-forge
-    pandocfilters             1.4.2                      py_1    conda-forge
-    papermill                 2.3.1                    pypi_0    pypi
-    parso                     0.7.1              pyh9f0ad1d_0    conda-forge
-    partd                     1.2.0              pyhd3eb1b0_0    defaults
-    pathspec                  0.8.1                    pypi_0    pypi
-    pdbufr                    0.8.2                    pypi_0    pypi
-    pexpect                   4.8.0              pyh9f0ad1d_2    conda-forge
-    pickleshare               0.7.5                   py_1003    conda-forge
-    pillow                    8.2.0            py38he98fc37_0    defaults
-    pip                       21.0.1                   pypi_0    pypi
-    pipx                      0.16.1.0                 pypi_0    pypi
-    powerline-shell           0.7.0                    pypi_0    pypi
-    prometheus_client         0.9.0              pyhd3deb0d_0    conda-forge
-    prompt-toolkit            3.0.10             pyha770c72_0    conda-forge
-    properscoring             0.1                        py_0    conda-forge
-    protobuf                  3.14.0           py38h2531618_1    defaults
-    psutil                    5.8.0            py38h27cfd23_1    defaults
-    ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
-    pyasn1                    0.4.8                      py_0    defaults
-    pyasn1-modules            0.2.8                      py_0    defaults
-    pycosat                   0.6.3           py38h497a2fe_1006    conda-forge
-    pycparser                 2.20               pyh9f0ad1d_2    conda-forge
-    pycurl                    7.43.0.6         py38h996a351_1    conda-forge
-    pygments                  2.7.4              pyhd8ed1ab_0    conda-forge
-    pyjwt                     2.0.1              pyhd8ed1ab_0    conda-forge
-    pyodc                     1.0.3                    pypi_0    pypi
-    pyopenssl                 20.0.1             pyhd8ed1ab_0    conda-forge
-    pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
-    pyrsistent                0.17.3           py38h497a2fe_2    conda-forge
-    pysocks                   1.7.1            py38h578d9bd_3    conda-forge
-    python                    3.8.6           hffdb5ce_4_cpython    conda-forge
-    python-dateutil           2.8.1                      py_0    conda-forge
-    python-eccodes            2021.03.0        py38hb5d20a5_0    conda-forge
-    python-editor             1.0.4                      py_0    conda-forge
-    python-flatbuffers        1.12               pyhd3eb1b0_0    defaults
-    python-json-logger        2.0.1              pyh9f0ad1d_0    conda-forge
-    python_abi                3.8                      1_cp38    conda-forge
-    pytz                      2021.1             pyhd3eb1b0_0    defaults
-    pyyaml                    5.4.1                    pypi_0    pypi
-    pyzmq                     21.0.1           py38h3d7ac18_0    conda-forge
-    readline                  8.0                  he28a2e2_2    conda-forge
-    regex                     2021.4.4                 pypi_0    pypi
-    requests                  2.25.1             pyhd3deb0d_0    conda-forge
-    requests-oauthlib         1.3.0                      py_0    defaults
-    rsa                       4.7.2              pyhd3eb1b0_1    defaults
-    ruamel.yaml               0.16.12          py38h497a2fe_2    conda-forge
-    ruamel.yaml.clib          0.2.2            py38h497a2fe_2    conda-forge
-    ruamel_yaml               0.15.80         py38h497a2fe_1003    conda-forge
-    s3fs                      0.6.0              pyhd3eb1b0_0    defaults
-    scikit-learn              0.24.2           py38ha9443f7_0    defaults
-    scipy                     1.6.2            py38had2a1c9_1    defaults
-    send2trash                1.5.0                      py_0    conda-forge
-    setuptools                49.6.0           py38h578d9bd_3    conda-forge
-    simpervisor               0.4                      pypi_0    pypi
-    six                       1.15.0             pyh9f0ad1d_0    conda-forge
-    sklearn-xarray            0.4.0                    pypi_0    pypi
-    smmap                     4.0.0                    pypi_0    pypi
-    sortedcontainers          2.3.0              pyhd3eb1b0_0    defaults
-    sqlalchemy                1.3.22           py38h497a2fe_1    conda-forge
-    sqlite                    3.34.0               h74cdb3f_0    conda-forge
-    sysroot_linux-64          2.12                h77966d4_13    conda-forge
-    tbb                       2020.3               hfd86e86_0    defaults
-    tblib                     1.7.0                      py_0    defaults
-    tenacity                  7.0.0                    pypi_0    pypi
-    tensorboard               2.4.0              pyhc547734_0    defaults
-    tensorboard-plugin-wit    1.6.0                      py_0    defaults
-    tensorflow                2.4.1           mkl_py38hb2083e0_0    defaults
-    tensorflow-base           2.4.1           mkl_py38h43e0292_0    defaults
-    tensorflow-estimator      2.4.1              pyheb71bc4_0    defaults
-    termcolor                 1.1.0            py38h06a4308_1    defaults
-    terminado                 0.9.2            py38h578d9bd_0    conda-forge
-    testpath                  0.4.4                      py_0    conda-forge
-    textwrap3                 0.9.2                    pypi_0    pypi
-    threadpoolctl             2.1.0              pyh5ca1d4c_0    defaults
-    tini                      0.18.0            h14c3975_1001    conda-forge
-    tk                        8.6.10               h21135ba_1    conda-forge
-    toml                      0.10.2                   pypi_0    pypi
-    toolz                     0.11.1             pyhd3eb1b0_0    defaults
-    tornado                   6.1              py38h497a2fe_1    conda-forge
-    tqdm                      4.56.0             pyhd8ed1ab_0    conda-forge
-    traitlets                 5.0.5                      py_0    conda-forge
-    typed-ast                 1.4.2                    pypi_0    pypi
-    typing-extensions         3.7.4.3              hd3eb1b0_0    defaults
-    typing_extensions         3.7.4.3            pyh06a4308_0    defaults
-    urllib3                   1.26.2             pyhd8ed1ab_0    conda-forge
-    userpath                  1.4.2                    pypi_0    pypi
-    wcwidth                   0.2.5              pyh9f0ad1d_2    conda-forge
-    webencodings              0.5.1                      py_1    conda-forge
-    werkzeug                  1.0.1              pyhd3eb1b0_0    defaults
-    wheel                     0.36.2             pyhd3deb0d_0    conda-forge
-    wrapt                     1.12.1           py38h7b6447c_1    defaults
-    xarray                    0.18.0             pyhd3eb1b0_1    defaults
-    xhistogram                0.1.2              pyhd8ed1ab_0    conda-forge
-    xskillscore               0.0.20             pyhd8ed1ab_1    conda-forge
-    xz                        5.2.5                h516909a_1    conda-forge
-    yaml                      0.2.5                h516909a_0    conda-forge
-    yarl                      1.6.3            py38h27cfd23_0    defaults
-    zarr                      2.8.1              pyhd3eb1b0_0    defaults
-    zeromq                    4.3.3                h58526e2_3    conda-forge
-    zict                      2.0.0              pyhd3eb1b0_0    defaults
-    zipp                      3.4.0                      py_0    conda-forge
-    zlib                      1.2.11            h516909a_1010    conda-forge
-    zstd                      1.4.9                haebb681_0    defaults
-
 %% Cell type:code id: tags:

 ``` python
 ```
No results found