Compare revisions

9019caf1 · 9019caf1 · 9019caf1 · 9019caf1 · 9019caf1 · 9019caf1
--- a/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/t2m/1.0.1.0.0
+++ b/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/t2m/1.0.1.0.0
--- a/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/t2m/2.0.0.0.0
+++ b/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/t2m/2.0.0.0.0
--- a/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/t2m/2.0.1.0.0
+++ b/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/t2m/2.0.1.0.0
--- a/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/tp/0.0.0.0.0
+++ b/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/tp/0.0.0.0.0
--- a/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/tp/0.0.1.0.0
+++ b/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/tp/0.0.1.0.0
--- a/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/tp/1.0.0.0.0
+++ b/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/tp/1.0.0.0.0
--- a/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/tp/1.0.1.0.0
+++ b/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/tp/1.0.1.0.0
--- a/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/tp/2.0.0.0.0
+++ b/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/tp/2.0.0.0.0
--- a/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/tp/2.0.1.0.0
+++ b/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/tp/2.0.1.0.0
--- a/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/weekofyear/.zarray
+++ b/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/weekofyear/.zarray
--- a/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/weekofyear/.zattrs
+++ b/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/weekofyear/.zattrs
--- a/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/weekofyear/0
+++ b/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr/weekofyear/0
--- a/environment.yml
+++ b/environment.yml
@@ -18,6 +18,7 @@ dependencies:
  - s3fs
  - intake-xarray
  - cfgrib
+  - eccodes
  - nc-time-axis
  - pydap
  - h5netcdf

--- a/notebooks/ML_train_and_predict.ipynb
+++ b/notebooks/ML_train_and_predict.ipynb
 %% Cell type:markdown id: tags:
 # Train ML model to correct predictions of week 3-4 & 5-6
 This notebook create a Machine Learning `ML_model` to predict weeks 3-4 & 5-6 based on `S2S` weeks 3-4 & 5-6 forecasts and is compared to `CPC` observations for the [`s2s-ai-challenge`](https://s2s-ai-challenge.github.io/).
 %% Cell type:markdown id: tags:
 # Synopsis
 %% Cell type:markdown id: tags:
 ## Method: `ML-based mean bias reduction`
 - calculate the ML-based bias from 2000-2019 deterministic ensemble mean forecast
 - remove that the ML-based bias from 2020 forecast deterministic ensemble mean forecast
 %% Cell type:markdown id: tags:
 ## Data used
 type: renku datasets
 Training-input for Machine Learning model:
 - hindcasts of models:
    - ECMWF: `ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr`
 Forecast-input for Machine Learning model:
 - real-time 2020 forecasts of models:
    - ECMWF: `ecmwf_forecast-input_2020_biweekly_deterministic.zarr`
 Compare Machine Learning model forecast against against ground truth:
 - `CPC` observations:
    - `hindcast-like-observations_biweekly_deterministic.zarr`
    - `forecast-like-observations_2020_biweekly_deterministic.zarr`
 %% Cell type:markdown id: tags:
 ## Resources used
 for training, details in reproducibility
 - platform: renku
 - memory: 8 GB
 - processors: 2 CPU
 - storage required: 10 GB
 %% Cell type:markdown id: tags:
 ## Safeguards
 All points have to be [x] checked. If not, your submission is invalid.
 Changes to the code after submissions are not possible, as the `commit` before the `tag` will be reviewed.
 (Only in exceptions and if previous effort in reproducibility can be found, it may be allowed to improve readability and reproducibility after November 1st 2021.)
 %% Cell type:markdown id: tags:
 ### Safeguards to prevent [overfitting](https://en.wikipedia.org/wiki/Overfitting?wprov=sfti1)
 If the organizers suspect overfitting, your contribution can be disqualified.
  - [x] We did not use 2020 observations in training (explicit overfitting and cheating)
  - [x] We did not repeatedly verify my model on 2020 observations and incrementally improved my RPSS (implicit overfitting)
  - [x] We provide RPSS scores for the training period with script `print_RPS_per_year`, see in section 6.3 `predict`.
  - [x] We tried our best to prevent [data leakage](https://en.wikipedia.org/wiki/Leakage_(machine_learning)?wprov=sfti1).
  - [x] We honor the `train-validate-test` [split principle](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets). This means that the hindcast data is split into `train` and `validate`, whereas `test` is withheld.
  - [x] We did not use `test` explicitly in training or implicitly in incrementally adjusting parameters.
  - [x] We considered [cross-validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)).
 %% Cell type:markdown id: tags:
 ### Safeguards for Reproducibility
 Notebook/code must be independently reproducible from scratch by the organizers (after the competition), if not possible: no prize
  - [x] All training data is publicly available (no pre-trained private neural networks, as they are not reproducible for us)
  - [x] Code is well documented, readable and reproducible.
  - [x] Code to reproduce training and predictions is preferred to run within a day on the described architecture. If the training takes longer than a day, please justify why this is needed. Please do not submit training piplelines, which take weeks to train.
 %% Cell type:markdown id: tags:
 # Todos to improve template
 This is just a demo.
 - [ ] use multiple predictor variables and two predicted variables
 - [ ] for both `lead_time`s in one go
 - [ ] consider seasonality, for now all `forecast_time` months are mixed
 - [ ] make probabilistic predictions with `category` dim, for now works deterministic
 %% Cell type:markdown id: tags:
 # Imports
 %% Cell type:code id: tags:
 ``` python
 from tensorflow.keras.layers import Input, Dense, Flatten
 from tensorflow.keras.models import Sequential
 import matplotlib.pyplot as plt
 import xarray as xr
 xr.set_options(display_style='text')
 import numpy as np
 from dask.utils import format_bytes
 import xskillscore as xs
 ```
-%% Output
-    /opt/conda/lib/python3.8/site-packages/xarray/backends/cfgrib_.py:27: UserWarning: Failed to load cfgrib - most likely there is a problem accessing the ecCodes library. Try `import cfgrib` to get the full error message
-      warnings.warn(
 %% Cell type:markdown id: tags:
 # Get training data
 preprocessing of input data may be done in separate notebook/script
 %% Cell type:markdown id: tags:
 ## Hindcast
 get weekly initialized hindcasts
 %% Cell type:code id: tags:
 ``` python
 v='t2m'
 ```
 %% Cell type:code id: tags:
 ``` python
 # preprocessed as renku dataset
 !renku storage pull ../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr
 ```
 %% Output
    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m
 %% Cell type:code id: tags:
 ``` python
 hind_2000_2019 = xr.open_zarr("../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr", consolidated=True)
 ```
-%% Output
-    /opt/conda/lib/python3.8/site-packages/xarray/backends/plugins.py:61: RuntimeWarning: Engine 'cfgrib' loading failed:
-    /opt/conda/lib/python3.8/site-packages/gribapi/_bindings.cpython-38-x86_64-linux-gnu.so: undefined symbol: codes_bufr_key_is_header
-      warnings.warn(f"Engine {name!r} loading failed:\n{ex}", RuntimeWarning)
 %% Cell type:code id: tags:
 ``` python
 # preprocessed as renku dataset
 !renku storage pull ../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr
 ```
 %% Output
    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m
 %% Cell type:code id: tags:
 ``` python
 fct_2020 = xr.open_zarr("../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr", consolidated=True)
 ```
 %% Cell type:markdown id: tags:
 ## Observations
 corresponding to hindcasts
 %% Cell type:code id: tags:
 ``` python
 # preprocessed as renku dataset
 !renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
 ```
 %% Output
    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m
 %% Cell type:code id: tags:
 ``` python
 obs_2000_2019 = xr.open_zarr("../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr", consolidated=True)#[v]
 ```
 %% Cell type:code id: tags:
 ``` python
 # preprocessed as renku dataset
 !renku storage pull ../data/forecast-like-observations_2020_biweekly_deterministic.zarr
 ```
 %% Output
    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m
 %% Cell type:code id: tags:
 ``` python
 obs_2020 = xr.open_zarr("../data/forecast-like-observations_2020_biweekly_deterministic.zarr", consolidated=True)#[v]
 ```
 %% Cell type:markdown id: tags:
 # ML model
 %% Cell type:markdown id: tags:
 based on [Weatherbench](https://github.com/pangeo-data/WeatherBench/blob/master/quickstart.ipynb)
 %% Cell type:code id: tags:
 ``` python
 # run once only and dont commit
 !git clone https://github.com/pangeo-data/WeatherBench/
 ```
 %% Output
-    Cloning into 'WeatherBench'...
+    fatal: destination path 'WeatherBench' already exists and is not an empty directory.
-    remote: Enumerating objects: 718, done.[K
-    remote: Counting objects: 100% (3/3), done.[K
-    remote: Compressing objects: 100% (3/3), done.[K
-    remote: Total 718 (delta 0), reused 0 (delta 0), pack-reused 715[K
-    Receiving objects: 100% (718/718), 17.77 MiB | 14.96 MiB/s, done.
-    Resolving deltas: 100% (424/424), done.
 %% Cell type:code id: tags:
 ``` python
 import sys
 sys.path.insert(1, 'WeatherBench')
 from WeatherBench.src.train_nn import DataGenerator, PeriodicConv2D, create_predictions
 import tensorflow.keras as keras
 ```
 %% Cell type:code id: tags:
 ``` python
 bs=32
 import numpy as np
 class DataGenerator(keras.utils.Sequence):
    def __init__(self, fct, verif, lead_time, batch_size=bs, shuffle=True, load=True,
                 mean=None, std=None):
        """
        Data generator for WeatherBench data.
        Template from https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly
        Args:
            fct: forecasts from S2S models: xr.DataArray (xr.Dataset doesnt work properly)
            verif: observations with same dimensionality (xr.Dataset doesnt work properly)
            lead_time: Lead_time as in model
            batch_size: Batch size
            shuffle: bool. If True, data is shuffled.
            load: bool. If True, datadet is loaded into RAM.
            mean: If None, compute mean from data.
            std: If None, compute standard deviation from data.
        Todo:
        - use number in a better way, now uses only ensemble mean forecast
        - dont use .sel(lead_time=lead_time) to train over all lead_time at once
        - be sensitive with forecast_time, pool a few around the weekofyear given
        - use more variables as predictors
        - predict more variables
        """
        if isinstance(fct, xr.Dataset):
            print('convert fct to array')
            fct = fct.to_array().transpose(...,'variable')
            self.fct_dataset=True
        else:
            self.fct_dataset=False
        if isinstance(verif, xr.Dataset):
            print('convert verif to array')
            verif = verif.to_array().transpose(...,'variable')
            self.verif_dataset=True
        else:
            self.verif_dataset=False
        #self.fct = fct
        self.batch_size = batch_size
        self.shuffle = shuffle
        self.lead_time = lead_time
        self.fct_data = fct.transpose('forecast_time', ...).sel(lead_time=lead_time)
        self.fct_mean = self.fct_data.mean('forecast_time').compute() if mean is None else mean
        self.fct_std = self.fct_data.std('forecast_time').compute() if std is None else std
        self.verif_data = verif.transpose('forecast_time', ...).sel(lead_time=lead_time)
        self.verif_mean = self.verif_data.mean('forecast_time').compute() if mean is None else mean
        self.verif_std = self.verif_data.std('forecast_time').compute() if std is None else std
        # Normalize
        self.fct_data = (self.fct_data - self.fct_mean) / self.fct_std
        self.verif_data = (self.verif_data - self.verif_mean) / self.verif_std
        self.n_samples = self.fct_data.forecast_time.size
        self.forecast_time = self.fct_data.forecast_time
        self.on_epoch_end()
        # For some weird reason calling .load() earlier messes up the mean and std computations
        if load:
            # print('Loading data into RAM')
            self.fct_data.load()
    def __len__(self):
        'Denotes the number of batches per epoch'
        return int(np.ceil(self.n_samples / self.batch_size))
    def __getitem__(self, i):
        'Generate one batch of data'
        idxs = self.idxs[i * self.batch_size:(i + 1) * self.batch_size]
        # got all nan if nans not masked
        X = self.fct_data.isel(forecast_time=idxs).fillna(0.).values
        y = self.verif_data.isel(forecast_time=idxs).fillna(0.).values
        return X, y
    def on_epoch_end(self):
        'Updates indexes after each epoch'
        self.idxs = np.arange(self.n_samples)
        if self.shuffle == True:
            np.random.shuffle(self.idxs)
 ```
 %% Cell type:code id: tags:
 ``` python
 # 2 bi-weekly `lead_time`: week 3-4
 lead = hind_2000_2019.isel(lead_time=0).lead_time
 lead
 ```
 %% Output
    <xarray.DataArray 'lead_time' ()>
    array(1209600000000000, dtype='timedelta64[ns]')
    Coordinates:
        lead_time  timedelta64[ns] 14 days
    Attributes:
-        comment:  lead_time describes bi-weekly aggregates. The pd.Timedelta corr...
+        aggregate:      The pd.Timedelta corresponds to the first day of a biweek...
+        description:    Forecast period is the time interval between the forecast...
+        long_name:      lead time
+        standard_name:  forecast_period
+        week34_t2m:     mean[14 days, 27 days]
+        week34_tp:      28 days minus 14 days
+        week56_t2m:     mean[28 days, 41 days]
+        week56_tp:      42 days minus 28 days
 %% Cell type:code id: tags:
 ``` python
 # mask, needed?
 hind_2000_2019 = hind_2000_2019.where(obs_2000_2019.isel(forecast_time=0, lead_time=0,drop=True).notnull())
 ```
 %% Cell type:markdown id: tags:
 ## data prep: train, valid, test
 [Use the hindcast period to split train and valid.](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets) Do not use the 2020 data for testing!
 %% Cell type:code id: tags:
 ``` python
 # time is the forecast_time
 time_train_start,time_train_end='2000','2017' # train
 time_valid_start,time_valid_end='2018','2019' # valid
 time_test = '2020'                            # test
 ```
 %% Cell type:code id: tags:
 ``` python
 dg_train = DataGenerator(
    hind_2000_2019.mean('realization').sel(forecast_time=slice(time_train_start,time_train_end))[v],
    obs_2000_2019.sel(forecast_time=slice(time_train_start,time_train_end))[v],
    lead_time=lead, batch_size=bs, load=True)
 ```
 %% Output
-    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
-    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
-    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
-    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
-    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
 %% Cell type:code id: tags:
 ``` python
 dg_valid = DataGenerator(
    hind_2000_2019.mean('realization').sel(forecast_time=slice(time_valid_start,time_valid_end))[v],
    obs_2000_2019.sel(forecast_time=slice(time_valid_start,time_valid_end))[v],
    lead_time=lead, batch_size=bs, shuffle=False, load=True)
 ```
 %% Output
-    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
-    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
-    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
-    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
-    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
 %% Cell type:code id: tags:
 ``` python
 # do not use, delete?
 dg_test = DataGenerator(
    fct_2020.mean('realization').sel(forecast_time=time_test)[v],
    obs_2020.sel(forecast_time=time_test)[v],
    lead_time=lead, batch_size=bs, load=True, mean=dg_train.fct_mean, std=dg_train.fct_std, shuffle=False)
 ```
 %% Cell type:code id: tags:
 ``` python
 X, y = dg_valid[0]
 X.shape, y.shape
 ```
 %% Output
    ((32, 121, 240), (32, 121, 240))
 %% Cell type:code id: tags:
 ``` python
 # short look into training data: large biases
 # any problem from normalizing?
-i=4
+# i=4
-xr.DataArray(np.vstack([X[i],y[i]])).plot(yincrease=False, robust=True)
+# xr.DataArray(np.vstack([X[i],y[i]])).plot(yincrease=False, robust=True)
 ```
-%% Output
-    <matplotlib.collections.QuadMesh at 0x7fd217042850>
 %% Cell type:markdown id: tags:
 ## `fit`
 %% Cell type:code id: tags:
 ``` python
 cnn = keras.models.Sequential([
    PeriodicConv2D(filters=32, kernel_size=5, conv_kwargs={'activation':'relu'}, input_shape=(32, 64, 1)),
    PeriodicConv2D(filters=1, kernel_size=5)
 ])
 ```
 %% Output
-    WARNING:tensorflow:AutoGraph could not transform <bound method PeriodicPadding2D.call of <WeatherBench.src.train_nn.PeriodicPadding2D object at 0x7fd217b127c0>> and will run it as-is.
+    WARNING:tensorflow:AutoGraph could not transform <bound method PeriodicPadding2D.call of <WeatherBench.src.train_nn.PeriodicPadding2D object at 0x7f86042986a0>> and will run it as-is.
    Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
    Cause: module 'gast' has no attribute 'Index'
    To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
-    WARNING: AutoGraph could not transform <bound method PeriodicPadding2D.call of <WeatherBench.src.train_nn.PeriodicPadding2D object at 0x7fd217b127c0>> and will run it as-is.
+    WARNING: AutoGraph could not transform <bound method PeriodicPadding2D.call of <WeatherBench.src.train_nn.PeriodicPadding2D object at 0x7f86042986a0>> and will run it as-is.
    Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
    Cause: module 'gast' has no attribute 'Index'
    To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
 %% Cell type:code id: tags:
 ``` python
 cnn.summary()
 ```
 %% Output
    Model: "sequential"
    _________________________________________________________________
    Layer (type)                 Output Shape              Param #
    =================================================================
    periodic_conv2d (PeriodicCon (None, 32, 64, 32)        832
    _________________________________________________________________
    periodic_conv2d_1 (PeriodicC (None, 32, 64, 1)         801
    =================================================================
    Total params: 1,633
    Trainable params: 1,633
    Non-trainable params: 0
    _________________________________________________________________
 %% Cell type:code id: tags:
 ``` python
 cnn.compile(keras.optimizers.Adam(1e-4), 'mse')
 ```
 %% Cell type:code id: tags:
 ``` python
 import warnings
 warnings.simplefilter("ignore")
 ```
 %% Cell type:code id: tags:
 ``` python
 cnn.fit(dg_train, epochs=2, validation_data=dg_valid)
 ```
 %% Output
    Epoch 1/2
-    30/30 [==============================] - 24s 719ms/step - loss: 0.3028 - val_loss: 0.1696
+    30/30 [==============================] - 58s 2s/step - loss: 0.1472 - val_loss: 0.0742
    Epoch 2/2
-    30/30 [==============================] - 21s 697ms/step - loss: 0.1617 - val_loss: 0.0993
+    30/30 [==============================] - 45s 1s/step - loss: 0.0712 - val_loss: 0.0545
-    <tensorflow.python.keras.callbacks.History at 0x7fd2166d8d90>
+    <tensorflow.python.keras.callbacks.History at 0x7f865c2103d0>
 %% Cell type:markdown id: tags:
 ## `predict`
 Create predictions and print `mean(variable, lead_time, longitude, weighted latitude)` RPSS for all years as calculated by `skill_by_year`.
 %% Cell type:code id: tags:
 ``` python
 from scripts import add_valid_time_from_forecast_reference_time_and_lead_time
 def _create_predictions(model, dg, lead):
    """Create non-iterative predictions"""
    preds = model.predict(dg).squeeze()
    # Unnormalize
    preds = preds * dg.fct_std.values + dg.fct_mean.values
    if dg.verif_dataset:
        da = xr.DataArray(
                    preds,
                    dims=['forecast_time', 'latitude', 'longitude','variable'],
                    coords={'forecast_time': dg.fct_data.forecast_time, 'latitude': dg.fct_data.latitude,
                            'longitude': dg.fct_data.longitude},
                ).to_dataset() # doesnt work yet
    else:
        da = xr.DataArray(
                    preds,
                    dims=['forecast_time', 'latitude', 'longitude'],
                    coords={'forecast_time': dg.fct_data.forecast_time, 'latitude': dg.fct_data.latitude,
                            'longitude': dg.fct_data.longitude},
                )
    da = da.assign_coords(lead_time=lead)
    # da = add_valid_time_from_forecast_reference_time_and_lead_time(da)
    return da
 ```
 %% Cell type:code id: tags:
 ``` python
 # optionally masking the ocean when making probabilistic
 mask = obs_2020.std(['lead_time','forecast_time']).notnull()
 ```
 %% Cell type:code id: tags:
 ``` python
 from scripts import make_probabilistic
 ```
 %% Cell type:code id: tags:
 ``` python
 !renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc
 ```
 %% Output
    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m
 %% Cell type:code id: tags:
 ``` python
 cache_path='../data'
 tercile_file = f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc'
 tercile_edges = xr.open_dataset(tercile_file)
 ```
 %% Cell type:code id: tags:
 ``` python
 # this is not useful but results have expected dimensions
 # actually train for each lead_time
 def create_predictions(cnn, fct, obs, time):
    preds_test=[]
    for lead in fct.lead_time:
        dg = DataGenerator(fct.mean('realization').sel(forecast_time=time)[v],
                           obs.sel(forecast_time=time)[v],
                           lead_time=lead, batch_size=bs, mean=dg_train.fct_mean, std=dg_train.fct_std, shuffle=False)
        preds_test.append(_create_predictions(cnn, dg, lead))
    preds_test = xr.concat(preds_test, 'lead_time')
    preds_test['lead_time'] = fct.lead_time
    # add valid_time coord
    preds_test = add_valid_time_from_forecast_reference_time_and_lead_time(preds_test)
    preds_test = preds_test.to_dataset(name=v)
    # add fake var
    preds_test['tp'] = preds_test['t2m']
    # make probabilistic
    preds_test = make_probabilistic(preds_test.expand_dims('realization'), tercile_edges, mask=mask)
    return preds_test
 ```
 %% Cell type:markdown id: tags:
 ### `predict` training period in-sample
 %% Cell type:code id: tags:
 ``` python
 !renku storage pull ../data/forecast-like-observations_2020_biweekly_terciled.nc
 ```
 %% Output
    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m
 %% Cell type:code id: tags:
 ``` python
 !renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr
 ```
 %% Output
    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m
 %% Cell type:code id: tags:
 ``` python
 from scripts import skill_by_year
 import os
 if os.environ['HOME'] == '/home/jovyan':
    import pandas as pd
    # assume on renku with small memory
    step = 2
    skill_list = []
    for year in np.arange(int(time_train_start), int(time_train_end) -1, step): # loop over years to consume less memory on renku
        preds_is = create_predictions(cnn, hind_2000_2019, obs_2000_2019, time=slice(str(year), str(year+step-1))).compute()
        skill_list.append(skill_by_year(preds_is))
    skill = pd.concat(skill_list)
 else: # with larger memory, simply do
    preds_is = create_predictions(cnn, hind_2000_2019, obs_2000_2019, time=slice(time_train_start, time_train_end))
    skill = skill_by_year(preds_is)
 skill
 ```
 %% Output
              RPSS
    year
-    2000 -1.293103
+    2000 -0.862483
-    2001 -1.446606
+    2001 -1.015485
-    2002 -1.494487
+    2002 -1.101022
-    2003 -1.484899
+    2003 -1.032647
-    2004 -1.421862
+    2004 -1.056348
-    2005 -1.549783
+    2005 -1.165675
-    2006 -1.508035
+    2006 -1.057217
-    2007 -1.502208
+    2007 -1.170849
-    2008 -1.493371
+    2008 -1.049785
-    2009 -1.568156
+    2009 -1.169108
-    2010 -1.519528
+    2010 -1.130845
-    2011 -1.389702
+    2011 -1.052670
-    2012 -1.499871
+    2012 -1.126449
-    2013 -1.549204
+    2013 -1.126930
-    2014 -1.500869
+    2014 -1.095896
-    2015 -1.506727
+    2015 -1.117486
 %% Cell type:markdown id: tags:
 ### `predict` validation period out-of-sample
 %% Cell type:code id: tags:
 ``` python
 preds_os = create_predictions(cnn, hind_2000_2019, obs_2000_2019, time=slice(time_valid_start, time_valid_end))
 skill_by_year(preds_os)
 ```
 %% Output
              RPSS
    year
-    2018 -1.432631
+    2018 -1.099744
-    2019 -1.544451
+    2019 -1.172401
 %% Cell type:markdown id: tags:
 ### `predict` test
 %% Cell type:code id: tags:
 ``` python
 preds_test = create_predictions(cnn, fct_2020, obs_2020, time=time_test)
 skill_by_year(preds_test)
 ```
 %% Output
-            RPSS
+              RPSS
    year
-    2020 -1.4709
+    2020 -1.076834
 %% Cell type:markdown id: tags:
 # Submission
 %% Cell type:code id: tags:
 ``` python
 from scripts import assert_predictions_2020
 assert_predictions_2020(preds_test)
 ```
 %% Cell type:code id: tags:
 ``` python
 preds_test.to_netcdf('../submissions/ML_prediction_2020.nc')
 ```
 %% Cell type:code id: tags:
 ``` python
-#!git add ../submissions/ML_prediction_2020.nc
+# !git add ../submissions/ML_prediction_2020.nc
-#!git add ML_train_and_prediction.ipynb
+# !git add ML_train_and_prediction.ipynb
 ```
 %% Cell type:code id: tags:
 ``` python
-#!git commit -m "template_test commit message" # whatever message you want
+# !git commit -m "template_test commit message" # whatever message you want
 ```
 %% Cell type:code id: tags:
 ``` python
-#!git tag "submission-template_test-0.0.1" # if this is to be checked by scorer, only the last submitted==tagged version will be considered
+# !git tag "submission-template_test-0.0.1" # if this is to be checked by scorer, only the last submitted==tagged version will be considered
 ```
 %% Cell type:code id: tags:
 ``` python
-#!git push --tags
+# !git push --tags
 ```
 %% Cell type:code id: tags:
 ``` python
 ```
 %% Cell type:markdown id: tags:
 # Reproducibility
 %% Cell type:markdown id: tags:
 ## memory
 %% Cell type:code id: tags:
 ``` python
 # https://phoenixnap.com/kb/linux-commands-check-memory-usage
 !free -g
 ```
 %% Output
                  total        used        free      shared  buff/cache   available
-    Mem:             47          10          24           0          11          36
+    Mem:             31           7          11           0          12          24
    Swap:             0           0           0
 %% Cell type:markdown id: tags:
 ## CPU
 %% Cell type:code id: tags:
 ``` python
 !lscpu
 ```
 %% Output
    Architecture:                    x86_64
    CPU op-mode(s):                  32-bit, 64-bit
    Byte Order:                      Little Endian
    Address sizes:                   40 bits physical, 48 bits virtual
-    CPU(s):                          16
+    CPU(s):                          8
-    On-line CPU(s) list:             0-15
+    On-line CPU(s) list:             0-7
    Thread(s) per core:              1
    Core(s) per socket:              1
-    Socket(s):                       16
+    Socket(s):                       8
    NUMA node(s):                    1
    Vendor ID:                       GenuineIntel
    CPU family:                      6
-    Model:                           61
+    Model:                           85
-    Model name:                      Intel Core Processor (Broadwell, IBRS)
+    Model name:                      Intel Xeon Processor (Skylake, IBRS)
-    Stepping:                        2
+    Stepping:                        4
-    CPU MHz:                         2194.916
+    CPU MHz:                         2095.078
-    BogoMIPS:                        4389.83
+    BogoMIPS:                        4190.15
    Virtualization:                  VT-x
    Hypervisor vendor:               KVM
    Virtualization type:             full
-    L1d cache:                       512 KiB
+    L1d cache:                       256 KiB
-    L1i cache:                       512 KiB
+    L1i cache:                       256 KiB
-    L2 cache:                        64 MiB
+    L2 cache:                        32 MiB
-    NUMA node0 CPU(s):               0-15
+    L3 cache:                        128 MiB
+    NUMA node0 CPU(s):               0-7
    Vulnerability Itlb multihit:     KVM: Mitigation: Split huge pages
    Vulnerability L1tf:              Mitigation; PTE Inversion; VMX conditional cach
                                     e flushes, SMT disabled
-    Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT Host state u
+    Vulnerability Mds:               Vulnerable: Clear CPU buffers attempted, no mic
-                                     nknown
+                                     rocode; SMT Host state unknown
    Vulnerability Meltdown:          Mitigation; PTI
-    Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled v
+    Vulnerability Spec store bypass: Vulnerable
-                                     ia prctl and seccomp
    Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user
                                      pointer sanitization
    Vulnerability Spectre v2:        Mitigation; Full generic retpoline, IBPB condit
                                     ional, IBRS_FW, STIBP disabled, RSB filling
-    Vulnerability Tsx async abort:   Mitigation; Clear CPU buffers; SMT Host state u
+    Vulnerability Srbds:             Not affected
-                                     nknown
+    Vulnerability Tsx async abort:   Not affected
    Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtr
                                     r pge mca cmov pat pse36 clflush mmx fxsr sse s
-                                     se2 ss syscall nx pdpe1gb rdtscp lm constant_ts
+                                     se2 syscall nx pdpe1gb rdtscp lm constant_tsc r
-                                     c rep_good nopl cpuid tsc_known_freq pni pclmul
+                                     ep_good nopl xtopology cpuid tsc_known_freq pni
-                                     qdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2api
+                                      pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_
-                                     c movbe popcnt tsc_deadline_timer aes xsave avx
+                                     2 x2apic movbe popcnt tsc_deadline_timer aes xs
-                                      f16c rdrand hypervisor lahf_lm abm 3dnowprefet
+                                     ave avx f16c rdrand hypervisor lahf_lm abm 3dno
-                                     ch cpuid_fault invpcid_single pti ssbd ibrs ibp
+                                     wprefetch cpuid_fault invpcid_single pti ibrs i
-                                     b tpr_shadow vnmi flexpriority ept vpid fsgsbas
+                                     bpb tpr_shadow vnmi flexpriority ept vpid ept_a
-                                     e bmi1 hle avx2 smep bmi2 erms invpcid rtm rdse
+                                     d fsgsbase bmi1 avx2 smep bmi2 erms invpcid avx
-                                     ed adx smap xsaveopt arat md_clear
+                                     512f avx512dq rdseed adx smap clwb avx512cd avx
+                                     512bw avx512vl xsaveopt xsavec xgetbv1 arat pku
+                                      ospke
 %% Cell type:markdown id: tags:
 ## software
 %% Cell type:code id: tags:
 ``` python
 !conda list
 ```
 %% Output
    # packages in environment at /opt/conda:
    #
    # Name                    Version                   Build  Channel
    _libgcc_mutex             0.1                 conda_forge    conda-forge
    _openmp_mutex             4.5                       1_gnu    conda-forge
    _pytorch_select           0.1                       cpu_0    defaults
    _tflow_select             2.3.0                       mkl    defaults
-    absl-py                   0.12.0           py38h06a4308_0    defaults
+    absl-py                   0.13.0           py38h06a4308_0    defaults
-    aiobotocore               1.2.2              pyhd3eb1b0_0    defaults
+    aiobotocore               1.4.1              pyhd3eb1b0_0    defaults
-    aiohttp                   3.7.4.post0              pypi_0    pypi
+    aiohttp                   3.7.4.post0      py38h7f8727e_2    defaults
    aioitertools              0.7.1              pyhd3eb1b0_0    defaults
    alembic                   1.4.3              pyh9f0ad1d_0    conda-forge
    ansiwrap                  0.8.4                    pypi_0    pypi
    appdirs                   1.4.4                    pypi_0    pypi
-    argcomplete               1.12.2                   pypi_0    pypi
+    argcomplete               1.12.3                   pypi_0    pypi
    argon2-cffi               20.1.0           py38h497a2fe_2    conda-forge
    argparse                  1.4.0                    pypi_0    pypi
    asciitree                 0.3.3                      py_2    defaults
    astor                     0.8.1            py38h06a4308_0    defaults
    astunparse                1.6.3                      py_0    defaults
    async-timeout             3.0.1                    pypi_0    pypi
    async_generator           1.10                       py_0    conda-forge
-    attrs                     20.3.0             pyhd3deb0d_0    conda-forge
+    attrs                     21.2.0                   pypi_0    pypi
    backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
    backports                 1.0                        py_2    conda-forge
    backports.functools_lru_cache 1.6.1                      py_0    conda-forge
-    beautifulsoup4            4.9.3              pyha847dfd_0    defaults
+    bagit                     1.8.1                    pypi_0    pypi
+    beautifulsoup4            4.10.0             pyh06a4308_0    defaults
    binutils_impl_linux-64    2.35.1               h193b22a_1    conda-forge
    binutils_linux-64         2.35                h67ddf6f_30    conda-forge
    black                     20.8b1                   pypi_0    pypi
    blas                      1.0                         mkl    defaults
    bleach                    3.2.1              pyh9f0ad1d_0    conda-forge
    blinker                   1.4                        py_1    conda-forge
-    bokeh                     2.3.2            py38h06a4308_0    defaults
+    bokeh                     2.3.3            py38h06a4308_0    defaults
-    botocore                  1.20.84            pyhd3eb1b0_1    defaults
+    botocore                  1.20.106           pyhd3eb1b0_0    defaults
    bottleneck                1.3.2            py38heb32a55_1    defaults
+    bracex                    2.1.1                    pypi_0    pypi
    branca                    0.3.1                    pypi_0    pypi
+    brotli                    1.0.9                he6710b0_2    defaults
    brotlipy                  0.7.0           py38h497a2fe_1001    conda-forge
    bzip2                     1.0.8                h7f98852_4    conda-forge
    c-ares                    1.17.1               h36c2ea0_0    conda-forge
-    ca-certificates           2021.5.25            h06a4308_1    defaults
+    ca-certificates           2021.7.5             h06a4308_1    defaults
-    cachetools                4.2.2              pyhd3eb1b0_0    defaults
+    cachecontrol              0.12.6                   pypi_0    pypi
+    cachetools                4.2.4                    pypi_0    pypi
+    calamus                   0.3.12                   pypi_0    pypi
    cdsapi                    0.5.1                    pypi_0    pypi
-    certifi                   2021.5.30        py38h06a4308_0    defaults
+    certifi                   2021.5.30                pypi_0    pypi
    certipy                   0.1.3                      py_0    conda-forge
-    cffi                      1.14.4           py38ha65f79e_1    conda-forge
+    cffi                      1.14.6                   pypi_0    pypi
    cfgrib                    0.9.9.0            pyhd8ed1ab_1    conda-forge
    cftime                    1.5.0            py38h6323ea4_0    defaults
-    chardet                   4.0.0            py38h578d9bd_1    conda-forge
+    chardet                   3.0.4                    pypi_0    pypi
    click                     7.1.2                    pypi_0    pypi
-    climetlab                 0.7.2                    pypi_0    pypi
+    click-completion          0.5.2                    pypi_0    pypi
-    climetlab-s2s-ai-challenge 0.6.7                    pypi_0    pypi
+    click-option-group        0.5.3                    pypi_0    pypi
-    cloudpickle               1.6.0                      py_0    defaults
+    click-plugins             1.1.1                    pypi_0    pypi
+    climetlab                 0.8.31                   pypi_0    pypi
+    climetlab-s2s-ai-challenge 0.8.0                    pypi_0    pypi
+    cloudpickle               2.0.0              pyhd3eb1b0_0    defaults
    colorama                  0.4.4                    pypi_0    pypi
+    coloredlogs               15.0.1                   pypi_0    pypi
+    commonmark                0.9.1                    pypi_0    pypi
    conda                     4.9.2            py38h578d9bd_0    conda-forge
    conda-package-handling    1.7.2            py38h8df0ef7_0    conda-forge
-    configargparse            1.4.1                    pypi_0    pypi
+    configargparse            1.5.2                    pypi_0    pypi
    configurable-http-proxy   1.3.0                         0    conda-forge
    coverage                  5.5              py38h27cfd23_2    defaults
-    cryptography              3.3.1            py38h2b97feb_1    conda-forge
+    cryptography              3.4.8                    pypi_0    pypi
    curl                      7.71.1               he644dc0_8    conda-forge
+    cwlgen                    0.4.2                    pypi_0    pypi
+    cwltool                   3.1.20211004060744          pypi_0    pypi
    cycler                    0.10.0                   py38_0    defaults
-    cython                    0.29.23          py38h2531618_0    defaults
+    cython                    0.29.24          py38h295c915_0    defaults
    cytoolz                   0.11.0           py38h7b6447c_0    defaults
-    dask                      2021.5.1           pyhd3eb1b0_0    defaults
+    dask                      2021.8.1           pyhd3eb1b0_0    defaults
-    dask-core                 2021.5.1           pyhd3eb1b0_0    defaults
+    dask-core                 2021.8.1           pyhd3eb1b0_0    defaults
+    dataclasses               0.8                pyh6d0b6a4_7    defaults
    decorator                 4.4.2                      py_0    conda-forge
    defusedxml                0.6.0                      py_0    conda-forge
-    distributed               2021.5.1         py38h06a4308_0    defaults
+    distributed               2021.8.1         py38h06a4308_0    defaults
    distro                    1.5.0                    pypi_0    pypi
    docopt                    0.6.2            py38h06a4308_0    defaults
-    eccodes                   2.18.0               hf05d9b7_0    conda-forge
+    eccodes                   2.21.0               ha0e6eb6_0    conda-forge
    ecmwf-api-client          1.6.1                    pypi_0    pypi
-    ecmwflibs                 0.3.7                    pypi_0    pypi
+    ecmwflibs                 0.3.14                   pypi_0    pypi
    entrypoints               0.3             pyhd8ed1ab_1003    conda-forge
-    fasteners                 0.16.1             pyhd3eb1b0_0    defaults
+    environ-config            21.2.0                   pypi_0    pypi
+    fasteners                 0.16.3             pyhd3eb1b0_0    defaults
+    filelock                  3.0.12                   pypi_0    pypi
    findlibs                  0.0.2                    pypi_0    pypi
-    folium                    0.12.1                   pypi_0    pypi
+    fonttools                 4.25.0             pyhd3eb1b0_0    defaults
    freetype                  2.10.4               h5ab3b9f_0    defaults
-    fsspec                    0.9.0              pyhd3eb1b0_0    defaults
+    frozendict                2.0.6                    pypi_0    pypi
-    gast                      0.4.0                      py_0    defaults
+    fsspec                    2021.7.0           pyhd3eb1b0_0    defaults
+    gast                      0.4.0              pyhd3eb1b0_0    defaults
    gcc_impl_linux-64         9.3.0               h70c0ae5_18    conda-forge
    gcc_linux-64              9.3.0               hf25ea35_30    conda-forge
    gitdb                     4.0.7                    pypi_0    pypi
    gitpython                 3.1.14                   pypi_0    pypi
-    google-auth               1.30.1             pyhd3eb1b0_0    defaults
+    google-auth               1.33.0             pyhd3eb1b0_0    defaults
    google-auth-oauthlib      0.4.4              pyhd3eb1b0_0    defaults
-    google-pasta              0.2.0                      py_0    defaults
+    google-pasta              0.2.0              pyhd3eb1b0_0    defaults
    grpcio                    1.36.1           py38h2157cd5_1    defaults
    gxx_impl_linux-64         9.3.0               hd87eabc_18    conda-forge
    gxx_linux-64              9.3.0               h3fbe746_30    conda-forge
    h5netcdf                  0.11.0             pyhd8ed1ab_0    conda-forge
    h5py                      2.10.0           py38hd6299e0_1    defaults
    hdf4                      4.2.13               h3ca952b_2    defaults
-    hdf5                      1.10.6          nompi_h3c11f04_101    conda-forge
+    hdf5                      1.10.6          nompi_h6a2412b_1114    conda-forge
-    heapdict                  1.0.1                      py_0    defaults
+    heapdict                  1.0.1              pyhd3eb1b0_0    defaults
+    humanfriendly             10.0                     pypi_0    pypi
+    humanize                  3.7.1                    pypi_0    pypi
    icu                       68.1                 h58526e2_0    conda-forge
    idna                      2.10               pyh9f0ad1d_0    conda-forge
    importlib-metadata        3.4.0            py38h578d9bd_0    conda-forge
    importlib_metadata        3.4.0                hd8ed1ab_0    conda-forge
-    intake                    0.6.2              pyhd3eb1b0_0    defaults
+    intake                    0.6.3              pyhd3eb1b0_0    defaults
    intake-xarray             0.5.0              pyhd3eb1b0_0    defaults
    intel-openmp              2019.4                      243    defaults
    ipykernel                 5.4.2            py38h81c977d_0    conda-forge
    ipython                   7.19.0           py38h81c977d_2    conda-forge
    ipython_genutils          0.2.0                      py_1    conda-forge
+    isodate                   0.6.0                    pypi_0    pypi
    jasper                    1.900.1              hd497a04_4    defaults
    jedi                      0.17.2           py38h578d9bd_1    conda-forge
-    jinja2                    2.11.2             pyh9f0ad1d_0    conda-forge
+    jellyfish                 0.8.8                    pypi_0    pypi
-    jmespath                  0.10.0                     py_0    defaults
+    jinja2                    3.0.1                    pypi_0    pypi
+    jmespath                  0.10.0             pyhd3eb1b0_0    defaults
    joblib                    1.0.1              pyhd3eb1b0_0    defaults
-    jpeg                      9d                   h36c2ea0_0    conda-forge
+    jpeg                      9d                   h7f8727e_0    defaults
    json5                     0.9.5              pyh9f0ad1d_0    conda-forge
    jsonschema                3.2.0                      py_2    conda-forge
    jupyter-server-proxy      1.6.0                    pypi_0    pypi
    jupyter_client            6.1.11             pyhd8ed1ab_1    conda-forge
    jupyter_core              4.7.0            py38h578d9bd_0    conda-forge
    jupyter_telemetry         0.1.0              pyhd8ed1ab_1    conda-forge
    jupyterhub                1.2.2                    pypi_0    pypi
    jupyterlab                2.2.9                      py_0    conda-forge
    jupyterlab-git            0.23.3                   pypi_0    pypi
    jupyterlab_pygments       0.1.2              pyh9f0ad1d_0    conda-forge
    jupyterlab_server         1.2.0                      py_0    conda-forge
    keras-preprocessing       1.1.2              pyhd3eb1b0_0    defaults
    kernel-headers_linux-64   2.6.32              h77966d4_13    conda-forge
    kiwisolver                1.3.1            py38h2531618_0    defaults
    krb5                      1.17.2               h926e7f8_0    conda-forge
+    lazy-object-proxy         1.6.0                    pypi_0    pypi
    lcms2                     2.12                 h3be6417_0    defaults
    ld_impl_linux-64          2.35.1               hea4e1c9_1    conda-forge
    libaec                    1.0.4                he6710b0_1    defaults
+    libblas                   3.9.0           1_h86c2bf4_netlib    conda-forge
+    libcblas                  3.9.0           5_h92ddd45_netlib    conda-forge
    libcurl                   7.71.1               hcdd3856_8    conda-forge
    libedit                   3.1.20191231         he28a2e2_2    conda-forge
    libev                     4.33                 h516909a_1    conda-forge
    libffi                    3.3                  h58526e2_2    conda-forge
    libgcc-devel_linux-64     9.3.0               h7864c58_18    conda-forge
    libgcc-ng                 9.3.0               h2828fa1_18    conda-forge
-    libgfortran-ng            7.3.0                hdf63c60_0    defaults
+    libgfortran-ng            9.3.0               ha5ec8a7_17    defaults
+    libgfortran5              9.3.0               ha5ec8a7_17    defaults
    libgomp                   9.3.0               h2828fa1_18    conda-forge
+    liblapack                 3.9.0           5_h92ddd45_netlib    conda-forge
    libllvm10                 10.0.1               hbcb73fb_5    defaults
    libmklml                  2019.0.5                      0    defaults
    libnetcdf                 4.7.4           nompi_h56d31a8_107    conda-forge
    libnghttp2                1.41.0               h8cfc5f6_2    conda-forge
    libpng                    1.6.37               hbc83047_0    defaults
-    libprotobuf               3.14.0               h8c45485_0    defaults
+    libprotobuf               3.17.2               h4ff587b_1    defaults
    libsodium                 1.0.18               h36c2ea0_1    conda-forge
    libssh2                   1.9.0                hab1572f_5    conda-forge
    libstdcxx-devel_linux-64  9.3.0               hb016644_18    conda-forge
    libstdcxx-ng              9.3.0               h6de172a_18    conda-forge
    libtiff                   4.2.0                h85742a9_0    defaults
    libuv                     1.40.0               h7f98852_0    conda-forge
    libwebp-base              1.2.0                h27cfd23_0    defaults
    llvmlite                  0.36.0           py38h612dafd_4    defaults
    locket                    0.2.1            py38h06a4308_1    defaults
-    lz4-c                     1.9.3                h2531618_0    defaults
+    lockfile                  0.12.2                   pypi_0    pypi
+    lxml                      4.6.3                    pypi_0    pypi
+    lz4-c                     1.9.3                h295c915_1    defaults
    magics                    1.5.6                    pypi_0    pypi
    mako                      1.1.4              pyh44b312d_0    conda-forge
    markdown                  3.3.4            py38h06a4308_0    defaults
-    markupsafe                1.1.1            py38h497a2fe_3    conda-forge
+    markupsafe                2.0.1                    pypi_0    pypi
-    matplotlib-base           3.3.4            py38h62a2d02_0    defaults
+    marshmallow               3.13.0                   pypi_0    pypi
+    matplotlib-base           3.4.2            py38hab158f2_0    defaults
    mistune                   0.8.4           py38h497a2fe_1003    conda-forge
    mkl                       2020.2                      256    defaults
    mkl-service               2.3.0            py38he904b0f_0    defaults
    mkl_fft                   1.3.0            py38h54f3939_0    defaults
    mkl_random                1.1.1            py38h0573a6f_0    defaults
    msgpack-python            1.0.2            py38hff7bd54_1    defaults
    multidict                 5.1.0            py38h27cfd23_2    defaults
+    munkres                   1.1.4                      py_0    defaults
    mypy-extensions           0.4.3                    pypi_0    pypi
    nbclient                  0.5.0                    pypi_0    pypi
    nbconvert                 6.0.7            py38h578d9bd_3    conda-forge
    nbdime                    2.1.0                    pypi_0    pypi
    nbformat                  5.1.2              pyhd8ed1ab_1    conda-forge
    nbresuse                  0.4.0                    pypi_0    pypi
-    nc-time-axis              1.2.0                      py_1    conda-forge
+    nc-time-axis              1.3.1              pyhd8ed1ab_2    conda-forge
    ncurses                   6.2                  h58526e2_4    conda-forge
+    ndg-httpsclient           0.5.1                    pypi_0    pypi
    nest-asyncio              1.4.3              pyhd8ed1ab_0    conda-forge
    netcdf4                   1.5.4                    pypi_0    pypi
+    networkx                  2.6.3                    pypi_0    pypi
    ninja                     1.10.2               hff7bd54_1    defaults
    nodejs                    15.3.0               h25f6087_0    conda-forge
    notebook                  6.2.0            py38h578d9bd_0    conda-forge
    numba                     0.53.1           py38ha9443f7_0    defaults
-    numcodecs                 0.7.3            py38h2531618_0    defaults
+    numcodecs                 0.8.0            py38h2531618_0    defaults
+    numexpr                   2.7.3            py38hb2eb853_0    defaults
    numpy                     1.19.2           py38h54aff64_0    defaults
    numpy-base                1.19.2           py38hfa32c7d_0    defaults
    oauthlib                  3.0.1                      py_0    conda-forge
-    olefile                   0.46                       py_0    defaults
+    olefile                   0.46               pyhd3eb1b0_0    defaults
-    openssl                   1.1.1k               h27cfd23_0    defaults
+    openjpeg                  2.4.0                h3ad879b_0    defaults
+    openssl                   1.1.1l               h7f8727e_0    defaults
    opt_einsum                3.3.0              pyhd3eb1b0_1    defaults
+    owlrl                     5.2.3                    pypi_0    pypi
    packaging                 20.8               pyhd3deb0d_0    conda-forge
    pamela                    1.0.0                      py_0    conda-forge
-    pandas                    1.2.4            py38h2531618_0    defaults
+    pandas                    1.3.2            py38h8c16a72_0    defaults
    pandoc                    2.11.3.2             h7f98852_0    conda-forge
    pandocfilters             1.4.2                      py_1    conda-forge
    papermill                 2.3.1                    pypi_0    pypi
    parso                     0.7.1              pyh9f0ad1d_0    conda-forge
    partd                     1.2.0              pyhd3eb1b0_0    defaults
-    pathspec                  0.8.1                    pypi_0    pypi
+    pathspec                  0.9.0                    pypi_0    pypi
-    pdbufr                    0.8.2                    pypi_0    pypi
+    patool                    1.12                     pypi_0    pypi
+    pdbufr                    0.9.0                    pypi_0    pypi
    pexpect                   4.8.0              pyh9f0ad1d_2    conda-forge
    pickleshare               0.7.5                   py_1003    conda-forge
-    pillow                    8.2.0            py38he98fc37_0    defaults
+    pillow                    8.3.1            py38h2c7a002_0    defaults
    pip                       21.0.1                   pypi_0    pypi
    pipx                      0.16.1.0                 pypi_0    pypi
+    pluggy                    0.13.1                   pypi_0    pypi
+    portalocker               2.3.2                    pypi_0    pypi
    powerline-shell           0.7.0                    pypi_0    pypi
    prometheus_client         0.9.0              pyhd3deb0d_0    conda-forge
    prompt-toolkit            3.0.10             pyha770c72_0    conda-forge
    properscoring             0.1                        py_0    conda-forge
-    protobuf                  3.14.0           py38h2531618_1    defaults
+    protobuf                  3.17.2           py38h295c915_0    defaults
+    prov                      1.5.1                    pypi_0    pypi
    psutil                    5.8.0            py38h27cfd23_1    defaults
    ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
-    pyasn1                    0.4.8                      py_0    defaults
+    pyasn1                    0.4.8              pyhd3eb1b0_0    defaults
    pyasn1-modules            0.2.8                      py_0    defaults
    pycosat                   0.6.3           py38h497a2fe_1006    conda-forge
    pycparser                 2.20               pyh9f0ad1d_2    conda-forge
    pycurl                    7.43.0.6         py38h996a351_1    conda-forge
    pydap                     3.2.2           pyh9f0ad1d_1001    conda-forge
-    pygments                  2.7.4              pyhd8ed1ab_0    conda-forge
+    pydot                     1.4.2                    pypi_0    pypi
-    pyjwt                     2.0.1              pyhd8ed1ab_0    conda-forge
+    pygments                  2.10.0                   pypi_0    pypi
-    pyodc                     1.0.3                    pypi_0    pypi
+    pyjwt                     2.1.0                    pypi_0    pypi
+    pyld                      2.0.3                    pypi_0    pypi
+    pyodc                     1.1.1                    pypi_0    pypi
    pyopenssl                 20.0.1             pyhd8ed1ab_0    conda-forge
    pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
    pyrsistent                0.17.3           py38h497a2fe_2    conda-forge
+    pyshacl                   0.17.0.post1             pypi_0    pypi
    pysocks                   1.7.1            py38h578d9bd_3    conda-forge
    python                    3.8.6           hffdb5ce_4_cpython    conda-forge
    python-dateutil           2.8.1                      py_0    conda-forge
-    python-eccodes            2021.03.0        py38hb5d20a5_0    conda-forge
+    python-eccodes            2021.03.0        py38hb5d20a5_1    conda-forge
-    python-editor             1.0.4                      py_0    conda-forge
+    python-editor             1.0.4                    pypi_0    pypi
    python-flatbuffers        1.12               pyhd3eb1b0_0    defaults
    python-json-logger        2.0.1              pyh9f0ad1d_0    conda-forge
+    python-snappy             0.6.0            py38h2531618_3    defaults
    python_abi                3.8                      1_cp38    conda-forge
-    pytorch                   1.7.1           cpu_py38h6a09485_0    defaults
+    pytorch                   1.8.1           cpu_py38h60491be_0    defaults
    pytz                      2021.1             pyhd3eb1b0_0    defaults
    pyyaml                    5.4.1                    pypi_0    pypi
    pyzmq                     21.0.1           py38h3d7ac18_0    conda-forge
+    rdflib                    6.0.1                    pypi_0    pypi
+    rdflib-jsonld             0.5.0                    pypi_0    pypi
    readline                  8.0                  he28a2e2_2    conda-forge
    regex                     2021.4.4                 pypi_0    pypi
-    requests                  2.25.1             pyhd3deb0d_0    conda-forge
+    renku                     0.16.2                   pypi_0    pypi
+    requests                  2.24.0                   pypi_0    pypi
    requests-oauthlib         1.3.0                      py_0    defaults
+    rich                      10.3.0                   pypi_0    pypi
    rsa                       4.7.2              pyhd3eb1b0_1    defaults
-    ruamel.yaml               0.16.12          py38h497a2fe_2    conda-forge
+    ruamel-yaml               0.16.5                   pypi_0    pypi
    ruamel.yaml.clib          0.2.2            py38h497a2fe_2    conda-forge
    ruamel_yaml               0.15.80         py38h497a2fe_1003    conda-forge
-    s3fs                      0.6.0              pyhd3eb1b0_0    defaults
+    s3fs                      2021.7.0           pyhd3eb1b0_0    defaults
+    schema-salad              8.2.20210918131710          pypi_0    pypi
    scikit-learn              0.24.2           py38ha9443f7_0    defaults
-    scipy                     1.6.2            py38h91f5cce_0    defaults
+    scipy                     1.7.0            py38h7b17777_1    conda-forge
    send2trash                1.5.0                      py_0    conda-forge
-    setuptools                49.6.0           py38h578d9bd_3    conda-forge
+    setuptools                58.2.0                   pypi_0    pypi
+    setuptools-scm            6.0.1                    pypi_0    pypi
+    shellescape               3.8.1                    pypi_0    pypi
+    shellingham               1.4.0                    pypi_0    pypi
    simpervisor               0.4                      pypi_0    pypi
-    six                       1.15.0             pyh9f0ad1d_0    conda-forge
+    six                       1.16.0                   pypi_0    pypi
    smmap                     4.0.0                    pypi_0    pypi
-    sortedcontainers          2.3.0              pyhd3eb1b0_0    defaults
+    snappy                    1.1.8                he6710b0_0    defaults
+    sortedcontainers          2.4.0              pyhd3eb1b0_0    defaults
    soupsieve                 2.2.1              pyhd3eb1b0_0    defaults
    sqlalchemy                1.3.22           py38h497a2fe_1    conda-forge
    sqlite                    3.34.0               h74cdb3f_0    conda-forge
    sysroot_linux-64          2.12                h77966d4_13    conda-forge
+    tabulate                  0.8.9                    pypi_0    pypi
    tbb                       2020.3               hfd86e86_0    defaults
-    tblib                     1.7.0                      py_0    defaults
+    tblib                     1.7.0              pyhd3eb1b0_0    defaults
    tenacity                  7.0.0                    pypi_0    pypi
    tensorboard               2.4.0              pyhc547734_0    defaults
    tensorboard-plugin-wit    1.6.0                      py_0    defaults
    tensorflow                2.4.1           mkl_py38hb2083e0_0    defaults
    tensorflow-base           2.4.1           mkl_py38h43e0292_0    defaults
-    tensorflow-estimator      2.5.0              pyh7b7c402_0    defaults
+    tensorflow-estimator      2.6.0              pyh7b7c402_0    defaults
    termcolor                 1.1.0            py38h06a4308_1    defaults
    terminado                 0.9.2            py38h578d9bd_0    conda-forge
    testpath                  0.4.4                      py_0    conda-forge
    textwrap3                 0.9.2                    pypi_0    pypi
-    threadpoolctl             2.1.0              pyh5ca1d4c_0    defaults
+    threadpoolctl             2.2.0              pyh0d69192_0    defaults
    tini                      0.18.0            h14c3975_1001    conda-forge
    tk                        8.6.10               h21135ba_1    conda-forge
    toml                      0.10.2                   pypi_0    pypi
    toolz                     0.11.1             pyhd3eb1b0_0    defaults
    tornado                   6.1              py38h497a2fe_1    conda-forge
-    tqdm                      4.56.0             pyhd8ed1ab_0    conda-forge
+    tqdm                      4.60.0                   pypi_0    pypi
    traitlets                 5.0.5                      py_0    conda-forge
    typed-ast                 1.4.2                    pypi_0    pypi
-    typing-extensions         3.7.4.3              hd3eb1b0_0    defaults
+    typing-extensions         3.7.4.3                  pypi_0    pypi
-    typing_extensions         3.7.4.3            pyh06a4308_0    defaults
+    typing_extensions         3.10.0.2           pyh06a4308_0    defaults
-    urllib3                   1.26.2             pyhd8ed1ab_0    conda-forge
+    urllib3                   1.25.11                  pypi_0    pypi
    userpath                  1.4.2                    pypi_0    pypi
+    wcmatch                   8.2                      pypi_0    pypi
    wcwidth                   0.2.5              pyh9f0ad1d_2    conda-forge
    webencodings              0.5.1                      py_1    conda-forge
    webob                     1.8.7              pyhd3eb1b0_0    defaults
-    werkzeug                  1.0.1              pyhd3eb1b0_0    defaults
+    werkzeug                  2.0.1              pyhd3eb1b0_0    defaults
    wheel                     0.36.2             pyhd3deb0d_0    conda-forge
    wrapt                     1.12.1           py38h7b6447c_1    defaults
-    xarray                    0.18.0             pyhd3eb1b0_1    defaults
+    xarray                    0.19.0             pyhd3eb1b0_1    defaults
-    xhistogram                0.1.2              pyhd8ed1ab_0    conda-forge
+    xhistogram                0.3.0              pyhd8ed1ab_0    conda-forge
-    xskillscore               0.0.20             pyhd8ed1ab_1    conda-forge
+    xskillscore               0.0.23             pyhd8ed1ab_0    conda-forge
    xz                        5.2.5                h516909a_1    conda-forge
+    yagup                     0.1.1                    pypi_0    pypi
    yaml                      0.2.5                h516909a_0    conda-forge
    yarl                      1.6.3            py38h27cfd23_0    defaults
    zarr                      2.8.1              pyhd3eb1b0_0    defaults
    zeromq                    4.3.3                h58526e2_3    conda-forge
    zict                      2.0.0              pyhd3eb1b0_0    defaults
    zipp                      3.4.0                      py_0    conda-forge
    zlib                      1.2.11            h516909a_1010    conda-forge
    zstd                      1.4.9                haebb681_0    defaults
 %% Cell type:code id: tags:
 ``` python
 ```

 %% Cell type:markdown id: tags:
 # Train ML model to correct predictions of week 3-4 & 5-6
 This notebook create a Machine Learning `ML_model` to predict weeks 3-4 & 5-6 based on `S2S` weeks 3-4 & 5-6 forecasts and is compared to `CPC` observations for the [`s2s-ai-challenge`](https://s2s-ai-challenge.github.io/).
 %% Cell type:markdown id: tags:
 # Synopsis
 %% Cell type:markdown id: tags:
 ## Method: `ML-based mean bias reduction`
 - calculate the ML-based bias from 2000-2019 deterministic ensemble mean forecast
 - remove that the ML-based bias from 2020 forecast deterministic ensemble mean forecast
 %% Cell type:markdown id: tags:
 ## Data used
 type: renku datasets
 Training-input for Machine Learning model:
 - hindcasts of models:
    - ECMWF: `ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr`
 Forecast-input for Machine Learning model:
 - real-time 2020 forecasts of models:
    - ECMWF: `ecmwf_forecast-input_2020_biweekly_deterministic.zarr`
 Compare Machine Learning model forecast against against ground truth:
 - `CPC` observations:
    - `hindcast-like-observations_biweekly_deterministic.zarr`
    - `forecast-like-observations_2020_biweekly_deterministic.zarr`
 %% Cell type:markdown id: tags:
 ## Resources used
 for training, details in reproducibility
 - platform: renku
 - memory: 8 GB
 - processors: 2 CPU
 - storage required: 10 GB
 %% Cell type:markdown id: tags:
 ## Safeguards
 All points have to be [x] checked. If not, your submission is invalid.
 Changes to the code after submissions are not possible, as the `commit` before the `tag` will be reviewed.
 (Only in exceptions and if previous effort in reproducibility can be found, it may be allowed to improve readability and reproducibility after November 1st 2021.)
 %% Cell type:markdown id: tags:
 ### Safeguards to prevent [overfitting](https://en.wikipedia.org/wiki/Overfitting?wprov=sfti1)
 If the organizers suspect overfitting, your contribution can be disqualified.
  - [x] We did not use 2020 observations in training (explicit overfitting and cheating)
  - [x] We did not repeatedly verify my model on 2020 observations and incrementally improved my RPSS (implicit overfitting)
  - [x] We provide RPSS scores for the training period with script `print_RPS_per_year`, see in section 6.3 `predict`.
  - [x] We tried our best to prevent [data leakage](https://en.wikipedia.org/wiki/Leakage_(machine_learning)?wprov=sfti1).
  - [x] We honor the `train-validate-test` [split principle](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets). This means that the hindcast data is split into `train` and `validate`, whereas `test` is withheld.
  - [x] We did not use `test` explicitly in training or implicitly in incrementally adjusting parameters.
  - [x] We considered [cross-validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)).
 %% Cell type:markdown id: tags:
 ### Safeguards for Reproducibility
 Notebook/code must be independently reproducible from scratch by the organizers (after the competition), if not possible: no prize
  - [x] All training data is publicly available (no pre-trained private neural networks, as they are not reproducible for us)
  - [x] Code is well documented, readable and reproducible.
  - [x] Code to reproduce training and predictions is preferred to run within a day on the described architecture. If the training takes longer than a day, please justify why this is needed. Please do not submit training piplelines, which take weeks to train.
 %% Cell type:markdown id: tags:
 # Todos to improve template
 This is just a demo.
 - [ ] use multiple predictor variables and two predicted variables
 - [ ] for both `lead_time`s in one go
 - [ ] consider seasonality, for now all `forecast_time` months are mixed
 - [ ] make probabilistic predictions with `category` dim, for now works deterministic
 %% Cell type:markdown id: tags:
 # Imports
 %% Cell type:code id: tags:
 ``` python
 from tensorflow.keras.layers import Input, Dense, Flatten
 from tensorflow.keras.models import Sequential
 import matplotlib.pyplot as plt
 import xarray as xr
 xr.set_options(display_style='text')
 import numpy as np
 from dask.utils import format_bytes
 import xskillscore as xs
 ```
-%% Output
-    /opt/conda/lib/python3.8/site-packages/xarray/backends/cfgrib_.py:27: UserWarning: Failed to load cfgrib - most likely there is a problem accessing the ecCodes library. Try `import cfgrib` to get the full error message
-      warnings.warn(
 %% Cell type:markdown id: tags:
 # Get training data
 preprocessing of input data may be done in separate notebook/script
 %% Cell type:markdown id: tags:
 ## Hindcast
 get weekly initialized hindcasts
 %% Cell type:code id: tags:
 ``` python
 v='t2m'
 ```
 %% Cell type:code id: tags:
 ``` python
 # preprocessed as renku dataset
 !renku storage pull ../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr
 ```
 %% Output
    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m
 %% Cell type:code id: tags:
 ``` python
 hind_2000_2019 = xr.open_zarr("../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr", consolidated=True)
 ```
-%% Output
-    /opt/conda/lib/python3.8/site-packages/xarray/backends/plugins.py:61: RuntimeWarning: Engine 'cfgrib' loading failed:
-    /opt/conda/lib/python3.8/site-packages/gribapi/_bindings.cpython-38-x86_64-linux-gnu.so: undefined symbol: codes_bufr_key_is_header
-      warnings.warn(f"Engine {name!r} loading failed:\n{ex}", RuntimeWarning)
 %% Cell type:code id: tags:
 ``` python
 # preprocessed as renku dataset
 !renku storage pull ../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr
 ```
 %% Output
    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m
 %% Cell type:code id: tags:
 ``` python
 fct_2020 = xr.open_zarr("../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr", consolidated=True)
 ```
 %% Cell type:markdown id: tags:
 ## Observations
 corresponding to hindcasts
 %% Cell type:code id: tags:
 ``` python
 # preprocessed as renku dataset
 !renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
 ```
 %% Output
    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m
 %% Cell type:code id: tags:
 ``` python
 obs_2000_2019 = xr.open_zarr("../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr", consolidated=True)#[v]
 ```
 %% Cell type:code id: tags:
 ``` python
 # preprocessed as renku dataset
 !renku storage pull ../data/forecast-like-observations_2020_biweekly_deterministic.zarr
 ```
 %% Output
    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m
 %% Cell type:code id: tags:
 ``` python
 obs_2020 = xr.open_zarr("../data/forecast-like-observations_2020_biweekly_deterministic.zarr", consolidated=True)#[v]
 ```
 %% Cell type:markdown id: tags:
 # ML model
 %% Cell type:markdown id: tags:
 based on [Weatherbench](https://github.com/pangeo-data/WeatherBench/blob/master/quickstart.ipynb)
 %% Cell type:code id: tags:
 ``` python
 # run once only and dont commit
 !git clone https://github.com/pangeo-data/WeatherBench/
 ```
 %% Output
-    Cloning into 'WeatherBench'...
+    fatal: destination path 'WeatherBench' already exists and is not an empty directory.
-    remote: Enumerating objects: 718, done.[K
-    remote: Counting objects: 100% (3/3), done.[K
-    remote: Compressing objects: 100% (3/3), done.[K
-    remote: Total 718 (delta 0), reused 0 (delta 0), pack-reused 715[K
-    Receiving objects: 100% (718/718), 17.77 MiB | 14.96 MiB/s, done.
-    Resolving deltas: 100% (424/424), done.
 %% Cell type:code id: tags:
 ``` python
 import sys
 sys.path.insert(1, 'WeatherBench')
 from WeatherBench.src.train_nn import DataGenerator, PeriodicConv2D, create_predictions
 import tensorflow.keras as keras
 ```
 %% Cell type:code id: tags:
 ``` python
 bs=32
 import numpy as np
 class DataGenerator(keras.utils.Sequence):
    def __init__(self, fct, verif, lead_time, batch_size=bs, shuffle=True, load=True,
                 mean=None, std=None):
        """
        Data generator for WeatherBench data.
        Template from https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly
        Args:
            fct: forecasts from S2S models: xr.DataArray (xr.Dataset doesnt work properly)
            verif: observations with same dimensionality (xr.Dataset doesnt work properly)
            lead_time: Lead_time as in model
            batch_size: Batch size
            shuffle: bool. If True, data is shuffled.
            load: bool. If True, datadet is loaded into RAM.
            mean: If None, compute mean from data.
            std: If None, compute standard deviation from data.
        Todo:
        - use number in a better way, now uses only ensemble mean forecast
        - dont use .sel(lead_time=lead_time) to train over all lead_time at once
        - be sensitive with forecast_time, pool a few around the weekofyear given
        - use more variables as predictors
        - predict more variables
        """
        if isinstance(fct, xr.Dataset):
            print('convert fct to array')
            fct = fct.to_array().transpose(...,'variable')
            self.fct_dataset=True
        else:
            self.fct_dataset=False
        if isinstance(verif, xr.Dataset):
            print('convert verif to array')
            verif = verif.to_array().transpose(...,'variable')
            self.verif_dataset=True
        else:
            self.verif_dataset=False
        #self.fct = fct
        self.batch_size = batch_size
        self.shuffle = shuffle
        self.lead_time = lead_time
        self.fct_data = fct.transpose('forecast_time', ...).sel(lead_time=lead_time)
        self.fct_mean = self.fct_data.mean('forecast_time').compute() if mean is None else mean
        self.fct_std = self.fct_data.std('forecast_time').compute() if std is None else std
        self.verif_data = verif.transpose('forecast_time', ...).sel(lead_time=lead_time)
        self.verif_mean = self.verif_data.mean('forecast_time').compute() if mean is None else mean
        self.verif_std = self.verif_data.std('forecast_time').compute() if std is None else std
        # Normalize
        self.fct_data = (self.fct_data - self.fct_mean) / self.fct_std
        self.verif_data = (self.verif_data - self.verif_mean) / self.verif_std
        self.n_samples = self.fct_data.forecast_time.size
        self.forecast_time = self.fct_data.forecast_time
        self.on_epoch_end()
        # For some weird reason calling .load() earlier messes up the mean and std computations
        if load:
            # print('Loading data into RAM')
            self.fct_data.load()
    def __len__(self):
        'Denotes the number of batches per epoch'
        return int(np.ceil(self.n_samples / self.batch_size))
    def __getitem__(self, i):
        'Generate one batch of data'
        idxs = self.idxs[i * self.batch_size:(i + 1) * self.batch_size]
        # got all nan if nans not masked
        X = self.fct_data.isel(forecast_time=idxs).fillna(0.).values
        y = self.verif_data.isel(forecast_time=idxs).fillna(0.).values
        return X, y
    def on_epoch_end(self):
        'Updates indexes after each epoch'
        self.idxs = np.arange(self.n_samples)
        if self.shuffle == True:
            np.random.shuffle(self.idxs)
 ```
 %% Cell type:code id: tags:
 ``` python
 # 2 bi-weekly `lead_time`: week 3-4
 lead = hind_2000_2019.isel(lead_time=0).lead_time
 lead
 ```
 %% Output
    <xarray.DataArray 'lead_time' ()>
    array(1209600000000000, dtype='timedelta64[ns]')
    Coordinates:
        lead_time  timedelta64[ns] 14 days
    Attributes:
-        comment:  lead_time describes bi-weekly aggregates. The pd.Timedelta corr...
+        aggregate:      The pd.Timedelta corresponds to the first day of a biweek...
+        description:    Forecast period is the time interval between the forecast...
+        long_name:      lead time
+        standard_name:  forecast_period
+        week34_t2m:     mean[14 days, 27 days]
+        week34_tp:      28 days minus 14 days
+        week56_t2m:     mean[28 days, 41 days]
+        week56_tp:      42 days minus 28 days
 %% Cell type:code id: tags:
 ``` python
 # mask, needed?
 hind_2000_2019 = hind_2000_2019.where(obs_2000_2019.isel(forecast_time=0, lead_time=0,drop=True).notnull())
 ```
 %% Cell type:markdown id: tags:
 ## data prep: train, valid, test
 [Use the hindcast period to split train and valid.](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets) Do not use the 2020 data for testing!
 %% Cell type:code id: tags:
 ``` python
 # time is the forecast_time
 time_train_start,time_train_end='2000','2017' # train
 time_valid_start,time_valid_end='2018','2019' # valid
 time_test = '2020'                            # test
 ```
 %% Cell type:code id: tags:
 ``` python
 dg_train = DataGenerator(
    hind_2000_2019.mean('realization').sel(forecast_time=slice(time_train_start,time_train_end))[v],
    obs_2000_2019.sel(forecast_time=slice(time_train_start,time_train_end))[v],
    lead_time=lead, batch_size=bs, load=True)
 ```
 %% Output
-    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
-    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
-    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
-    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
-    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
 %% Cell type:code id: tags:
 ``` python
 dg_valid = DataGenerator(
    hind_2000_2019.mean('realization').sel(forecast_time=slice(time_valid_start,time_valid_end))[v],
    obs_2000_2019.sel(forecast_time=slice(time_valid_start,time_valid_end))[v],
    lead_time=lead, batch_size=bs, shuffle=False, load=True)
 ```
 %% Output
-    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
-    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
-    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
-    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
-    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
 %% Cell type:code id: tags:
 ``` python
 # do not use, delete?
 dg_test = DataGenerator(
    fct_2020.mean('realization').sel(forecast_time=time_test)[v],
    obs_2020.sel(forecast_time=time_test)[v],
    lead_time=lead, batch_size=bs, load=True, mean=dg_train.fct_mean, std=dg_train.fct_std, shuffle=False)
 ```
 %% Cell type:code id: tags:
 ``` python
 X, y = dg_valid[0]
 X.shape, y.shape
 ```
 %% Output
    ((32, 121, 240), (32, 121, 240))
 %% Cell type:code id: tags:
 ``` python
 # short look into training data: large biases
 # any problem from normalizing?
-i=4
+# i=4
-xr.DataArray(np.vstack([X[i],y[i]])).plot(yincrease=False, robust=True)
+# xr.DataArray(np.vstack([X[i],y[i]])).plot(yincrease=False, robust=True)
 ```
-%% Output
-    <matplotlib.collections.QuadMesh at 0x7fd217042850>
 %% Cell type:markdown id: tags:
 ## `fit`
 %% Cell type:code id: tags:
 ``` python
 cnn = keras.models.Sequential([
    PeriodicConv2D(filters=32, kernel_size=5, conv_kwargs={'activation':'relu'}, input_shape=(32, 64, 1)),
    PeriodicConv2D(filters=1, kernel_size=5)
 ])
 ```
 %% Output
-    WARNING:tensorflow:AutoGraph could not transform <bound method PeriodicPadding2D.call of <WeatherBench.src.train_nn.PeriodicPadding2D object at 0x7fd217b127c0>> and will run it as-is.
+    WARNING:tensorflow:AutoGraph could not transform <bound method PeriodicPadding2D.call of <WeatherBench.src.train_nn.PeriodicPadding2D object at 0x7f86042986a0>> and will run it as-is.
    Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
    Cause: module 'gast' has no attribute 'Index'
    To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
-    WARNING: AutoGraph could not transform <bound method PeriodicPadding2D.call of <WeatherBench.src.train_nn.PeriodicPadding2D object at 0x7fd217b127c0>> and will run it as-is.
+    WARNING: AutoGraph could not transform <bound method PeriodicPadding2D.call of <WeatherBench.src.train_nn.PeriodicPadding2D object at 0x7f86042986a0>> and will run it as-is.
    Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
    Cause: module 'gast' has no attribute 'Index'
    To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
 %% Cell type:code id: tags:
 ``` python
 cnn.summary()
 ```
 %% Output
    Model: "sequential"
    _________________________________________________________________
    Layer (type)                 Output Shape              Param #
    =================================================================
    periodic_conv2d (PeriodicCon (None, 32, 64, 32)        832
    _________________________________________________________________
    periodic_conv2d_1 (PeriodicC (None, 32, 64, 1)         801
    =================================================================
    Total params: 1,633
    Trainable params: 1,633
    Non-trainable params: 0
    _________________________________________________________________
 %% Cell type:code id: tags:
 ``` python
 cnn.compile(keras.optimizers.Adam(1e-4), 'mse')
 ```
 %% Cell type:code id: tags:
 ``` python
 import warnings
 warnings.simplefilter("ignore")
 ```
 %% Cell type:code id: tags:
 ``` python
 cnn.fit(dg_train, epochs=2, validation_data=dg_valid)
 ```
 %% Output
    Epoch 1/2
-    30/30 [==============================] - 24s 719ms/step - loss: 0.3028 - val_loss: 0.1696
+    30/30 [==============================] - 58s 2s/step - loss: 0.1472 - val_loss: 0.0742
    Epoch 2/2
-    30/30 [==============================] - 21s 697ms/step - loss: 0.1617 - val_loss: 0.0993
+    30/30 [==============================] - 45s 1s/step - loss: 0.0712 - val_loss: 0.0545
-    <tensorflow.python.keras.callbacks.History at 0x7fd2166d8d90>
+    <tensorflow.python.keras.callbacks.History at 0x7f865c2103d0>
 %% Cell type:markdown id: tags:
 ## `predict`
 Create predictions and print `mean(variable, lead_time, longitude, weighted latitude)` RPSS for all years as calculated by `skill_by_year`.
 %% Cell type:code id: tags:
 ``` python
 from scripts import add_valid_time_from_forecast_reference_time_and_lead_time
 def _create_predictions(model, dg, lead):
    """Create non-iterative predictions"""
    preds = model.predict(dg).squeeze()
    # Unnormalize
    preds = preds * dg.fct_std.values + dg.fct_mean.values
    if dg.verif_dataset:
        da = xr.DataArray(
                    preds,
                    dims=['forecast_time', 'latitude', 'longitude','variable'],
                    coords={'forecast_time': dg.fct_data.forecast_time, 'latitude': dg.fct_data.latitude,
                            'longitude': dg.fct_data.longitude},
                ).to_dataset() # doesnt work yet
    else:
        da = xr.DataArray(
                    preds,
                    dims=['forecast_time', 'latitude', 'longitude'],
                    coords={'forecast_time': dg.fct_data.forecast_time, 'latitude': dg.fct_data.latitude,
                            'longitude': dg.fct_data.longitude},
                )
    da = da.assign_coords(lead_time=lead)
    # da = add_valid_time_from_forecast_reference_time_and_lead_time(da)
    return da
 ```
 %% Cell type:code id: tags:
 ``` python
 # optionally masking the ocean when making probabilistic
 mask = obs_2020.std(['lead_time','forecast_time']).notnull()
 ```
 %% Cell type:code id: tags:
 ``` python
 from scripts import make_probabilistic
 ```
 %% Cell type:code id: tags:
 ``` python
 !renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc
 ```
 %% Output
    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m
 %% Cell type:code id: tags:
 ``` python
 cache_path='../data'
 tercile_file = f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc'
 tercile_edges = xr.open_dataset(tercile_file)
 ```
 %% Cell type:code id: tags:
 ``` python
 # this is not useful but results have expected dimensions
 # actually train for each lead_time
 def create_predictions(cnn, fct, obs, time):
    preds_test=[]
    for lead in fct.lead_time:
        dg = DataGenerator(fct.mean('realization').sel(forecast_time=time)[v],
                           obs.sel(forecast_time=time)[v],
                           lead_time=lead, batch_size=bs, mean=dg_train.fct_mean, std=dg_train.fct_std, shuffle=False)
        preds_test.append(_create_predictions(cnn, dg, lead))
    preds_test = xr.concat(preds_test, 'lead_time')
    preds_test['lead_time'] = fct.lead_time
    # add valid_time coord
    preds_test = add_valid_time_from_forecast_reference_time_and_lead_time(preds_test)
    preds_test = preds_test.to_dataset(name=v)
    # add fake var
    preds_test['tp'] = preds_test['t2m']
    # make probabilistic
    preds_test = make_probabilistic(preds_test.expand_dims('realization'), tercile_edges, mask=mask)
    return preds_test
 ```
 %% Cell type:markdown id: tags:
 ### `predict` training period in-sample
 %% Cell type:code id: tags:
 ``` python
 !renku storage pull ../data/forecast-like-observations_2020_biweekly_terciled.nc
 ```
 %% Output
    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m
 %% Cell type:code id: tags:
 ``` python
 !renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr
 ```
 %% Output
    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
    [0m
 %% Cell type:code id: tags:
 ``` python
 from scripts import skill_by_year
 import os
 if os.environ['HOME'] == '/home/jovyan':
    import pandas as pd
    # assume on renku with small memory
    step = 2
    skill_list = []
    for year in np.arange(int(time_train_start), int(time_train_end) -1, step): # loop over years to consume less memory on renku
        preds_is = create_predictions(cnn, hind_2000_2019, obs_2000_2019, time=slice(str(year), str(year+step-1))).compute()
        skill_list.append(skill_by_year(preds_is))
    skill = pd.concat(skill_list)
 else: # with larger memory, simply do
    preds_is = create_predictions(cnn, hind_2000_2019, obs_2000_2019, time=slice(time_train_start, time_train_end))
    skill = skill_by_year(preds_is)
 skill
 ```
 %% Output
              RPSS
    year
-    2000 -1.293103
+    2000 -0.862483
-    2001 -1.446606
+    2001 -1.015485
-    2002 -1.494487
+    2002 -1.101022
-    2003 -1.484899
+    2003 -1.032647
-    2004 -1.421862
+    2004 -1.056348
-    2005 -1.549783
+    2005 -1.165675
-    2006 -1.508035
+    2006 -1.057217
-    2007 -1.502208
+    2007 -1.170849
-    2008 -1.493371
+    2008 -1.049785
-    2009 -1.568156
+    2009 -1.169108
-    2010 -1.519528
+    2010 -1.130845
-    2011 -1.389702
+    2011 -1.052670
-    2012 -1.499871
+    2012 -1.126449
-    2013 -1.549204
+    2013 -1.126930
-    2014 -1.500869
+    2014 -1.095896
-    2015 -1.506727
+    2015 -1.117486
 %% Cell type:markdown id: tags:
 ### `predict` validation period out-of-sample
 %% Cell type:code id: tags:
 ``` python
 preds_os = create_predictions(cnn, hind_2000_2019, obs_2000_2019, time=slice(time_valid_start, time_valid_end))
 skill_by_year(preds_os)
 ```
 %% Output
              RPSS
    year
-    2018 -1.432631
+    2018 -1.099744
-    2019 -1.544451
+    2019 -1.172401
 %% Cell type:markdown id: tags:
 ### `predict` test
 %% Cell type:code id: tags:
 ``` python
 preds_test = create_predictions(cnn, fct_2020, obs_2020, time=time_test)
 skill_by_year(preds_test)
 ```
 %% Output
-            RPSS
+              RPSS
    year
-    2020 -1.4709
+    2020 -1.076834
 %% Cell type:markdown id: tags:
 # Submission
 %% Cell type:code id: tags:
 ``` python
 from scripts import assert_predictions_2020
 assert_predictions_2020(preds_test)
 ```
 %% Cell type:code id: tags:
 ``` python
 preds_test.to_netcdf('../submissions/ML_prediction_2020.nc')
 ```
 %% Cell type:code id: tags:
 ``` python
-#!git add ../submissions/ML_prediction_2020.nc
+# !git add ../submissions/ML_prediction_2020.nc
-#!git add ML_train_and_prediction.ipynb
+# !git add ML_train_and_prediction.ipynb
 ```
 %% Cell type:code id: tags:
 ``` python
-#!git commit -m "template_test commit message" # whatever message you want
+# !git commit -m "template_test commit message" # whatever message you want
 ```
 %% Cell type:code id: tags:
 ``` python
-#!git tag "submission-template_test-0.0.1" # if this is to be checked by scorer, only the last submitted==tagged version will be considered
+# !git tag "submission-template_test-0.0.1" # if this is to be checked by scorer, only the last submitted==tagged version will be considered
 ```
 %% Cell type:code id: tags:
 ``` python
-#!git push --tags
+# !git push --tags
 ```
 %% Cell type:code id: tags:
 ``` python
 ```
 %% Cell type:markdown id: tags:
 # Reproducibility
 %% Cell type:markdown id: tags:
 ## memory
 %% Cell type:code id: tags:
 ``` python
 # https://phoenixnap.com/kb/linux-commands-check-memory-usage
 !free -g
 ```
 %% Output
                  total        used        free      shared  buff/cache   available
-    Mem:             47          10          24           0          11          36
+    Mem:             31           7          11           0          12          24
    Swap:             0           0           0
 %% Cell type:markdown id: tags:
 ## CPU
 %% Cell type:code id: tags:
 ``` python
 !lscpu
 ```
 %% Output
    Architecture:                    x86_64
    CPU op-mode(s):                  32-bit, 64-bit
    Byte Order:                      Little Endian
    Address sizes:                   40 bits physical, 48 bits virtual
-    CPU(s):                          16
+    CPU(s):                          8
-    On-line CPU(s) list:             0-15
+    On-line CPU(s) list:             0-7
    Thread(s) per core:              1
    Core(s) per socket:              1
-    Socket(s):                       16
+    Socket(s):                       8
    NUMA node(s):                    1
    Vendor ID:                       GenuineIntel
    CPU family:                      6
-    Model:                           61
+    Model:                           85
-    Model name:                      Intel Core Processor (Broadwell, IBRS)
+    Model name:                      Intel Xeon Processor (Skylake, IBRS)
-    Stepping:                        2
+    Stepping:                        4
-    CPU MHz:                         2194.916
+    CPU MHz:                         2095.078
-    BogoMIPS:                        4389.83
+    BogoMIPS:                        4190.15
    Virtualization:                  VT-x
    Hypervisor vendor:               KVM
    Virtualization type:             full
-    L1d cache:                       512 KiB
+    L1d cache:                       256 KiB
-    L1i cache:                       512 KiB
+    L1i cache:                       256 KiB
-    L2 cache:                        64 MiB
+    L2 cache:                        32 MiB
-    NUMA node0 CPU(s):               0-15
+    L3 cache:                        128 MiB
+    NUMA node0 CPU(s):               0-7
    Vulnerability Itlb multihit:     KVM: Mitigation: Split huge pages
    Vulnerability L1tf:              Mitigation; PTE Inversion; VMX conditional cach
                                     e flushes, SMT disabled
-    Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT Host state u
+    Vulnerability Mds:               Vulnerable: Clear CPU buffers attempted, no mic
-                                     nknown
+                                     rocode; SMT Host state unknown
    Vulnerability Meltdown:          Mitigation; PTI
-    Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled v
+    Vulnerability Spec store bypass: Vulnerable
-                                     ia prctl and seccomp
    Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user
                                      pointer sanitization
    Vulnerability Spectre v2:        Mitigation; Full generic retpoline, IBPB condit
                                     ional, IBRS_FW, STIBP disabled, RSB filling
-    Vulnerability Tsx async abort:   Mitigation; Clear CPU buffers; SMT Host state u
+    Vulnerability Srbds:             Not affected
-                                     nknown
+    Vulnerability Tsx async abort:   Not affected
    Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtr
                                     r pge mca cmov pat pse36 clflush mmx fxsr sse s
-                                     se2 ss syscall nx pdpe1gb rdtscp lm constant_ts
+                                     se2 syscall nx pdpe1gb rdtscp lm constant_tsc r
-                                     c rep_good nopl cpuid tsc_known_freq pni pclmul
+                                     ep_good nopl xtopology cpuid tsc_known_freq pni
-                                     qdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2api
+                                      pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_
-                                     c movbe popcnt tsc_deadline_timer aes xsave avx
+                                     2 x2apic movbe popcnt tsc_deadline_timer aes xs
-                                      f16c rdrand hypervisor lahf_lm abm 3dnowprefet
+                                     ave avx f16c rdrand hypervisor lahf_lm abm 3dno
-                                     ch cpuid_fault invpcid_single pti ssbd ibrs ibp
+                                     wprefetch cpuid_fault invpcid_single pti ibrs i
-                                     b tpr_shadow vnmi flexpriority ept vpid fsgsbas
+                                     bpb tpr_shadow vnmi flexpriority ept vpid ept_a
-                                     e bmi1 hle avx2 smep bmi2 erms invpcid rtm rdse
+                                     d fsgsbase bmi1 avx2 smep bmi2 erms invpcid avx
-                                     ed adx smap xsaveopt arat md_clear
+                                     512f avx512dq rdseed adx smap clwb avx512cd avx
+                                     512bw avx512vl xsaveopt xsavec xgetbv1 arat pku
+                                      ospke
 %% Cell type:markdown id: tags:
 ## software
 %% Cell type:code id: tags:
 ``` python
 !conda list
 ```
 %% Output
    # packages in environment at /opt/conda:
    #
    # Name                    Version                   Build  Channel
    _libgcc_mutex             0.1                 conda_forge    conda-forge
    _openmp_mutex             4.5                       1_gnu    conda-forge
    _pytorch_select           0.1                       cpu_0    defaults
    _tflow_select             2.3.0                       mkl    defaults
-    absl-py                   0.12.0           py38h06a4308_0    defaults
+    absl-py                   0.13.0           py38h06a4308_0    defaults
-    aiobotocore               1.2.2              pyhd3eb1b0_0    defaults
+    aiobotocore               1.4.1              pyhd3eb1b0_0    defaults
-    aiohttp                   3.7.4.post0              pypi_0    pypi
+    aiohttp                   3.7.4.post0      py38h7f8727e_2    defaults
    aioitertools              0.7.1              pyhd3eb1b0_0    defaults
    alembic                   1.4.3              pyh9f0ad1d_0    conda-forge
    ansiwrap                  0.8.4                    pypi_0    pypi
    appdirs                   1.4.4                    pypi_0    pypi
-    argcomplete               1.12.2                   pypi_0    pypi
+    argcomplete               1.12.3                   pypi_0    pypi
    argon2-cffi               20.1.0           py38h497a2fe_2    conda-forge
    argparse                  1.4.0                    pypi_0    pypi
    asciitree                 0.3.3                      py_2    defaults
    astor                     0.8.1            py38h06a4308_0    defaults
    astunparse                1.6.3                      py_0    defaults
    async-timeout             3.0.1                    pypi_0    pypi
    async_generator           1.10                       py_0    conda-forge
-    attrs                     20.3.0             pyhd3deb0d_0    conda-forge
+    attrs                     21.2.0                   pypi_0    pypi
    backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
    backports                 1.0                        py_2    conda-forge
    backports.functools_lru_cache 1.6.1                      py_0    conda-forge
-    beautifulsoup4            4.9.3              pyha847dfd_0    defaults
+    bagit                     1.8.1                    pypi_0    pypi
+    beautifulsoup4            4.10.0             pyh06a4308_0    defaults
    binutils_impl_linux-64    2.35.1               h193b22a_1    conda-forge
    binutils_linux-64         2.35                h67ddf6f_30    conda-forge
    black                     20.8b1                   pypi_0    pypi
    blas                      1.0                         mkl    defaults
    bleach                    3.2.1              pyh9f0ad1d_0    conda-forge
    blinker                   1.4                        py_1    conda-forge
-    bokeh                     2.3.2            py38h06a4308_0    defaults
+    bokeh                     2.3.3            py38h06a4308_0    defaults
-    botocore                  1.20.84            pyhd3eb1b0_1    defaults
+    botocore                  1.20.106           pyhd3eb1b0_0    defaults
    bottleneck                1.3.2            py38heb32a55_1    defaults
+    bracex                    2.1.1                    pypi_0    pypi
    branca                    0.3.1                    pypi_0    pypi
+    brotli                    1.0.9                he6710b0_2    defaults
    brotlipy                  0.7.0           py38h497a2fe_1001    conda-forge
    bzip2                     1.0.8                h7f98852_4    conda-forge
    c-ares                    1.17.1               h36c2ea0_0    conda-forge
-    ca-certificates           2021.5.25            h06a4308_1    defaults
+    ca-certificates           2021.7.5             h06a4308_1    defaults
-    cachetools                4.2.2              pyhd3eb1b0_0    defaults
+    cachecontrol              0.12.6                   pypi_0    pypi
+    cachetools                4.2.4                    pypi_0    pypi
+    calamus                   0.3.12                   pypi_0    pypi
    cdsapi                    0.5.1                    pypi_0    pypi
-    certifi                   2021.5.30        py38h06a4308_0    defaults
+    certifi                   2021.5.30                pypi_0    pypi
    certipy                   0.1.3                      py_0    conda-forge
-    cffi                      1.14.4           py38ha65f79e_1    conda-forge
+    cffi                      1.14.6                   pypi_0    pypi
    cfgrib                    0.9.9.0            pyhd8ed1ab_1    conda-forge
    cftime                    1.5.0            py38h6323ea4_0    defaults
-    chardet                   4.0.0            py38h578d9bd_1    conda-forge
+    chardet                   3.0.4                    pypi_0    pypi
    click                     7.1.2                    pypi_0    pypi
-    climetlab                 0.7.2                    pypi_0    pypi
+    click-completion          0.5.2                    pypi_0    pypi
-    climetlab-s2s-ai-challenge 0.6.7                    pypi_0    pypi
+    click-option-group        0.5.3                    pypi_0    pypi
-    cloudpickle               1.6.0                      py_0    defaults
+    click-plugins             1.1.1                    pypi_0    pypi
+    climetlab                 0.8.31                   pypi_0    pypi
+    climetlab-s2s-ai-challenge 0.8.0                    pypi_0    pypi
+    cloudpickle               2.0.0              pyhd3eb1b0_0    defaults
    colorama                  0.4.4                    pypi_0    pypi
+    coloredlogs               15.0.1                   pypi_0    pypi
+    commonmark                0.9.1                    pypi_0    pypi
    conda                     4.9.2            py38h578d9bd_0    conda-forge
    conda-package-handling    1.7.2            py38h8df0ef7_0    conda-forge
-    configargparse            1.4.1                    pypi_0    pypi
+    configargparse            1.5.2                    pypi_0    pypi
    configurable-http-proxy   1.3.0                         0    conda-forge
    coverage                  5.5              py38h27cfd23_2    defaults
-    cryptography              3.3.1            py38h2b97feb_1    conda-forge
+    cryptography              3.4.8                    pypi_0    pypi
    curl                      7.71.1               he644dc0_8    conda-forge
+    cwlgen                    0.4.2                    pypi_0    pypi
+    cwltool                   3.1.20211004060744          pypi_0    pypi
    cycler                    0.10.0                   py38_0    defaults
-    cython                    0.29.23          py38h2531618_0    defaults
+    cython                    0.29.24          py38h295c915_0    defaults
    cytoolz                   0.11.0           py38h7b6447c_0    defaults
-    dask                      2021.5.1           pyhd3eb1b0_0    defaults
+    dask                      2021.8.1           pyhd3eb1b0_0    defaults
-    dask-core                 2021.5.1           pyhd3eb1b0_0    defaults
+    dask-core                 2021.8.1           pyhd3eb1b0_0    defaults
+    dataclasses               0.8                pyh6d0b6a4_7    defaults
    decorator                 4.4.2                      py_0    conda-forge
    defusedxml                0.6.0                      py_0    conda-forge
-    distributed               2021.5.1         py38h06a4308_0    defaults
+    distributed               2021.8.1         py38h06a4308_0    defaults
    distro                    1.5.0                    pypi_0    pypi
    docopt                    0.6.2            py38h06a4308_0    defaults
-    eccodes                   2.18.0               hf05d9b7_0    conda-forge
+    eccodes                   2.21.0               ha0e6eb6_0    conda-forge
    ecmwf-api-client          1.6.1                    pypi_0    pypi
-    ecmwflibs                 0.3.7                    pypi_0    pypi
+    ecmwflibs                 0.3.14                   pypi_0    pypi
    entrypoints               0.3             pyhd8ed1ab_1003    conda-forge
-    fasteners                 0.16.1             pyhd3eb1b0_0    defaults
+    environ-config            21.2.0                   pypi_0    pypi
+    fasteners                 0.16.3             pyhd3eb1b0_0    defaults
+    filelock                  3.0.12                   pypi_0    pypi
    findlibs                  0.0.2                    pypi_0    pypi
-    folium                    0.12.1                   pypi_0    pypi
+    fonttools                 4.25.0             pyhd3eb1b0_0    defaults
    freetype                  2.10.4               h5ab3b9f_0    defaults
-    fsspec                    0.9.0              pyhd3eb1b0_0    defaults
+    frozendict                2.0.6                    pypi_0    pypi
-    gast                      0.4.0                      py_0    defaults
+    fsspec                    2021.7.0           pyhd3eb1b0_0    defaults
+    gast                      0.4.0              pyhd3eb1b0_0    defaults
    gcc_impl_linux-64         9.3.0               h70c0ae5_18    conda-forge
    gcc_linux-64              9.3.0               hf25ea35_30    conda-forge
    gitdb                     4.0.7                    pypi_0    pypi
    gitpython                 3.1.14                   pypi_0    pypi
-    google-auth               1.30.1             pyhd3eb1b0_0    defaults
+    google-auth               1.33.0             pyhd3eb1b0_0    defaults
    google-auth-oauthlib      0.4.4              pyhd3eb1b0_0    defaults
-    google-pasta              0.2.0                      py_0    defaults
+    google-pasta              0.2.0              pyhd3eb1b0_0    defaults
    grpcio                    1.36.1           py38h2157cd5_1    defaults
    gxx_impl_linux-64         9.3.0               hd87eabc_18    conda-forge
    gxx_linux-64              9.3.0               h3fbe746_30    conda-forge
    h5netcdf                  0.11.0             pyhd8ed1ab_0    conda-forge
    h5py                      2.10.0           py38hd6299e0_1    defaults
    hdf4                      4.2.13               h3ca952b_2    defaults
-    hdf5                      1.10.6          nompi_h3c11f04_101    conda-forge
+    hdf5                      1.10.6          nompi_h6a2412b_1114    conda-forge
-    heapdict                  1.0.1                      py_0    defaults
+    heapdict                  1.0.1              pyhd3eb1b0_0    defaults
+    humanfriendly             10.0                     pypi_0    pypi
+    humanize                  3.7.1                    pypi_0    pypi
    icu                       68.1                 h58526e2_0    conda-forge
    idna                      2.10               pyh9f0ad1d_0    conda-forge
    importlib-metadata        3.4.0            py38h578d9bd_0    conda-forge
    importlib_metadata        3.4.0                hd8ed1ab_0    conda-forge
-    intake                    0.6.2              pyhd3eb1b0_0    defaults
+    intake                    0.6.3              pyhd3eb1b0_0    defaults
    intake-xarray             0.5.0              pyhd3eb1b0_0    defaults
    intel-openmp              2019.4                      243    defaults
    ipykernel                 5.4.2            py38h81c977d_0    conda-forge
    ipython                   7.19.0           py38h81c977d_2    conda-forge
    ipython_genutils          0.2.0                      py_1    conda-forge
+    isodate                   0.6.0                    pypi_0    pypi
    jasper                    1.900.1              hd497a04_4    defaults
    jedi                      0.17.2           py38h578d9bd_1    conda-forge
-    jinja2                    2.11.2             pyh9f0ad1d_0    conda-forge
+    jellyfish                 0.8.8                    pypi_0    pypi
-    jmespath                  0.10.0                     py_0    defaults
+    jinja2                    3.0.1                    pypi_0    pypi
+    jmespath                  0.10.0             pyhd3eb1b0_0    defaults
    joblib                    1.0.1              pyhd3eb1b0_0    defaults
-    jpeg                      9d                   h36c2ea0_0    conda-forge
+    jpeg                      9d                   h7f8727e_0    defaults
    json5                     0.9.5              pyh9f0ad1d_0    conda-forge
    jsonschema                3.2.0                      py_2    conda-forge
    jupyter-server-proxy      1.6.0                    pypi_0    pypi
    jupyter_client            6.1.11             pyhd8ed1ab_1    conda-forge
    jupyter_core              4.7.0            py38h578d9bd_0    conda-forge
    jupyter_telemetry         0.1.0              pyhd8ed1ab_1    conda-forge
    jupyterhub                1.2.2                    pypi_0    pypi
    jupyterlab                2.2.9                      py_0    conda-forge
    jupyterlab-git            0.23.3                   pypi_0    pypi
    jupyterlab_pygments       0.1.2              pyh9f0ad1d_0    conda-forge
    jupyterlab_server         1.2.0                      py_0    conda-forge
    keras-preprocessing       1.1.2              pyhd3eb1b0_0    defaults
    kernel-headers_linux-64   2.6.32              h77966d4_13    conda-forge
    kiwisolver                1.3.1            py38h2531618_0    defaults
    krb5                      1.17.2               h926e7f8_0    conda-forge
+    lazy-object-proxy         1.6.0                    pypi_0    pypi
    lcms2                     2.12                 h3be6417_0    defaults
    ld_impl_linux-64          2.35.1               hea4e1c9_1    conda-forge
    libaec                    1.0.4                he6710b0_1    defaults
+    libblas                   3.9.0           1_h86c2bf4_netlib    conda-forge
+    libcblas                  3.9.0           5_h92ddd45_netlib    conda-forge
    libcurl                   7.71.1               hcdd3856_8    conda-forge
    libedit                   3.1.20191231         he28a2e2_2    conda-forge
    libev                     4.33                 h516909a_1    conda-forge
    libffi                    3.3                  h58526e2_2    conda-forge
    libgcc-devel_linux-64     9.3.0               h7864c58_18    conda-forge
    libgcc-ng                 9.3.0               h2828fa1_18    conda-forge
-    libgfortran-ng            7.3.0                hdf63c60_0    defaults
+    libgfortran-ng            9.3.0               ha5ec8a7_17    defaults
+    libgfortran5              9.3.0               ha5ec8a7_17    defaults
    libgomp                   9.3.0               h2828fa1_18    conda-forge
+    liblapack                 3.9.0           5_h92ddd45_netlib    conda-forge
    libllvm10                 10.0.1               hbcb73fb_5    defaults
    libmklml                  2019.0.5                      0    defaults
    libnetcdf                 4.7.4           nompi_h56d31a8_107    conda-forge
    libnghttp2                1.41.0               h8cfc5f6_2    conda-forge
    libpng                    1.6.37               hbc83047_0    defaults
-    libprotobuf               3.14.0               h8c45485_0    defaults
+    libprotobuf               3.17.2               h4ff587b_1    defaults
    libsodium                 1.0.18               h36c2ea0_1    conda-forge
    libssh2                   1.9.0                hab1572f_5    conda-forge
    libstdcxx-devel_linux-64  9.3.0               hb016644_18    conda-forge
    libstdcxx-ng              9.3.0               h6de172a_18    conda-forge
    libtiff                   4.2.0                h85742a9_0    defaults
    libuv                     1.40.0               h7f98852_0    conda-forge
    libwebp-base              1.2.0                h27cfd23_0    defaults
    llvmlite                  0.36.0           py38h612dafd_4    defaults
    locket                    0.2.1            py38h06a4308_1    defaults
-    lz4-c                     1.9.3                h2531618_0    defaults
+    lockfile                  0.12.2                   pypi_0    pypi
+    lxml                      4.6.3                    pypi_0    pypi
+    lz4-c                     1.9.3                h295c915_1    defaults
    magics                    1.5.6                    pypi_0    pypi
    mako                      1.1.4              pyh44b312d_0    conda-forge
    markdown                  3.3.4            py38h06a4308_0    defaults
-    markupsafe                1.1.1            py38h497a2fe_3    conda-forge
+    markupsafe                2.0.1                    pypi_0    pypi
-    matplotlib-base           3.3.4            py38h62a2d02_0    defaults
+    marshmallow               3.13.0                   pypi_0    pypi
+    matplotlib-base           3.4.2            py38hab158f2_0    defaults
    mistune                   0.8.4           py38h497a2fe_1003    conda-forge
    mkl                       2020.2                      256    defaults
    mkl-service               2.3.0            py38he904b0f_0    defaults
    mkl_fft                   1.3.0            py38h54f3939_0    defaults
    mkl_random                1.1.1            py38h0573a6f_0    defaults
    msgpack-python            1.0.2            py38hff7bd54_1    defaults
    multidict                 5.1.0            py38h27cfd23_2    defaults
+    munkres                   1.1.4                      py_0    defaults
    mypy-extensions           0.4.3                    pypi_0    pypi
    nbclient                  0.5.0                    pypi_0    pypi
    nbconvert                 6.0.7            py38h578d9bd_3    conda-forge
    nbdime                    2.1.0                    pypi_0    pypi
    nbformat                  5.1.2              pyhd8ed1ab_1    conda-forge
    nbresuse                  0.4.0                    pypi_0    pypi
-    nc-time-axis              1.2.0                      py_1    conda-forge
+    nc-time-axis              1.3.1              pyhd8ed1ab_2    conda-forge
    ncurses                   6.2                  h58526e2_4    conda-forge
+    ndg-httpsclient           0.5.1                    pypi_0    pypi
    nest-asyncio              1.4.3              pyhd8ed1ab_0    conda-forge
    netcdf4                   1.5.4                    pypi_0    pypi
+    networkx                  2.6.3                    pypi_0    pypi
    ninja                     1.10.2               hff7bd54_1    defaults
    nodejs                    15.3.0               h25f6087_0    conda-forge
    notebook                  6.2.0            py38h578d9bd_0    conda-forge
    numba                     0.53.1           py38ha9443f7_0    defaults
-    numcodecs                 0.7.3            py38h2531618_0    defaults
+    numcodecs                 0.8.0            py38h2531618_0    defaults
+    numexpr                   2.7.3            py38hb2eb853_0    defaults
    numpy                     1.19.2           py38h54aff64_0    defaults
    numpy-base                1.19.2           py38hfa32c7d_0    defaults
    oauthlib                  3.0.1                      py_0    conda-forge
-    olefile                   0.46                       py_0    defaults
+    olefile                   0.46               pyhd3eb1b0_0    defaults
-    openssl                   1.1.1k               h27cfd23_0    defaults
+    openjpeg                  2.4.0                h3ad879b_0    defaults
+    openssl                   1.1.1l               h7f8727e_0    defaults
    opt_einsum                3.3.0              pyhd3eb1b0_1    defaults
+    owlrl                     5.2.3                    pypi_0    pypi
    packaging                 20.8               pyhd3deb0d_0    conda-forge
    pamela                    1.0.0                      py_0    conda-forge
-    pandas                    1.2.4            py38h2531618_0    defaults
+    pandas                    1.3.2            py38h8c16a72_0    defaults
    pandoc                    2.11.3.2             h7f98852_0    conda-forge
    pandocfilters             1.4.2                      py_1    conda-forge
    papermill                 2.3.1                    pypi_0    pypi
    parso                     0.7.1              pyh9f0ad1d_0    conda-forge
    partd                     1.2.0              pyhd3eb1b0_0    defaults
-    pathspec                  0.8.1                    pypi_0    pypi
+    pathspec                  0.9.0                    pypi_0    pypi
-    pdbufr                    0.8.2                    pypi_0    pypi
+    patool                    1.12                     pypi_0    pypi
+    pdbufr                    0.9.0                    pypi_0    pypi
    pexpect                   4.8.0              pyh9f0ad1d_2    conda-forge
    pickleshare               0.7.5                   py_1003    conda-forge
-    pillow                    8.2.0            py38he98fc37_0    defaults
+    pillow                    8.3.1            py38h2c7a002_0    defaults
    pip                       21.0.1                   pypi_0    pypi
    pipx                      0.16.1.0                 pypi_0    pypi
+    pluggy                    0.13.1                   pypi_0    pypi
+    portalocker               2.3.2                    pypi_0    pypi
    powerline-shell           0.7.0                    pypi_0    pypi
    prometheus_client         0.9.0              pyhd3deb0d_0    conda-forge
    prompt-toolkit            3.0.10             pyha770c72_0    conda-forge
    properscoring             0.1                        py_0    conda-forge
-    protobuf                  3.14.0           py38h2531618_1    defaults
+    protobuf                  3.17.2           py38h295c915_0    defaults
+    prov                      1.5.1                    pypi_0    pypi
    psutil                    5.8.0            py38h27cfd23_1    defaults
    ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
-    pyasn1                    0.4.8                      py_0    defaults
+    pyasn1                    0.4.8              pyhd3eb1b0_0    defaults
    pyasn1-modules            0.2.8                      py_0    defaults
    pycosat                   0.6.3           py38h497a2fe_1006    conda-forge
    pycparser                 2.20               pyh9f0ad1d_2    conda-forge
    pycurl                    7.43.0.6         py38h996a351_1    conda-forge
    pydap                     3.2.2           pyh9f0ad1d_1001    conda-forge
-    pygments                  2.7.4              pyhd8ed1ab_0    conda-forge
+    pydot                     1.4.2                    pypi_0    pypi
-    pyjwt                     2.0.1              pyhd8ed1ab_0    conda-forge
+    pygments                  2.10.0                   pypi_0    pypi
-    pyodc                     1.0.3                    pypi_0    pypi
+    pyjwt                     2.1.0                    pypi_0    pypi
+    pyld                      2.0.3                    pypi_0    pypi
+    pyodc                     1.1.1                    pypi_0    pypi
    pyopenssl                 20.0.1             pyhd8ed1ab_0    conda-forge
    pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
    pyrsistent                0.17.3           py38h497a2fe_2    conda-forge
+    pyshacl                   0.17.0.post1             pypi_0    pypi
    pysocks                   1.7.1            py38h578d9bd_3    conda-forge
    python                    3.8.6           hffdb5ce_4_cpython    conda-forge
    python-dateutil           2.8.1                      py_0    conda-forge
-    python-eccodes            2021.03.0        py38hb5d20a5_0    conda-forge
+    python-eccodes            2021.03.0        py38hb5d20a5_1    conda-forge
-    python-editor             1.0.4                      py_0    conda-forge
+    python-editor             1.0.4                    pypi_0    pypi
    python-flatbuffers        1.12               pyhd3eb1b0_0    defaults
    python-json-logger        2.0.1              pyh9f0ad1d_0    conda-forge
+    python-snappy             0.6.0            py38h2531618_3    defaults
    python_abi                3.8                      1_cp38    conda-forge
-    pytorch                   1.7.1           cpu_py38h6a09485_0    defaults
+    pytorch                   1.8.1           cpu_py38h60491be_0    defaults
    pytz                      2021.1             pyhd3eb1b0_0    defaults
    pyyaml                    5.4.1                    pypi_0    pypi
    pyzmq                     21.0.1           py38h3d7ac18_0    conda-forge
+    rdflib                    6.0.1                    pypi_0    pypi
+    rdflib-jsonld             0.5.0                    pypi_0    pypi
    readline                  8.0                  he28a2e2_2    conda-forge
    regex                     2021.4.4                 pypi_0    pypi
-    requests                  2.25.1             pyhd3deb0d_0    conda-forge
+    renku                     0.16.2                   pypi_0    pypi
+    requests                  2.24.0                   pypi_0    pypi
    requests-oauthlib         1.3.0                      py_0    defaults
+    rich                      10.3.0                   pypi_0    pypi
    rsa                       4.7.2              pyhd3eb1b0_1    defaults
-    ruamel.yaml               0.16.12          py38h497a2fe_2    conda-forge
+    ruamel-yaml               0.16.5                   pypi_0    pypi
    ruamel.yaml.clib          0.2.2            py38h497a2fe_2    conda-forge
    ruamel_yaml               0.15.80         py38h497a2fe_1003    conda-forge
-    s3fs                      0.6.0              pyhd3eb1b0_0    defaults
+    s3fs                      2021.7.0           pyhd3eb1b0_0    defaults
+    schema-salad              8.2.20210918131710          pypi_0    pypi
    scikit-learn              0.24.2           py38ha9443f7_0    defaults
-    scipy                     1.6.2            py38h91f5cce_0    defaults
+    scipy                     1.7.0            py38h7b17777_1    conda-forge
    send2trash                1.5.0                      py_0    conda-forge
-    setuptools                49.6.0           py38h578d9bd_3    conda-forge
+    setuptools                58.2.0                   pypi_0    pypi
+    setuptools-scm            6.0.1                    pypi_0    pypi
+    shellescape               3.8.1                    pypi_0    pypi
+    shellingham               1.4.0                    pypi_0    pypi
    simpervisor               0.4                      pypi_0    pypi
-    six                       1.15.0             pyh9f0ad1d_0    conda-forge
+    six                       1.16.0                   pypi_0    pypi
    smmap                     4.0.0                    pypi_0    pypi
-    sortedcontainers          2.3.0              pyhd3eb1b0_0    defaults
+    snappy                    1.1.8                he6710b0_0    defaults
+    sortedcontainers          2.4.0              pyhd3eb1b0_0    defaults
    soupsieve                 2.2.1              pyhd3eb1b0_0    defaults
    sqlalchemy                1.3.22           py38h497a2fe_1    conda-forge
    sqlite                    3.34.0               h74cdb3f_0    conda-forge
    sysroot_linux-64          2.12                h77966d4_13    conda-forge
+    tabulate                  0.8.9                    pypi_0    pypi
    tbb                       2020.3               hfd86e86_0    defaults
-    tblib                     1.7.0                      py_0    defaults
+    tblib                     1.7.0              pyhd3eb1b0_0    defaults
    tenacity                  7.0.0                    pypi_0    pypi
    tensorboard               2.4.0              pyhc547734_0    defaults
    tensorboard-plugin-wit    1.6.0                      py_0    defaults
    tensorflow                2.4.1           mkl_py38hb2083e0_0    defaults
    tensorflow-base           2.4.1           mkl_py38h43e0292_0    defaults
-    tensorflow-estimator      2.5.0              pyh7b7c402_0    defaults
+    tensorflow-estimator      2.6.0              pyh7b7c402_0    defaults
    termcolor                 1.1.0            py38h06a4308_1    defaults
    terminado                 0.9.2            py38h578d9bd_0    conda-forge
    testpath                  0.4.4                      py_0    conda-forge
    textwrap3                 0.9.2                    pypi_0    pypi
-    threadpoolctl             2.1.0              pyh5ca1d4c_0    defaults
+    threadpoolctl             2.2.0              pyh0d69192_0    defaults
    tini                      0.18.0            h14c3975_1001    conda-forge
    tk                        8.6.10               h21135ba_1    conda-forge
    toml                      0.10.2                   pypi_0    pypi
    toolz                     0.11.1             pyhd3eb1b0_0    defaults
    tornado                   6.1              py38h497a2fe_1    conda-forge
-    tqdm                      4.56.0             pyhd8ed1ab_0    conda-forge
+    tqdm                      4.60.0                   pypi_0    pypi
    traitlets                 5.0.5                      py_0    conda-forge
    typed-ast                 1.4.2                    pypi_0    pypi
-    typing-extensions         3.7.4.3              hd3eb1b0_0    defaults
+    typing-extensions         3.7.4.3                  pypi_0    pypi
-    typing_extensions         3.7.4.3            pyh06a4308_0    defaults
+    typing_extensions         3.10.0.2           pyh06a4308_0    defaults
-    urllib3                   1.26.2             pyhd8ed1ab_0    conda-forge
+    urllib3                   1.25.11                  pypi_0    pypi
    userpath                  1.4.2                    pypi_0    pypi
+    wcmatch                   8.2                      pypi_0    pypi
    wcwidth                   0.2.5              pyh9f0ad1d_2    conda-forge
    webencodings              0.5.1                      py_1    conda-forge
    webob                     1.8.7              pyhd3eb1b0_0    defaults
-    werkzeug                  1.0.1              pyhd3eb1b0_0    defaults
+    werkzeug                  2.0.1              pyhd3eb1b0_0    defaults
    wheel                     0.36.2             pyhd3deb0d_0    conda-forge
    wrapt                     1.12.1           py38h7b6447c_1    defaults
-    xarray                    0.18.0             pyhd3eb1b0_1    defaults
+    xarray                    0.19.0             pyhd3eb1b0_1    defaults
-    xhistogram                0.1.2              pyhd8ed1ab_0    conda-forge
+    xhistogram                0.3.0              pyhd8ed1ab_0    conda-forge
-    xskillscore               0.0.20             pyhd8ed1ab_1    conda-forge
+    xskillscore               0.0.23             pyhd8ed1ab_0    conda-forge
    xz                        5.2.5                h516909a_1    conda-forge
+    yagup                     0.1.1                    pypi_0    pypi
    yaml                      0.2.5                h516909a_0    conda-forge
    yarl                      1.6.3            py38h27cfd23_0    defaults
    zarr                      2.8.1              pyhd3eb1b0_0    defaults
    zeromq                    4.3.3                h58526e2_3    conda-forge
    zict                      2.0.0              pyhd3eb1b0_0    defaults
    zipp                      3.4.0                      py_0    conda-forge
    zlib                      1.2.11            h516909a_1010    conda-forge
    zstd                      1.4.9                haebb681_0    defaults
 %% Cell type:code id: tags:
 ``` python
 ```

--- a/notebooks/RPSS_verification.ipynb
+++ b/notebooks/RPSS_verification.ipynb
--- a/notebooks/mean_bias_reduction.ipynb
+++ b/notebooks/mean_bias_reduction.ipynb
 %% Cell type:markdown id: tags:
 # Train ML model to correct predictions of week 3-4 & 5-6
 This notebook create a Machine Learning `ML_model` to predict weeks 3-4 & 5-6 based on `S2S` weeks 3-4 & 5-6 forecasts and is compared to `CPC` observations for the [`s2s-ai-challenge`](https://s2s-ai-challenge.github.io/).
 %% Cell type:markdown id: tags:
 # Synopsis
 %% Cell type:markdown id: tags:
 ## Method: `mean bias reduction`
 - calculate the mean bias from 2000-2019 deterministic ensemble mean forecast
 - remove that mean bias from 2020 forecast deterministic ensemble mean forecast
 - no Machine Learning used here
 %% Cell type:markdown id: tags:
 ## Data used
 type: renku datasets
 Training-input for Machine Learning model:
 - hindcasts of models:
    - ECMWF: `ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr`
 Forecast-input for Machine Learning model:
 - real-time 2020 forecasts of models:
    - ECMWF: `ecmwf_forecast-input_2020_biweekly_deterministic.zarr`
 Compare Machine Learning model forecast against against ground truth:
 - `CPC` observations:
    - `hindcast-like-observations_biweekly_deterministic.zarr`
    - `forecast-like-observations_2020_biweekly_deterministic.zarr`
 %% Cell type:markdown id: tags:
 ## Resources used
 for training, details in reproducibility
 - platform: MPI-M supercompute 1 Node
 - memory: 64 GB
 - processors: 36 CPU
 - storage required: 10 GB
 %% Cell type:markdown id: tags:
 ## Safeguards
 All points have to be [x] checked. If not, your submission is invalid.
 Changes to the code after submissions are not possible, as the `commit` before the `tag` will be reviewed.
 (Only in exceptions and if previous effort in reproducibility can be found, it may be allowed to improve readability and reproducibility after November 1st 2021.)
 %% Cell type:markdown id: tags:
 ### Safeguards to prevent [overfitting](https://en.wikipedia.org/wiki/Overfitting?wprov=sfti1)
 If the organizers suspect overfitting, your contribution can be disqualified.
  - [x] We didnt use 2020 observations in training (explicit overfitting and cheating)
  - [x] We didnt repeatedly verify my model on 2020 observations and incrementally improved my RPSS (implicit overfitting)
  - [x] We provide RPSS scores for the training period with script `skill_by_year`, see in section 6.3 `predict`.
  - [x] We tried our best to prevent [data leakage](https://en.wikipedia.org/wiki/Leakage_(machine_learning)?wprov=sfti1).
  - [x] We honor the `train-validate-test` [split principle](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets). This means that the hindcast data is split into `train` and `validate`, whereas `test` is withheld.
  - [x] We did use `test` explicitly in training or implicitly in incrementally adjusting parameters.
  - [x] We considered [cross-validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)).
 %% Cell type:markdown id: tags:
 ### Safeguards for Reproducibility
 Notebook/code must be independently reproducible from scratch by the organizers (after the competition), if not possible: no prize
  - [x] All training data is publicly available (no pre-trained private neural networks, as they are not reproducible for us)
  - [x] Code is well documented, readable and reproducible.
  - [x] Code to reproduce training and predictions is preferred to run within a day on the described architecture. If the training takes longer than a day, please justify why this is needed. Please do not submit training piplelines, which take weeks to train.
 %% Cell type:markdown id: tags:
 # Imports
 %% Cell type:code id: tags:
 ``` python
 import xarray as xr
 xr.set_options(display_style='text')
 ```
 %% Output
-    <xarray.core.options.set_options at 0x2b37fc26ec50>
+    <xarray.core.options.set_options at 0x7f05cc486340>
 %% Cell type:markdown id: tags:
 # Get training data
 preprocessing of input data may be done in separate notebook/script
 %% Cell type:markdown id: tags:
 ## Hindcast
 get weekly initialized hindcasts
 %% Cell type:code id: tags:
 ``` python
 # preprocessed as renku dataset
 !renku storage pull ../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr
 ```
+%% Output
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
 %% Cell type:code id: tags:
 ``` python
 hind_2000_2019 = xr.open_zarr("../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr", consolidated=True)
 ```
 %% Cell type:code id: tags:
 ``` python
 # preprocessed as renku dataset
 !renku storage pull ../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr
 ```
+%% Output
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
 %% Cell type:code id: tags:
 ``` python
 fct_2020 = xr.open_zarr("../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr", consolidated=True)
 ```
 %% Cell type:markdown id: tags:
 ## Observations
 corresponding to hindcasts
 %% Cell type:code id: tags:
 ``` python
 # preprocessed as renku dataset
 !renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
 ```
+%% Output
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
 %% Cell type:code id: tags:
 ``` python
 obs_2000_2019 = xr.open_zarr("../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr", consolidated=True)
 ```
 %% Cell type:code id: tags:
 ``` python
 # preprocessed as renku dataset
 !renku storage pull ../data/forecast-like-observations_2020_biweekly_deterministic.zarr
 ```
+%% Output
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
 %% Cell type:code id: tags:
 ``` python
 obs_2020 = xr.open_zarr("../data/forecast-like-observations_2020_biweekly_deterministic.zarr", consolidated=True)
 ```
 %% Cell type:markdown id: tags:
 # no ML model
 %% Cell type:markdown id: tags:
 Here, we just remove the mean bias from the ensemble mean forecast.
 %% Cell type:code id: tags:
 ``` python
-bias_2000_2019 = (hind_2000_2019.mean('realization') - obs_2000_2019).groupby('forecast_time.weekofyear').mean().compute()
+from scripts import add_year_week_coords
+obs_2000_2019 = add_year_week_coords(obs_2000_2019)
+hind_2000_2019 = add_year_week_coords(hind_2000_2019)
+```
+%% Cell type:code id: tags:
+``` python
+bias_2000_2019 = (hind_2000_2019.mean('realization') - obs_2000_2019).groupby('week').mean().compute()
 ```
 %% Output
-    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
-      FutureWarning,
-    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
 %% Cell type:markdown id: tags:
 ## `predict`
 Create predictions and print `mean(variable, lead_time, longitude, weighted latitude)` RPSS for all years as calculated by `skill_by_year`.
 %% Cell type:code id: tags:
 ``` python
 from scripts import make_probabilistic
 ```
-%% Output
-    WARNING: ecmwflibs universal: cannot find a library called MagPlus
-    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/climetlab/plotting/drivers/magics/actions.py:36: UserWarning: Magics library could not be found
-      warnings.warn(str(e))
 %% Cell type:code id: tags:
 ``` python
 !renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc
 ```
+%% Output
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
 %% Cell type:code id: tags:
 ``` python
 tercile_file = f'../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc'
 tercile_edges = xr.open_dataset(tercile_file)
 ```
 %% Cell type:code id: tags:
 ``` python
 def create_predictions(fct, bias):
-    preds = fct - bias.sel(weekofyear=fct.forecast_time.dt.weekofyear)
+    if 'week' not in fct.coords:
+        fct = add_year_week_coords(fct)
+    preds = fct - bias.sel(week=fct.week)
    preds = make_probabilistic(preds, tercile_edges)
    return preds.astype('float32')
 ```
 %% Cell type:markdown id: tags:
 ### `predict` training period in-sample
 %% Cell type:code id: tags:
 ``` python
 !renku storage pull ../data/forecast-like-observations_2020_biweekly_terciled.nc
 ```
+%% Output
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
 %% Cell type:code id: tags:
 ``` python
 !renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr
 ```
+%% Output
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
 %% Cell type:code id: tags:
 ``` python
 preds_is = create_predictions(hind_2000_2019, bias_2000_2019).compute()
 ```
-%% Output
-    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
-      FutureWarning,
-    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
-      FutureWarning,
 %% Cell type:code id: tags:
 ``` python
 from scripts import skill_by_year
-skill_by_year(preds_is)
 ```
-%% Output
+%% Cell type:code id: tags:
-              RPSS
+``` python
-    year
+skill_by_year(preds_is)
-    2000 -0.141857
+```
-    2001 -0.203405
-    2002 -0.202549
-    2003 -0.206234
-    2004 -0.549463
-    2005 -0.168421
-    2006 -0.184515
-    2007 -0.616939
-    2008 -0.195251
-    2009 -0.202809
-    2010 -0.189126
-    2011 -0.678302
-    2012 -0.620137
-    2013 -0.202285
-    2014 -0.206982
-    2015 -0.172498
-    2016 -0.136464
-    2017 -0.638293
-    2018 -0.667205
-    2019 -0.180896
 %% Cell type:markdown id: tags:
 ### `predict` test
 %% Cell type:code id: tags:
 ``` python
 preds_test = create_predictions(fct_2020, bias_2000_2019)
 ```
-%% Output
-    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
-      FutureWarning,
-    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
-      FutureWarning,
 %% Cell type:code id: tags:
 ``` python
 skill_by_year(preds_test)
 ```
-%% Output
-    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
-      x = np.divide(x1, x2, out)
-              RPSS
-    year
-    2020 -0.093422
 %% Cell type:markdown id: tags:
 # Submission
 %% Cell type:code id: tags:
 ``` python
 from scripts import assert_predictions_2020
 assert_predictions_2020(preds_test)
 ```
 %% Cell type:code id: tags:
 ``` python
 preds_test.attrs = {'author': 'Aaron Spring', 'author_email': 'aaron.spring@mpimet.mpg.de',
               'comment': 'created for the s2s-ai-challenge as a template for the website',
               'notebook': 'mean_bias_reduction.ipynb',
               'website': 'https://s2s-ai-challenge.github.io/#evaluation'}
 html_repr = xr.core.formatting_html.dataset_repr(preds_test)
 with open('submission_template_repr.html', 'w') as myFile:
    myFile.write(html_repr)
 ```
 %% Cell type:code id: tags:
 ``` python
 preds_test.to_netcdf('../submissions/ML_prediction_2020.nc')
 ```
 %% Cell type:code id: tags:
 ``` python
-#!git add ../submissions/ML_prediction_2020.nc
+# !git add ../submissions/ML_prediction_2020.nc
-#!git add mean_bias_reduction.ipynb
+# !git add mean_bias_reduction.ipynb
 ```
 %% Cell type:code id: tags:
 ``` python
 #!git commit -m "template_test no ML mean bias reduction" # whatever message you want
 ```
 %% Cell type:code id: tags:
 ``` python
-#!git tag "submission-no_ML_mean_bias_reduction-0.0.1" # if this is to be checked by scorer, only the last submitted==tagged version will be considered
+#!git tag "submission-no_ML_mean_bias_reduction-0.0.2" # if this is to be checked by scorer, only the last submitted==tagged version will be considered
 ```
 %% Cell type:code id: tags:
 ``` python
 #!git push --tags
 ```
 %% Cell type:markdown id: tags:
 # Reproducibility
 %% Cell type:markdown id: tags:
 ## memory
 %% Cell type:code id: tags:
 ``` python
 # https://phoenixnap.com/kb/linux-commands-check-memory-usage
 !free -g
 ```
-%% Output
-                 total       used       free     shared    buffers     cached
-    Mem:            62         15         46          0          0          5
-    -/+ buffers/cache:         10         52
-    Swap:            0          0          0
 %% Cell type:markdown id: tags:
 ## CPU
 %% Cell type:code id: tags:
 ``` python
 !lscpu
 ```
-%% Output
-    Architecture:          x86_64
-    CPU op-mode(s):        32-bit, 64-bit
-    Byte Order:            Little Endian
-    CPU(s):                72
-    On-line CPU(s) list:   0-71
-    Thread(s) per core:    2
-    Core(s) per socket:    18
-    Socket(s):             2
-    NUMA node(s):          2
-    Vendor ID:             GenuineIntel
-    CPU family:            6
-    Model:                 79
-    Model name:            Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10GHz
-    Stepping:              1
-    CPU MHz:               2100.000
-    BogoMIPS:              4190.01
-    Virtualization:        VT-x
-    L1d cache:             32K
-    L1i cache:             32K
-    L2 cache:              256K
-    L3 cache:              46080K
-    NUMA node0 CPU(s):     0-17,36-53
-    NUMA node1 CPU(s):     18-35,54-71
 %% Cell type:markdown id: tags:
 ## software
 %% Cell type:code id: tags:
 ``` python
 !conda list
 ```
-%% Output
-    # packages in environment at /work/mh0727/m300524/conda-envs/s2s-ai:
-    #
-    # Name                    Version                   Build  Channel
-    _libgcc_mutex             0.1                        main
-    _tflow_select             2.3.0                       mkl
-    absl-py                   0.12.0           py37h06a4308_0
-    aiobotocore               1.2.2              pyhd3eb1b0_0
-    aiohttp                   3.7.4            py37h27cfd23_1
-    aioitertools              0.7.1              pyhd3eb1b0_0
-    anyio                     2.2.0                    pypi_0    pypi
-    appdirs                   1.4.4                      py_0
-    argcomplete               1.12.2                   pypi_0    pypi
-    argon2-cffi               20.1.0           py37h27cfd23_1
-    asciitree                 0.3.3                      py_2
-    astunparse                1.6.3                      py_0
-    async-timeout             3.0.1            py37h06a4308_0
-    async_generator           1.10             py37h28b3542_0
-    attrs                     20.2.0                   pypi_0    pypi
-    babel                     2.9.0                    pypi_0    pypi
-    backcall                  0.2.0              pyhd3eb1b0_0
-    backrefs                  5.0.1                    pypi_0    pypi
-    bagit                     1.8.1                    pypi_0    pypi
-    beautifulsoup4            4.9.3              pyha847dfd_0
-    black                     20.8b1                   pypi_0    pypi
-    blas                      1.0                         mkl
-    bleach                    3.3.0              pyhd3eb1b0_0
-    blinker                   1.4              py37h06a4308_0
-    bokeh                     2.3.0            py37h06a4308_0
-    botocore                  1.20.33            pyhd3eb1b0_1
-    bottleneck                1.3.2            py37heb32a55_1
-    bracex                    2.1.1                    pypi_0    pypi
-    branca                    0.3.1                    pypi_0    pypi
-    brotlipy                  0.7.0           py37h27cfd23_1003
-    bzip2                     1.0.8                h7b6447c_0
-    c-ares                    1.17.1               h27cfd23_0
-    ca-certificates           2021.1.19            h06a4308_1
-    cachecontrol              0.11.7                   pypi_0    pypi
-    cachetools                4.2.1              pyhd3eb1b0_0
-    calamus                   0.3.7                    pypi_0    pypi
-    cdsapi                    0.5.1                    pypi_0    pypi
-    certifi                   2020.12.5        py37h06a4308_0
-    cffi                      1.14.5           py37h261ae71_0
-    cfgrib                    0.9.8.5            pyhd8ed1ab_0    conda-forge
-    cftime                    1.4.1            py37h6323ea4_0
-    chardet                   3.0.4           py37h06a4308_1003
-    click                     7.1.2              pyhd3eb1b0_0
-    click-completion          0.5.2                    pypi_0    pypi
-    click-plugins             1.1.1                    pypi_0    pypi
-    climetlab                 0.8.0                    pypi_0    pypi
-    climetlab-s2s-ai-challenge 0.6.7                    pypi_0    pypi
-    climetlab-s2s-ai-competition 0.3.7                    pypi_0    pypi
-    cloudpickle               1.6.0                      py_0
-    colorama                  0.4.4                    pypi_0    pypi
-    coloredlogs               15.0                     pypi_0    pypi
-    commonmark                0.9.1                    pypi_0    pypi
-    configargparse            1.4                      pypi_0    pypi
-    coverage                  5.5              py37h27cfd23_2
-    cryptography              3.4.6            py37hd23ed53_0
-    curl                      7.71.1               hbc83047_1
-    cwlgen                    0.4.2                    pypi_0    pypi
-    cwltool                   3.0.20210319143721          pypi_0    pypi
-    cycler                    0.10.0                   py37_0
-    cython                    0.29.22          py37h2531618_0
-    cytoolz                   0.11.0           py37h7b6447c_0
-    dask                      2021.3.0                 pypi_0    pypi
-    dask-labextension         5.0.1                    pypi_0    pypi
-    dbus                      1.13.18              hb2f20db_0
-    decorator                 4.4.2              pyhd3eb1b0_0
-    defusedxml                0.7.1              pyhd3eb1b0_0
-    distributed               2021.3.0         py37h06a4308_0
-    docopt                    0.6.2            py37h06a4308_0
-    eccodes                   1.2.0                    pypi_0    pypi
-    ecmwf-api-client          1.6.1                    pypi_0    pypi
-    ecmwflibs                 0.2.3                    pypi_0    pypi
-    entrypoints               0.3                      py37_0
-    environ-config            20.1.0                   pypi_0    pypi
-    expat                     2.2.10               he6710b0_2
-    fasteners                 0.16               pyhd3eb1b0_0
-    fastprogress              1.0.0                      py_0    conda-forge
-    filelock                  3.0.12                   pypi_0    pypi
-    folium                    0.12.1                   pypi_0    pypi
-    fontconfig                2.13.1               h6c09931_0
-    freetype                  2.10.4               h5ab3b9f_0
-    frozendict                1.2                      pypi_0    pypi
-    fsspec                    0.8.7              pyhd3eb1b0_0
-    gast                      0.4.0                      py_0
-    gitdb                     4.0.6                    pypi_0    pypi
-    gitpython                 3.1.12                   pypi_0    pypi
-    glib                      2.67.4               h36276a3_1
-    google-auth               1.28.0             pyhd3eb1b0_0
-    google-auth-oauthlib      0.4.3              pyhd3eb1b0_0
-    google-pasta              0.2.0                      py_0
-    grpcio                    1.36.1           py37h2157cd5_1
-    gst-plugins-base          1.14.0               h8213a91_2
-    gstreamer                 1.14.0               h28cd5cc_2
-    h5netcdf                  0.10.0             pyhd8ed1ab_0    conda-forge
-    h5py                      2.10.0           py37h7918eee_0
-    hdf4                      4.2.13               h3ca952b_2
-    hdf5                      1.10.4               hb1b8bf9_0
-    heapdict                  1.0.1                      py_0
-    humanfriendly             9.1                      pypi_0    pypi
-    humanize                  2.6.0                    pypi_0    pypi
-    icu                       58.2                 he6710b0_3
-    idna                      2.10               pyhd3eb1b0_0
-    importlib-metadata        3.7.3            py37h06a4308_1
-    importlib_metadata        3.7.3                hd3eb1b0_1
-    intake                    0.6.2              pyhd3eb1b0_0
-    intake-esm                2020.8.15                  py_0    conda-forge
-    intake-xarray             0.5.0              pyhd3eb1b0_0
-    intel-openmp              2020.2                      254
-    ipykernel                 5.3.4            py37h5ca1d4c_0
-    ipython                   7.21.0           py37hb070fc8_0
-    ipython_genutils          0.2.0                      py_1    conda-forge
-    isodate                   0.6.0                    pypi_0    pypi
-    jasper                    1.900.1              hd497a04_4
-    jedi                      0.17.2           py37h06a4308_1
-    jinja2                    2.11.3             pyhd3eb1b0_0
-    jmespath                  0.10.0                     py_0
-    joblib                    1.0.1              pyhd3eb1b0_0
-    jpeg                      9d                   h36c2ea0_0    conda-forge
-    json5                     0.9.5                    pypi_0    pypi
-    jsonschema                3.2.0                      py_2
-    jupyter-packaging         0.7.12                   pypi_0    pypi
-    jupyter-server            1.5.1                    pypi_0    pypi
-    jupyter-server-proxy      3.0.2                    pypi_0    pypi
-    jupyter_client            6.1.12             pyhd8ed1ab_0    conda-forge
-    jupyter_core              4.7.1            py37h89c1867_0    conda-forge
-    jupyterlab                3.0.12                   pypi_0    pypi
-    jupyterlab-server         2.3.0                    pypi_0    pypi
-    jupyterlab_pygments       0.1.2                      py_0
-    keras-preprocessing       1.1.2              pyhd3eb1b0_0
-    kiwisolver                1.3.1            py37h2531618_0
-    krb5                      1.18.2               h173b8e3_0
-    lazy-object-proxy         1.6.0                    pypi_0    pypi
-    lcms2                     2.11                 h396b838_0
-    ld_impl_linux-64          2.33.1               h53a641e_7
-    libaec                    1.0.4                he6710b0_1
-    libcurl                   7.71.1               h20c2e04_1
-    libedit                   3.1.20210216         h27cfd23_1
-    libffi                    3.3                  he6710b0_2
-    libgcc-ng                 9.1.0                hdf63c60_0
-    libgfortran-ng            7.3.0                hdf63c60_0
-    libllvm10                 10.0.1               hbcb73fb_5
-    libnetcdf                 4.6.2             hbdf4f91_1001    conda-forge
-    libpng                    1.6.37               hbc83047_0
-    libprotobuf               3.14.0               h8c45485_0
-    libsodium                 1.0.18               h36c2ea0_1    conda-forge
-    libssh2                   1.9.0                h1ba5d50_1
-    libstdcxx-ng              9.1.0                hdf63c60_0
-    libtiff                   4.2.0                h85742a9_0
-    libuuid                   1.0.3                h1bed415_2
-    libwebp-base              1.2.0                h27cfd23_0
-    libxcb                    1.14                 h7b6447c_0
-    libxml2                   2.9.10               hb55368b_3
-    llvmlite                  0.36.0           py37h612dafd_4
-    locket                    0.2.1            py37h06a4308_1
-    lockfile                  0.12.2                   pypi_0    pypi
-    lxml                      4.6.3                    pypi_0    pypi
-    lz4-c                     1.9.3                h2531618_0
-    magics                    1.5.6                    pypi_0    pypi
-    markdown                  3.3.4            py37h06a4308_0
-    markupsafe                1.1.1            py37h14c3975_1
-    marshmallow               3.10.0                   pypi_0    pypi
-    matplotlib                3.3.4            py37h06a4308_0
-    matplotlib-base           3.3.4            py37h62a2d02_0
-    mistune                   0.8.4           py37h14c3975_1001
-    mkl                       2020.2                      256
-    mkl-service               2.3.0            py37he8ac12f_0
-    mkl_fft                   1.3.0            py37h54f3939_0
-    mkl_random                1.1.1            py37h0573a6f_0
-    monotonic                 1.5                        py_0
-    msgpack-python            1.0.2            py37hff7bd54_1
-    multidict                 5.1.0            py37h27cfd23_2
-    mypy-extensions           0.4.3                    pypi_0    pypi
-    nb-black                  1.0.7                    pypi_0    pypi
-    nb_conda_kernels          2.3.1            py37h06a4308_0
-    nbclassic                 0.2.6                    pypi_0    pypi
-    nbclient                  0.5.3              pyhd3eb1b0_0
-    nbconvert                 6.0.7                    py37_0
-    nbformat                  5.1.2              pyhd3eb1b0_1
-    ncurses                   6.2                  he6710b0_1
-    ndg-httpsclient           0.5.1                    pypi_0    pypi
-    nest-asyncio              1.5.1              pyhd3eb1b0_0
-    netcdf4                   1.5.1            py37had58050_0    conda-forge
-    networkx                  2.5                      pypi_0    pypi
-    notebook                  6.3.0            py37h06a4308_0
-    numba                     0.53.0           py37ha9443f7_0
-    numcodecs                 0.7.3            py37h2531618_0
-    numpy                     1.19.2           py37h54aff64_0
-    numpy-base                1.19.2           py37hfa32c7d_0
-    oauthlib                  3.1.0                      py_0
-    olefile                   0.46                     py37_0
-    openssl                   1.1.1k               h27cfd23_0
-    opt_einsum                3.1.0                      py_0
-    owlrl                     5.2.1                    pypi_0    pypi
-    packaging                 20.9               pyhd3eb1b0_0
-    pandas                    1.2.3            py37ha9443f7_0
-    pandoc                    2.12                 h06a4308_0
-    pandocfilters             1.4.3            py37h06a4308_1
-    parso                     0.7.0                      py_0
-    partd                     1.1.0                      py_0
-    pathspec                  0.8.0                    pypi_0    pypi
-    patool                    1.12                     pypi_0    pypi
-    pcre                      8.44                 he6710b0_0
-    pdbufr                    0.8.2                    pypi_0    pypi
-    pexpect                   4.8.0              pyhd3eb1b0_3
-    pickleshare               0.7.5           pyhd3eb1b0_1003
-    pillow                    8.1.2            py37he98fc37_0
-    pip                       21.0.1           py37h06a4308_0
-    pluggy                    0.13.1                   pypi_0    pypi
-    portalocker               2.2.1                    pypi_0    pypi
-    prometheus_client         0.9.0              pyhd3eb1b0_0
-    prompt-toolkit            3.0.17             pyh06a4308_0
-    properscoring             0.1                        py_0    conda-forge
-    protobuf                  3.14.0           py37h2531618_1
-    prov                      1.5.1                    pypi_0    pypi
-    psutil                    5.7.2                    pypi_0    pypi
-    ptyprocess                0.7.0              pyhd3eb1b0_2
-    pyasn1                    0.4.8                      py_0
-    pyasn1-modules            0.2.8                      py_0
-    pycparser                 2.20                       py_2
-    pydap                     3.2.2           pyh9f0ad1d_1001    conda-forge
-    pydot                     1.4.2                    pypi_0    pypi
-    pygments                  2.8.1              pyhd3eb1b0_0
-    pyjwt                     2.0.0                    pypi_0    pypi
-    pyld                      2.0.3                    pypi_0    pypi
-    pyodc                     1.0.3                    pypi_0    pypi
-    pyopenssl                 19.1.0                   pypi_0    pypi
-    pyparsing                 2.4.7              pyhd3eb1b0_0
-    pyqt                      5.9.2            py37h05f1152_2
-    pyrsistent                0.17.3           py37h7b6447c_0
-    pyshacl                   0.11.3.post1             pypi_0    pypi
-    pysocks                   1.7.1                    py37_1
-    python                    3.7.10               hdb3f193_0
-    python-dateutil           2.8.1              pyhd3eb1b0_0
-    python-editor             1.0.4                    pypi_0    pypi
-    python-flatbuffers        1.12               pyhd3eb1b0_0
-    python_abi                3.7                     1_cp37m    conda-forge
-    pytz                      2021.1             pyhd3eb1b0_0
-    pyyaml                    5.3.1                    pypi_0    pypi
-    pyzmq                     19.0.2           py37hac76be4_2    conda-forge
-    qt                        5.9.7                h5867ecd_1
-    rdflib                    5.0.0                    pypi_0    pypi
-    rdflib-jsonld             0.5.0                    pypi_0    pypi
-    readline                  8.1                  h27cfd23_0
-    rechunker                 0.3.3                    pypi_0    pypi
-    regex                     2021.3.17                pypi_0    pypi
-    renku                     0.14.1                   pypi_0    pypi
-    requests                  2.24.0                   pypi_0    pypi
-    requests-oauthlib         1.3.0                      py_0
-    rich                      9.3.0                    pypi_0    pypi
-    rsa                       4.7.2              pyhd3eb1b0_1
-    ruamel-yaml               0.16.5                   pypi_0    pypi
-    ruamel-yaml-clib          0.2.2                    pypi_0    pypi
-    s3fs                      0.5.2              pyhd3eb1b0_0
-    schema-salad              7.1.20210316164414          pypi_0    pypi
-    scikit-learn              0.24.1           py37ha9443f7_0
-    scipy                     1.6.1            py37h91f5cce_0
-    send2trash                1.5.0              pyhd3eb1b0_1
-    setuptools                52.0.0           py37h06a4308_0
-    setuptools-scm            4.1.2                    pypi_0    pypi
-    shellescape               3.4.1                    pypi_0    pypi
-    shellingham               1.4.0                    pypi_0    pypi
-    simpervisor               0.4                      pypi_0    pypi
-    sip                       4.19.8           py37hf484d3e_0
-    six                       1.15.0           py37h06a4308_0
-    smmap                     3.0.5                    pypi_0    pypi
-    sniffio                   1.2.0                    pypi_0    pypi
-    sortedcontainers          2.3.0              pyhd3eb1b0_0
-    soupsieve                 2.2.1              pyhd3eb1b0_0
-    sqlite                    3.35.2               hdfb4753_0
-    tabulate                  0.8.7                    pypi_0    pypi
-    tbb                       2020.3               hfd86e86_0
-    tblib                     1.7.0                      py_0
-    tensorboard               2.4.0              pyhc547734_0
-    tensorboard-plugin-wit    1.6.0                      py_0
-    tensorflow                2.4.1           mkl_py37h2d14ff2_0
-    tensorflow-base           2.4.1           mkl_py37h43e0292_0
-    tensorflow-estimator      2.4.1              pyheb71bc4_0
-    termcolor                 1.1.0            py37h06a4308_1
-    terminado                 0.9.3            py37h06a4308_0
-    testpath                  0.4.4              pyhd3eb1b0_0
-    threadpoolctl             2.1.0              pyh5ca1d4c_0
-    tk                        8.6.10               hbc83047_0
-    toml                      0.10.2                   pypi_0    pypi
-    toolz                     0.11.1             pyhd3eb1b0_0
-    tornado                   6.1              py37h27cfd23_0
-    tqdm                      4.48.2                   pypi_0    pypi
-    traitlets                 5.0.5                      py_0    conda-forge
-    typed-ast                 1.4.2                    pypi_0    pypi
-    typing-extensions         3.7.4.3              hd3eb1b0_0
-    typing_extensions         3.7.4.3            pyh06a4308_0
-    urllib3                   1.25.11                  pypi_0    pypi
-    wcmatch                   6.1                      pypi_0    pypi
-    wcwidth                   0.2.5                      py_0
-    webencodings              0.5.1                    py37_1
-    webob                     1.8.7              pyhd3eb1b0_0
-    werkzeug                  1.0.1              pyhd3eb1b0_0
-    wheel                     0.36.2             pyhd3eb1b0_0
-    wrapt                     1.12.1           py37h7b6447c_1
-    xarray                    0.17.0             pyhd3eb1b0_0
-    xhistogram                0.1.2              pyhd8ed1ab_0    conda-forge
-    xskillscore               0.0.20                   pypi_0    pypi
-    xz                        5.2.5                h7b6447c_0
-    yaml                      0.2.5                h7b6447c_0
-    yarl                      1.6.3            py37h27cfd23_0
-    zarr                      2.6.1              pyhd3eb1b0_0
-    zeromq                    4.3.4                h2531618_0
-    zict                      2.0.0              pyhd3eb1b0_0
-    zipp                      3.4.1              pyhd3eb1b0_0
-    zlib                      1.2.11               h7b6447c_3
-    zstd                      1.4.5                h9ceee32_0
 %% Cell type:code id: tags:
 ``` python
 ```

 %% Cell type:markdown id: tags:
 # Train ML model to correct predictions of week 3-4 & 5-6
 This notebook create a Machine Learning `ML_model` to predict weeks 3-4 & 5-6 based on `S2S` weeks 3-4 & 5-6 forecasts and is compared to `CPC` observations for the [`s2s-ai-challenge`](https://s2s-ai-challenge.github.io/).
 %% Cell type:markdown id: tags:
 # Synopsis
 %% Cell type:markdown id: tags:
 ## Method: `mean bias reduction`
 - calculate the mean bias from 2000-2019 deterministic ensemble mean forecast
 - remove that mean bias from 2020 forecast deterministic ensemble mean forecast
 - no Machine Learning used here
 %% Cell type:markdown id: tags:
 ## Data used
 type: renku datasets
 Training-input for Machine Learning model:
 - hindcasts of models:
    - ECMWF: `ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr`
 Forecast-input for Machine Learning model:
 - real-time 2020 forecasts of models:
    - ECMWF: `ecmwf_forecast-input_2020_biweekly_deterministic.zarr`
 Compare Machine Learning model forecast against against ground truth:
 - `CPC` observations:
    - `hindcast-like-observations_biweekly_deterministic.zarr`
    - `forecast-like-observations_2020_biweekly_deterministic.zarr`
 %% Cell type:markdown id: tags:
 ## Resources used
 for training, details in reproducibility
 - platform: MPI-M supercompute 1 Node
 - memory: 64 GB
 - processors: 36 CPU
 - storage required: 10 GB
 %% Cell type:markdown id: tags:
 ## Safeguards
 All points have to be [x] checked. If not, your submission is invalid.
 Changes to the code after submissions are not possible, as the `commit` before the `tag` will be reviewed.
 (Only in exceptions and if previous effort in reproducibility can be found, it may be allowed to improve readability and reproducibility after November 1st 2021.)
 %% Cell type:markdown id: tags:
 ### Safeguards to prevent [overfitting](https://en.wikipedia.org/wiki/Overfitting?wprov=sfti1)
 If the organizers suspect overfitting, your contribution can be disqualified.
  - [x] We didnt use 2020 observations in training (explicit overfitting and cheating)
  - [x] We didnt repeatedly verify my model on 2020 observations and incrementally improved my RPSS (implicit overfitting)
  - [x] We provide RPSS scores for the training period with script `skill_by_year`, see in section 6.3 `predict`.
  - [x] We tried our best to prevent [data leakage](https://en.wikipedia.org/wiki/Leakage_(machine_learning)?wprov=sfti1).
  - [x] We honor the `train-validate-test` [split principle](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets). This means that the hindcast data is split into `train` and `validate`, whereas `test` is withheld.
  - [x] We did use `test` explicitly in training or implicitly in incrementally adjusting parameters.
  - [x] We considered [cross-validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)).
 %% Cell type:markdown id: tags:
 ### Safeguards for Reproducibility
 Notebook/code must be independently reproducible from scratch by the organizers (after the competition), if not possible: no prize
  - [x] All training data is publicly available (no pre-trained private neural networks, as they are not reproducible for us)
  - [x] Code is well documented, readable and reproducible.
  - [x] Code to reproduce training and predictions is preferred to run within a day on the described architecture. If the training takes longer than a day, please justify why this is needed. Please do not submit training piplelines, which take weeks to train.
 %% Cell type:markdown id: tags:
 # Imports
 %% Cell type:code id: tags:
 ``` python
 import xarray as xr
 xr.set_options(display_style='text')
 ```
 %% Output
-    <xarray.core.options.set_options at 0x2b37fc26ec50>
+    <xarray.core.options.set_options at 0x7f05cc486340>
 %% Cell type:markdown id: tags:
 # Get training data
 preprocessing of input data may be done in separate notebook/script
 %% Cell type:markdown id: tags:
 ## Hindcast
 get weekly initialized hindcasts
 %% Cell type:code id: tags:
 ``` python
 # preprocessed as renku dataset
 !renku storage pull ../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr
 ```
+%% Output
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
 %% Cell type:code id: tags:
 ``` python
 hind_2000_2019 = xr.open_zarr("../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr", consolidated=True)
 ```
 %% Cell type:code id: tags:
 ``` python
 # preprocessed as renku dataset
 !renku storage pull ../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr
 ```
+%% Output
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
 %% Cell type:code id: tags:
 ``` python
 fct_2020 = xr.open_zarr("../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr", consolidated=True)
 ```
 %% Cell type:markdown id: tags:
 ## Observations
 corresponding to hindcasts
 %% Cell type:code id: tags:
 ``` python
 # preprocessed as renku dataset
 !renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
 ```
+%% Output
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
 %% Cell type:code id: tags:
 ``` python
 obs_2000_2019 = xr.open_zarr("../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr", consolidated=True)
 ```
 %% Cell type:code id: tags:
 ``` python
 # preprocessed as renku dataset
 !renku storage pull ../data/forecast-like-observations_2020_biweekly_deterministic.zarr
 ```
+%% Output
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
 %% Cell type:code id: tags:
 ``` python
 obs_2020 = xr.open_zarr("../data/forecast-like-observations_2020_biweekly_deterministic.zarr", consolidated=True)
 ```
 %% Cell type:markdown id: tags:
 # no ML model
 %% Cell type:markdown id: tags:
 Here, we just remove the mean bias from the ensemble mean forecast.
 %% Cell type:code id: tags:
 ``` python
-bias_2000_2019 = (hind_2000_2019.mean('realization') - obs_2000_2019).groupby('forecast_time.weekofyear').mean().compute()
+from scripts import add_year_week_coords
+obs_2000_2019 = add_year_week_coords(obs_2000_2019)
+hind_2000_2019 = add_year_week_coords(hind_2000_2019)
+```
+%% Cell type:code id: tags:
+``` python
+bias_2000_2019 = (hind_2000_2019.mean('realization') - obs_2000_2019).groupby('week').mean().compute()
 ```
 %% Output
-    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
+    /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
-      FutureWarning,
-    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
 %% Cell type:markdown id: tags:
 ## `predict`
 Create predictions and print `mean(variable, lead_time, longitude, weighted latitude)` RPSS for all years as calculated by `skill_by_year`.
 %% Cell type:code id: tags:
 ``` python
 from scripts import make_probabilistic
 ```
-%% Output
-    WARNING: ecmwflibs universal: cannot find a library called MagPlus
-    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/climetlab/plotting/drivers/magics/actions.py:36: UserWarning: Magics library could not be found
-      warnings.warn(str(e))
 %% Cell type:code id: tags:
 ``` python
 !renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc
 ```
+%% Output
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
 %% Cell type:code id: tags:
 ``` python
 tercile_file = f'../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc'
 tercile_edges = xr.open_dataset(tercile_file)
 ```
 %% Cell type:code id: tags:
 ``` python
 def create_predictions(fct, bias):
-    preds = fct - bias.sel(weekofyear=fct.forecast_time.dt.weekofyear)
+    if 'week' not in fct.coords:
+        fct = add_year_week_coords(fct)
+    preds = fct - bias.sel(week=fct.week)
    preds = make_probabilistic(preds, tercile_edges)
    return preds.astype('float32')
 ```
 %% Cell type:markdown id: tags:
 ### `predict` training period in-sample
 %% Cell type:code id: tags:
 ``` python
 !renku storage pull ../data/forecast-like-observations_2020_biweekly_terciled.nc
 ```
+%% Output
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
 %% Cell type:code id: tags:
 ``` python
 !renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr
 ```
+%% Output
+    [33m[1mWarning: [0mRun CLI commands only from project's root directory.
+    [0m
 %% Cell type:code id: tags:
 ``` python
 preds_is = create_predictions(hind_2000_2019, bias_2000_2019).compute()
 ```
-%% Output
-    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
-      FutureWarning,
-    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
-      FutureWarning,
 %% Cell type:code id: tags:
 ``` python
 from scripts import skill_by_year
-skill_by_year(preds_is)
 ```
-%% Output
+%% Cell type:code id: tags:
-              RPSS
+``` python
-    year
+skill_by_year(preds_is)
-    2000 -0.141857
+```
-    2001 -0.203405
-    2002 -0.202549
-    2003 -0.206234
-    2004 -0.549463
-    2005 -0.168421
-    2006 -0.184515
-    2007 -0.616939
-    2008 -0.195251
-    2009 -0.202809
-    2010 -0.189126
-    2011 -0.678302
-    2012 -0.620137
-    2013 -0.202285
-    2014 -0.206982
-    2015 -0.172498
-    2016 -0.136464
-    2017 -0.638293
-    2018 -0.667205
-    2019 -0.180896
 %% Cell type:markdown id: tags:
 ### `predict` test
 %% Cell type:code id: tags:
 ``` python
 preds_test = create_predictions(fct_2020, bias_2000_2019)
 ```
-%% Output
-    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
-      FutureWarning,
-    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
-      FutureWarning,
 %% Cell type:code id: tags:
 ``` python
 skill_by_year(preds_test)
 ```
-%% Output
-    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
-      x = np.divide(x1, x2, out)
-              RPSS
-    year
-    2020 -0.093422
 %% Cell type:markdown id: tags:
 # Submission
 %% Cell type:code id: tags:
 ``` python
 from scripts import assert_predictions_2020
 assert_predictions_2020(preds_test)
 ```
 %% Cell type:code id: tags:
 ``` python
 preds_test.attrs = {'author': 'Aaron Spring', 'author_email': 'aaron.spring@mpimet.mpg.de',
               'comment': 'created for the s2s-ai-challenge as a template for the website',
               'notebook': 'mean_bias_reduction.ipynb',
               'website': 'https://s2s-ai-challenge.github.io/#evaluation'}
 html_repr = xr.core.formatting_html.dataset_repr(preds_test)
 with open('submission_template_repr.html', 'w') as myFile:
    myFile.write(html_repr)
 ```
 %% Cell type:code id: tags:
 ``` python
 preds_test.to_netcdf('../submissions/ML_prediction_2020.nc')
 ```
 %% Cell type:code id: tags:
 ``` python
-#!git add ../submissions/ML_prediction_2020.nc
+# !git add ../submissions/ML_prediction_2020.nc
-#!git add mean_bias_reduction.ipynb
+# !git add mean_bias_reduction.ipynb
 ```
 %% Cell type:code id: tags:
 ``` python
 #!git commit -m "template_test no ML mean bias reduction" # whatever message you want
 ```
 %% Cell type:code id: tags:
 ``` python
-#!git tag "submission-no_ML_mean_bias_reduction-0.0.1" # if this is to be checked by scorer, only the last submitted==tagged version will be considered
+#!git tag "submission-no_ML_mean_bias_reduction-0.0.2" # if this is to be checked by scorer, only the last submitted==tagged version will be considered
 ```
 %% Cell type:code id: tags:
 ``` python
 #!git push --tags
 ```
 %% Cell type:markdown id: tags:
 # Reproducibility
 %% Cell type:markdown id: tags:
 ## memory
 %% Cell type:code id: tags:
 ``` python
 # https://phoenixnap.com/kb/linux-commands-check-memory-usage
 !free -g
 ```
-%% Output
-                 total       used       free     shared    buffers     cached
-    Mem:            62         15         46          0          0          5
-    -/+ buffers/cache:         10         52
-    Swap:            0          0          0
 %% Cell type:markdown id: tags:
 ## CPU
 %% Cell type:code id: tags:
 ``` python
 !lscpu
 ```
-%% Output
-    Architecture:          x86_64
-    CPU op-mode(s):        32-bit, 64-bit
-    Byte Order:            Little Endian
-    CPU(s):                72
-    On-line CPU(s) list:   0-71
-    Thread(s) per core:    2
-    Core(s) per socket:    18
-    Socket(s):             2
-    NUMA node(s):          2
-    Vendor ID:             GenuineIntel
-    CPU family:            6
-    Model:                 79
-    Model name:            Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10GHz
-    Stepping:              1
-    CPU MHz:               2100.000
-    BogoMIPS:              4190.01
-    Virtualization:        VT-x
-    L1d cache:             32K
-    L1i cache:             32K
-    L2 cache:              256K
-    L3 cache:              46080K
-    NUMA node0 CPU(s):     0-17,36-53
-    NUMA node1 CPU(s):     18-35,54-71
 %% Cell type:markdown id: tags:
 ## software
 %% Cell type:code id: tags:
 ``` python
 !conda list
 ```
-%% Output
-    # packages in environment at /work/mh0727/m300524/conda-envs/s2s-ai:
-    #
-    # Name                    Version                   Build  Channel
-    _libgcc_mutex             0.1                        main
-    _tflow_select             2.3.0                       mkl
-    absl-py                   0.12.0           py37h06a4308_0
-    aiobotocore               1.2.2              pyhd3eb1b0_0
-    aiohttp                   3.7.4            py37h27cfd23_1
-    aioitertools              0.7.1              pyhd3eb1b0_0
-    anyio                     2.2.0                    pypi_0    pypi
-    appdirs                   1.4.4                      py_0
-    argcomplete               1.12.2                   pypi_0    pypi
-    argon2-cffi               20.1.0           py37h27cfd23_1
-    asciitree                 0.3.3                      py_2
-    astunparse                1.6.3                      py_0
-    async-timeout             3.0.1            py37h06a4308_0
-    async_generator           1.10             py37h28b3542_0
-    attrs                     20.2.0                   pypi_0    pypi
-    babel                     2.9.0                    pypi_0    pypi
-    backcall                  0.2.0              pyhd3eb1b0_0
-    backrefs                  5.0.1                    pypi_0    pypi
-    bagit                     1.8.1                    pypi_0    pypi
-    beautifulsoup4            4.9.3              pyha847dfd_0
-    black                     20.8b1                   pypi_0    pypi
-    blas                      1.0                         mkl
-    bleach                    3.3.0              pyhd3eb1b0_0
-    blinker                   1.4              py37h06a4308_0
-    bokeh                     2.3.0            py37h06a4308_0
-    botocore                  1.20.33            pyhd3eb1b0_1
-    bottleneck                1.3.2            py37heb32a55_1
-    bracex                    2.1.1                    pypi_0    pypi
-    branca                    0.3.1                    pypi_0    pypi
-    brotlipy                  0.7.0           py37h27cfd23_1003
-    bzip2                     1.0.8                h7b6447c_0
-    c-ares                    1.17.1               h27cfd23_0
-    ca-certificates           2021.1.19            h06a4308_1
-    cachecontrol              0.11.7                   pypi_0    pypi
-    cachetools                4.2.1              pyhd3eb1b0_0
-    calamus                   0.3.7                    pypi_0    pypi
-    cdsapi                    0.5.1                    pypi_0    pypi
-    certifi                   2020.12.5        py37h06a4308_0
-    cffi                      1.14.5           py37h261ae71_0
-    cfgrib                    0.9.8.5            pyhd8ed1ab_0    conda-forge
-    cftime                    1.4.1            py37h6323ea4_0
-    chardet                   3.0.4           py37h06a4308_1003
-    click                     7.1.2              pyhd3eb1b0_0
-    click-completion          0.5.2                    pypi_0    pypi
-    click-plugins             1.1.1                    pypi_0    pypi
-    climetlab                 0.8.0                    pypi_0    pypi
-    climetlab-s2s-ai-challenge 0.6.7                    pypi_0    pypi
-    climetlab-s2s-ai-competition 0.3.7                    pypi_0    pypi
-    cloudpickle               1.6.0                      py_0
-    colorama                  0.4.4                    pypi_0    pypi
-    coloredlogs               15.0                     pypi_0    pypi
-    commonmark                0.9.1                    pypi_0    pypi
-    configargparse            1.4                      pypi_0    pypi
-    coverage                  5.5              py37h27cfd23_2
-    cryptography              3.4.6            py37hd23ed53_0
-    curl                      7.71.1               hbc83047_1
-    cwlgen                    0.4.2                    pypi_0    pypi
-    cwltool                   3.0.20210319143721          pypi_0    pypi
-    cycler                    0.10.0                   py37_0
-    cython                    0.29.22          py37h2531618_0
-    cytoolz                   0.11.0           py37h7b6447c_0
-    dask                      2021.3.0                 pypi_0    pypi
-    dask-labextension         5.0.1                    pypi_0    pypi
-    dbus                      1.13.18              hb2f20db_0
-    decorator                 4.4.2              pyhd3eb1b0_0
-    defusedxml                0.7.1              pyhd3eb1b0_0
-    distributed               2021.3.0         py37h06a4308_0
-    docopt                    0.6.2            py37h06a4308_0
-    eccodes                   1.2.0                    pypi_0    pypi
-    ecmwf-api-client          1.6.1                    pypi_0    pypi
-    ecmwflibs                 0.2.3                    pypi_0    pypi
-    entrypoints               0.3                      py37_0
-    environ-config            20.1.0                   pypi_0    pypi
-    expat                     2.2.10               he6710b0_2
-    fasteners                 0.16               pyhd3eb1b0_0
-    fastprogress              1.0.0                      py_0    conda-forge
-    filelock                  3.0.12                   pypi_0    pypi
-    folium                    0.12.1                   pypi_0    pypi
-    fontconfig                2.13.1               h6c09931_0
-    freetype                  2.10.4               h5ab3b9f_0
-    frozendict                1.2                      pypi_0    pypi
-    fsspec                    0.8.7              pyhd3eb1b0_0
-    gast                      0.4.0                      py_0
-    gitdb                     4.0.6                    pypi_0    pypi
-    gitpython                 3.1.12                   pypi_0    pypi
-    glib                      2.67.4               h36276a3_1
-    google-auth               1.28.0             pyhd3eb1b0_0
-    google-auth-oauthlib      0.4.3              pyhd3eb1b0_0
-    google-pasta              0.2.0                      py_0
-    grpcio                    1.36.1           py37h2157cd5_1
-    gst-plugins-base          1.14.0               h8213a91_2
-    gstreamer                 1.14.0               h28cd5cc_2
-    h5netcdf                  0.10.0             pyhd8ed1ab_0    conda-forge
-    h5py                      2.10.0           py37h7918eee_0
-    hdf4                      4.2.13               h3ca952b_2
-    hdf5                      1.10.4               hb1b8bf9_0
-    heapdict                  1.0.1                      py_0
-    humanfriendly             9.1                      pypi_0    pypi
-    humanize                  2.6.0                    pypi_0    pypi
-    icu                       58.2                 he6710b0_3
-    idna                      2.10               pyhd3eb1b0_0
-    importlib-metadata        3.7.3            py37h06a4308_1
-    importlib_metadata        3.7.3                hd3eb1b0_1
-    intake                    0.6.2              pyhd3eb1b0_0
-    intake-esm                2020.8.15                  py_0    conda-forge
-    intake-xarray             0.5.0              pyhd3eb1b0_0
-    intel-openmp              2020.2                      254
-    ipykernel                 5.3.4            py37h5ca1d4c_0
-    ipython                   7.21.0           py37hb070fc8_0
-    ipython_genutils          0.2.0                      py_1    conda-forge
-    isodate                   0.6.0                    pypi_0    pypi
-    jasper                    1.900.1              hd497a04_4
-    jedi                      0.17.2           py37h06a4308_1
-    jinja2                    2.11.3             pyhd3eb1b0_0
-    jmespath                  0.10.0                     py_0
-    joblib                    1.0.1              pyhd3eb1b0_0
-    jpeg                      9d                   h36c2ea0_0    conda-forge
-    json5                     0.9.5                    pypi_0    pypi
-    jsonschema                3.2.0                      py_2
-    jupyter-packaging         0.7.12                   pypi_0    pypi
-    jupyter-server            1.5.1                    pypi_0    pypi
-    jupyter-server-proxy      3.0.2                    pypi_0    pypi
-    jupyter_client            6.1.12             pyhd8ed1ab_0    conda-forge
-    jupyter_core              4.7.1            py37h89c1867_0    conda-forge
-    jupyterlab                3.0.12                   pypi_0    pypi
-    jupyterlab-server         2.3.0                    pypi_0    pypi
-    jupyterlab_pygments       0.1.2                      py_0
-    keras-preprocessing       1.1.2              pyhd3eb1b0_0
-    kiwisolver                1.3.1            py37h2531618_0
-    krb5                      1.18.2               h173b8e3_0
-    lazy-object-proxy         1.6.0                    pypi_0    pypi
-    lcms2                     2.11                 h396b838_0
-    ld_impl_linux-64          2.33.1               h53a641e_7
-    libaec                    1.0.4                he6710b0_1
-    libcurl                   7.71.1               h20c2e04_1
-    libedit                   3.1.20210216         h27cfd23_1
-    libffi                    3.3                  he6710b0_2
-    libgcc-ng                 9.1.0                hdf63c60_0
-    libgfortran-ng            7.3.0                hdf63c60_0
-    libllvm10                 10.0.1               hbcb73fb_5
-    libnetcdf                 4.6.2             hbdf4f91_1001    conda-forge
-    libpng                    1.6.37               hbc83047_0
-    libprotobuf               3.14.0               h8c45485_0
-    libsodium                 1.0.18               h36c2ea0_1    conda-forge
-    libssh2                   1.9.0                h1ba5d50_1
-    libstdcxx-ng              9.1.0                hdf63c60_0
-    libtiff                   4.2.0                h85742a9_0
-    libuuid                   1.0.3                h1bed415_2
-    libwebp-base              1.2.0                h27cfd23_0
-    libxcb                    1.14                 h7b6447c_0
-    libxml2                   2.9.10               hb55368b_3
-    llvmlite                  0.36.0           py37h612dafd_4
-    locket                    0.2.1            py37h06a4308_1
-    lockfile                  0.12.2                   pypi_0    pypi
-    lxml                      4.6.3                    pypi_0    pypi
-    lz4-c                     1.9.3                h2531618_0
-    magics                    1.5.6                    pypi_0    pypi
-    markdown                  3.3.4            py37h06a4308_0
-    markupsafe                1.1.1            py37h14c3975_1
-    marshmallow               3.10.0                   pypi_0    pypi
-    matplotlib                3.3.4            py37h06a4308_0
-    matplotlib-base           3.3.4            py37h62a2d02_0
-    mistune                   0.8.4           py37h14c3975_1001
-    mkl                       2020.2                      256
-    mkl-service               2.3.0            py37he8ac12f_0
-    mkl_fft                   1.3.0            py37h54f3939_0
-    mkl_random                1.1.1            py37h0573a6f_0
-    monotonic                 1.5                        py_0
-    msgpack-python            1.0.2            py37hff7bd54_1
-    multidict                 5.1.0            py37h27cfd23_2
-    mypy-extensions           0.4.3                    pypi_0    pypi
-    nb-black                  1.0.7                    pypi_0    pypi
-    nb_conda_kernels          2.3.1            py37h06a4308_0
-    nbclassic                 0.2.6                    pypi_0    pypi
-    nbclient                  0.5.3              pyhd3eb1b0_0
-    nbconvert                 6.0.7                    py37_0
-    nbformat                  5.1.2              pyhd3eb1b0_1
-    ncurses                   6.2                  he6710b0_1
-    ndg-httpsclient           0.5.1                    pypi_0    pypi
-    nest-asyncio              1.5.1              pyhd3eb1b0_0
-    netcdf4                   1.5.1            py37had58050_0    conda-forge
-    networkx                  2.5                      pypi_0    pypi
-    notebook                  6.3.0            py37h06a4308_0
-    numba                     0.53.0           py37ha9443f7_0
-    numcodecs                 0.7.3            py37h2531618_0
-    numpy                     1.19.2           py37h54aff64_0
-    numpy-base                1.19.2           py37hfa32c7d_0
-    oauthlib                  3.1.0                      py_0
-    olefile                   0.46                     py37_0
-    openssl                   1.1.1k               h27cfd23_0
-    opt_einsum                3.1.0                      py_0
-    owlrl                     5.2.1                    pypi_0    pypi
-    packaging                 20.9               pyhd3eb1b0_0
-    pandas                    1.2.3            py37ha9443f7_0
-    pandoc                    2.12                 h06a4308_0
-    pandocfilters             1.4.3            py37h06a4308_1
-    parso                     0.7.0                      py_0
-    partd                     1.1.0                      py_0
-    pathspec                  0.8.0                    pypi_0    pypi
-    patool                    1.12                     pypi_0    pypi
-    pcre                      8.44                 he6710b0_0
-    pdbufr                    0.8.2                    pypi_0    pypi
-    pexpect                   4.8.0              pyhd3eb1b0_3
-    pickleshare               0.7.5           pyhd3eb1b0_1003
-    pillow                    8.1.2            py37he98fc37_0
-    pip                       21.0.1           py37h06a4308_0
-    pluggy                    0.13.1                   pypi_0    pypi
-    portalocker               2.2.1                    pypi_0    pypi
-    prometheus_client         0.9.0              pyhd3eb1b0_0
-    prompt-toolkit            3.0.17             pyh06a4308_0
-    properscoring             0.1                        py_0    conda-forge
-    protobuf                  3.14.0           py37h2531618_1
-    prov                      1.5.1                    pypi_0    pypi
-    psutil                    5.7.2                    pypi_0    pypi
-    ptyprocess                0.7.0              pyhd3eb1b0_2
-    pyasn1                    0.4.8                      py_0
-    pyasn1-modules            0.2.8                      py_0
-    pycparser                 2.20                       py_2
-    pydap                     3.2.2           pyh9f0ad1d_1001    conda-forge
-    pydot                     1.4.2                    pypi_0    pypi
-    pygments                  2.8.1              pyhd3eb1b0_0
-    pyjwt                     2.0.0                    pypi_0    pypi
-    pyld                      2.0.3                    pypi_0    pypi
-    pyodc                     1.0.3                    pypi_0    pypi
-    pyopenssl                 19.1.0                   pypi_0    pypi
-    pyparsing                 2.4.7              pyhd3eb1b0_0
-    pyqt                      5.9.2            py37h05f1152_2
-    pyrsistent                0.17.3           py37h7b6447c_0
-    pyshacl                   0.11.3.post1             pypi_0    pypi
-    pysocks                   1.7.1                    py37_1
-    python                    3.7.10               hdb3f193_0
-    python-dateutil           2.8.1              pyhd3eb1b0_0
-    python-editor             1.0.4                    pypi_0    pypi
-    python-flatbuffers        1.12               pyhd3eb1b0_0
-    python_abi                3.7                     1_cp37m    conda-forge
-    pytz                      2021.1             pyhd3eb1b0_0
-    pyyaml                    5.3.1                    pypi_0    pypi
-    pyzmq                     19.0.2           py37hac76be4_2    conda-forge
-    qt                        5.9.7                h5867ecd_1
-    rdflib                    5.0.0                    pypi_0    pypi
-    rdflib-jsonld             0.5.0                    pypi_0    pypi
-    readline                  8.1                  h27cfd23_0
-    rechunker                 0.3.3                    pypi_0    pypi
-    regex                     2021.3.17                pypi_0    pypi
-    renku                     0.14.1                   pypi_0    pypi
-    requests                  2.24.0                   pypi_0    pypi
-    requests-oauthlib         1.3.0                      py_0
-    rich                      9.3.0                    pypi_0    pypi
-    rsa                       4.7.2              pyhd3eb1b0_1
-    ruamel-yaml               0.16.5                   pypi_0    pypi
-    ruamel-yaml-clib          0.2.2                    pypi_0    pypi
-    s3fs                      0.5.2              pyhd3eb1b0_0
-    schema-salad              7.1.20210316164414          pypi_0    pypi
-    scikit-learn              0.24.1           py37ha9443f7_0
-    scipy                     1.6.1            py37h91f5cce_0
-    send2trash                1.5.0              pyhd3eb1b0_1
-    setuptools                52.0.0           py37h06a4308_0
-    setuptools-scm            4.1.2                    pypi_0    pypi
-    shellescape               3.4.1                    pypi_0    pypi
-    shellingham               1.4.0                    pypi_0    pypi
-    simpervisor               0.4                      pypi_0    pypi
-    sip                       4.19.8           py37hf484d3e_0
-    six                       1.15.0           py37h06a4308_0
-    smmap                     3.0.5                    pypi_0    pypi
-    sniffio                   1.2.0                    pypi_0    pypi
-    sortedcontainers          2.3.0              pyhd3eb1b0_0
-    soupsieve                 2.2.1              pyhd3eb1b0_0
-    sqlite                    3.35.2               hdfb4753_0
-    tabulate                  0.8.7                    pypi_0    pypi
-    tbb                       2020.3               hfd86e86_0
-    tblib                     1.7.0                      py_0
-    tensorboard               2.4.0              pyhc547734_0
-    tensorboard-plugin-wit    1.6.0                      py_0
-    tensorflow                2.4.1           mkl_py37h2d14ff2_0
-    tensorflow-base           2.4.1           mkl_py37h43e0292_0
-    tensorflow-estimator      2.4.1              pyheb71bc4_0
-    termcolor                 1.1.0            py37h06a4308_1
-    terminado                 0.9.3            py37h06a4308_0
-    testpath                  0.4.4              pyhd3eb1b0_0
-    threadpoolctl             2.1.0              pyh5ca1d4c_0
-    tk                        8.6.10               hbc83047_0
-    toml                      0.10.2                   pypi_0    pypi
-    toolz                     0.11.1             pyhd3eb1b0_0
-    tornado                   6.1              py37h27cfd23_0
-    tqdm                      4.48.2                   pypi_0    pypi
-    traitlets                 5.0.5                      py_0    conda-forge
-    typed-ast                 1.4.2                    pypi_0    pypi
-    typing-extensions         3.7.4.3              hd3eb1b0_0
-    typing_extensions         3.7.4.3            pyh06a4308_0
-    urllib3                   1.25.11                  pypi_0    pypi
-    wcmatch                   6.1                      pypi_0    pypi
-    wcwidth                   0.2.5                      py_0
-    webencodings              0.5.1                    py37_1
-    webob                     1.8.7              pyhd3eb1b0_0
-    werkzeug                  1.0.1              pyhd3eb1b0_0
-    wheel                     0.36.2             pyhd3eb1b0_0
-    wrapt                     1.12.1           py37h7b6447c_1
-    xarray                    0.17.0             pyhd3eb1b0_0
-    xhistogram                0.1.2              pyhd8ed1ab_0    conda-forge
-    xskillscore               0.0.20                   pypi_0    pypi
-    xz                        5.2.5                h7b6447c_0
-    yaml                      0.2.5                h7b6447c_0
-    yarl                      1.6.3            py37h27cfd23_0
-    zarr                      2.6.1              pyhd3eb1b0_0
-    zeromq                    4.3.4                h2531618_0
-    zict                      2.0.0              pyhd3eb1b0_0
-    zipp                      3.4.1              pyhd3eb1b0_0
-    zlib                      1.2.11               h7b6447c_3
-    zstd                      1.4.5                h9ceee32_0
 %% Cell type:code id: tags:
 ``` python
 ```

--- a/notebooks/renku_datasets_biweekly.ipynb
+++ b/notebooks/renku_datasets_biweekly.ipynb
 %% Cell type:markdown id: tags:
 # Create biweekly renku datasets from `climetlab-s2s-ai-challenge`
 %% Cell type:markdown id: tags:
 Goal:
 - Create biweekly renku datasets from [`climatelab-s2s-ai-challenge`](https://github.com/ecmwf-lab/climetlab-s2s-ai-challenge).
 - These renku datasets are then used in notebooks:
    - `ML_train_and_predict.ipynb` to train the ML model and do ML-based predictions
    - `RPSS_verification.ipynb` to calculate RPSS of the ML model
    - `mean_bias_reduction.ipynb` to remove the mean bias and
 Requirements:
 - [`climetlab`](https://github.com/ecmwf/climetlab)
 - [`climatelab-s2s-ai-challenge`](https://github.com/ecmwf-lab/climetlab-s2s-ai-challenge)
 - S2S and CPC observations uploaded on [European Weather Cloud (EWC)](https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/training-input/0.3.0/netcdf/index.html)
 Output: [renku dataset](https://renku-python.readthedocs.io/en/latest/commands.html#module-renku.cli.dataset) `s2s-ai-challenge`
 - observations
    - deterministic:
        - `hindcast-like-observations_2000-2019_biweekly_deterministic.zarr`
        - `forecast-like-observations_2020_biweekly_deterministic.zarr`
    - edges:
        - `hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc`
    - probabilistic:
        - `hindcast-like-observations_2000-2019_biweekly_terciled.zarr`
        - `forecast-like-observations_2020_biweekly_terciled.nc`
 - forecasts/hindcasts
    - deterministic:
        - `ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr`
        - `ecmwf_forecast-input_2020_biweekly_deterministic.zarr`
    - more models could be added
 - benchmark:
    - probabilistic:
        - `ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc`
 %% Cell type:code id: tags:
 ``` python
 import matplotlib.pyplot as plt
 import xarray as xr
 import xskillscore as xs
 import pandas as pd
 import climetlab_s2s_ai_challenge
 import climetlab as cml
 print(f'Climetlab version : {cml.__version__}')
 print(f'Climetlab-s2s-ai-challenge plugin version : {climetlab_s2s_ai_challenge.__version__}')
 xr.set_options(keep_attrs=True)
 xr.set_options(display_style='text')
 ```
+%% Output
+    WARNING: ecmwflibs universal: cannot find a library called MagPlus
+    Magics library could not be found
+    Climetlab version : 0.8.6
+    Climetlab-s2s-ai-challenge plugin version : 0.8.0
+    <xarray.core.options.set_options at 0x2b51c148f590>
 %% Cell type:code id: tags:
 ``` python
 # caching path for climetlab
-cache_path = "/work/mh0727/m300524/S2S_AI/cache" # set your own path
+cache_path = "/work/mh0727/m300524/S2S_AI/cache4" # set your own path
 cml.settings.set("cache-directory", cache_path)
+```
+%% Cell type:code id: tags:
+``` python
 cache_path = "../data"
 ```
 %% Cell type:markdown id: tags:
 # Download and cache
 Download all files for the observations, forecast and hindcast.
 %% Cell type:code id: tags:
 ``` python
 # shortcut
 from scripts import download
 #download()
 ```
 %% Cell type:markdown id: tags:
 ## hindcast and forecast `input`
 %% Cell type:code id: tags:
 ``` python
 # starting dates forecast_time in 2020
 dates = xr.cftime_range(start='20200102',freq='7D', periods=53).strftime('%Y%m%d').to_list()
 forecast_dataset_labels = ['training-input','test-input'] # ML community
 # equiv to
 forecast_dataset_labels = ['hindcast-input','forecast-input'] # NWP community
 varlist_forecast = ['tp','t2m'] # can add more
 center_list = ['ecmwf'] # 'ncep', 'eccc'
 ```
 %% Cell type:code id: tags:
 ``` python
 %%time
 # takes ~ 10-30 min to download for one model one variable depending on number of model realizations
 # and download settings https://climetlab.readthedocs.io/en/latest/guide/settings.html
 for center in center_list:
    for ds in forecast_dataset_labels:
        cml.load_dataset(f"s2s-ai-challenge-{ds}", origin=center, parameter=varlist_forecast, format='netcdf').to_xarray()
 ```
 %% Cell type:markdown id: tags:
 ## observations `output-reference`
 %% Cell type:code id: tags:
 ``` python
 obs_dataset_labels = ['training-output-reference','test-output-reference'] # ML community
 # equiv to
 obs_dataset_labels = ['hindcast-like-observations','forecast-like-observations'] # NWP community
 varlist_obs = ['tp', 't2m']
 ```
 %% Cell type:code id: tags:
 ``` python
 %%time
 # takes 10min to download
 for ds in obs_dataset_labels:
    print(ds)
    # only netcdf, no format choice
    cml.load_dataset(f"s2s-ai-challenge-{ds}", date=dates, parameter=varlist_obs).to_xarray()
 ```
 %% Cell type:code id: tags:
 ``` python
 # download obs_time for to create output-reference/observations for other models than ecmwf and eccc,
 # i.e. ncep or any S2S or Sub model
 obs_time = cml.load_dataset(f"s2s-ai-challenge-observations", parameter=['t2m', 'pr']).to_xarray()
 ```
 %% Cell type:markdown id: tags:
 # create bi-weekly aggregates
 %% Cell type:code id: tags:
 ``` python
 from scripts import aggregate_biweekly, ensure_attributes
 #aggregate_biweekly??
 ```
 %% Cell type:code id: tags:
 ``` python
 for c, center in enumerate(center_list):  # forecast centers (could also take models)
-    for dsl in obs_dataset_labels + forecast_dataset_labels:  # climetlab dataset labels
+    for dsl in obs_dataset_labels:# + forecast_dataset_labels:  # climetlab dataset labels
        for p, parameter in enumerate(varlist_forecast):  # variables
            if c != 0 and 'observation' in dsl:  # only do once for observations
                continue
            print(f"datasetlabel: {dsl}, center: {center}, parameter: {parameter}")
            if 'input' in dsl:
                ds = cml.load_dataset(f"s2s-ai-challenge-{dsl}", origin=center, parameter=parameter, format='netcdf').to_xarray()
            elif 'observation' in dsl: # obs only netcdf, no choice
                if parameter not in ['t2m', 'tp']:
                    continue
                ds = cml.load_dataset(f"s2s-ai-challenge-{dsl}", parameter=parameter, date=dates).to_xarray()
            if p == 0:
                ds_biweekly = ds.map(aggregate_biweekly)
            else:
                ds_biweekly[parameter] = ds.map(aggregate_biweekly)[parameter]
            ds_biweekly = ds_biweekly.map(ensure_attributes, biweekly=True)
            ds_biweekly = ds_biweekly.sortby('forecast_time')
        if 'test' in dsl:
            ds_biweekly = ds_biweekly.chunk('auto')
        else:
            ds_biweekly = ds_biweekly.chunk({'forecast_time':'auto','lead_time':-1,'longitude':-1,'latitude':-1})
        if 'hindcast' in dsl:
            time = f'{int(ds_biweekly.forecast_time.dt.year.min())}-{int(ds_biweekly.forecast_time.dt.year.max())}'
            if 'input' in dsl:
                name = f'{center}_{dsl}'
            elif 'observations':
                name = dsl
        elif 'forecast' in dsl:
            time = '2020'
            if 'input' in dsl:
                name = f'{center}_{dsl}'
            elif 'observations':
                name = dsl
        else:
            assert False
        # pattern: {model_if_not_observations}{observations/forecast/hindcast}_{time}_biweekly_deterministic.zarr
        zp = f'{cache_path}/{name}_{time}_biweekly_deterministic.zarr'
        ds_biweekly.attrs.update({'postprocessed_by':'https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/blob/master/notebooks/renku_datasets_biweekly.ipynb'})
        print(f'save to: {zp}')
        ds_biweekly.astype('float32').to_zarr(zp, consolidated=True, mode='w')
 ```
+%% Output
+    datasetlabel: hindcast-like-observations, center: ecmwf, parameter: tp
+    By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data.  This dataset has been dowloaded from IRIDL. By downloading this data you also agree to the terms and conditions defined at https://iridl.ldeo.columbia.edu.
+    WARNING: ecmwflibs universal: found eccodes at /work/mh0727/m300524/conda-envs/s2s-ai/lib/libeccodes.so
+    Warning: ecCodes 2.21.0 or higher is recommended. You are running version 2.18.0
+    By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data.
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/indexing.py:1379: PerformanceWarning: Slicing with an out-of-order index is generating 20 times more chunks
+      return self.array[key]
+    datasetlabel: hindcast-like-observations, center: ecmwf, parameter: t2m
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/indexing.py:1379: PerformanceWarning: Slicing is producing a large chunk. To accept the large
+    chunk and silence this warning, set the option
+        >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
+        ...     array[indexer]
+    To avoid creating the large chunks, set the option
+        >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
+        ...     array[indexer]
+      return self.array[key]
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/indexing.py:1379: PerformanceWarning: Slicing is producing a large chunk. To accept the large
+    chunk and silence this warning, set the option
+        >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
+        ...     array[indexer]
+    To avoid creating the large chunks, set the option
+        >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
+        ...     array[indexer]
+      return self.array[key]
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/indexing.py:1379: PerformanceWarning: Slicing with an out-of-order index is generating 20 times more chunks
+      return self.array[key]
+    save to: ../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
+    datasetlabel: forecast-like-observations, center: ecmwf, parameter: tp
+    By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data.  This dataset has been dowloaded from IRIDL. By downloading this data you also agree to the terms and conditions defined at https://iridl.ldeo.columbia.edu.
+    datasetlabel: forecast-like-observations, center: ecmwf, parameter: t2m
+    save to: ../data/forecast-like-observations_2020_biweekly_deterministic.zarr
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
 %% Cell type:markdown id: tags:
 ## add to `renku` dataset `s2s-ai-challenge`
 %% Cell type:code id: tags:
 ``` python
 # observations as hindcast
 # run renku commands from projects root directory only
 # !renku dataset add s2s-ai-challenge data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
 ```
 %% Cell type:code id: tags:
 ``` python
 # for further use retrieve from git lfs
 # !renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
 ```
 %% Cell type:code id: tags:
 ``` python
 obs_2000_2019 = xr.open_zarr(f"{cache_path}/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr", consolidated=True)
 print(obs_2000_2019.sizes,'\n',obs_2000_2019.coords,'\n', obs_2000_2019.nbytes/1e6,'MB')
 ```
 %% Output
    Frozen(SortedKeysDict({'forecast_time': 1060, 'latitude': 121, 'lead_time': 2, 'longitude': 240}))
     Coordinates:
      * forecast_time  (forecast_time) datetime64[ns] 2000-01-02 ... 2019-12-31
      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
      * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
        valid_time     (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 1060), meta=np.ndarray>
     492.546744 MB
 %% Cell type:code id: tags:
 ``` python
 # observations as forecast
 # run renku commands from projects root directory only
 # !renku dataset add s2s-ai-challenge data/forecast-like-observations_2020_biweekly_deterministic.zarr
 ```
 %% Cell type:code id: tags:
 ``` python
+# for further use retrieve from git lfs
+# !renku storage pull ../data/forecast-like-observations_2020_biweekly_deterministic.zarr
+```
+%% Cell type:code id: tags:
+``` python
 obs_2020 = xr.open_zarr(f"{cache_path}/forecast-like-observations_2020_biweekly_deterministic.zarr", consolidated=True)
 print(obs_2020.sizes,'\n',obs_2020.coords,'\n', obs_2020.nbytes/1e6,'MB')
 ```
 %% Output
    Frozen(SortedKeysDict({'forecast_time': 53, 'latitude': 121, 'lead_time': 2, 'longitude': 240}))
     Coordinates:
      * forecast_time  (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31
      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
      * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
        valid_time     (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 53), meta=np.ndarray>
     24.630096 MB
 %% Cell type:code id: tags:
 ``` python
 # ecmwf hindcast-input
 # run renku commands from projects root directory only
 # !renku dataset add s2s-ai-challenge data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr
 ```
 %% Cell type:code id: tags:
 ``` python
-hind_2000_2019 = xr.open_zarr(f"{cache_path}/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr", consolidated=True)
+# for further use retrieve from git lfs
-print(hind_2000_2019.sizes,'\n',hind_2000_2019.coords,'\n', hind_2000_2019.nbytes/1e6,'MB')
+# !renku storage pull ../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr
 ```
-%% Output
+%% Cell type:code id: tags:
-    Frozen(SortedKeysDict({'forecast_time': 1060, 'latitude': 121, 'lead_time': 2, 'longitude': 240, 'realization': 11}))
+``` python
-     Coordinates:
+hind_2000_2019 = xr.open_zarr(f"{cache_path}/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr", consolidated=True)
-      * forecast_time  (forecast_time) datetime64[ns] 2000-01-02 ... 2019-12-31
+print(hind_2000_2019.sizes,'\n',hind_2000_2019.coords,'\n', hind_2000_2019.nbytes/1e6,'MB')
-      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+```
-      * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
-      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
-      * realization    (realization) int64 0 1 2 3 4 5 6 7 8 9 10
-        valid_time     (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 1060), meta=np.ndarray>
-     5417.730832 MB
 %% Cell type:code id: tags:
 ``` python
 # ecmwf forecast-input
 # run renku commands from projects root directory only
 # !renku dataset add s2s-ai-challenge data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr
 ```
 %% Cell type:code id: tags:
 ``` python
-fct_2020 = xr.open_zarr(f"{cache_path}/ecmwf_forecast-input_2020_biweekly_deterministic.zarr", consolidated=True)
+# for further use retrieve from git lfs
-print(fct_2020.sizes,'\n',fct_2020.coords,'\n', fct_2020.nbytes/1e6,'MB')
+# !renku storage pull ../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr
 ```
-%% Output
+%% Cell type:code id: tags:
-    Frozen(SortedKeysDict({'forecast_time': 53, 'latitude': 121, 'lead_time': 2, 'longitude': 240, 'realization': 51}))
+``` python
-     Coordinates:
+fct_2020 = xr.open_zarr(f"{cache_path}/ecmwf_forecast-input_2020_biweekly_deterministic.zarr", consolidated=True)
-      * forecast_time  (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31
+print(fct_2020.sizes,'\n',fct_2020.coords,'\n', fct_2020.nbytes/1e6,'MB')
-      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+```
-      * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
-      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
-      * realization    (realization) int64 0 1 2 3 4 5 6 7 ... 44 45 46 47 48 49 50
-        valid_time     (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 53), meta=np.ndarray>
-     1255.926504 MB
 %% Cell type:markdown id: tags:
 # tercile edges
 Create 2 tercile edges at 1/3 and 2/3 quantiles of the 2000-2019 biweekly distrbution for each week of the year
 %% Cell type:code id: tags:
 ``` python
+obs_2000_2019 = xr.open_zarr(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr', consolidated=True)
+```
+%% Cell type:code id: tags:
+``` python
+from scripts import add_year_week_coords
+```
+%% Cell type:code id: tags:
+``` python
+# add week for groupby, see https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge/-/issues/29
+obs_2000_2019 = add_year_week_coords(obs_2000_2019)
+```
+%% Output
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:383: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
+      FutureWarning,
+%% Cell type:code id: tags:
+``` python
+obs_2000_2019
+```
+%% Output
+    <xarray.Dataset>
+    Dimensions:        (forecast_time: 1060, latitude: 121, lead_time: 2, longitude: 240)
+    Coordinates:
+      * forecast_time  (forecast_time) datetime64[ns] 2000-01-02 ... 2019-12-31
+      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+      * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
+      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
+        valid_time     (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 1060), meta=np.ndarray>
+        week           (forecast_time) int64 1 2 3 4 5 6 7 ... 47 48 49 50 51 52 53
+        year           (forecast_time) int64 2000 2000 2000 2000 ... 2019 2019 2019
+    Data variables:
+        t2m            (lead_time, forecast_time, latitude, longitude) float32 dask.array<chunksize=(2, 530, 121, 240), meta=np.ndarray>
+        tp             (lead_time, forecast_time, latitude, longitude) float32 dask.array<chunksize=(2, 530, 121, 240), meta=np.ndarray>
+    Attributes:
+        created_by_script:    tools/observations/makefile
+        created_by_software:  climetlab-s2s-ai-challenge
+        function:             climetlab_s2s_ai_challenge.extra.forecast_like_obse...
+        postprocessed_by:     https://renkulab.io/gitlab/aaron.spring/s2s-ai-chal...
+        regrid_method:        conservative
+        source_dataset_name:  NOAA NCEP CPC UNIFIED_PRCP GAUGE_BASED GLOBAL v1p0 ...
+        source_hosting:       IRIDL
+        source_url:           http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/...
+%% Cell type:code id: tags:
+``` python
 tercile_file = f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc'
 ```
 %% Cell type:code id: tags:
 ``` python
 %%time
-xr.open_zarr(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr',
+obs_2000_2019.chunk({'forecast_time':-1,'longitude':'auto'}).groupby('week').quantile(q=[1./3.,2./3.], dim='forecast_time').rename({'quantile':'category_edge'}).astype('float32').to_netcdf(tercile_file)
-             consolidated=True).chunk({'forecast_time':-1,'longitude':'auto'}).groupby('forecast_time.weekofyear').quantile(q=[1./3.,2./3.], dim=['forecast_time']).rename({'quantile':'category_edge'}).astype('float32').to_netcdf(tercile_file)
 ```
 %% Output
-    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
-      FutureWarning,
    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/numpy/lib/nanfunctions.py:1390: RuntimeWarning: All-NaN slice encountered
      overwrite_input, interpolation)
-    CPU times: user 21min 25s, sys: 9min 19s, total: 30min 45s
+    CPU times: user 19min 35s, sys: 8min 33s, total: 28min 9s
-    Wall time: 18min 4s
+    Wall time: 16min 44s
 %% Cell type:code id: tags:
 ``` python
 tercile_edges = xr.open_dataset(tercile_file)
 tercile_edges
 ```
 %% Output
    <xarray.Dataset>
-    Dimensions:        (category_edge: 2, latitude: 121, lead_time: 2, longitude: 240, weekofyear: 53)
+    Dimensions:        (category_edge: 2, latitude: 121, lead_time: 2, longitude: 240, week: 53)
    Coordinates:
      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
      * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
      * category_edge  (category_edge) float64 0.3333 0.6667
-      * weekofyear     (weekofyear) int64 1 2 3 4 5 6 7 8 ... 47 48 49 50 51 52 53
+      * week           (week) int64 1 2 3 4 5 6 7 8 9 ... 45 46 47 48 49 50 51 52 53
    Data variables:
-        t2m            (weekofyear, category_edge, lead_time, latitude, longitude) float32 ...
+        t2m            (week, category_edge, lead_time, latitude, longitude) float32 ...
-        tp             (weekofyear, category_edge, lead_time, latitude, longitude) float32 ...
+        tp             (week, category_edge, lead_time, latitude, longitude) float32 ...
    Attributes:
        created_by_script:    tools/observations/makefile
        created_by_software:  climetlab-s2s-ai-challenge
        function:             climetlab_s2s_ai_challenge.extra.forecast_like_obse...
-        postprocessed:        by https://renkulab.io/gitlab/aaron.spring/s2s-ai-c...
+        postprocessed_by:     https://renkulab.io/gitlab/aaron.spring/s2s-ai-chal...
+        regrid_method:        conservative
        source_dataset_name:  NOAA NCEP CPC UNIFIED_PRCP GAUGE_BASED GLOBAL v1p0 ...
        source_hosting:       IRIDL
        source_url:           http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/...
 %% Cell type:code id: tags:
 ``` python
 tercile_edges.nbytes*1e-6,'MB'
 ```
 %% Output
    (49.255184, 'MB')
 %% Cell type:code id: tags:
 ``` python
 # run renku commands from projects root directory only
 # tercile edges
 #!renku dataset add s2s-ai-challenge data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc
 ```
 %% Cell type:code id: tags:
 ``` python
 # to use retrieve from git lfs
 #!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc
 #xr.open_dataset("../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc")
 ```
 %% Cell type:markdown id: tags:
 # observations in categories
 - counting how many deterministic forecasts realizations fall into each category, like counting rps
 - categorize forecast-like-observations 2020 into categories
 %% Cell type:code id: tags:
 ``` python
 obs_2020 = xr.open_zarr(f'{cache_path}/forecast-like-observations_2020_biweekly_deterministic.zarr', consolidated=True)
 obs_2020.sizes
 ```
 %% Output
    Frozen(SortedKeysDict({'forecast_time': 53, 'latitude': 121, 'lead_time': 2, 'longitude': 240}))
 %% Cell type:code id: tags:
 ``` python
 # create a mask for land grid
 mask = obs_2020.std(['lead_time','forecast_time']).notnull()
 ```
 %% Cell type:code id: tags:
 ``` python
 # mask.to_array().plot(col='variable')
 ```
 %% Cell type:code id: tags:
 ``` python
 # total precipitation in arid regions are masked
 # Frederic Vitart suggested by email: "Based on your map we could mask all the areas where the lower tercile boundary is lower than 0.1 mm"
 # we are using a dry mask as in https://doi.org/10.1175/MWR-D-17-0092.1
 th = 0.01
 tp_arid_mask = tercile_edges.tp.isel(category_edge=0, lead_time=0, drop=True) > th
 #tp_arid_mask.where(mask.tp).plot(col='forecast_time', col_wrap=4)
 #plt.suptitle(f'dry mask: week 3-4 tp 1/3 category_edge > {th} kg m-2',y=1., x=.4)
 #plt.savefig('dry_mask.png')
 ```
 %% Cell type:code id: tags:
 ``` python
 # look into tercile edges
-```
-%% Cell type:code id: tags:
+# tercile_edges.isel(forecast_time=0)['tp'].plot(col='lead_time',row='category_edge', robust=True)
+# tercile_edges.isel(forecast_time=[0,20],category_edge=1)['tp'].plot(col='lead_time', row='forecast_time', robust=True)
-``` python
-#tercile_edges.isel(forecast_time=0)['tp'].plot(col='lead_time',row='category_edge', robust=True)
-```
-%% Cell type:code id: tags:
-``` python
-#tercile_edges.isel(forecast_time=[0,20],category_edge=1)['tp'].plot(col='lead_time', row='forecast_time', robust=True)
-```
-%% Cell type:code id: tags:
-``` python
 # tercile_edges.tp.mean(['forecast_time']).plot(col='lead_time',row='category_edge',vmax=.5)
 ```
 %% Cell type:markdown id: tags:
 ## categorize observations
 %% Cell type:markdown id: tags:
 ### forecast 2020
 %% Cell type:code id: tags:
 ``` python
 from scripts import make_probabilistic
 ```
 %% Cell type:code id: tags:
 ``` python
+# tp_arid_mask.isel(week=[0,10,20,30,40]).plot(col='week')
+```
+%% Cell type:code id: tags:
+``` python
 obs_2020_p = make_probabilistic(obs_2020, tercile_edges, mask=mask)
 ```
 %% Output
-    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:383: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
      FutureWarning,
 %% Cell type:code id: tags:
 ``` python
 obs_2020_p.nbytes/1e6, 'MB'
 ```
 %% Output
    (147.75984, 'MB')
 %% Cell type:code id: tags:
 ``` python
+obs_2020_p
+```
+%% Output
+    <xarray.Dataset>
+    Dimensions:        (category: 3, forecast_time: 53, latitude: 121, lead_time: 2, longitude: 240)
+    Coordinates:
+      * forecast_time  (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31
+      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+      * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
+      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
+        valid_time     (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 53), meta=np.ndarray>
+      * category       (category) <U12 'below normal' 'near normal' 'above normal'
+    Data variables:
+        t2m            (category, lead_time, forecast_time, latitude, longitude) float64 dask.array<chunksize=(1, 2, 53, 121, 240), meta=np.ndarray>
+        tp             (category, lead_time, forecast_time, latitude, longitude) float64 dask.array<chunksize=(1, 2, 53, 121, 240), meta=np.ndarray>
+%% Cell type:code id: tags:
+``` python
 obs_2020_p.astype('float32').to_netcdf(f'{cache_path}/forecast-like-observations_2020_biweekly_terciled.nc')
 ```
 %% Output
    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
 %% Cell type:code id: tags:
 ``` python
 # forecast-like-observations terciled
 # run renku commands from projects root directory only
 # !renku dataset add s2s-ai-challenge data/forecast-like-observations_2020_biweekly_terciled.nc
 ```
 %% Cell type:code id: tags:
 ``` python
 # to use retrieve from git lfs
 #!renku storage pull ../data/forecast-like-observations_2020_biweekly_terciled.nc
 xr.open_dataset("../data/forecast-like-observations_2020_biweekly_terciled.nc")
 ```
 %% Output
    <xarray.Dataset>
    Dimensions:        (category: 3, forecast_time: 53, latitude: 121, lead_time: 2, longitude: 240)
    Coordinates:
      * forecast_time  (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31
      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
      * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
        valid_time     (lead_time, forecast_time) datetime64[ns] ...
      * category       (category) object 'below normal' 'near normal' 'above normal'
    Data variables:
        t2m            (category, lead_time, forecast_time, latitude, longitude) float32 ...
        tp             (category, lead_time, forecast_time, latitude, longitude) float32 ...
 %% Cell type:code id: tags:
 ``` python
 ```
 %% Cell type:markdown id: tags:
 ### hindcast 2000_2019
 %% Cell type:code id: tags:
 ``` python
 obs_2000_2019 = xr.open_zarr(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr', consolidated=True)
 ```
 %% Cell type:code id: tags:
 ``` python
 obs_2000_2019_p = make_probabilistic(obs_2000_2019, tercile_edges, mask=mask)
 ```
 %% Output
-    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:383: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
      FutureWarning,
 %% Cell type:code id: tags:
 ``` python
 obs_2000_2019_p.nbytes/1e6, 'MB'
 ```
 %% Output
    (2955.138888, 'MB')
 %% Cell type:code id: tags:
 ``` python
 obs_2000_2019_p.astype('float32').chunk('auto').to_zarr(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_terciled.zarr', consolidated=True, mode='w')
 ```
 %% Output
    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
-    <xarray.backends.zarr.ZarrStore at 0x2b34e40d80c0>
+    <xarray.backends.zarr.ZarrStore at 0x2b51c1233360>
 %% Cell type:code id: tags:
 ``` python
 # forecast-like-observations terciled
 # run renku commands from projects root directory only
 # !renku dataset add s2s-ai-challenge data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr
 ```
 %% Cell type:code id: tags:
 ``` python
 # to use retrieve from git lfs
 #!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr
 xr.open_zarr("../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr")
 ```
 %% Output
    <xarray.Dataset>
    Dimensions:        (category: 3, forecast_time: 1060, latitude: 121, lead_time: 2, longitude: 240)
    Coordinates:
      * category       (category) <U12 'below normal' 'near normal' 'above normal'
      * forecast_time  (forecast_time) datetime64[ns] 2000-01-02 ... 2019-12-31
      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
      * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
        valid_time     (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 1060), meta=np.ndarray>
    Data variables:
-        t2m            (category, lead_time, forecast_time, latitude, longitude) float32 dask.array<chunksize=(1, 2, 280, 121, 240), meta=np.ndarray>
+        t2m            (category, lead_time, forecast_time, latitude, longitude) float32 dask.array<chunksize=(1, 2, 530, 121, 240), meta=np.ndarray>
-        tp             (category, lead_time, forecast_time, latitude, longitude) float32 dask.array<chunksize=(1, 2, 280, 121, 240), meta=np.ndarray>
+        tp             (category, lead_time, forecast_time, latitude, longitude) float32 dask.array<chunksize=(1, 2, 530, 121, 240), meta=np.ndarray>
 %% Cell type:code id: tags:
 ``` python
+# checking category frequencies
+# o = xr.open_zarr("../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr")
+# w=0
+# v='tp'
+# o.sel(forecast_time=o.forecast_time.dt.dayofyear==2+7*w).sum('forecast_time', skipna=False)[v].plot(row='lead_time',col='category', levels=[5.5,6.5,7.5])
+# o.sel(forecast_time=o.forecast_time.dt.dayofyear==2+7*w).sum('forecast_time', skipna=False).sum('category', skipna=False)[v].plot(row='lead_time', levels=[16.5,17.5,18.5,19.5,20.5])
 ```
 %% Cell type:markdown id: tags:
 # Benchmark
 center: ECMWF
 The calibration has been performed by using the tercile boundaries from the model climatology rather than from observations. Script by Frederic Vitart.
 %% Cell type:code id: tags:
 ``` python
 bench_p = cml.load_dataset("s2s-ai-challenge-test-output-benchmark", parameter=['tp','t2m']).to_xarray()
 ```
 %% Output
     50%|█████     | 1/2 [00:00<00:00,  6.11it/s]
    By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data.
    100%|██████████| 2/2 [00:00<00:00,  6.89it/s]
    WARNING: ecmwflibs universal: found eccodes at /work/mh0727/m300524/conda-envs/s2s-ai/lib/libeccodes.so
    Warning: ecCodes 2.21.0 or higher is recommended. You are running version 2.12.3
 %% Cell type:code id: tags:
 ``` python
 bench_p['category'].attrs = {'long_name': 'tercile category probabilities', 'units': '1',
                        'description': 'Probabilities for three tercile categories. All three tercile category probabilities must add up to 1.'}
 ```
 %% Cell type:code id: tags:
 ``` python
 bench_p['lead_time'] = [pd.Timedelta(f"{i} d") for i in [14, 28]] # take first day of biweekly average as new coordinate
 bench_p['lead_time'].attrs = {'long_name':'forecast_period', 'description': 'Forecast period is the time interval between the forecast reference time and the validity time.',
                         'aggregate': 'The pd.Timedelta corresponds to the first day of a biweekly aggregate.',
                         'week34_t2m': 'mean[day 14, 27]',
                         'week56_t2m': 'mean[day 28, 41]',
                         'week34_tp': 'day 28 minus day 14',
                         'week56_tp': 'day 42 minus day 28'}
 ```
 %% Cell type:code id: tags:
 ``` python
 bench_p = bench_p / 100 # convert percent to [0-1] probability
 ```
 %% Cell type:code id: tags:
 ``` python
 bench_p = bench_p.map(ensure_attributes, biweekly=True)
 ```
 %% Output
      0%|          | 0/1 [00:00<?, ?it/s]
    By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data.
    100%|██████████| 1/1 [00:00<00:00,  4.34it/s]
    100%|██████████| 1/1 [00:00<00:00,  4.22it/s]
 %% Cell type:code id: tags:
 ``` python
 # bench_p.isel(forecast_time=2).t2m.plot(row='lead_time', col='category')
 ```
 %% Cell type:code id: tags:
 ``` python
 bench_p
 ```
 %% Output
    <xarray.Dataset>
    Dimensions:        (category: 3, forecast_time: 53, latitude: 121, lead_time: 2, longitude: 240)
    Coordinates:
      * category       (category) object 'below normal' 'near normal' 'above normal'
      * forecast_time  (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31
      * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
        valid_time     (forecast_time, lead_time) datetime64[ns] 2020-01-16 ... 2...
    Data variables:
        tp             (category, forecast_time, lead_time, latitude, longitude) float32 ...
        t2m            (category, forecast_time, lead_time, latitude, longitude) float32 ...
 %% Cell type:code id: tags:
 ``` python
 bench_p.astype('float32').to_netcdf('../data/ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc')
 ```
 %% Cell type:code id: tags:
 ``` python
 #!renku dataset add s2s-ai-challenge data/ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc
 ```

 %% Cell type:markdown id: tags:
 # Create biweekly renku datasets from `climetlab-s2s-ai-challenge`
 %% Cell type:markdown id: tags:
 Goal:
 - Create biweekly renku datasets from [`climatelab-s2s-ai-challenge`](https://github.com/ecmwf-lab/climetlab-s2s-ai-challenge).
 - These renku datasets are then used in notebooks:
    - `ML_train_and_predict.ipynb` to train the ML model and do ML-based predictions
    - `RPSS_verification.ipynb` to calculate RPSS of the ML model
    - `mean_bias_reduction.ipynb` to remove the mean bias and
 Requirements:
 - [`climetlab`](https://github.com/ecmwf/climetlab)
 - [`climatelab-s2s-ai-challenge`](https://github.com/ecmwf-lab/climetlab-s2s-ai-challenge)
 - S2S and CPC observations uploaded on [European Weather Cloud (EWC)](https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/training-input/0.3.0/netcdf/index.html)
 Output: [renku dataset](https://renku-python.readthedocs.io/en/latest/commands.html#module-renku.cli.dataset) `s2s-ai-challenge`
 - observations
    - deterministic:
        - `hindcast-like-observations_2000-2019_biweekly_deterministic.zarr`
        - `forecast-like-observations_2020_biweekly_deterministic.zarr`
    - edges:
        - `hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc`
    - probabilistic:
        - `hindcast-like-observations_2000-2019_biweekly_terciled.zarr`
        - `forecast-like-observations_2020_biweekly_terciled.nc`
 - forecasts/hindcasts
    - deterministic:
        - `ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr`
        - `ecmwf_forecast-input_2020_biweekly_deterministic.zarr`
    - more models could be added
 - benchmark:
    - probabilistic:
        - `ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc`
 %% Cell type:code id: tags:
 ``` python
 import matplotlib.pyplot as plt
 import xarray as xr
 import xskillscore as xs
 import pandas as pd
 import climetlab_s2s_ai_challenge
 import climetlab as cml
 print(f'Climetlab version : {cml.__version__}')
 print(f'Climetlab-s2s-ai-challenge plugin version : {climetlab_s2s_ai_challenge.__version__}')
 xr.set_options(keep_attrs=True)
 xr.set_options(display_style='text')
 ```
+%% Output
+    WARNING: ecmwflibs universal: cannot find a library called MagPlus
+    Magics library could not be found
+    Climetlab version : 0.8.6
+    Climetlab-s2s-ai-challenge plugin version : 0.8.0
+    <xarray.core.options.set_options at 0x2b51c148f590>
 %% Cell type:code id: tags:
 ``` python
 # caching path for climetlab
-cache_path = "/work/mh0727/m300524/S2S_AI/cache" # set your own path
+cache_path = "/work/mh0727/m300524/S2S_AI/cache4" # set your own path
 cml.settings.set("cache-directory", cache_path)
+```
+%% Cell type:code id: tags:
+``` python
 cache_path = "../data"
 ```
 %% Cell type:markdown id: tags:
 # Download and cache
 Download all files for the observations, forecast and hindcast.
 %% Cell type:code id: tags:
 ``` python
 # shortcut
 from scripts import download
 #download()
 ```
 %% Cell type:markdown id: tags:
 ## hindcast and forecast `input`
 %% Cell type:code id: tags:
 ``` python
 # starting dates forecast_time in 2020
 dates = xr.cftime_range(start='20200102',freq='7D', periods=53).strftime('%Y%m%d').to_list()
 forecast_dataset_labels = ['training-input','test-input'] # ML community
 # equiv to
 forecast_dataset_labels = ['hindcast-input','forecast-input'] # NWP community
 varlist_forecast = ['tp','t2m'] # can add more
 center_list = ['ecmwf'] # 'ncep', 'eccc'
 ```
 %% Cell type:code id: tags:
 ``` python
 %%time
 # takes ~ 10-30 min to download for one model one variable depending on number of model realizations
 # and download settings https://climetlab.readthedocs.io/en/latest/guide/settings.html
 for center in center_list:
    for ds in forecast_dataset_labels:
        cml.load_dataset(f"s2s-ai-challenge-{ds}", origin=center, parameter=varlist_forecast, format='netcdf').to_xarray()
 ```
 %% Cell type:markdown id: tags:
 ## observations `output-reference`
 %% Cell type:code id: tags:
 ``` python
 obs_dataset_labels = ['training-output-reference','test-output-reference'] # ML community
 # equiv to
 obs_dataset_labels = ['hindcast-like-observations','forecast-like-observations'] # NWP community
 varlist_obs = ['tp', 't2m']
 ```
 %% Cell type:code id: tags:
 ``` python
 %%time
 # takes 10min to download
 for ds in obs_dataset_labels:
    print(ds)
    # only netcdf, no format choice
    cml.load_dataset(f"s2s-ai-challenge-{ds}", date=dates, parameter=varlist_obs).to_xarray()
 ```
 %% Cell type:code id: tags:
 ``` python
 # download obs_time for to create output-reference/observations for other models than ecmwf and eccc,
 # i.e. ncep or any S2S or Sub model
 obs_time = cml.load_dataset(f"s2s-ai-challenge-observations", parameter=['t2m', 'pr']).to_xarray()
 ```
 %% Cell type:markdown id: tags:
 # create bi-weekly aggregates
 %% Cell type:code id: tags:
 ``` python
 from scripts import aggregate_biweekly, ensure_attributes
 #aggregate_biweekly??
 ```
 %% Cell type:code id: tags:
 ``` python
 for c, center in enumerate(center_list):  # forecast centers (could also take models)
-    for dsl in obs_dataset_labels + forecast_dataset_labels:  # climetlab dataset labels
+    for dsl in obs_dataset_labels:# + forecast_dataset_labels:  # climetlab dataset labels
        for p, parameter in enumerate(varlist_forecast):  # variables
            if c != 0 and 'observation' in dsl:  # only do once for observations
                continue
            print(f"datasetlabel: {dsl}, center: {center}, parameter: {parameter}")
            if 'input' in dsl:
                ds = cml.load_dataset(f"s2s-ai-challenge-{dsl}", origin=center, parameter=parameter, format='netcdf').to_xarray()
            elif 'observation' in dsl: # obs only netcdf, no choice
                if parameter not in ['t2m', 'tp']:
                    continue
                ds = cml.load_dataset(f"s2s-ai-challenge-{dsl}", parameter=parameter, date=dates).to_xarray()
            if p == 0:
                ds_biweekly = ds.map(aggregate_biweekly)
            else:
                ds_biweekly[parameter] = ds.map(aggregate_biweekly)[parameter]
            ds_biweekly = ds_biweekly.map(ensure_attributes, biweekly=True)
            ds_biweekly = ds_biweekly.sortby('forecast_time')
        if 'test' in dsl:
            ds_biweekly = ds_biweekly.chunk('auto')
        else:
            ds_biweekly = ds_biweekly.chunk({'forecast_time':'auto','lead_time':-1,'longitude':-1,'latitude':-1})
        if 'hindcast' in dsl:
            time = f'{int(ds_biweekly.forecast_time.dt.year.min())}-{int(ds_biweekly.forecast_time.dt.year.max())}'
            if 'input' in dsl:
                name = f'{center}_{dsl}'
            elif 'observations':
                name = dsl
        elif 'forecast' in dsl:
            time = '2020'
            if 'input' in dsl:
                name = f'{center}_{dsl}'
            elif 'observations':
                name = dsl
        else:
            assert False
        # pattern: {model_if_not_observations}{observations/forecast/hindcast}_{time}_biweekly_deterministic.zarr
        zp = f'{cache_path}/{name}_{time}_biweekly_deterministic.zarr'
        ds_biweekly.attrs.update({'postprocessed_by':'https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/blob/master/notebooks/renku_datasets_biweekly.ipynb'})
        print(f'save to: {zp}')
        ds_biweekly.astype('float32').to_zarr(zp, consolidated=True, mode='w')
 ```
+%% Output
+    datasetlabel: hindcast-like-observations, center: ecmwf, parameter: tp
+    By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data.  This dataset has been dowloaded from IRIDL. By downloading this data you also agree to the terms and conditions defined at https://iridl.ldeo.columbia.edu.
+    WARNING: ecmwflibs universal: found eccodes at /work/mh0727/m300524/conda-envs/s2s-ai/lib/libeccodes.so
+    Warning: ecCodes 2.21.0 or higher is recommended. You are running version 2.18.0
+    By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data.
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/indexing.py:1379: PerformanceWarning: Slicing with an out-of-order index is generating 20 times more chunks
+      return self.array[key]
+    datasetlabel: hindcast-like-observations, center: ecmwf, parameter: t2m
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/indexing.py:1379: PerformanceWarning: Slicing is producing a large chunk. To accept the large
+    chunk and silence this warning, set the option
+        >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
+        ...     array[indexer]
+    To avoid creating the large chunks, set the option
+        >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
+        ...     array[indexer]
+      return self.array[key]
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/indexing.py:1379: PerformanceWarning: Slicing is producing a large chunk. To accept the large
+    chunk and silence this warning, set the option
+        >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
+        ...     array[indexer]
+    To avoid creating the large chunks, set the option
+        >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
+        ...     array[indexer]
+      return self.array[key]
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/indexing.py:1379: PerformanceWarning: Slicing with an out-of-order index is generating 20 times more chunks
+      return self.array[key]
+    save to: ../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
+    datasetlabel: forecast-like-observations, center: ecmwf, parameter: tp
+    By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data.  This dataset has been dowloaded from IRIDL. By downloading this data you also agree to the terms and conditions defined at https://iridl.ldeo.columbia.edu.
+    datasetlabel: forecast-like-observations, center: ecmwf, parameter: t2m
+    save to: ../data/forecast-like-observations_2020_biweekly_deterministic.zarr
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
+      x = np.divide(x1, x2, out)
 %% Cell type:markdown id: tags:
 ## add to `renku` dataset `s2s-ai-challenge`
 %% Cell type:code id: tags:
 ``` python
 # observations as hindcast
 # run renku commands from projects root directory only
 # !renku dataset add s2s-ai-challenge data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
 ```
 %% Cell type:code id: tags:
 ``` python
 # for further use retrieve from git lfs
 # !renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
 ```
 %% Cell type:code id: tags:
 ``` python
 obs_2000_2019 = xr.open_zarr(f"{cache_path}/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr", consolidated=True)
 print(obs_2000_2019.sizes,'\n',obs_2000_2019.coords,'\n', obs_2000_2019.nbytes/1e6,'MB')
 ```
 %% Output
    Frozen(SortedKeysDict({'forecast_time': 1060, 'latitude': 121, 'lead_time': 2, 'longitude': 240}))
     Coordinates:
      * forecast_time  (forecast_time) datetime64[ns] 2000-01-02 ... 2019-12-31
      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
      * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
        valid_time     (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 1060), meta=np.ndarray>
     492.546744 MB
 %% Cell type:code id: tags:
 ``` python
 # observations as forecast
 # run renku commands from projects root directory only
 # !renku dataset add s2s-ai-challenge data/forecast-like-observations_2020_biweekly_deterministic.zarr
 ```
 %% Cell type:code id: tags:
 ``` python
+# for further use retrieve from git lfs
+# !renku storage pull ../data/forecast-like-observations_2020_biweekly_deterministic.zarr
+```
+%% Cell type:code id: tags:
+``` python
 obs_2020 = xr.open_zarr(f"{cache_path}/forecast-like-observations_2020_biweekly_deterministic.zarr", consolidated=True)
 print(obs_2020.sizes,'\n',obs_2020.coords,'\n', obs_2020.nbytes/1e6,'MB')
 ```
 %% Output
    Frozen(SortedKeysDict({'forecast_time': 53, 'latitude': 121, 'lead_time': 2, 'longitude': 240}))
     Coordinates:
      * forecast_time  (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31
      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
      * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
        valid_time     (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 53), meta=np.ndarray>
     24.630096 MB
 %% Cell type:code id: tags:
 ``` python
 # ecmwf hindcast-input
 # run renku commands from projects root directory only
 # !renku dataset add s2s-ai-challenge data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr
 ```
 %% Cell type:code id: tags:
 ``` python
-hind_2000_2019 = xr.open_zarr(f"{cache_path}/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr", consolidated=True)
+# for further use retrieve from git lfs
-print(hind_2000_2019.sizes,'\n',hind_2000_2019.coords,'\n', hind_2000_2019.nbytes/1e6,'MB')
+# !renku storage pull ../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr
 ```
-%% Output
+%% Cell type:code id: tags:
-    Frozen(SortedKeysDict({'forecast_time': 1060, 'latitude': 121, 'lead_time': 2, 'longitude': 240, 'realization': 11}))
+``` python
-     Coordinates:
+hind_2000_2019 = xr.open_zarr(f"{cache_path}/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr", consolidated=True)
-      * forecast_time  (forecast_time) datetime64[ns] 2000-01-02 ... 2019-12-31
+print(hind_2000_2019.sizes,'\n',hind_2000_2019.coords,'\n', hind_2000_2019.nbytes/1e6,'MB')
-      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+```
-      * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
-      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
-      * realization    (realization) int64 0 1 2 3 4 5 6 7 8 9 10
-        valid_time     (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 1060), meta=np.ndarray>
-     5417.730832 MB
 %% Cell type:code id: tags:
 ``` python
 # ecmwf forecast-input
 # run renku commands from projects root directory only
 # !renku dataset add s2s-ai-challenge data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr
 ```
 %% Cell type:code id: tags:
 ``` python
-fct_2020 = xr.open_zarr(f"{cache_path}/ecmwf_forecast-input_2020_biweekly_deterministic.zarr", consolidated=True)
+# for further use retrieve from git lfs
-print(fct_2020.sizes,'\n',fct_2020.coords,'\n', fct_2020.nbytes/1e6,'MB')
+# !renku storage pull ../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr
 ```
-%% Output
+%% Cell type:code id: tags:
-    Frozen(SortedKeysDict({'forecast_time': 53, 'latitude': 121, 'lead_time': 2, 'longitude': 240, 'realization': 51}))
+``` python
-     Coordinates:
+fct_2020 = xr.open_zarr(f"{cache_path}/ecmwf_forecast-input_2020_biweekly_deterministic.zarr", consolidated=True)
-      * forecast_time  (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31
+print(fct_2020.sizes,'\n',fct_2020.coords,'\n', fct_2020.nbytes/1e6,'MB')
-      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+```
-      * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
-      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
-      * realization    (realization) int64 0 1 2 3 4 5 6 7 ... 44 45 46 47 48 49 50
-        valid_time     (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 53), meta=np.ndarray>
-     1255.926504 MB
 %% Cell type:markdown id: tags:
 # tercile edges
 Create 2 tercile edges at 1/3 and 2/3 quantiles of the 2000-2019 biweekly distrbution for each week of the year
 %% Cell type:code id: tags:
 ``` python
+obs_2000_2019 = xr.open_zarr(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr', consolidated=True)
+```
+%% Cell type:code id: tags:
+``` python
+from scripts import add_year_week_coords
+```
+%% Cell type:code id: tags:
+``` python
+# add week for groupby, see https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge/-/issues/29
+obs_2000_2019 = add_year_week_coords(obs_2000_2019)
+```
+%% Output
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:383: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
+      FutureWarning,
+%% Cell type:code id: tags:
+``` python
+obs_2000_2019
+```
+%% Output
+    <xarray.Dataset>
+    Dimensions:        (forecast_time: 1060, latitude: 121, lead_time: 2, longitude: 240)
+    Coordinates:
+      * forecast_time  (forecast_time) datetime64[ns] 2000-01-02 ... 2019-12-31
+      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+      * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
+      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
+        valid_time     (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 1060), meta=np.ndarray>
+        week           (forecast_time) int64 1 2 3 4 5 6 7 ... 47 48 49 50 51 52 53
+        year           (forecast_time) int64 2000 2000 2000 2000 ... 2019 2019 2019
+    Data variables:
+        t2m            (lead_time, forecast_time, latitude, longitude) float32 dask.array<chunksize=(2, 530, 121, 240), meta=np.ndarray>
+        tp             (lead_time, forecast_time, latitude, longitude) float32 dask.array<chunksize=(2, 530, 121, 240), meta=np.ndarray>
+    Attributes:
+        created_by_script:    tools/observations/makefile
+        created_by_software:  climetlab-s2s-ai-challenge
+        function:             climetlab_s2s_ai_challenge.extra.forecast_like_obse...
+        postprocessed_by:     https://renkulab.io/gitlab/aaron.spring/s2s-ai-chal...
+        regrid_method:        conservative
+        source_dataset_name:  NOAA NCEP CPC UNIFIED_PRCP GAUGE_BASED GLOBAL v1p0 ...
+        source_hosting:       IRIDL
+        source_url:           http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/...
+%% Cell type:code id: tags:
+``` python
 tercile_file = f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc'
 ```
 %% Cell type:code id: tags:
 ``` python
 %%time
-xr.open_zarr(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr',
+obs_2000_2019.chunk({'forecast_time':-1,'longitude':'auto'}).groupby('week').quantile(q=[1./3.,2./3.], dim='forecast_time').rename({'quantile':'category_edge'}).astype('float32').to_netcdf(tercile_file)
-             consolidated=True).chunk({'forecast_time':-1,'longitude':'auto'}).groupby('forecast_time.weekofyear').quantile(q=[1./3.,2./3.], dim=['forecast_time']).rename({'quantile':'category_edge'}).astype('float32').to_netcdf(tercile_file)
 ```
 %% Output
-    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
-      FutureWarning,
    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/numpy/lib/nanfunctions.py:1390: RuntimeWarning: All-NaN slice encountered
      overwrite_input, interpolation)
-    CPU times: user 21min 25s, sys: 9min 19s, total: 30min 45s
+    CPU times: user 19min 35s, sys: 8min 33s, total: 28min 9s
-    Wall time: 18min 4s
+    Wall time: 16min 44s
 %% Cell type:code id: tags:
 ``` python
 tercile_edges = xr.open_dataset(tercile_file)
 tercile_edges
 ```
 %% Output
    <xarray.Dataset>
-    Dimensions:        (category_edge: 2, latitude: 121, lead_time: 2, longitude: 240, weekofyear: 53)
+    Dimensions:        (category_edge: 2, latitude: 121, lead_time: 2, longitude: 240, week: 53)
    Coordinates:
      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
      * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
      * category_edge  (category_edge) float64 0.3333 0.6667
-      * weekofyear     (weekofyear) int64 1 2 3 4 5 6 7 8 ... 47 48 49 50 51 52 53
+      * week           (week) int64 1 2 3 4 5 6 7 8 9 ... 45 46 47 48 49 50 51 52 53
    Data variables:
-        t2m            (weekofyear, category_edge, lead_time, latitude, longitude) float32 ...
+        t2m            (week, category_edge, lead_time, latitude, longitude) float32 ...
-        tp             (weekofyear, category_edge, lead_time, latitude, longitude) float32 ...
+        tp             (week, category_edge, lead_time, latitude, longitude) float32 ...
    Attributes:
        created_by_script:    tools/observations/makefile
        created_by_software:  climetlab-s2s-ai-challenge
        function:             climetlab_s2s_ai_challenge.extra.forecast_like_obse...
-        postprocessed:        by https://renkulab.io/gitlab/aaron.spring/s2s-ai-c...
+        postprocessed_by:     https://renkulab.io/gitlab/aaron.spring/s2s-ai-chal...
+        regrid_method:        conservative
        source_dataset_name:  NOAA NCEP CPC UNIFIED_PRCP GAUGE_BASED GLOBAL v1p0 ...
        source_hosting:       IRIDL
        source_url:           http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/...
 %% Cell type:code id: tags:
 ``` python
 tercile_edges.nbytes*1e-6,'MB'
 ```
 %% Output
    (49.255184, 'MB')
 %% Cell type:code id: tags:
 ``` python
 # run renku commands from projects root directory only
 # tercile edges
 #!renku dataset add s2s-ai-challenge data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc
 ```
 %% Cell type:code id: tags:
 ``` python
 # to use retrieve from git lfs
 #!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc
 #xr.open_dataset("../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc")
 ```
 %% Cell type:markdown id: tags:
 # observations in categories
 - counting how many deterministic forecasts realizations fall into each category, like counting rps
 - categorize forecast-like-observations 2020 into categories
 %% Cell type:code id: tags:
 ``` python
 obs_2020 = xr.open_zarr(f'{cache_path}/forecast-like-observations_2020_biweekly_deterministic.zarr', consolidated=True)
 obs_2020.sizes
 ```
 %% Output
    Frozen(SortedKeysDict({'forecast_time': 53, 'latitude': 121, 'lead_time': 2, 'longitude': 240}))
 %% Cell type:code id: tags:
 ``` python
 # create a mask for land grid
 mask = obs_2020.std(['lead_time','forecast_time']).notnull()
 ```
 %% Cell type:code id: tags:
 ``` python
 # mask.to_array().plot(col='variable')
 ```
 %% Cell type:code id: tags:
 ``` python
 # total precipitation in arid regions are masked
 # Frederic Vitart suggested by email: "Based on your map we could mask all the areas where the lower tercile boundary is lower than 0.1 mm"
 # we are using a dry mask as in https://doi.org/10.1175/MWR-D-17-0092.1
 th = 0.01
 tp_arid_mask = tercile_edges.tp.isel(category_edge=0, lead_time=0, drop=True) > th
 #tp_arid_mask.where(mask.tp).plot(col='forecast_time', col_wrap=4)
 #plt.suptitle(f'dry mask: week 3-4 tp 1/3 category_edge > {th} kg m-2',y=1., x=.4)
 #plt.savefig('dry_mask.png')
 ```
 %% Cell type:code id: tags:
 ``` python
 # look into tercile edges
-```
-%% Cell type:code id: tags:
+# tercile_edges.isel(forecast_time=0)['tp'].plot(col='lead_time',row='category_edge', robust=True)
+# tercile_edges.isel(forecast_time=[0,20],category_edge=1)['tp'].plot(col='lead_time', row='forecast_time', robust=True)
-``` python
-#tercile_edges.isel(forecast_time=0)['tp'].plot(col='lead_time',row='category_edge', robust=True)
-```
-%% Cell type:code id: tags:
-``` python
-#tercile_edges.isel(forecast_time=[0,20],category_edge=1)['tp'].plot(col='lead_time', row='forecast_time', robust=True)
-```
-%% Cell type:code id: tags:
-``` python
 # tercile_edges.tp.mean(['forecast_time']).plot(col='lead_time',row='category_edge',vmax=.5)
 ```
 %% Cell type:markdown id: tags:
 ## categorize observations
 %% Cell type:markdown id: tags:
 ### forecast 2020
 %% Cell type:code id: tags:
 ``` python
 from scripts import make_probabilistic
 ```
 %% Cell type:code id: tags:
 ``` python
+# tp_arid_mask.isel(week=[0,10,20,30,40]).plot(col='week')
+```
+%% Cell type:code id: tags:
+``` python
 obs_2020_p = make_probabilistic(obs_2020, tercile_edges, mask=mask)
 ```
 %% Output
-    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:383: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
      FutureWarning,
 %% Cell type:code id: tags:
 ``` python
 obs_2020_p.nbytes/1e6, 'MB'
 ```
 %% Output
    (147.75984, 'MB')
 %% Cell type:code id: tags:
 ``` python
+obs_2020_p
+```
+%% Output
+    <xarray.Dataset>
+    Dimensions:        (category: 3, forecast_time: 53, latitude: 121, lead_time: 2, longitude: 240)
+    Coordinates:
+      * forecast_time  (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31
+      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
+      * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
+      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
+        valid_time     (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 53), meta=np.ndarray>
+      * category       (category) <U12 'below normal' 'near normal' 'above normal'
+    Data variables:
+        t2m            (category, lead_time, forecast_time, latitude, longitude) float64 dask.array<chunksize=(1, 2, 53, 121, 240), meta=np.ndarray>
+        tp             (category, lead_time, forecast_time, latitude, longitude) float64 dask.array<chunksize=(1, 2, 53, 121, 240), meta=np.ndarray>
+%% Cell type:code id: tags:
+``` python
 obs_2020_p.astype('float32').to_netcdf(f'{cache_path}/forecast-like-observations_2020_biweekly_terciled.nc')
 ```
 %% Output
    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
 %% Cell type:code id: tags:
 ``` python
 # forecast-like-observations terciled
 # run renku commands from projects root directory only
 # !renku dataset add s2s-ai-challenge data/forecast-like-observations_2020_biweekly_terciled.nc
 ```
 %% Cell type:code id: tags:
 ``` python
 # to use retrieve from git lfs
 #!renku storage pull ../data/forecast-like-observations_2020_biweekly_terciled.nc
 xr.open_dataset("../data/forecast-like-observations_2020_biweekly_terciled.nc")
 ```
 %% Output
    <xarray.Dataset>
    Dimensions:        (category: 3, forecast_time: 53, latitude: 121, lead_time: 2, longitude: 240)
    Coordinates:
      * forecast_time  (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31
      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
      * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
        valid_time     (lead_time, forecast_time) datetime64[ns] ...
      * category       (category) object 'below normal' 'near normal' 'above normal'
    Data variables:
        t2m            (category, lead_time, forecast_time, latitude, longitude) float32 ...
        tp             (category, lead_time, forecast_time, latitude, longitude) float32 ...
 %% Cell type:code id: tags:
 ``` python
 ```
 %% Cell type:markdown id: tags:
 ### hindcast 2000_2019
 %% Cell type:code id: tags:
 ``` python
 obs_2000_2019 = xr.open_zarr(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr', consolidated=True)
 ```
 %% Cell type:code id: tags:
 ``` python
 obs_2000_2019_p = make_probabilistic(obs_2000_2019, tercile_edges, mask=mask)
 ```
 %% Output
-    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
+    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:383: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
      FutureWarning,
 %% Cell type:code id: tags:
 ``` python
 obs_2000_2019_p.nbytes/1e6, 'MB'
 ```
 %% Output
    (2955.138888, 'MB')
 %% Cell type:code id: tags:
 ``` python
 obs_2000_2019_p.astype('float32').chunk('auto').to_zarr(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_terciled.zarr', consolidated=True, mode='w')
 ```
 %% Output
    /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
      x = np.divide(x1, x2, out)
-    <xarray.backends.zarr.ZarrStore at 0x2b34e40d80c0>
+    <xarray.backends.zarr.ZarrStore at 0x2b51c1233360>
 %% Cell type:code id: tags:
 ``` python
 # forecast-like-observations terciled
 # run renku commands from projects root directory only
 # !renku dataset add s2s-ai-challenge data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr
 ```
 %% Cell type:code id: tags:
 ``` python
 # to use retrieve from git lfs
 #!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr
 xr.open_zarr("../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr")
 ```
 %% Output
    <xarray.Dataset>
    Dimensions:        (category: 3, forecast_time: 1060, latitude: 121, lead_time: 2, longitude: 240)
    Coordinates:
      * category       (category) <U12 'below normal' 'near normal' 'above normal'
      * forecast_time  (forecast_time) datetime64[ns] 2000-01-02 ... 2019-12-31
      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
      * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
        valid_time     (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 1060), meta=np.ndarray>
    Data variables:
-        t2m            (category, lead_time, forecast_time, latitude, longitude) float32 dask.array<chunksize=(1, 2, 280, 121, 240), meta=np.ndarray>
+        t2m            (category, lead_time, forecast_time, latitude, longitude) float32 dask.array<chunksize=(1, 2, 530, 121, 240), meta=np.ndarray>
-        tp             (category, lead_time, forecast_time, latitude, longitude) float32 dask.array<chunksize=(1, 2, 280, 121, 240), meta=np.ndarray>
+        tp             (category, lead_time, forecast_time, latitude, longitude) float32 dask.array<chunksize=(1, 2, 530, 121, 240), meta=np.ndarray>
 %% Cell type:code id: tags:
 ``` python
+# checking category frequencies
+# o = xr.open_zarr("../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr")
+# w=0
+# v='tp'
+# o.sel(forecast_time=o.forecast_time.dt.dayofyear==2+7*w).sum('forecast_time', skipna=False)[v].plot(row='lead_time',col='category', levels=[5.5,6.5,7.5])
+# o.sel(forecast_time=o.forecast_time.dt.dayofyear==2+7*w).sum('forecast_time', skipna=False).sum('category', skipna=False)[v].plot(row='lead_time', levels=[16.5,17.5,18.5,19.5,20.5])
 ```
 %% Cell type:markdown id: tags:
 # Benchmark
 center: ECMWF
 The calibration has been performed by using the tercile boundaries from the model climatology rather than from observations. Script by Frederic Vitart.
 %% Cell type:code id: tags:
 ``` python
 bench_p = cml.load_dataset("s2s-ai-challenge-test-output-benchmark", parameter=['tp','t2m']).to_xarray()
 ```
 %% Output
     50%|█████     | 1/2 [00:00<00:00,  6.11it/s]
    By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data.
    100%|██████████| 2/2 [00:00<00:00,  6.89it/s]
    WARNING: ecmwflibs universal: found eccodes at /work/mh0727/m300524/conda-envs/s2s-ai/lib/libeccodes.so
    Warning: ecCodes 2.21.0 or higher is recommended. You are running version 2.12.3
 %% Cell type:code id: tags:
 ``` python
 bench_p['category'].attrs = {'long_name': 'tercile category probabilities', 'units': '1',
                        'description': 'Probabilities for three tercile categories. All three tercile category probabilities must add up to 1.'}
 ```
 %% Cell type:code id: tags:
 ``` python
 bench_p['lead_time'] = [pd.Timedelta(f"{i} d") for i in [14, 28]] # take first day of biweekly average as new coordinate
 bench_p['lead_time'].attrs = {'long_name':'forecast_period', 'description': 'Forecast period is the time interval between the forecast reference time and the validity time.',
                         'aggregate': 'The pd.Timedelta corresponds to the first day of a biweekly aggregate.',
                         'week34_t2m': 'mean[day 14, 27]',
                         'week56_t2m': 'mean[day 28, 41]',
                         'week34_tp': 'day 28 minus day 14',
                         'week56_tp': 'day 42 minus day 28'}
 ```
 %% Cell type:code id: tags:
 ``` python
 bench_p = bench_p / 100 # convert percent to [0-1] probability
 ```
 %% Cell type:code id: tags:
 ``` python
 bench_p = bench_p.map(ensure_attributes, biweekly=True)
 ```
 %% Output
      0%|          | 0/1 [00:00<?, ?it/s]
    By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data.
    100%|██████████| 1/1 [00:00<00:00,  4.34it/s]
    100%|██████████| 1/1 [00:00<00:00,  4.22it/s]
 %% Cell type:code id: tags:
 ``` python
 # bench_p.isel(forecast_time=2).t2m.plot(row='lead_time', col='category')
 ```
 %% Cell type:code id: tags:
 ``` python
 bench_p
 ```
 %% Output
    <xarray.Dataset>
    Dimensions:        (category: 3, forecast_time: 53, latitude: 121, lead_time: 2, longitude: 240)
    Coordinates:
      * category       (category) object 'below normal' 'near normal' 'above normal'
      * forecast_time  (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31
      * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
      * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
      * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
        valid_time     (forecast_time, lead_time) datetime64[ns] 2020-01-16 ... 2...
    Data variables:
        tp             (category, forecast_time, lead_time, latitude, longitude) float32 ...
        t2m            (category, forecast_time, lead_time, latitude, longitude) float32 ...
 %% Cell type:code id: tags:
 ``` python
 bench_p.astype('float32').to_netcdf('../data/ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc')
 ```
 %% Cell type:code id: tags:
 ``` python
 #!renku dataset add s2s-ai-challenge data/ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc
 ```

--- a/notebooks/scripts.py
+++ b/notebooks/scripts.py
@@ -146,11 +146,25 @@ def ensure_attributes(da, biweekly=False):
    return da
-def make_probabilistic(ds, tercile_edges, member_dim='realization', mask=None):
+def add_year_week_coords(ds):
+    import numpy as np
+    if 'week' not in ds.coords and 'year' not in ds.coords:
+        year = ds.forecast_time.dt.year.to_index().unique()
+        week = (list(np.arange(1,54)))
+        weeks = week * len(year)
+        years = np.repeat(year,len(week))
+        ds.coords["week"] = ("forecast_time", weeks)
+        ds.coords['week'].attrs['description'] = "This week represents the number of forecast_time starting from 1 to 53. Note: This week is different from the ISO week from groupby('forecast_time.weekofyear'), see https://en.wikipedia.org/wiki/ISO_week_date and https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge/-/issues/29"
+        ds.coords["year"] = ("forecast_time", years)
+        ds.coords['year'].attrs['long_name'] = "calendar year"
+    return ds
+def make_probabilistic(ds, tercile_edges, member_dim='realization', mask=None, groupby_coord='week'):
    """Compute probabilities from ds (observations or forecasts) based on tercile_edges."""
    # broadcast
-    if 'forecast_time' not in tercile_edges.dims and 'weekofyear' in tercile_edges.dims:
+    ds = add_year_week_coords(ds)
-        tercile_edges = tercile_edges.sel(weekofyear=ds.forecast_time.dt.weekofyear)
+    tercile_edges = tercile_edges.sel({groupby_coord: ds.coords[groupby_coord]})
    bn = ds < tercile_edges.isel(category_edge=0, drop=True)  # below normal
    n = (ds >= tercile_edges.isel(category_edge=0, drop=True)) & (ds < tercile_edges.isel(category_edge=1, drop=True))  # normal
    an = ds >= tercile_edges.isel(category_edge=1, drop=True)  # above normal
@@ -176,12 +190,14 @@ def make_probabilistic(ds, tercile_edges, member_dim='realization', mask=None):
                      'comment': 'All three tercile category probabilities must add up to 1.',
                      'variable_before_categorization': 'https://confluence.ecmwf.int/display/S2S/S2S+Surface+Air+Temperature'
                      }
-    if 'weekofyear' in ds_p.coords:
+    if 'year' in ds_p.coords:
-        ds_p = ds_p.drop('weekofyear')
+        del ds_p.coords['year']
+    if groupby_coord in ds_p.coords:
+        ds_p = ds_p.drop(groupby_coord)
    return ds_p
-def skill_by_year(preds):
+def skill_by_year(preds, adapt=False):
    """Returns pd.Dataframe of RPSS per year."""
    # similar verification_RPSS.ipynb
    # as scorer bot but returns a score for each year
@@ -194,44 +210,49 @@ def skill_by_year(preds):
    # from root
    #renku storage pull data/forecast-like-observations_2020_biweekly_terciled.nc
    #renku storage pull data/hindcast-like-observations_2000-2019_biweekly_terciled.nc
+    cache_path = '../data'
    if 2020 in preds.forecast_time.dt.year:
        obs_p = xr.open_dataset(f'{cache_path}/forecast-like-observations_2020_biweekly_terciled.nc').sel(forecast_time=preds.forecast_time)
    else:
        obs_p = xr.open_dataset(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_terciled.zarr', engine='zarr').sel(forecast_time=preds.forecast_time)
    # ML probabilities
    fct_p = preds
-    # check inputs
-    assert_predictions_2020(obs_p)
-    assert_predictions_2020(fct_p)
    # climatology
    clim_p = xr.DataArray([1/3, 1/3, 1/3], dims='category', coords={'category':['below normal', 'near normal', 'above normal']}).to_dataset(name='tp')
    clim_p['t2m'] = clim_p['tp']
-    ## RPSS
+    if adapt:
+        # select only obs_p where fct_p forecasts provided
+        for c in ['longitude', 'latitude', 'forecast_time', 'lead_time']:
+            obs_p = obs_p.sel({c:fct_p[c]})
+        obs_p = obs_p[list(fct_p.data_vars)]
+        clim_p = clim_p[list(fct_p.data_vars)]
+    else:
+        # check inputs
+        assert_predictions_2020(obs_p)
+        assert_predictions_2020(fct_p)
    # rps_ML
    rps_ML = xs.rps(obs_p, fct_p, category_edges=None, dim=[], input_distributions='p').compute()
    # rps_clim
    rps_clim = xs.rps(obs_p, clim_p, category_edges=None, dim=[], input_distributions='p').compute()
-    # rpss
-    rpss = 1 - (rps_ML / rps_clim)
-    # https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/issues/7
-    # penalize
+    ## RPSS
-    penalize = obs_p.where(fct_p!=1, other=-10).mean('category')
+    # penalize # https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/issues/7
-    rpss = rpss.where(penalize!=0,other=-10)
+    expect = obs_p.sum('category')
+    expect = expect.where(expect > 0.98).where(expect < 1.02)  # should be True if not all NaN
+    # https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/issues/50
+    rps_ML = rps_ML.where(expect, other=2)  # assign RPS=2 where value was expected but NaN found
+    # following Weigel 2007: https://doi.org/10.1175/MWR3280.1
+    rpss = 1 - (rps_ML.groupby('forecast_time.year').mean() / rps_clim.groupby('forecast_time.year').mean())
    # clip
    rpss = rpss.clip(-10, 1)
-    # average over all forecasts
-    rpss = rpss.groupby('forecast_time.year').mean()
    # weighted area mean
    weights = np.cos(np.deg2rad(np.abs(rpss.latitude)))

--- a/submissions/ML_prediction_2020.nc
+++ b/submissions/ML_prediction_2020.nc
No results found