Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • aaron.spring/s2s-ai-challenge-template
  • anthonypv_97/s2s-ai-challenge-template
  • anthonypv_97/s2s-ai-challenge-predictia
  • sandeep.sukumaran/s2s-ai-challenge-template
  • lucasdmarten/s2s-ai-challenge-template
  • otaviomf123/s2s-ai-challenge-template
  • utkuunalan/s2s-ai-challenge-template
  • utkuunalan/s2s-ai-challenge-envai
  • glabolhumar/s2s-ai-challenge-enveai
  • glabolhumar/s2s-ai-challenge-envai
  • 1557684138/s2s-ai-challenge-template
  • janer0/s2s-ai-challenge-template
  • luweispark/s2s-ai-challenge-template
  • luweispark/s2s-ai-challenge-tianing
  • 444375642/s2s-ai-challenge-onthego
  • rok.roskar/s2s-ai-challenge-template
  • wanwb1224/s2s-ai-challenge-template
  • 834586573/s2s-ai-challenge-template
  • suryadev/s2s-ai-challenge-template
  • suryadev/s2s-sps
  • rhkdgns322/s2s-ai-challenge-template
  • lorenzo.cavazzi.tech/s2s-ai-challenge-template-test
  • chprsandeep/s2s-ai-challenge-template
  • chprsandeep/s2s-ai-challenge-template-deeplearners
  • chprsandeep/s2s-ai-challenge-deeplearners
  • adam.bienkowski/s2s-ai-challenge-uconn
  • tasko.olevski/s2s-ai-challenge-template-test1
  • 605483660/s2s-ai-challenge-template-1
  • dattranoptimuss/s2s-ai-challenge-template
  • declan.finney/s2s-ai-challenge-template
  • hilla.gerstman/s2s-ai-challenge-template
  • maria.pyrina/s2s-ai-challenge-template
  • weiriche/s2s-ai-challenge-s2s-eth
  • lxywn96/s2s-ai-challenge-template
  • ken.takahashi.guevara/s2s-ai-challenge-senamhi
  • manpreet.phy/s2s-ai-challenge-template
  • rahul.s8396/s2s-ai-challenge-template
  • manmeet.singh/s2s-ai-challenge-template
  • manmeet.singh/s2s-ai-challenge-template-iitm-ut-austin
  • xiangyanfei212/s2s-ai-challenge-template
  • cheikhnoreyni.fall/s2s-ai-challenge-template
  • 1843402075/s2s-ai-challenge-template
  • priyanka.yadav/s2s-ai-challenge-template
  • priyanka.yadav/s2s-ai-challenge-s2s-eth
  • wanedahirou/s2s-ai-challenge-template
  • r08h19/s2s-ai-challenge-template
  • xueqy_666/s2s-ai-challenge-template
  • r08h19/s2s-ai-challenge-pink
  • 1727072371/s2s-ai-challenge-template
  • 1727072371/s2s-ai-challenge-templateandy
  • 1727072371/s2s-ai-challenge-templateandy1
  • jiehongx/s2s-ai-challenge-template
  • kwmski7/s2s-ai-challenge-template
  • lo.riches/s2s-ai-challenge-template
  • thmamouka/s2s-ai-challenge-agroapps
  • vvourlioti/s2s-ai-challenge-agroapps
  • dolkong400/s2s-ai-challenge-template
  • 1843402075/s2s-ai-challenge-123
  • daniel.steinfeld87/s2s-ai-challenge-kit-eth-ubern
  • jehangir_awan/s2s-ai-challenge-template
  • muhammad.haider/s2s-ai-challenge-template
  • rahul.s8396/s2s-ai-challenge-sa-india
  • mudithnirmala/s2s-ai-challenge-template
  • tao.sun/s2s-ai-challenge-template
  • rayjohnbell0/s2s-ai-challenge-template
  • lluis.palma/s2s-ai-challenge-bsc-es
  • daniel.janke/s2s-ai-challenge-daniel-janke
  • daniel.janke/s2s-ai-challenge-danieljanke
  • jordan.gierschendorf/s2s-ai-challenge-template
  • declan.finney/s2s-ai-challenge-swan
  • 1843402075/s2s-ai-challenge-1234
  • yixisi1505/s2s-ai-challenge-ylabaiers
  • hayakawa-gen1010/s2s-ai-challenge-template-ylabaiers
  • adounkpep/s2s-ai-challenge-pyaj
  • molina/s2s-ai-challenge-ncar-team1
  • molina/s2s-ai-challenge-ncar-team2
  • rmcsqrd/s2s-ai-challenge-explore
  • lxywn96/s2s-ai-challenge-template-new
  • lxywn96/s2s-ai-challenge-nuister-f1
  • b1gdaniel/s2s-ai-challenge-nuister-f2
  • xueqy_666/s2s-ai-challenge-xqy
  • xueqy_666/s2s-ai-challenge-nuister-f4
  • 1727072371/s2s-ai-challenge-nuister-f3
  • 1727072371/s2s-ai-challenge-nuister-f5
  • panglin0912/s2s-ai-challenge-nuister-f5
  • 1342071344/s2s-ai-challenge-template
  • 931072922/s2s-ai-challenge-test
  • 931072922/s2s-ai-challenge-test2
  • 931072922/s2s-ai-challenge-piesat
  • jaareval/s2s-ai-challenge-uatest
  • tasko.olevski/s2s-ai-challenge-template-test2
  • medakramzaytar/s2s-ai-challenge-tabola
  • kwibukabertrand/s2s-ai-challenge-template
  • roberto.garcia/s2s-ai-challenge
  • roberto.garcia/s2s-ai-challenge-mnt-cptec-inpe
  • tamer.shoubaki/s2s-ai-challenge-rssai
  • 1342071344/s2s-ai-challenge-teamname
  • 1342071344/s2s-ai-challenge-template0
  • thabangline/s2s-ai-challenge-template
  • 2101110593/s2s-ai-challenge-piesat
  • info2/s2s-ai-challenge-template
  • jordan.gierschendorf1/s2s-ai-challenge-template
  • deepkneko/s2s-ai-challenge-ylabaiers
  • gyin/s2s-ai-challenge-new
  • pmartineau.work/s2s-ai-challenge-template
  • awr/s2s-ai-challenge-template-awr
  • awr/s2s-ai-challenge-temp
  • tasko.olevski/s2s-ai-challenge-template-test3
  • awr/s2s-ai-challenge-template3
  • lluis.palma/s2s-ai-challenge-bsc
  • cheikhnoreyni.fall/s2s-ai-challenge-template-noreyni
  • cheikhnoreyni.fall/s2s-ai-challenge-template-noreynidioum
  • tamerthamoqa/s2s-ai-challenge-template
  • cheikhnoreyni.fall/s2s-ai-challenge-template-noreynilpa
  • damien.specq/s2s-ai-challenge-template
  • kjhall01/s2s-ai-challenge-kjhall01
  • bjoern.mayer92/s2s-ai-challenge-template-zmaw2
  • zhoushanglin100/s2s-ai-challenge-template
  • samguo_321/s2s-ai-challenge-bsc
  • samguo_321/s2s-ai-challenge-guoshan
  • medakramzaytar/s2s-ai-challenge-bb
  • ejiro.chiomaa/s2s-ai-challenge-ejiro
  • mayur/s2s-ai-challenge-template-mayur
  • btickell/s2s-ai-challenge-template-mayur
  • junjie.liu.china/s2s-ai-challenge-template
  • zhanglang/s2s-ai-challenge-template
  • adjebbar83/s2s-ai-challenge-template
  • 1765007740/s2s-ai-challenge-template
128 results
Show changes
Showing
with 643 additions and 901 deletions
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
File deleted
...@@ -18,6 +18,7 @@ dependencies: ...@@ -18,6 +18,7 @@ dependencies:
- s3fs - s3fs
- intake-xarray - intake-xarray
- cfgrib - cfgrib
- eccodes
- nc-time-axis - nc-time-axis
- pydap - pydap
- h5netcdf - h5netcdf
......
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Train ML model to correct predictions of week 3-4 & 5-6 # Train ML model to correct predictions of week 3-4 & 5-6
This notebook create a Machine Learning `ML_model` to predict weeks 3-4 & 5-6 based on `S2S` weeks 3-4 & 5-6 forecasts and is compared to `CPC` observations for the [`s2s-ai-challenge`](https://s2s-ai-challenge.github.io/). This notebook create a Machine Learning `ML_model` to predict weeks 3-4 & 5-6 based on `S2S` weeks 3-4 & 5-6 forecasts and is compared to `CPC` observations for the [`s2s-ai-challenge`](https://s2s-ai-challenge.github.io/).
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Synopsis # Synopsis
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Method: `ML-based mean bias reduction` ## Method: `ML-based mean bias reduction`
- calculate the ML-based bias from 2000-2019 deterministic ensemble mean forecast - calculate the ML-based bias from 2000-2019 deterministic ensemble mean forecast
- remove that the ML-based bias from 2020 forecast deterministic ensemble mean forecast - remove that the ML-based bias from 2020 forecast deterministic ensemble mean forecast
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Data used ## Data used
type: renku datasets type: renku datasets
Training-input for Machine Learning model: Training-input for Machine Learning model:
- hindcasts of models: - hindcasts of models:
- ECMWF: `ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr` - ECMWF: `ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr`
Forecast-input for Machine Learning model: Forecast-input for Machine Learning model:
- real-time 2020 forecasts of models: - real-time 2020 forecasts of models:
- ECMWF: `ecmwf_forecast-input_2020_biweekly_deterministic.zarr` - ECMWF: `ecmwf_forecast-input_2020_biweekly_deterministic.zarr`
Compare Machine Learning model forecast against against ground truth: Compare Machine Learning model forecast against against ground truth:
- `CPC` observations: - `CPC` observations:
- `hindcast-like-observations_biweekly_deterministic.zarr` - `hindcast-like-observations_biweekly_deterministic.zarr`
- `forecast-like-observations_2020_biweekly_deterministic.zarr` - `forecast-like-observations_2020_biweekly_deterministic.zarr`
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Resources used ## Resources used
for training, details in reproducibility for training, details in reproducibility
- platform: renku - platform: renku
- memory: 8 GB - memory: 8 GB
- processors: 2 CPU - processors: 2 CPU
- storage required: 10 GB - storage required: 10 GB
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Safeguards ## Safeguards
All points have to be [x] checked. If not, your submission is invalid. All points have to be [x] checked. If not, your submission is invalid.
Changes to the code after submissions are not possible, as the `commit` before the `tag` will be reviewed. Changes to the code after submissions are not possible, as the `commit` before the `tag` will be reviewed.
(Only in exceptions and if previous effort in reproducibility can be found, it may be allowed to improve readability and reproducibility after November 1st 2021.) (Only in exceptions and if previous effort in reproducibility can be found, it may be allowed to improve readability and reproducibility after November 1st 2021.)
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Safeguards to prevent [overfitting](https://en.wikipedia.org/wiki/Overfitting?wprov=sfti1) ### Safeguards to prevent [overfitting](https://en.wikipedia.org/wiki/Overfitting?wprov=sfti1)
If the organizers suspect overfitting, your contribution can be disqualified. If the organizers suspect overfitting, your contribution can be disqualified.
- [x] We did not use 2020 observations in training (explicit overfitting and cheating) - [x] We did not use 2020 observations in training (explicit overfitting and cheating)
- [x] We did not repeatedly verify my model on 2020 observations and incrementally improved my RPSS (implicit overfitting) - [x] We did not repeatedly verify my model on 2020 observations and incrementally improved my RPSS (implicit overfitting)
- [x] We provide RPSS scores for the training period with script `print_RPS_per_year`, see in section 6.3 `predict`. - [x] We provide RPSS scores for the training period with script `print_RPS_per_year`, see in section 6.3 `predict`.
- [x] We tried our best to prevent [data leakage](https://en.wikipedia.org/wiki/Leakage_(machine_learning)?wprov=sfti1). - [x] We tried our best to prevent [data leakage](https://en.wikipedia.org/wiki/Leakage_(machine_learning)?wprov=sfti1).
- [x] We honor the `train-validate-test` [split principle](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets). This means that the hindcast data is split into `train` and `validate`, whereas `test` is withheld. - [x] We honor the `train-validate-test` [split principle](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets). This means that the hindcast data is split into `train` and `validate`, whereas `test` is withheld.
- [x] We did not use `test` explicitly in training or implicitly in incrementally adjusting parameters. - [x] We did not use `test` explicitly in training or implicitly in incrementally adjusting parameters.
- [x] We considered [cross-validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)). - [x] We considered [cross-validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)).
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Safeguards for Reproducibility ### Safeguards for Reproducibility
Notebook/code must be independently reproducible from scratch by the organizers (after the competition), if not possible: no prize Notebook/code must be independently reproducible from scratch by the organizers (after the competition), if not possible: no prize
- [x] All training data is publicly available (no pre-trained private neural networks, as they are not reproducible for us) - [x] All training data is publicly available (no pre-trained private neural networks, as they are not reproducible for us)
- [x] Code is well documented, readable and reproducible. - [x] Code is well documented, readable and reproducible.
- [x] Code to reproduce training and predictions is preferred to run within a day on the described architecture. If the training takes longer than a day, please justify why this is needed. Please do not submit training piplelines, which take weeks to train. - [x] Code to reproduce training and predictions is preferred to run within a day on the described architecture. If the training takes longer than a day, please justify why this is needed. Please do not submit training piplelines, which take weeks to train.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Todos to improve template # Todos to improve template
This is just a demo. This is just a demo.
- [ ] use multiple predictor variables and two predicted variables - [ ] use multiple predictor variables and two predicted variables
- [ ] for both `lead_time`s in one go - [ ] for both `lead_time`s in one go
- [ ] consider seasonality, for now all `forecast_time` months are mixed - [ ] consider seasonality, for now all `forecast_time` months are mixed
- [ ] make probabilistic predictions with `category` dim, for now works deterministic - [ ] make probabilistic predictions with `category` dim, for now works deterministic
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Imports # Imports
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from tensorflow.keras.layers import Input, Dense, Flatten from tensorflow.keras.layers import Input, Dense, Flatten
from tensorflow.keras.models import Sequential from tensorflow.keras.models import Sequential
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
import xarray as xr import xarray as xr
xr.set_options(display_style='text') xr.set_options(display_style='text')
import numpy as np import numpy as np
from dask.utils import format_bytes from dask.utils import format_bytes
import xskillscore as xs import xskillscore as xs
``` ```
%% Output
/opt/conda/lib/python3.8/site-packages/xarray/backends/cfgrib_.py:27: UserWarning: Failed to load cfgrib - most likely there is a problem accessing the ecCodes library. Try `import cfgrib` to get the full error message
warnings.warn(
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Get training data # Get training data
preprocessing of input data may be done in separate notebook/script preprocessing of input data may be done in separate notebook/script
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Hindcast ## Hindcast
get weekly initialized hindcasts get weekly initialized hindcasts
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
v='t2m' v='t2m'
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# preprocessed as renku dataset # preprocessed as renku dataset
!renku storage pull ../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr !renku storage pull ../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr
``` ```
%% Output %% Output
Warning: Run CLI commands only from project's root directory. Warning: Run CLI commands only from project's root directory.
 
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
hind_2000_2019 = xr.open_zarr("../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr", consolidated=True) hind_2000_2019 = xr.open_zarr("../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr", consolidated=True)
``` ```
%% Output
/opt/conda/lib/python3.8/site-packages/xarray/backends/plugins.py:61: RuntimeWarning: Engine 'cfgrib' loading failed:
/opt/conda/lib/python3.8/site-packages/gribapi/_bindings.cpython-38-x86_64-linux-gnu.so: undefined symbol: codes_bufr_key_is_header
warnings.warn(f"Engine {name!r} loading failed:\n{ex}", RuntimeWarning)
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# preprocessed as renku dataset # preprocessed as renku dataset
!renku storage pull ../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr !renku storage pull ../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr
``` ```
%% Output %% Output
Warning: Run CLI commands only from project's root directory. Warning: Run CLI commands only from project's root directory.
 
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
fct_2020 = xr.open_zarr("../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr", consolidated=True) fct_2020 = xr.open_zarr("../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr", consolidated=True)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Observations ## Observations
corresponding to hindcasts corresponding to hindcasts
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# preprocessed as renku dataset # preprocessed as renku dataset
!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr !renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
``` ```
%% Output %% Output
Warning: Run CLI commands only from project's root directory. Warning: Run CLI commands only from project's root directory.
 
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
obs_2000_2019 = xr.open_zarr("../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr", consolidated=True)#[v] obs_2000_2019 = xr.open_zarr("../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr", consolidated=True)#[v]
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# preprocessed as renku dataset # preprocessed as renku dataset
!renku storage pull ../data/forecast-like-observations_2020_biweekly_deterministic.zarr !renku storage pull ../data/forecast-like-observations_2020_biweekly_deterministic.zarr
``` ```
%% Output %% Output
Warning: Run CLI commands only from project's root directory. Warning: Run CLI commands only from project's root directory.
 
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
obs_2020 = xr.open_zarr("../data/forecast-like-observations_2020_biweekly_deterministic.zarr", consolidated=True)#[v] obs_2020 = xr.open_zarr("../data/forecast-like-observations_2020_biweekly_deterministic.zarr", consolidated=True)#[v]
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# ML model # ML model
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
based on [Weatherbench](https://github.com/pangeo-data/WeatherBench/blob/master/quickstart.ipynb) based on [Weatherbench](https://github.com/pangeo-data/WeatherBench/blob/master/quickstart.ipynb)
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# run once only and dont commit # run once only and dont commit
!git clone https://github.com/pangeo-data/WeatherBench/ !git clone https://github.com/pangeo-data/WeatherBench/
``` ```
%% Output %% Output
Cloning into 'WeatherBench'... fatal: destination path 'WeatherBench' already exists and is not an empty directory.
remote: Enumerating objects: 718, done.
remote: Counting objects: 100% (3/3), done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 718 (delta 0), reused 0 (delta 0), pack-reused 715
Receiving objects: 100% (718/718), 17.77 MiB | 14.96 MiB/s, done.
Resolving deltas: 100% (424/424), done.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import sys import sys
sys.path.insert(1, 'WeatherBench') sys.path.insert(1, 'WeatherBench')
from WeatherBench.src.train_nn import DataGenerator, PeriodicConv2D, create_predictions from WeatherBench.src.train_nn import DataGenerator, PeriodicConv2D, create_predictions
import tensorflow.keras as keras import tensorflow.keras as keras
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
bs=32 bs=32
import numpy as np import numpy as np
class DataGenerator(keras.utils.Sequence): class DataGenerator(keras.utils.Sequence):
def __init__(self, fct, verif, lead_time, batch_size=bs, shuffle=True, load=True, def __init__(self, fct, verif, lead_time, batch_size=bs, shuffle=True, load=True,
mean=None, std=None): mean=None, std=None):
""" """
Data generator for WeatherBench data. Data generator for WeatherBench data.
Template from https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly Template from https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly
Args: Args:
fct: forecasts from S2S models: xr.DataArray (xr.Dataset doesnt work properly) fct: forecasts from S2S models: xr.DataArray (xr.Dataset doesnt work properly)
verif: observations with same dimensionality (xr.Dataset doesnt work properly) verif: observations with same dimensionality (xr.Dataset doesnt work properly)
lead_time: Lead_time as in model lead_time: Lead_time as in model
batch_size: Batch size batch_size: Batch size
shuffle: bool. If True, data is shuffled. shuffle: bool. If True, data is shuffled.
load: bool. If True, datadet is loaded into RAM. load: bool. If True, datadet is loaded into RAM.
mean: If None, compute mean from data. mean: If None, compute mean from data.
std: If None, compute standard deviation from data. std: If None, compute standard deviation from data.
Todo: Todo:
- use number in a better way, now uses only ensemble mean forecast - use number in a better way, now uses only ensemble mean forecast
- dont use .sel(lead_time=lead_time) to train over all lead_time at once - dont use .sel(lead_time=lead_time) to train over all lead_time at once
- be sensitive with forecast_time, pool a few around the weekofyear given - be sensitive with forecast_time, pool a few around the weekofyear given
- use more variables as predictors - use more variables as predictors
- predict more variables - predict more variables
""" """
if isinstance(fct, xr.Dataset): if isinstance(fct, xr.Dataset):
print('convert fct to array') print('convert fct to array')
fct = fct.to_array().transpose(...,'variable') fct = fct.to_array().transpose(...,'variable')
self.fct_dataset=True self.fct_dataset=True
else: else:
self.fct_dataset=False self.fct_dataset=False
if isinstance(verif, xr.Dataset): if isinstance(verif, xr.Dataset):
print('convert verif to array') print('convert verif to array')
verif = verif.to_array().transpose(...,'variable') verif = verif.to_array().transpose(...,'variable')
self.verif_dataset=True self.verif_dataset=True
else: else:
self.verif_dataset=False self.verif_dataset=False
#self.fct = fct #self.fct = fct
self.batch_size = batch_size self.batch_size = batch_size
self.shuffle = shuffle self.shuffle = shuffle
self.lead_time = lead_time self.lead_time = lead_time
self.fct_data = fct.transpose('forecast_time', ...).sel(lead_time=lead_time) self.fct_data = fct.transpose('forecast_time', ...).sel(lead_time=lead_time)
self.fct_mean = self.fct_data.mean('forecast_time').compute() if mean is None else mean self.fct_mean = self.fct_data.mean('forecast_time').compute() if mean is None else mean
self.fct_std = self.fct_data.std('forecast_time').compute() if std is None else std self.fct_std = self.fct_data.std('forecast_time').compute() if std is None else std
self.verif_data = verif.transpose('forecast_time', ...).sel(lead_time=lead_time) self.verif_data = verif.transpose('forecast_time', ...).sel(lead_time=lead_time)
self.verif_mean = self.verif_data.mean('forecast_time').compute() if mean is None else mean self.verif_mean = self.verif_data.mean('forecast_time').compute() if mean is None else mean
self.verif_std = self.verif_data.std('forecast_time').compute() if std is None else std self.verif_std = self.verif_data.std('forecast_time').compute() if std is None else std
# Normalize # Normalize
self.fct_data = (self.fct_data - self.fct_mean) / self.fct_std self.fct_data = (self.fct_data - self.fct_mean) / self.fct_std
self.verif_data = (self.verif_data - self.verif_mean) / self.verif_std self.verif_data = (self.verif_data - self.verif_mean) / self.verif_std
self.n_samples = self.fct_data.forecast_time.size self.n_samples = self.fct_data.forecast_time.size
self.forecast_time = self.fct_data.forecast_time self.forecast_time = self.fct_data.forecast_time
self.on_epoch_end() self.on_epoch_end()
# For some weird reason calling .load() earlier messes up the mean and std computations # For some weird reason calling .load() earlier messes up the mean and std computations
if load: if load:
# print('Loading data into RAM') # print('Loading data into RAM')
self.fct_data.load() self.fct_data.load()
def __len__(self): def __len__(self):
'Denotes the number of batches per epoch' 'Denotes the number of batches per epoch'
return int(np.ceil(self.n_samples / self.batch_size)) return int(np.ceil(self.n_samples / self.batch_size))
def __getitem__(self, i): def __getitem__(self, i):
'Generate one batch of data' 'Generate one batch of data'
idxs = self.idxs[i * self.batch_size:(i + 1) * self.batch_size] idxs = self.idxs[i * self.batch_size:(i + 1) * self.batch_size]
# got all nan if nans not masked # got all nan if nans not masked
X = self.fct_data.isel(forecast_time=idxs).fillna(0.).values X = self.fct_data.isel(forecast_time=idxs).fillna(0.).values
y = self.verif_data.isel(forecast_time=idxs).fillna(0.).values y = self.verif_data.isel(forecast_time=idxs).fillna(0.).values
return X, y return X, y
def on_epoch_end(self): def on_epoch_end(self):
'Updates indexes after each epoch' 'Updates indexes after each epoch'
self.idxs = np.arange(self.n_samples) self.idxs = np.arange(self.n_samples)
if self.shuffle == True: if self.shuffle == True:
np.random.shuffle(self.idxs) np.random.shuffle(self.idxs)
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# 2 bi-weekly `lead_time`: week 3-4 # 2 bi-weekly `lead_time`: week 3-4
lead = hind_2000_2019.isel(lead_time=0).lead_time lead = hind_2000_2019.isel(lead_time=0).lead_time
lead lead
``` ```
%% Output %% Output
<xarray.DataArray 'lead_time' ()> <xarray.DataArray 'lead_time' ()>
array(1209600000000000, dtype='timedelta64[ns]') array(1209600000000000, dtype='timedelta64[ns]')
Coordinates: Coordinates:
lead_time timedelta64[ns] 14 days lead_time timedelta64[ns] 14 days
Attributes: Attributes:
comment: lead_time describes bi-weekly aggregates. The pd.Timedelta corr... aggregate: The pd.Timedelta corresponds to the first day of a biweek...
description: Forecast period is the time interval between the forecast...
long_name: lead time
standard_name: forecast_period
week34_t2m: mean[14 days, 27 days]
week34_tp: 28 days minus 14 days
week56_t2m: mean[28 days, 41 days]
week56_tp: 42 days minus 28 days
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# mask, needed? # mask, needed?
hind_2000_2019 = hind_2000_2019.where(obs_2000_2019.isel(forecast_time=0, lead_time=0,drop=True).notnull()) hind_2000_2019 = hind_2000_2019.where(obs_2000_2019.isel(forecast_time=0, lead_time=0,drop=True).notnull())
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## data prep: train, valid, test ## data prep: train, valid, test
[Use the hindcast period to split train and valid.](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets) Do not use the 2020 data for testing! [Use the hindcast period to split train and valid.](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets) Do not use the 2020 data for testing!
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# time is the forecast_time # time is the forecast_time
time_train_start,time_train_end='2000','2017' # train time_train_start,time_train_end='2000','2017' # train
time_valid_start,time_valid_end='2018','2019' # valid time_valid_start,time_valid_end='2018','2019' # valid
time_test = '2020' # test time_test = '2020' # test
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
dg_train = DataGenerator( dg_train = DataGenerator(
hind_2000_2019.mean('realization').sel(forecast_time=slice(time_train_start,time_train_end))[v], hind_2000_2019.mean('realization').sel(forecast_time=slice(time_train_start,time_train_end))[v],
obs_2000_2019.sel(forecast_time=slice(time_train_start,time_train_end))[v], obs_2000_2019.sel(forecast_time=slice(time_train_start,time_train_end))[v],
lead_time=lead, batch_size=bs, load=True) lead_time=lead, batch_size=bs, load=True)
``` ```
%% Output %% Output
/opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
x = np.divide(x1, x2, out) x = np.divide(x1, x2, out)
/opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
x = np.divide(x1, x2, out) x = np.divide(x1, x2, out)
/opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
x = np.divide(x1, x2, out) x = np.divide(x1, x2, out)
/opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
x = np.divide(x1, x2, out) x = np.divide(x1, x2, out)
/opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
x = np.divide(x1, x2, out)
/opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
x = np.divide(x1, x2, out) x = np.divide(x1, x2, out)
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
dg_valid = DataGenerator( dg_valid = DataGenerator(
hind_2000_2019.mean('realization').sel(forecast_time=slice(time_valid_start,time_valid_end))[v], hind_2000_2019.mean('realization').sel(forecast_time=slice(time_valid_start,time_valid_end))[v],
obs_2000_2019.sel(forecast_time=slice(time_valid_start,time_valid_end))[v], obs_2000_2019.sel(forecast_time=slice(time_valid_start,time_valid_end))[v],
lead_time=lead, batch_size=bs, shuffle=False, load=True) lead_time=lead, batch_size=bs, shuffle=False, load=True)
``` ```
%% Output %% Output
/opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
x = np.divide(x1, x2, out) x = np.divide(x1, x2, out)
/opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
x = np.divide(x1, x2, out) x = np.divide(x1, x2, out)
/opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
x = np.divide(x1, x2, out) x = np.divide(x1, x2, out)
/opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
x = np.divide(x1, x2, out) x = np.divide(x1, x2, out)
/opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
x = np.divide(x1, x2, out) x = np.divide(x1, x2, out)
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# do not use, delete? # do not use, delete?
dg_test = DataGenerator( dg_test = DataGenerator(
fct_2020.mean('realization').sel(forecast_time=time_test)[v], fct_2020.mean('realization').sel(forecast_time=time_test)[v],
obs_2020.sel(forecast_time=time_test)[v], obs_2020.sel(forecast_time=time_test)[v],
lead_time=lead, batch_size=bs, load=True, mean=dg_train.fct_mean, std=dg_train.fct_std, shuffle=False) lead_time=lead, batch_size=bs, load=True, mean=dg_train.fct_mean, std=dg_train.fct_std, shuffle=False)
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
X, y = dg_valid[0] X, y = dg_valid[0]
X.shape, y.shape X.shape, y.shape
``` ```
%% Output %% Output
((32, 121, 240), (32, 121, 240)) ((32, 121, 240), (32, 121, 240))
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# short look into training data: large biases # short look into training data: large biases
# any problem from normalizing? # any problem from normalizing?
i=4 # i=4
xr.DataArray(np.vstack([X[i],y[i]])).plot(yincrease=False, robust=True) # xr.DataArray(np.vstack([X[i],y[i]])).plot(yincrease=False, robust=True)
``` ```
%% Output
<matplotlib.collections.QuadMesh at 0x7fd217042850>
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## `fit` ## `fit`
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
cnn = keras.models.Sequential([ cnn = keras.models.Sequential([
PeriodicConv2D(filters=32, kernel_size=5, conv_kwargs={'activation':'relu'}, input_shape=(32, 64, 1)), PeriodicConv2D(filters=32, kernel_size=5, conv_kwargs={'activation':'relu'}, input_shape=(32, 64, 1)),
PeriodicConv2D(filters=1, kernel_size=5) PeriodicConv2D(filters=1, kernel_size=5)
]) ])
``` ```
%% Output %% Output
WARNING:tensorflow:AutoGraph could not transform <bound method PeriodicPadding2D.call of <WeatherBench.src.train_nn.PeriodicPadding2D object at 0x7fd217b127c0>> and will run it as-is. WARNING:tensorflow:AutoGraph could not transform <bound method PeriodicPadding2D.call of <WeatherBench.src.train_nn.PeriodicPadding2D object at 0x7f86042986a0>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Index' Cause: module 'gast' has no attribute 'Index'
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING: AutoGraph could not transform <bound method PeriodicPadding2D.call of <WeatherBench.src.train_nn.PeriodicPadding2D object at 0x7fd217b127c0>> and will run it as-is. WARNING: AutoGraph could not transform <bound method PeriodicPadding2D.call of <WeatherBench.src.train_nn.PeriodicPadding2D object at 0x7f86042986a0>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Index' Cause: module 'gast' has no attribute 'Index'
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
cnn.summary() cnn.summary()
``` ```
%% Output %% Output
Model: "sequential" Model: "sequential"
_________________________________________________________________ _________________________________________________________________
Layer (type) Output Shape Param # Layer (type) Output Shape Param #
================================================================= =================================================================
periodic_conv2d (PeriodicCon (None, 32, 64, 32) 832 periodic_conv2d (PeriodicCon (None, 32, 64, 32) 832
_________________________________________________________________ _________________________________________________________________
periodic_conv2d_1 (PeriodicC (None, 32, 64, 1) 801 periodic_conv2d_1 (PeriodicC (None, 32, 64, 1) 801
================================================================= =================================================================
Total params: 1,633 Total params: 1,633
Trainable params: 1,633 Trainable params: 1,633
Non-trainable params: 0 Non-trainable params: 0
_________________________________________________________________ _________________________________________________________________
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
cnn.compile(keras.optimizers.Adam(1e-4), 'mse') cnn.compile(keras.optimizers.Adam(1e-4), 'mse')
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import warnings import warnings
warnings.simplefilter("ignore") warnings.simplefilter("ignore")
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
cnn.fit(dg_train, epochs=2, validation_data=dg_valid) cnn.fit(dg_train, epochs=2, validation_data=dg_valid)
``` ```
%% Output %% Output
Epoch 1/2 Epoch 1/2
30/30 [==============================] - 24s 719ms/step - loss: 0.3028 - val_loss: 0.1696 30/30 [==============================] - 58s 2s/step - loss: 0.1472 - val_loss: 0.0742
Epoch 2/2 Epoch 2/2
30/30 [==============================] - 21s 697ms/step - loss: 0.1617 - val_loss: 0.0993 30/30 [==============================] - 45s 1s/step - loss: 0.0712 - val_loss: 0.0545
<tensorflow.python.keras.callbacks.History at 0x7fd2166d8d90> <tensorflow.python.keras.callbacks.History at 0x7f865c2103d0>
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## `predict` ## `predict`
Create predictions and print `mean(variable, lead_time, longitude, weighted latitude)` RPSS for all years as calculated by `skill_by_year`. Create predictions and print `mean(variable, lead_time, longitude, weighted latitude)` RPSS for all years as calculated by `skill_by_year`.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from scripts import add_valid_time_from_forecast_reference_time_and_lead_time from scripts import add_valid_time_from_forecast_reference_time_and_lead_time
def _create_predictions(model, dg, lead): def _create_predictions(model, dg, lead):
"""Create non-iterative predictions""" """Create non-iterative predictions"""
preds = model.predict(dg).squeeze() preds = model.predict(dg).squeeze()
# Unnormalize # Unnormalize
preds = preds * dg.fct_std.values + dg.fct_mean.values preds = preds * dg.fct_std.values + dg.fct_mean.values
if dg.verif_dataset: if dg.verif_dataset:
da = xr.DataArray( da = xr.DataArray(
preds, preds,
dims=['forecast_time', 'latitude', 'longitude','variable'], dims=['forecast_time', 'latitude', 'longitude','variable'],
coords={'forecast_time': dg.fct_data.forecast_time, 'latitude': dg.fct_data.latitude, coords={'forecast_time': dg.fct_data.forecast_time, 'latitude': dg.fct_data.latitude,
'longitude': dg.fct_data.longitude}, 'longitude': dg.fct_data.longitude},
).to_dataset() # doesnt work yet ).to_dataset() # doesnt work yet
else: else:
da = xr.DataArray( da = xr.DataArray(
preds, preds,
dims=['forecast_time', 'latitude', 'longitude'], dims=['forecast_time', 'latitude', 'longitude'],
coords={'forecast_time': dg.fct_data.forecast_time, 'latitude': dg.fct_data.latitude, coords={'forecast_time': dg.fct_data.forecast_time, 'latitude': dg.fct_data.latitude,
'longitude': dg.fct_data.longitude}, 'longitude': dg.fct_data.longitude},
) )
da = da.assign_coords(lead_time=lead) da = da.assign_coords(lead_time=lead)
# da = add_valid_time_from_forecast_reference_time_and_lead_time(da) # da = add_valid_time_from_forecast_reference_time_and_lead_time(da)
return da return da
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# optionally masking the ocean when making probabilistic # optionally masking the ocean when making probabilistic
mask = obs_2020.std(['lead_time','forecast_time']).notnull() mask = obs_2020.std(['lead_time','forecast_time']).notnull()
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from scripts import make_probabilistic from scripts import make_probabilistic
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc !renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc
``` ```
%% Output %% Output
Warning: Run CLI commands only from project's root directory. Warning: Run CLI commands only from project's root directory.
 
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
cache_path='../data' cache_path='../data'
tercile_file = f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc' tercile_file = f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc'
tercile_edges = xr.open_dataset(tercile_file) tercile_edges = xr.open_dataset(tercile_file)
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# this is not useful but results have expected dimensions # this is not useful but results have expected dimensions
# actually train for each lead_time # actually train for each lead_time
def create_predictions(cnn, fct, obs, time): def create_predictions(cnn, fct, obs, time):
preds_test=[] preds_test=[]
for lead in fct.lead_time: for lead in fct.lead_time:
dg = DataGenerator(fct.mean('realization').sel(forecast_time=time)[v], dg = DataGenerator(fct.mean('realization').sel(forecast_time=time)[v],
obs.sel(forecast_time=time)[v], obs.sel(forecast_time=time)[v],
lead_time=lead, batch_size=bs, mean=dg_train.fct_mean, std=dg_train.fct_std, shuffle=False) lead_time=lead, batch_size=bs, mean=dg_train.fct_mean, std=dg_train.fct_std, shuffle=False)
preds_test.append(_create_predictions(cnn, dg, lead)) preds_test.append(_create_predictions(cnn, dg, lead))
preds_test = xr.concat(preds_test, 'lead_time') preds_test = xr.concat(preds_test, 'lead_time')
preds_test['lead_time'] = fct.lead_time preds_test['lead_time'] = fct.lead_time
# add valid_time coord # add valid_time coord
preds_test = add_valid_time_from_forecast_reference_time_and_lead_time(preds_test) preds_test = add_valid_time_from_forecast_reference_time_and_lead_time(preds_test)
preds_test = preds_test.to_dataset(name=v) preds_test = preds_test.to_dataset(name=v)
# add fake var # add fake var
preds_test['tp'] = preds_test['t2m'] preds_test['tp'] = preds_test['t2m']
# make probabilistic # make probabilistic
preds_test = make_probabilistic(preds_test.expand_dims('realization'), tercile_edges, mask=mask) preds_test = make_probabilistic(preds_test.expand_dims('realization'), tercile_edges, mask=mask)
return preds_test return preds_test
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### `predict` training period in-sample ### `predict` training period in-sample
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
!renku storage pull ../data/forecast-like-observations_2020_biweekly_terciled.nc !renku storage pull ../data/forecast-like-observations_2020_biweekly_terciled.nc
``` ```
%% Output %% Output
Warning: Run CLI commands only from project's root directory. Warning: Run CLI commands only from project's root directory.
 
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr !renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr
``` ```
%% Output %% Output
Warning: Run CLI commands only from project's root directory. Warning: Run CLI commands only from project's root directory.
 
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from scripts import skill_by_year from scripts import skill_by_year
import os import os
if os.environ['HOME'] == '/home/jovyan': if os.environ['HOME'] == '/home/jovyan':
import pandas as pd import pandas as pd
# assume on renku with small memory # assume on renku with small memory
step = 2 step = 2
skill_list = [] skill_list = []
for year in np.arange(int(time_train_start), int(time_train_end) -1, step): # loop over years to consume less memory on renku for year in np.arange(int(time_train_start), int(time_train_end) -1, step): # loop over years to consume less memory on renku
preds_is = create_predictions(cnn, hind_2000_2019, obs_2000_2019, time=slice(str(year), str(year+step-1))).compute() preds_is = create_predictions(cnn, hind_2000_2019, obs_2000_2019, time=slice(str(year), str(year+step-1))).compute()
skill_list.append(skill_by_year(preds_is)) skill_list.append(skill_by_year(preds_is))
skill = pd.concat(skill_list) skill = pd.concat(skill_list)
else: # with larger memory, simply do else: # with larger memory, simply do
preds_is = create_predictions(cnn, hind_2000_2019, obs_2000_2019, time=slice(time_train_start, time_train_end)) preds_is = create_predictions(cnn, hind_2000_2019, obs_2000_2019, time=slice(time_train_start, time_train_end))
skill = skill_by_year(preds_is) skill = skill_by_year(preds_is)
skill skill
``` ```
%% Output %% Output
RPSS RPSS
year year
2000 -1.293103 2000 -0.862483
2001 -1.446606 2001 -1.015485
2002 -1.494487 2002 -1.101022
2003 -1.484899 2003 -1.032647
2004 -1.421862 2004 -1.056348
2005 -1.549783 2005 -1.165675
2006 -1.508035 2006 -1.057217
2007 -1.502208 2007 -1.170849
2008 -1.493371 2008 -1.049785
2009 -1.568156 2009 -1.169108
2010 -1.519528 2010 -1.130845
2011 -1.389702 2011 -1.052670
2012 -1.499871 2012 -1.126449
2013 -1.549204 2013 -1.126930
2014 -1.500869 2014 -1.095896
2015 -1.506727 2015 -1.117486
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### `predict` validation period out-of-sample ### `predict` validation period out-of-sample
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
preds_os = create_predictions(cnn, hind_2000_2019, obs_2000_2019, time=slice(time_valid_start, time_valid_end)) preds_os = create_predictions(cnn, hind_2000_2019, obs_2000_2019, time=slice(time_valid_start, time_valid_end))
skill_by_year(preds_os) skill_by_year(preds_os)
``` ```
%% Output %% Output
RPSS RPSS
year year
2018 -1.432631 2018 -1.099744
2019 -1.544451 2019 -1.172401
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### `predict` test ### `predict` test
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
preds_test = create_predictions(cnn, fct_2020, obs_2020, time=time_test) preds_test = create_predictions(cnn, fct_2020, obs_2020, time=time_test)
skill_by_year(preds_test) skill_by_year(preds_test)
``` ```
%% Output %% Output
RPSS RPSS
year year
2020 -1.4709 2020 -1.076834
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Submission # Submission
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from scripts import assert_predictions_2020 from scripts import assert_predictions_2020
assert_predictions_2020(preds_test) assert_predictions_2020(preds_test)
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
preds_test.to_netcdf('../submissions/ML_prediction_2020.nc') preds_test.to_netcdf('../submissions/ML_prediction_2020.nc')
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
#!git add ../submissions/ML_prediction_2020.nc # !git add ../submissions/ML_prediction_2020.nc
#!git add ML_train_and_prediction.ipynb # !git add ML_train_and_prediction.ipynb
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
#!git commit -m "template_test commit message" # whatever message you want # !git commit -m "template_test commit message" # whatever message you want
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
#!git tag "submission-template_test-0.0.1" # if this is to be checked by scorer, only the last submitted==tagged version will be considered # !git tag "submission-template_test-0.0.1" # if this is to be checked by scorer, only the last submitted==tagged version will be considered
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
#!git push --tags # !git push --tags
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Reproducibility # Reproducibility
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## memory ## memory
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# https://phoenixnap.com/kb/linux-commands-check-memory-usage # https://phoenixnap.com/kb/linux-commands-check-memory-usage
!free -g !free -g
``` ```
%% Output %% Output
total used free shared buff/cache available total used free shared buff/cache available
Mem: 47 10 24 0 11 36 Mem: 31 7 11 0 12 24
Swap: 0 0 0 Swap: 0 0 0
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## CPU ## CPU
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
!lscpu !lscpu
``` ```
%% Output %% Output
Architecture: x86_64 Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian Byte Order: Little Endian
Address sizes: 40 bits physical, 48 bits virtual Address sizes: 40 bits physical, 48 bits virtual
CPU(s): 16 CPU(s): 8
On-line CPU(s) list: 0-15 On-line CPU(s) list: 0-7
Thread(s) per core: 1 Thread(s) per core: 1
Core(s) per socket: 1 Core(s) per socket: 1
Socket(s): 16 Socket(s): 8
NUMA node(s): 1 NUMA node(s): 1
Vendor ID: GenuineIntel Vendor ID: GenuineIntel
CPU family: 6 CPU family: 6
Model: 61 Model: 85
Model name: Intel Core Processor (Broadwell, IBRS) Model name: Intel Xeon Processor (Skylake, IBRS)
Stepping: 2 Stepping: 4
CPU MHz: 2194.916 CPU MHz: 2095.078
BogoMIPS: 4389.83 BogoMIPS: 4190.15
Virtualization: VT-x Virtualization: VT-x
Hypervisor vendor: KVM Hypervisor vendor: KVM
Virtualization type: full Virtualization type: full
L1d cache: 512 KiB L1d cache: 256 KiB
L1i cache: 512 KiB L1i cache: 256 KiB
L2 cache: 64 MiB L2 cache: 32 MiB
NUMA node0 CPU(s): 0-15 L3 cache: 128 MiB
NUMA node0 CPU(s): 0-7
Vulnerability Itlb multihit: KVM: Mitigation: Split huge pages Vulnerability Itlb multihit: KVM: Mitigation: Split huge pages
Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cach Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cach
e flushes, SMT disabled e flushes, SMT disabled
Vulnerability Mds: Mitigation; Clear CPU buffers; SMT Host state u Vulnerability Mds: Vulnerable: Clear CPU buffers attempted, no mic
nknown rocode; SMT Host state unknown
Vulnerability Meltdown: Mitigation; PTI Vulnerability Meltdown: Mitigation; PTI
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled v Vulnerability Spec store bypass: Vulnerable
ia prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user
pointer sanitization pointer sanitization
Vulnerability Spectre v2: Mitigation; Full generic retpoline, IBPB condit Vulnerability Spectre v2: Mitigation; Full generic retpoline, IBPB condit
ional, IBRS_FW, STIBP disabled, RSB filling ional, IBRS_FW, STIBP disabled, RSB filling
Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT Host state u Vulnerability Srbds: Not affected
nknown Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtr Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtr
r pge mca cmov pat pse36 clflush mmx fxsr sse s r pge mca cmov pat pse36 clflush mmx fxsr sse s
se2 ss syscall nx pdpe1gb rdtscp lm constant_ts se2 syscall nx pdpe1gb rdtscp lm constant_tsc r
c rep_good nopl cpuid tsc_known_freq pni pclmul ep_good nopl xtopology cpuid tsc_known_freq pni
qdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2api pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_
c movbe popcnt tsc_deadline_timer aes xsave avx 2 x2apic movbe popcnt tsc_deadline_timer aes xs
f16c rdrand hypervisor lahf_lm abm 3dnowprefet ave avx f16c rdrand hypervisor lahf_lm abm 3dno
ch cpuid_fault invpcid_single pti ssbd ibrs ibp wprefetch cpuid_fault invpcid_single pti ibrs i
b tpr_shadow vnmi flexpriority ept vpid fsgsbas bpb tpr_shadow vnmi flexpriority ept vpid ept_a
e bmi1 hle avx2 smep bmi2 erms invpcid rtm rdse d fsgsbase bmi1 avx2 smep bmi2 erms invpcid avx
ed adx smap xsaveopt arat md_clear 512f avx512dq rdseed adx smap clwb avx512cd avx
512bw avx512vl xsaveopt xsavec xgetbv1 arat pku
ospke
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## software ## software
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
!conda list !conda list
``` ```
%% Output %% Output
# packages in environment at /opt/conda: # packages in environment at /opt/conda:
# #
# Name Version Build Channel # Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge _libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 1_gnu conda-forge _openmp_mutex 4.5 1_gnu conda-forge
_pytorch_select 0.1 cpu_0 defaults _pytorch_select 0.1 cpu_0 defaults
_tflow_select 2.3.0 mkl defaults _tflow_select 2.3.0 mkl defaults
absl-py 0.12.0 py38h06a4308_0 defaults absl-py 0.13.0 py38h06a4308_0 defaults
aiobotocore 1.2.2 pyhd3eb1b0_0 defaults aiobotocore 1.4.1 pyhd3eb1b0_0 defaults
aiohttp 3.7.4.post0 pypi_0 pypi aiohttp 3.7.4.post0 py38h7f8727e_2 defaults
aioitertools 0.7.1 pyhd3eb1b0_0 defaults aioitertools 0.7.1 pyhd3eb1b0_0 defaults
alembic 1.4.3 pyh9f0ad1d_0 conda-forge alembic 1.4.3 pyh9f0ad1d_0 conda-forge
ansiwrap 0.8.4 pypi_0 pypi ansiwrap 0.8.4 pypi_0 pypi
appdirs 1.4.4 pypi_0 pypi appdirs 1.4.4 pypi_0 pypi
argcomplete 1.12.2 pypi_0 pypi argcomplete 1.12.3 pypi_0 pypi
argon2-cffi 20.1.0 py38h497a2fe_2 conda-forge argon2-cffi 20.1.0 py38h497a2fe_2 conda-forge
argparse 1.4.0 pypi_0 pypi argparse 1.4.0 pypi_0 pypi
asciitree 0.3.3 py_2 defaults asciitree 0.3.3 py_2 defaults
astor 0.8.1 py38h06a4308_0 defaults astor 0.8.1 py38h06a4308_0 defaults
astunparse 1.6.3 py_0 defaults astunparse 1.6.3 py_0 defaults
async-timeout 3.0.1 pypi_0 pypi async-timeout 3.0.1 pypi_0 pypi
async_generator 1.10 py_0 conda-forge async_generator 1.10 py_0 conda-forge
attrs 20.3.0 pyhd3deb0d_0 conda-forge attrs 21.2.0 pypi_0 pypi
backcall 0.2.0 pyh9f0ad1d_0 conda-forge backcall 0.2.0 pyh9f0ad1d_0 conda-forge
backports 1.0 py_2 conda-forge backports 1.0 py_2 conda-forge
backports.functools_lru_cache 1.6.1 py_0 conda-forge backports.functools_lru_cache 1.6.1 py_0 conda-forge
beautifulsoup4 4.9.3 pyha847dfd_0 defaults bagit 1.8.1 pypi_0 pypi
beautifulsoup4 4.10.0 pyh06a4308_0 defaults
binutils_impl_linux-64 2.35.1 h193b22a_1 conda-forge binutils_impl_linux-64 2.35.1 h193b22a_1 conda-forge
binutils_linux-64 2.35 h67ddf6f_30 conda-forge binutils_linux-64 2.35 h67ddf6f_30 conda-forge
black 20.8b1 pypi_0 pypi black 20.8b1 pypi_0 pypi
blas 1.0 mkl defaults blas 1.0 mkl defaults
bleach 3.2.1 pyh9f0ad1d_0 conda-forge bleach 3.2.1 pyh9f0ad1d_0 conda-forge
blinker 1.4 py_1 conda-forge blinker 1.4 py_1 conda-forge
bokeh 2.3.2 py38h06a4308_0 defaults bokeh 2.3.3 py38h06a4308_0 defaults
botocore 1.20.84 pyhd3eb1b0_1 defaults botocore 1.20.106 pyhd3eb1b0_0 defaults
bottleneck 1.3.2 py38heb32a55_1 defaults bottleneck 1.3.2 py38heb32a55_1 defaults
bracex 2.1.1 pypi_0 pypi
branca 0.3.1 pypi_0 pypi branca 0.3.1 pypi_0 pypi
brotli 1.0.9 he6710b0_2 defaults
brotlipy 0.7.0 py38h497a2fe_1001 conda-forge brotlipy 0.7.0 py38h497a2fe_1001 conda-forge
bzip2 1.0.8 h7f98852_4 conda-forge bzip2 1.0.8 h7f98852_4 conda-forge
c-ares 1.17.1 h36c2ea0_0 conda-forge c-ares 1.17.1 h36c2ea0_0 conda-forge
ca-certificates 2021.5.25 h06a4308_1 defaults ca-certificates 2021.7.5 h06a4308_1 defaults
cachetools 4.2.2 pyhd3eb1b0_0 defaults cachecontrol 0.12.6 pypi_0 pypi
cachetools 4.2.4 pypi_0 pypi
calamus 0.3.12 pypi_0 pypi
cdsapi 0.5.1 pypi_0 pypi cdsapi 0.5.1 pypi_0 pypi
certifi 2021.5.30 py38h06a4308_0 defaults certifi 2021.5.30 pypi_0 pypi
certipy 0.1.3 py_0 conda-forge certipy 0.1.3 py_0 conda-forge
cffi 1.14.4 py38ha65f79e_1 conda-forge cffi 1.14.6 pypi_0 pypi
cfgrib 0.9.9.0 pyhd8ed1ab_1 conda-forge cfgrib 0.9.9.0 pyhd8ed1ab_1 conda-forge
cftime 1.5.0 py38h6323ea4_0 defaults cftime 1.5.0 py38h6323ea4_0 defaults
chardet 4.0.0 py38h578d9bd_1 conda-forge chardet 3.0.4 pypi_0 pypi
click 7.1.2 pypi_0 pypi click 7.1.2 pypi_0 pypi
climetlab 0.7.2 pypi_0 pypi click-completion 0.5.2 pypi_0 pypi
climetlab-s2s-ai-challenge 0.6.7 pypi_0 pypi click-option-group 0.5.3 pypi_0 pypi
cloudpickle 1.6.0 py_0 defaults click-plugins 1.1.1 pypi_0 pypi
climetlab 0.8.31 pypi_0 pypi
climetlab-s2s-ai-challenge 0.8.0 pypi_0 pypi
cloudpickle 2.0.0 pyhd3eb1b0_0 defaults
colorama 0.4.4 pypi_0 pypi colorama 0.4.4 pypi_0 pypi
coloredlogs 15.0.1 pypi_0 pypi
commonmark 0.9.1 pypi_0 pypi
conda 4.9.2 py38h578d9bd_0 conda-forge conda 4.9.2 py38h578d9bd_0 conda-forge
conda-package-handling 1.7.2 py38h8df0ef7_0 conda-forge conda-package-handling 1.7.2 py38h8df0ef7_0 conda-forge
configargparse 1.4.1 pypi_0 pypi configargparse 1.5.2 pypi_0 pypi
configurable-http-proxy 1.3.0 0 conda-forge configurable-http-proxy 1.3.0 0 conda-forge
coverage 5.5 py38h27cfd23_2 defaults coverage 5.5 py38h27cfd23_2 defaults
cryptography 3.3.1 py38h2b97feb_1 conda-forge cryptography 3.4.8 pypi_0 pypi
curl 7.71.1 he644dc0_8 conda-forge curl 7.71.1 he644dc0_8 conda-forge
cwlgen 0.4.2 pypi_0 pypi
cwltool 3.1.20211004060744 pypi_0 pypi
cycler 0.10.0 py38_0 defaults cycler 0.10.0 py38_0 defaults
cython 0.29.23 py38h2531618_0 defaults cython 0.29.24 py38h295c915_0 defaults
cytoolz 0.11.0 py38h7b6447c_0 defaults cytoolz 0.11.0 py38h7b6447c_0 defaults
dask 2021.5.1 pyhd3eb1b0_0 defaults dask 2021.8.1 pyhd3eb1b0_0 defaults
dask-core 2021.5.1 pyhd3eb1b0_0 defaults dask-core 2021.8.1 pyhd3eb1b0_0 defaults
dataclasses 0.8 pyh6d0b6a4_7 defaults
decorator 4.4.2 py_0 conda-forge decorator 4.4.2 py_0 conda-forge
defusedxml 0.6.0 py_0 conda-forge defusedxml 0.6.0 py_0 conda-forge
distributed 2021.5.1 py38h06a4308_0 defaults distributed 2021.8.1 py38h06a4308_0 defaults
distro 1.5.0 pypi_0 pypi distro 1.5.0 pypi_0 pypi
docopt 0.6.2 py38h06a4308_0 defaults docopt 0.6.2 py38h06a4308_0 defaults
eccodes 2.18.0 hf05d9b7_0 conda-forge eccodes 2.21.0 ha0e6eb6_0 conda-forge
ecmwf-api-client 1.6.1 pypi_0 pypi ecmwf-api-client 1.6.1 pypi_0 pypi
ecmwflibs 0.3.7 pypi_0 pypi ecmwflibs 0.3.14 pypi_0 pypi
entrypoints 0.3 pyhd8ed1ab_1003 conda-forge entrypoints 0.3 pyhd8ed1ab_1003 conda-forge
fasteners 0.16.1 pyhd3eb1b0_0 defaults environ-config 21.2.0 pypi_0 pypi
fasteners 0.16.3 pyhd3eb1b0_0 defaults
filelock 3.0.12 pypi_0 pypi
findlibs 0.0.2 pypi_0 pypi findlibs 0.0.2 pypi_0 pypi
folium 0.12.1 pypi_0 pypi fonttools 4.25.0 pyhd3eb1b0_0 defaults
freetype 2.10.4 h5ab3b9f_0 defaults freetype 2.10.4 h5ab3b9f_0 defaults
fsspec 0.9.0 pyhd3eb1b0_0 defaults frozendict 2.0.6 pypi_0 pypi
gast 0.4.0 py_0 defaults fsspec 2021.7.0 pyhd3eb1b0_0 defaults
gast 0.4.0 pyhd3eb1b0_0 defaults
gcc_impl_linux-64 9.3.0 h70c0ae5_18 conda-forge gcc_impl_linux-64 9.3.0 h70c0ae5_18 conda-forge
gcc_linux-64 9.3.0 hf25ea35_30 conda-forge gcc_linux-64 9.3.0 hf25ea35_30 conda-forge
gitdb 4.0.7 pypi_0 pypi gitdb 4.0.7 pypi_0 pypi
gitpython 3.1.14 pypi_0 pypi gitpython 3.1.14 pypi_0 pypi
google-auth 1.30.1 pyhd3eb1b0_0 defaults google-auth 1.33.0 pyhd3eb1b0_0 defaults
google-auth-oauthlib 0.4.4 pyhd3eb1b0_0 defaults google-auth-oauthlib 0.4.4 pyhd3eb1b0_0 defaults
google-pasta 0.2.0 py_0 defaults google-pasta 0.2.0 pyhd3eb1b0_0 defaults
grpcio 1.36.1 py38h2157cd5_1 defaults grpcio 1.36.1 py38h2157cd5_1 defaults
gxx_impl_linux-64 9.3.0 hd87eabc_18 conda-forge gxx_impl_linux-64 9.3.0 hd87eabc_18 conda-forge
gxx_linux-64 9.3.0 h3fbe746_30 conda-forge gxx_linux-64 9.3.0 h3fbe746_30 conda-forge
h5netcdf 0.11.0 pyhd8ed1ab_0 conda-forge h5netcdf 0.11.0 pyhd8ed1ab_0 conda-forge
h5py 2.10.0 py38hd6299e0_1 defaults h5py 2.10.0 py38hd6299e0_1 defaults
hdf4 4.2.13 h3ca952b_2 defaults hdf4 4.2.13 h3ca952b_2 defaults
hdf5 1.10.6 nompi_h3c11f04_101 conda-forge hdf5 1.10.6 nompi_h6a2412b_1114 conda-forge
heapdict 1.0.1 py_0 defaults heapdict 1.0.1 pyhd3eb1b0_0 defaults
humanfriendly 10.0 pypi_0 pypi
humanize 3.7.1 pypi_0 pypi
icu 68.1 h58526e2_0 conda-forge icu 68.1 h58526e2_0 conda-forge
idna 2.10 pyh9f0ad1d_0 conda-forge idna 2.10 pyh9f0ad1d_0 conda-forge
importlib-metadata 3.4.0 py38h578d9bd_0 conda-forge importlib-metadata 3.4.0 py38h578d9bd_0 conda-forge
importlib_metadata 3.4.0 hd8ed1ab_0 conda-forge importlib_metadata 3.4.0 hd8ed1ab_0 conda-forge
intake 0.6.2 pyhd3eb1b0_0 defaults intake 0.6.3 pyhd3eb1b0_0 defaults
intake-xarray 0.5.0 pyhd3eb1b0_0 defaults intake-xarray 0.5.0 pyhd3eb1b0_0 defaults
intel-openmp 2019.4 243 defaults intel-openmp 2019.4 243 defaults
ipykernel 5.4.2 py38h81c977d_0 conda-forge ipykernel 5.4.2 py38h81c977d_0 conda-forge
ipython 7.19.0 py38h81c977d_2 conda-forge ipython 7.19.0 py38h81c977d_2 conda-forge
ipython_genutils 0.2.0 py_1 conda-forge ipython_genutils 0.2.0 py_1 conda-forge
isodate 0.6.0 pypi_0 pypi
jasper 1.900.1 hd497a04_4 defaults jasper 1.900.1 hd497a04_4 defaults
jedi 0.17.2 py38h578d9bd_1 conda-forge jedi 0.17.2 py38h578d9bd_1 conda-forge
jinja2 2.11.2 pyh9f0ad1d_0 conda-forge jellyfish 0.8.8 pypi_0 pypi
jmespath 0.10.0 py_0 defaults jinja2 3.0.1 pypi_0 pypi
jmespath 0.10.0 pyhd3eb1b0_0 defaults
joblib 1.0.1 pyhd3eb1b0_0 defaults joblib 1.0.1 pyhd3eb1b0_0 defaults
jpeg 9d h36c2ea0_0 conda-forge jpeg 9d h7f8727e_0 defaults
json5 0.9.5 pyh9f0ad1d_0 conda-forge json5 0.9.5 pyh9f0ad1d_0 conda-forge
jsonschema 3.2.0 py_2 conda-forge jsonschema 3.2.0 py_2 conda-forge
jupyter-server-proxy 1.6.0 pypi_0 pypi jupyter-server-proxy 1.6.0 pypi_0 pypi
jupyter_client 6.1.11 pyhd8ed1ab_1 conda-forge jupyter_client 6.1.11 pyhd8ed1ab_1 conda-forge
jupyter_core 4.7.0 py38h578d9bd_0 conda-forge jupyter_core 4.7.0 py38h578d9bd_0 conda-forge
jupyter_telemetry 0.1.0 pyhd8ed1ab_1 conda-forge jupyter_telemetry 0.1.0 pyhd8ed1ab_1 conda-forge
jupyterhub 1.2.2 pypi_0 pypi jupyterhub 1.2.2 pypi_0 pypi
jupyterlab 2.2.9 py_0 conda-forge jupyterlab 2.2.9 py_0 conda-forge
jupyterlab-git 0.23.3 pypi_0 pypi jupyterlab-git 0.23.3 pypi_0 pypi
jupyterlab_pygments 0.1.2 pyh9f0ad1d_0 conda-forge jupyterlab_pygments 0.1.2 pyh9f0ad1d_0 conda-forge
jupyterlab_server 1.2.0 py_0 conda-forge jupyterlab_server 1.2.0 py_0 conda-forge
keras-preprocessing 1.1.2 pyhd3eb1b0_0 defaults keras-preprocessing 1.1.2 pyhd3eb1b0_0 defaults
kernel-headers_linux-64 2.6.32 h77966d4_13 conda-forge kernel-headers_linux-64 2.6.32 h77966d4_13 conda-forge
kiwisolver 1.3.1 py38h2531618_0 defaults kiwisolver 1.3.1 py38h2531618_0 defaults
krb5 1.17.2 h926e7f8_0 conda-forge krb5 1.17.2 h926e7f8_0 conda-forge
lazy-object-proxy 1.6.0 pypi_0 pypi
lcms2 2.12 h3be6417_0 defaults lcms2 2.12 h3be6417_0 defaults
ld_impl_linux-64 2.35.1 hea4e1c9_1 conda-forge ld_impl_linux-64 2.35.1 hea4e1c9_1 conda-forge
libaec 1.0.4 he6710b0_1 defaults libaec 1.0.4 he6710b0_1 defaults
libblas 3.9.0 1_h86c2bf4_netlib conda-forge
libcblas 3.9.0 5_h92ddd45_netlib conda-forge
libcurl 7.71.1 hcdd3856_8 conda-forge libcurl 7.71.1 hcdd3856_8 conda-forge
libedit 3.1.20191231 he28a2e2_2 conda-forge libedit 3.1.20191231 he28a2e2_2 conda-forge
libev 4.33 h516909a_1 conda-forge libev 4.33 h516909a_1 conda-forge
libffi 3.3 h58526e2_2 conda-forge libffi 3.3 h58526e2_2 conda-forge
libgcc-devel_linux-64 9.3.0 h7864c58_18 conda-forge libgcc-devel_linux-64 9.3.0 h7864c58_18 conda-forge
libgcc-ng 9.3.0 h2828fa1_18 conda-forge libgcc-ng 9.3.0 h2828fa1_18 conda-forge
libgfortran-ng 7.3.0 hdf63c60_0 defaults libgfortran-ng 9.3.0 ha5ec8a7_17 defaults
libgfortran5 9.3.0 ha5ec8a7_17 defaults
libgomp 9.3.0 h2828fa1_18 conda-forge libgomp 9.3.0 h2828fa1_18 conda-forge
liblapack 3.9.0 5_h92ddd45_netlib conda-forge
libllvm10 10.0.1 hbcb73fb_5 defaults libllvm10 10.0.1 hbcb73fb_5 defaults
libmklml 2019.0.5 0 defaults libmklml 2019.0.5 0 defaults
libnetcdf 4.7.4 nompi_h56d31a8_107 conda-forge libnetcdf 4.7.4 nompi_h56d31a8_107 conda-forge
libnghttp2 1.41.0 h8cfc5f6_2 conda-forge libnghttp2 1.41.0 h8cfc5f6_2 conda-forge
libpng 1.6.37 hbc83047_0 defaults libpng 1.6.37 hbc83047_0 defaults
libprotobuf 3.14.0 h8c45485_0 defaults libprotobuf 3.17.2 h4ff587b_1 defaults
libsodium 1.0.18 h36c2ea0_1 conda-forge libsodium 1.0.18 h36c2ea0_1 conda-forge
libssh2 1.9.0 hab1572f_5 conda-forge libssh2 1.9.0 hab1572f_5 conda-forge
libstdcxx-devel_linux-64 9.3.0 hb016644_18 conda-forge libstdcxx-devel_linux-64 9.3.0 hb016644_18 conda-forge
libstdcxx-ng 9.3.0 h6de172a_18 conda-forge libstdcxx-ng 9.3.0 h6de172a_18 conda-forge
libtiff 4.2.0 h85742a9_0 defaults libtiff 4.2.0 h85742a9_0 defaults
libuv 1.40.0 h7f98852_0 conda-forge libuv 1.40.0 h7f98852_0 conda-forge
libwebp-base 1.2.0 h27cfd23_0 defaults libwebp-base 1.2.0 h27cfd23_0 defaults
llvmlite 0.36.0 py38h612dafd_4 defaults llvmlite 0.36.0 py38h612dafd_4 defaults
locket 0.2.1 py38h06a4308_1 defaults locket 0.2.1 py38h06a4308_1 defaults
lz4-c 1.9.3 h2531618_0 defaults lockfile 0.12.2 pypi_0 pypi
lxml 4.6.3 pypi_0 pypi
lz4-c 1.9.3 h295c915_1 defaults
magics 1.5.6 pypi_0 pypi magics 1.5.6 pypi_0 pypi
mako 1.1.4 pyh44b312d_0 conda-forge mako 1.1.4 pyh44b312d_0 conda-forge
markdown 3.3.4 py38h06a4308_0 defaults markdown 3.3.4 py38h06a4308_0 defaults
markupsafe 1.1.1 py38h497a2fe_3 conda-forge markupsafe 2.0.1 pypi_0 pypi
matplotlib-base 3.3.4 py38h62a2d02_0 defaults marshmallow 3.13.0 pypi_0 pypi
matplotlib-base 3.4.2 py38hab158f2_0 defaults
mistune 0.8.4 py38h497a2fe_1003 conda-forge mistune 0.8.4 py38h497a2fe_1003 conda-forge
mkl 2020.2 256 defaults mkl 2020.2 256 defaults
mkl-service 2.3.0 py38he904b0f_0 defaults mkl-service 2.3.0 py38he904b0f_0 defaults
mkl_fft 1.3.0 py38h54f3939_0 defaults mkl_fft 1.3.0 py38h54f3939_0 defaults
mkl_random 1.1.1 py38h0573a6f_0 defaults mkl_random 1.1.1 py38h0573a6f_0 defaults
msgpack-python 1.0.2 py38hff7bd54_1 defaults msgpack-python 1.0.2 py38hff7bd54_1 defaults
multidict 5.1.0 py38h27cfd23_2 defaults multidict 5.1.0 py38h27cfd23_2 defaults
munkres 1.1.4 py_0 defaults
mypy-extensions 0.4.3 pypi_0 pypi mypy-extensions 0.4.3 pypi_0 pypi
nbclient 0.5.0 pypi_0 pypi nbclient 0.5.0 pypi_0 pypi
nbconvert 6.0.7 py38h578d9bd_3 conda-forge nbconvert 6.0.7 py38h578d9bd_3 conda-forge
nbdime 2.1.0 pypi_0 pypi nbdime 2.1.0 pypi_0 pypi
nbformat 5.1.2 pyhd8ed1ab_1 conda-forge nbformat 5.1.2 pyhd8ed1ab_1 conda-forge
nbresuse 0.4.0 pypi_0 pypi nbresuse 0.4.0 pypi_0 pypi
nc-time-axis 1.2.0 py_1 conda-forge nc-time-axis 1.3.1 pyhd8ed1ab_2 conda-forge
ncurses 6.2 h58526e2_4 conda-forge ncurses 6.2 h58526e2_4 conda-forge
ndg-httpsclient 0.5.1 pypi_0 pypi
nest-asyncio 1.4.3 pyhd8ed1ab_0 conda-forge nest-asyncio 1.4.3 pyhd8ed1ab_0 conda-forge
netcdf4 1.5.4 pypi_0 pypi netcdf4 1.5.4 pypi_0 pypi
networkx 2.6.3 pypi_0 pypi
ninja 1.10.2 hff7bd54_1 defaults ninja 1.10.2 hff7bd54_1 defaults
nodejs 15.3.0 h25f6087_0 conda-forge nodejs 15.3.0 h25f6087_0 conda-forge
notebook 6.2.0 py38h578d9bd_0 conda-forge notebook 6.2.0 py38h578d9bd_0 conda-forge
numba 0.53.1 py38ha9443f7_0 defaults numba 0.53.1 py38ha9443f7_0 defaults
numcodecs 0.7.3 py38h2531618_0 defaults numcodecs 0.8.0 py38h2531618_0 defaults
numexpr 2.7.3 py38hb2eb853_0 defaults
numpy 1.19.2 py38h54aff64_0 defaults numpy 1.19.2 py38h54aff64_0 defaults
numpy-base 1.19.2 py38hfa32c7d_0 defaults numpy-base 1.19.2 py38hfa32c7d_0 defaults
oauthlib 3.0.1 py_0 conda-forge oauthlib 3.0.1 py_0 conda-forge
olefile 0.46 py_0 defaults olefile 0.46 pyhd3eb1b0_0 defaults
openssl 1.1.1k h27cfd23_0 defaults openjpeg 2.4.0 h3ad879b_0 defaults
openssl 1.1.1l h7f8727e_0 defaults
opt_einsum 3.3.0 pyhd3eb1b0_1 defaults opt_einsum 3.3.0 pyhd3eb1b0_1 defaults
owlrl 5.2.3 pypi_0 pypi
packaging 20.8 pyhd3deb0d_0 conda-forge packaging 20.8 pyhd3deb0d_0 conda-forge
pamela 1.0.0 py_0 conda-forge pamela 1.0.0 py_0 conda-forge
pandas 1.2.4 py38h2531618_0 defaults pandas 1.3.2 py38h8c16a72_0 defaults
pandoc 2.11.3.2 h7f98852_0 conda-forge pandoc 2.11.3.2 h7f98852_0 conda-forge
pandocfilters 1.4.2 py_1 conda-forge pandocfilters 1.4.2 py_1 conda-forge
papermill 2.3.1 pypi_0 pypi papermill 2.3.1 pypi_0 pypi
parso 0.7.1 pyh9f0ad1d_0 conda-forge parso 0.7.1 pyh9f0ad1d_0 conda-forge
partd 1.2.0 pyhd3eb1b0_0 defaults partd 1.2.0 pyhd3eb1b0_0 defaults
pathspec 0.8.1 pypi_0 pypi pathspec 0.9.0 pypi_0 pypi
pdbufr 0.8.2 pypi_0 pypi patool 1.12 pypi_0 pypi
pdbufr 0.9.0 pypi_0 pypi
pexpect 4.8.0 pyh9f0ad1d_2 conda-forge pexpect 4.8.0 pyh9f0ad1d_2 conda-forge
pickleshare 0.7.5 py_1003 conda-forge pickleshare 0.7.5 py_1003 conda-forge
pillow 8.2.0 py38he98fc37_0 defaults pillow 8.3.1 py38h2c7a002_0 defaults
pip 21.0.1 pypi_0 pypi pip 21.0.1 pypi_0 pypi
pipx 0.16.1.0 pypi_0 pypi pipx 0.16.1.0 pypi_0 pypi
pluggy 0.13.1 pypi_0 pypi
portalocker 2.3.2 pypi_0 pypi
powerline-shell 0.7.0 pypi_0 pypi powerline-shell 0.7.0 pypi_0 pypi
prometheus_client 0.9.0 pyhd3deb0d_0 conda-forge prometheus_client 0.9.0 pyhd3deb0d_0 conda-forge
prompt-toolkit 3.0.10 pyha770c72_0 conda-forge prompt-toolkit 3.0.10 pyha770c72_0 conda-forge
properscoring 0.1 py_0 conda-forge properscoring 0.1 py_0 conda-forge
protobuf 3.14.0 py38h2531618_1 defaults protobuf 3.17.2 py38h295c915_0 defaults
prov 1.5.1 pypi_0 pypi
psutil 5.8.0 py38h27cfd23_1 defaults psutil 5.8.0 py38h27cfd23_1 defaults
ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge
pyasn1 0.4.8 py_0 defaults pyasn1 0.4.8 pyhd3eb1b0_0 defaults
pyasn1-modules 0.2.8 py_0 defaults pyasn1-modules 0.2.8 py_0 defaults
pycosat 0.6.3 py38h497a2fe_1006 conda-forge pycosat 0.6.3 py38h497a2fe_1006 conda-forge
pycparser 2.20 pyh9f0ad1d_2 conda-forge pycparser 2.20 pyh9f0ad1d_2 conda-forge
pycurl 7.43.0.6 py38h996a351_1 conda-forge pycurl 7.43.0.6 py38h996a351_1 conda-forge
pydap 3.2.2 pyh9f0ad1d_1001 conda-forge pydap 3.2.2 pyh9f0ad1d_1001 conda-forge
pygments 2.7.4 pyhd8ed1ab_0 conda-forge pydot 1.4.2 pypi_0 pypi
pyjwt 2.0.1 pyhd8ed1ab_0 conda-forge pygments 2.10.0 pypi_0 pypi
pyodc 1.0.3 pypi_0 pypi pyjwt 2.1.0 pypi_0 pypi
pyld 2.0.3 pypi_0 pypi
pyodc 1.1.1 pypi_0 pypi
pyopenssl 20.0.1 pyhd8ed1ab_0 conda-forge pyopenssl 20.0.1 pyhd8ed1ab_0 conda-forge
pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge
pyrsistent 0.17.3 py38h497a2fe_2 conda-forge pyrsistent 0.17.3 py38h497a2fe_2 conda-forge
pyshacl 0.17.0.post1 pypi_0 pypi
pysocks 1.7.1 py38h578d9bd_3 conda-forge pysocks 1.7.1 py38h578d9bd_3 conda-forge
python 3.8.6 hffdb5ce_4_cpython conda-forge python 3.8.6 hffdb5ce_4_cpython conda-forge
python-dateutil 2.8.1 py_0 conda-forge python-dateutil 2.8.1 py_0 conda-forge
python-eccodes 2021.03.0 py38hb5d20a5_0 conda-forge python-eccodes 2021.03.0 py38hb5d20a5_1 conda-forge
python-editor 1.0.4 py_0 conda-forge python-editor 1.0.4 pypi_0 pypi
python-flatbuffers 1.12 pyhd3eb1b0_0 defaults python-flatbuffers 1.12 pyhd3eb1b0_0 defaults
python-json-logger 2.0.1 pyh9f0ad1d_0 conda-forge python-json-logger 2.0.1 pyh9f0ad1d_0 conda-forge
python-snappy 0.6.0 py38h2531618_3 defaults
python_abi 3.8 1_cp38 conda-forge python_abi 3.8 1_cp38 conda-forge
pytorch 1.7.1 cpu_py38h6a09485_0 defaults pytorch 1.8.1 cpu_py38h60491be_0 defaults
pytz 2021.1 pyhd3eb1b0_0 defaults pytz 2021.1 pyhd3eb1b0_0 defaults
pyyaml 5.4.1 pypi_0 pypi pyyaml 5.4.1 pypi_0 pypi
pyzmq 21.0.1 py38h3d7ac18_0 conda-forge pyzmq 21.0.1 py38h3d7ac18_0 conda-forge
rdflib 6.0.1 pypi_0 pypi
rdflib-jsonld 0.5.0 pypi_0 pypi
readline 8.0 he28a2e2_2 conda-forge readline 8.0 he28a2e2_2 conda-forge
regex 2021.4.4 pypi_0 pypi regex 2021.4.4 pypi_0 pypi
requests 2.25.1 pyhd3deb0d_0 conda-forge renku 0.16.2 pypi_0 pypi
requests 2.24.0 pypi_0 pypi
requests-oauthlib 1.3.0 py_0 defaults requests-oauthlib 1.3.0 py_0 defaults
rich 10.3.0 pypi_0 pypi
rsa 4.7.2 pyhd3eb1b0_1 defaults rsa 4.7.2 pyhd3eb1b0_1 defaults
ruamel.yaml 0.16.12 py38h497a2fe_2 conda-forge ruamel-yaml 0.16.5 pypi_0 pypi
ruamel.yaml.clib 0.2.2 py38h497a2fe_2 conda-forge ruamel.yaml.clib 0.2.2 py38h497a2fe_2 conda-forge
ruamel_yaml 0.15.80 py38h497a2fe_1003 conda-forge ruamel_yaml 0.15.80 py38h497a2fe_1003 conda-forge
s3fs 0.6.0 pyhd3eb1b0_0 defaults s3fs 2021.7.0 pyhd3eb1b0_0 defaults
schema-salad 8.2.20210918131710 pypi_0 pypi
scikit-learn 0.24.2 py38ha9443f7_0 defaults scikit-learn 0.24.2 py38ha9443f7_0 defaults
scipy 1.6.2 py38h91f5cce_0 defaults scipy 1.7.0 py38h7b17777_1 conda-forge
send2trash 1.5.0 py_0 conda-forge send2trash 1.5.0 py_0 conda-forge
setuptools 49.6.0 py38h578d9bd_3 conda-forge setuptools 58.2.0 pypi_0 pypi
setuptools-scm 6.0.1 pypi_0 pypi
shellescape 3.8.1 pypi_0 pypi
shellingham 1.4.0 pypi_0 pypi
simpervisor 0.4 pypi_0 pypi simpervisor 0.4 pypi_0 pypi
six 1.15.0 pyh9f0ad1d_0 conda-forge six 1.16.0 pypi_0 pypi
smmap 4.0.0 pypi_0 pypi smmap 4.0.0 pypi_0 pypi
sortedcontainers 2.3.0 pyhd3eb1b0_0 defaults snappy 1.1.8 he6710b0_0 defaults
sortedcontainers 2.4.0 pyhd3eb1b0_0 defaults
soupsieve 2.2.1 pyhd3eb1b0_0 defaults soupsieve 2.2.1 pyhd3eb1b0_0 defaults
sqlalchemy 1.3.22 py38h497a2fe_1 conda-forge sqlalchemy 1.3.22 py38h497a2fe_1 conda-forge
sqlite 3.34.0 h74cdb3f_0 conda-forge sqlite 3.34.0 h74cdb3f_0 conda-forge
sysroot_linux-64 2.12 h77966d4_13 conda-forge sysroot_linux-64 2.12 h77966d4_13 conda-forge
tabulate 0.8.9 pypi_0 pypi
tbb 2020.3 hfd86e86_0 defaults tbb 2020.3 hfd86e86_0 defaults
tblib 1.7.0 py_0 defaults tblib 1.7.0 pyhd3eb1b0_0 defaults
tenacity 7.0.0 pypi_0 pypi tenacity 7.0.0 pypi_0 pypi
tensorboard 2.4.0 pyhc547734_0 defaults tensorboard 2.4.0 pyhc547734_0 defaults
tensorboard-plugin-wit 1.6.0 py_0 defaults tensorboard-plugin-wit 1.6.0 py_0 defaults
tensorflow 2.4.1 mkl_py38hb2083e0_0 defaults tensorflow 2.4.1 mkl_py38hb2083e0_0 defaults
tensorflow-base 2.4.1 mkl_py38h43e0292_0 defaults tensorflow-base 2.4.1 mkl_py38h43e0292_0 defaults
tensorflow-estimator 2.5.0 pyh7b7c402_0 defaults tensorflow-estimator 2.6.0 pyh7b7c402_0 defaults
termcolor 1.1.0 py38h06a4308_1 defaults termcolor 1.1.0 py38h06a4308_1 defaults
terminado 0.9.2 py38h578d9bd_0 conda-forge terminado 0.9.2 py38h578d9bd_0 conda-forge
testpath 0.4.4 py_0 conda-forge testpath 0.4.4 py_0 conda-forge
textwrap3 0.9.2 pypi_0 pypi textwrap3 0.9.2 pypi_0 pypi
threadpoolctl 2.1.0 pyh5ca1d4c_0 defaults threadpoolctl 2.2.0 pyh0d69192_0 defaults
tini 0.18.0 h14c3975_1001 conda-forge tini 0.18.0 h14c3975_1001 conda-forge
tk 8.6.10 h21135ba_1 conda-forge tk 8.6.10 h21135ba_1 conda-forge
toml 0.10.2 pypi_0 pypi toml 0.10.2 pypi_0 pypi
toolz 0.11.1 pyhd3eb1b0_0 defaults toolz 0.11.1 pyhd3eb1b0_0 defaults
tornado 6.1 py38h497a2fe_1 conda-forge tornado 6.1 py38h497a2fe_1 conda-forge
tqdm 4.56.0 pyhd8ed1ab_0 conda-forge tqdm 4.60.0 pypi_0 pypi
traitlets 5.0.5 py_0 conda-forge traitlets 5.0.5 py_0 conda-forge
typed-ast 1.4.2 pypi_0 pypi typed-ast 1.4.2 pypi_0 pypi
typing-extensions 3.7.4.3 hd3eb1b0_0 defaults typing-extensions 3.7.4.3 pypi_0 pypi
typing_extensions 3.7.4.3 pyh06a4308_0 defaults typing_extensions 3.10.0.2 pyh06a4308_0 defaults
urllib3 1.26.2 pyhd8ed1ab_0 conda-forge urllib3 1.25.11 pypi_0 pypi
userpath 1.4.2 pypi_0 pypi userpath 1.4.2 pypi_0 pypi
wcmatch 8.2 pypi_0 pypi
wcwidth 0.2.5 pyh9f0ad1d_2 conda-forge wcwidth 0.2.5 pyh9f0ad1d_2 conda-forge
webencodings 0.5.1 py_1 conda-forge webencodings 0.5.1 py_1 conda-forge
webob 1.8.7 pyhd3eb1b0_0 defaults webob 1.8.7 pyhd3eb1b0_0 defaults
werkzeug 1.0.1 pyhd3eb1b0_0 defaults werkzeug 2.0.1 pyhd3eb1b0_0 defaults
wheel 0.36.2 pyhd3deb0d_0 conda-forge wheel 0.36.2 pyhd3deb0d_0 conda-forge
wrapt 1.12.1 py38h7b6447c_1 defaults wrapt 1.12.1 py38h7b6447c_1 defaults
xarray 0.18.0 pyhd3eb1b0_1 defaults xarray 0.19.0 pyhd3eb1b0_1 defaults
xhistogram 0.1.2 pyhd8ed1ab_0 conda-forge xhistogram 0.3.0 pyhd8ed1ab_0 conda-forge
xskillscore 0.0.20 pyhd8ed1ab_1 conda-forge xskillscore 0.0.23 pyhd8ed1ab_0 conda-forge
xz 5.2.5 h516909a_1 conda-forge xz 5.2.5 h516909a_1 conda-forge
yagup 0.1.1 pypi_0 pypi
yaml 0.2.5 h516909a_0 conda-forge yaml 0.2.5 h516909a_0 conda-forge
yarl 1.6.3 py38h27cfd23_0 defaults yarl 1.6.3 py38h27cfd23_0 defaults
zarr 2.8.1 pyhd3eb1b0_0 defaults zarr 2.8.1 pyhd3eb1b0_0 defaults
zeromq 4.3.3 h58526e2_3 conda-forge zeromq 4.3.3 h58526e2_3 conda-forge
zict 2.0.0 pyhd3eb1b0_0 defaults zict 2.0.0 pyhd3eb1b0_0 defaults
zipp 3.4.0 py_0 conda-forge zipp 3.4.0 py_0 conda-forge
zlib 1.2.11 h516909a_1010 conda-forge zlib 1.2.11 h516909a_1010 conda-forge
zstd 1.4.9 haebb681_0 defaults zstd 1.4.9 haebb681_0 defaults
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
``` ```
......
source diff could not be displayed: it is too large. Options to address this: view the blob.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Train ML model to correct predictions of week 3-4 & 5-6 # Train ML model to correct predictions of week 3-4 & 5-6
This notebook create a Machine Learning `ML_model` to predict weeks 3-4 & 5-6 based on `S2S` weeks 3-4 & 5-6 forecasts and is compared to `CPC` observations for the [`s2s-ai-challenge`](https://s2s-ai-challenge.github.io/). This notebook create a Machine Learning `ML_model` to predict weeks 3-4 & 5-6 based on `S2S` weeks 3-4 & 5-6 forecasts and is compared to `CPC` observations for the [`s2s-ai-challenge`](https://s2s-ai-challenge.github.io/).
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Synopsis # Synopsis
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Method: `mean bias reduction` ## Method: `mean bias reduction`
- calculate the mean bias from 2000-2019 deterministic ensemble mean forecast - calculate the mean bias from 2000-2019 deterministic ensemble mean forecast
- remove that mean bias from 2020 forecast deterministic ensemble mean forecast - remove that mean bias from 2020 forecast deterministic ensemble mean forecast
- no Machine Learning used here - no Machine Learning used here
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Data used ## Data used
type: renku datasets type: renku datasets
Training-input for Machine Learning model: Training-input for Machine Learning model:
- hindcasts of models: - hindcasts of models:
- ECMWF: `ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr` - ECMWF: `ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr`
Forecast-input for Machine Learning model: Forecast-input for Machine Learning model:
- real-time 2020 forecasts of models: - real-time 2020 forecasts of models:
- ECMWF: `ecmwf_forecast-input_2020_biweekly_deterministic.zarr` - ECMWF: `ecmwf_forecast-input_2020_biweekly_deterministic.zarr`
Compare Machine Learning model forecast against against ground truth: Compare Machine Learning model forecast against against ground truth:
- `CPC` observations: - `CPC` observations:
- `hindcast-like-observations_biweekly_deterministic.zarr` - `hindcast-like-observations_biweekly_deterministic.zarr`
- `forecast-like-observations_2020_biweekly_deterministic.zarr` - `forecast-like-observations_2020_biweekly_deterministic.zarr`
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Resources used ## Resources used
for training, details in reproducibility for training, details in reproducibility
- platform: MPI-M supercompute 1 Node - platform: MPI-M supercompute 1 Node
- memory: 64 GB - memory: 64 GB
- processors: 36 CPU - processors: 36 CPU
- storage required: 10 GB - storage required: 10 GB
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Safeguards ## Safeguards
All points have to be [x] checked. If not, your submission is invalid. All points have to be [x] checked. If not, your submission is invalid.
Changes to the code after submissions are not possible, as the `commit` before the `tag` will be reviewed. Changes to the code after submissions are not possible, as the `commit` before the `tag` will be reviewed.
(Only in exceptions and if previous effort in reproducibility can be found, it may be allowed to improve readability and reproducibility after November 1st 2021.) (Only in exceptions and if previous effort in reproducibility can be found, it may be allowed to improve readability and reproducibility after November 1st 2021.)
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Safeguards to prevent [overfitting](https://en.wikipedia.org/wiki/Overfitting?wprov=sfti1) ### Safeguards to prevent [overfitting](https://en.wikipedia.org/wiki/Overfitting?wprov=sfti1)
If the organizers suspect overfitting, your contribution can be disqualified. If the organizers suspect overfitting, your contribution can be disqualified.
- [x] We didnt use 2020 observations in training (explicit overfitting and cheating) - [x] We didnt use 2020 observations in training (explicit overfitting and cheating)
- [x] We didnt repeatedly verify my model on 2020 observations and incrementally improved my RPSS (implicit overfitting) - [x] We didnt repeatedly verify my model on 2020 observations and incrementally improved my RPSS (implicit overfitting)
- [x] We provide RPSS scores for the training period with script `skill_by_year`, see in section 6.3 `predict`. - [x] We provide RPSS scores for the training period with script `skill_by_year`, see in section 6.3 `predict`.
- [x] We tried our best to prevent [data leakage](https://en.wikipedia.org/wiki/Leakage_(machine_learning)?wprov=sfti1). - [x] We tried our best to prevent [data leakage](https://en.wikipedia.org/wiki/Leakage_(machine_learning)?wprov=sfti1).
- [x] We honor the `train-validate-test` [split principle](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets). This means that the hindcast data is split into `train` and `validate`, whereas `test` is withheld. - [x] We honor the `train-validate-test` [split principle](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets). This means that the hindcast data is split into `train` and `validate`, whereas `test` is withheld.
- [x] We did use `test` explicitly in training or implicitly in incrementally adjusting parameters. - [x] We did use `test` explicitly in training or implicitly in incrementally adjusting parameters.
- [x] We considered [cross-validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)). - [x] We considered [cross-validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)).
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Safeguards for Reproducibility ### Safeguards for Reproducibility
Notebook/code must be independently reproducible from scratch by the organizers (after the competition), if not possible: no prize Notebook/code must be independently reproducible from scratch by the organizers (after the competition), if not possible: no prize
- [x] All training data is publicly available (no pre-trained private neural networks, as they are not reproducible for us) - [x] All training data is publicly available (no pre-trained private neural networks, as they are not reproducible for us)
- [x] Code is well documented, readable and reproducible. - [x] Code is well documented, readable and reproducible.
- [x] Code to reproduce training and predictions is preferred to run within a day on the described architecture. If the training takes longer than a day, please justify why this is needed. Please do not submit training piplelines, which take weeks to train. - [x] Code to reproduce training and predictions is preferred to run within a day on the described architecture. If the training takes longer than a day, please justify why this is needed. Please do not submit training piplelines, which take weeks to train.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Imports # Imports
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import xarray as xr import xarray as xr
xr.set_options(display_style='text') xr.set_options(display_style='text')
``` ```
%% Output %% Output
<xarray.core.options.set_options at 0x2b37fc26ec50> <xarray.core.options.set_options at 0x7f05cc486340>
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Get training data # Get training data
preprocessing of input data may be done in separate notebook/script preprocessing of input data may be done in separate notebook/script
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Hindcast ## Hindcast
get weekly initialized hindcasts get weekly initialized hindcasts
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# preprocessed as renku dataset # preprocessed as renku dataset
!renku storage pull ../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr !renku storage pull ../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr
``` ```
%% Output
Warning: Run CLI commands only from project's root directory.

%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
hind_2000_2019 = xr.open_zarr("../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr", consolidated=True) hind_2000_2019 = xr.open_zarr("../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr", consolidated=True)
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# preprocessed as renku dataset # preprocessed as renku dataset
!renku storage pull ../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr !renku storage pull ../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr
``` ```
%% Output
Warning: Run CLI commands only from project's root directory.

%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
fct_2020 = xr.open_zarr("../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr", consolidated=True) fct_2020 = xr.open_zarr("../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr", consolidated=True)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Observations ## Observations
corresponding to hindcasts corresponding to hindcasts
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# preprocessed as renku dataset # preprocessed as renku dataset
!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr !renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
``` ```
%% Output
Warning: Run CLI commands only from project's root directory.

%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
obs_2000_2019 = xr.open_zarr("../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr", consolidated=True) obs_2000_2019 = xr.open_zarr("../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr", consolidated=True)
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# preprocessed as renku dataset # preprocessed as renku dataset
!renku storage pull ../data/forecast-like-observations_2020_biweekly_deterministic.zarr !renku storage pull ../data/forecast-like-observations_2020_biweekly_deterministic.zarr
``` ```
%% Output
Warning: Run CLI commands only from project's root directory.

%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
obs_2020 = xr.open_zarr("../data/forecast-like-observations_2020_biweekly_deterministic.zarr", consolidated=True) obs_2020 = xr.open_zarr("../data/forecast-like-observations_2020_biweekly_deterministic.zarr", consolidated=True)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# no ML model # no ML model
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Here, we just remove the mean bias from the ensemble mean forecast. Here, we just remove the mean bias from the ensemble mean forecast.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
bias_2000_2019 = (hind_2000_2019.mean('realization') - obs_2000_2019).groupby('forecast_time.weekofyear').mean().compute() from scripts import add_year_week_coords
obs_2000_2019 = add_year_week_coords(obs_2000_2019)
hind_2000_2019 = add_year_week_coords(hind_2000_2019)
```
%% Cell type:code id: tags:
``` python
bias_2000_2019 = (hind_2000_2019.mean('realization') - obs_2000_2019).groupby('week').mean().compute()
``` ```
%% Output %% Output
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead. /opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
FutureWarning,
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
x = np.divide(x1, x2, out) x = np.divide(x1, x2, out)
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## `predict` ## `predict`
Create predictions and print `mean(variable, lead_time, longitude, weighted latitude)` RPSS for all years as calculated by `skill_by_year`. Create predictions and print `mean(variable, lead_time, longitude, weighted latitude)` RPSS for all years as calculated by `skill_by_year`.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from scripts import make_probabilistic from scripts import make_probabilistic
``` ```
%% Output
WARNING: ecmwflibs universal: cannot find a library called MagPlus
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/climetlab/plotting/drivers/magics/actions.py:36: UserWarning: Magics library could not be found
warnings.warn(str(e))
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc !renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc
``` ```
%% Output
Warning: Run CLI commands only from project's root directory.

%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
tercile_file = f'../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc' tercile_file = f'../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc'
tercile_edges = xr.open_dataset(tercile_file) tercile_edges = xr.open_dataset(tercile_file)
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
def create_predictions(fct, bias): def create_predictions(fct, bias):
preds = fct - bias.sel(weekofyear=fct.forecast_time.dt.weekofyear) if 'week' not in fct.coords:
fct = add_year_week_coords(fct)
preds = fct - bias.sel(week=fct.week)
preds = make_probabilistic(preds, tercile_edges) preds = make_probabilistic(preds, tercile_edges)
return preds.astype('float32') return preds.astype('float32')
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### `predict` training period in-sample ### `predict` training period in-sample
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
!renku storage pull ../data/forecast-like-observations_2020_biweekly_terciled.nc !renku storage pull ../data/forecast-like-observations_2020_biweekly_terciled.nc
``` ```
%% Output
Warning: Run CLI commands only from project's root directory.

%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr !renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr
``` ```
%% Output
Warning: Run CLI commands only from project's root directory.

%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
preds_is = create_predictions(hind_2000_2019, bias_2000_2019).compute() preds_is = create_predictions(hind_2000_2019, bias_2000_2019).compute()
``` ```
%% Output
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
FutureWarning,
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
FutureWarning,
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from scripts import skill_by_year from scripts import skill_by_year
skill_by_year(preds_is)
``` ```
%% Output %% Cell type:code id: tags:
RPSS ``` python
year skill_by_year(preds_is)
2000 -0.141857 ```
2001 -0.203405
2002 -0.202549
2003 -0.206234
2004 -0.549463
2005 -0.168421
2006 -0.184515
2007 -0.616939
2008 -0.195251
2009 -0.202809
2010 -0.189126
2011 -0.678302
2012 -0.620137
2013 -0.202285
2014 -0.206982
2015 -0.172498
2016 -0.136464
2017 -0.638293
2018 -0.667205
2019 -0.180896
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### `predict` test ### `predict` test
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
preds_test = create_predictions(fct_2020, bias_2000_2019) preds_test = create_predictions(fct_2020, bias_2000_2019)
``` ```
%% Output
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
FutureWarning,
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
FutureWarning,
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
skill_by_year(preds_test) skill_by_year(preds_test)
``` ```
%% Output
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
x = np.divide(x1, x2, out)
RPSS
year
2020 -0.093422
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Submission # Submission
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from scripts import assert_predictions_2020 from scripts import assert_predictions_2020
assert_predictions_2020(preds_test) assert_predictions_2020(preds_test)
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
preds_test.attrs = {'author': 'Aaron Spring', 'author_email': 'aaron.spring@mpimet.mpg.de', preds_test.attrs = {'author': 'Aaron Spring', 'author_email': 'aaron.spring@mpimet.mpg.de',
'comment': 'created for the s2s-ai-challenge as a template for the website', 'comment': 'created for the s2s-ai-challenge as a template for the website',
'notebook': 'mean_bias_reduction.ipynb', 'notebook': 'mean_bias_reduction.ipynb',
'website': 'https://s2s-ai-challenge.github.io/#evaluation'} 'website': 'https://s2s-ai-challenge.github.io/#evaluation'}
html_repr = xr.core.formatting_html.dataset_repr(preds_test) html_repr = xr.core.formatting_html.dataset_repr(preds_test)
with open('submission_template_repr.html', 'w') as myFile: with open('submission_template_repr.html', 'w') as myFile:
myFile.write(html_repr) myFile.write(html_repr)
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
preds_test.to_netcdf('../submissions/ML_prediction_2020.nc') preds_test.to_netcdf('../submissions/ML_prediction_2020.nc')
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
#!git add ../submissions/ML_prediction_2020.nc # !git add ../submissions/ML_prediction_2020.nc
#!git add mean_bias_reduction.ipynb # !git add mean_bias_reduction.ipynb
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
#!git commit -m "template_test no ML mean bias reduction" # whatever message you want #!git commit -m "template_test no ML mean bias reduction" # whatever message you want
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
#!git tag "submission-no_ML_mean_bias_reduction-0.0.1" # if this is to be checked by scorer, only the last submitted==tagged version will be considered #!git tag "submission-no_ML_mean_bias_reduction-0.0.2" # if this is to be checked by scorer, only the last submitted==tagged version will be considered
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
#!git push --tags #!git push --tags
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Reproducibility # Reproducibility
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## memory ## memory
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# https://phoenixnap.com/kb/linux-commands-check-memory-usage # https://phoenixnap.com/kb/linux-commands-check-memory-usage
!free -g !free -g
``` ```
%% Output
total used free shared buffers cached
Mem: 62 15 46 0 0 5
-/+ buffers/cache: 10 52
Swap: 0 0 0
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## CPU ## CPU
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
!lscpu !lscpu
``` ```
%% Output
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 72
On-line CPU(s) list: 0-71
Thread(s) per core: 2
Core(s) per socket: 18
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10GHz
Stepping: 1
CPU MHz: 2100.000
BogoMIPS: 4190.01
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 46080K
NUMA node0 CPU(s): 0-17,36-53
NUMA node1 CPU(s): 18-35,54-71
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## software ## software
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
!conda list !conda list
``` ```
%% Output
# packages in environment at /work/mh0727/m300524/conda-envs/s2s-ai:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
_tflow_select 2.3.0 mkl
absl-py 0.12.0 py37h06a4308_0
aiobotocore 1.2.2 pyhd3eb1b0_0
aiohttp 3.7.4 py37h27cfd23_1
aioitertools 0.7.1 pyhd3eb1b0_0
anyio 2.2.0 pypi_0 pypi
appdirs 1.4.4 py_0
argcomplete 1.12.2 pypi_0 pypi
argon2-cffi 20.1.0 py37h27cfd23_1
asciitree 0.3.3 py_2
astunparse 1.6.3 py_0
async-timeout 3.0.1 py37h06a4308_0
async_generator 1.10 py37h28b3542_0
attrs 20.2.0 pypi_0 pypi
babel 2.9.0 pypi_0 pypi
backcall 0.2.0 pyhd3eb1b0_0
backrefs 5.0.1 pypi_0 pypi
bagit 1.8.1 pypi_0 pypi
beautifulsoup4 4.9.3 pyha847dfd_0
black 20.8b1 pypi_0 pypi
blas 1.0 mkl
bleach 3.3.0 pyhd3eb1b0_0
blinker 1.4 py37h06a4308_0
bokeh 2.3.0 py37h06a4308_0
botocore 1.20.33 pyhd3eb1b0_1
bottleneck 1.3.2 py37heb32a55_1
bracex 2.1.1 pypi_0 pypi
branca 0.3.1 pypi_0 pypi
brotlipy 0.7.0 py37h27cfd23_1003
bzip2 1.0.8 h7b6447c_0
c-ares 1.17.1 h27cfd23_0
ca-certificates 2021.1.19 h06a4308_1
cachecontrol 0.11.7 pypi_0 pypi
cachetools 4.2.1 pyhd3eb1b0_0
calamus 0.3.7 pypi_0 pypi
cdsapi 0.5.1 pypi_0 pypi
certifi 2020.12.5 py37h06a4308_0
cffi 1.14.5 py37h261ae71_0
cfgrib 0.9.8.5 pyhd8ed1ab_0 conda-forge
cftime 1.4.1 py37h6323ea4_0
chardet 3.0.4 py37h06a4308_1003
click 7.1.2 pyhd3eb1b0_0
click-completion 0.5.2 pypi_0 pypi
click-plugins 1.1.1 pypi_0 pypi
climetlab 0.8.0 pypi_0 pypi
climetlab-s2s-ai-challenge 0.6.7 pypi_0 pypi
climetlab-s2s-ai-competition 0.3.7 pypi_0 pypi
cloudpickle 1.6.0 py_0
colorama 0.4.4 pypi_0 pypi
coloredlogs 15.0 pypi_0 pypi
commonmark 0.9.1 pypi_0 pypi
configargparse 1.4 pypi_0 pypi
coverage 5.5 py37h27cfd23_2
cryptography 3.4.6 py37hd23ed53_0
curl 7.71.1 hbc83047_1
cwlgen 0.4.2 pypi_0 pypi
cwltool 3.0.20210319143721 pypi_0 pypi
cycler 0.10.0 py37_0
cython 0.29.22 py37h2531618_0
cytoolz 0.11.0 py37h7b6447c_0
dask 2021.3.0 pypi_0 pypi
dask-labextension 5.0.1 pypi_0 pypi
dbus 1.13.18 hb2f20db_0
decorator 4.4.2 pyhd3eb1b0_0
defusedxml 0.7.1 pyhd3eb1b0_0
distributed 2021.3.0 py37h06a4308_0
docopt 0.6.2 py37h06a4308_0
eccodes 1.2.0 pypi_0 pypi
ecmwf-api-client 1.6.1 pypi_0 pypi
ecmwflibs 0.2.3 pypi_0 pypi
entrypoints 0.3 py37_0
environ-config 20.1.0 pypi_0 pypi
expat 2.2.10 he6710b0_2
fasteners 0.16 pyhd3eb1b0_0
fastprogress 1.0.0 py_0 conda-forge
filelock 3.0.12 pypi_0 pypi
folium 0.12.1 pypi_0 pypi
fontconfig 2.13.1 h6c09931_0
freetype 2.10.4 h5ab3b9f_0
frozendict 1.2 pypi_0 pypi
fsspec 0.8.7 pyhd3eb1b0_0
gast 0.4.0 py_0
gitdb 4.0.6 pypi_0 pypi
gitpython 3.1.12 pypi_0 pypi
glib 2.67.4 h36276a3_1
google-auth 1.28.0 pyhd3eb1b0_0
google-auth-oauthlib 0.4.3 pyhd3eb1b0_0
google-pasta 0.2.0 py_0
grpcio 1.36.1 py37h2157cd5_1
gst-plugins-base 1.14.0 h8213a91_2
gstreamer 1.14.0 h28cd5cc_2
h5netcdf 0.10.0 pyhd8ed1ab_0 conda-forge
h5py 2.10.0 py37h7918eee_0
hdf4 4.2.13 h3ca952b_2
hdf5 1.10.4 hb1b8bf9_0
heapdict 1.0.1 py_0
humanfriendly 9.1 pypi_0 pypi
humanize 2.6.0 pypi_0 pypi
icu 58.2 he6710b0_3
idna 2.10 pyhd3eb1b0_0
importlib-metadata 3.7.3 py37h06a4308_1
importlib_metadata 3.7.3 hd3eb1b0_1
intake 0.6.2 pyhd3eb1b0_0
intake-esm 2020.8.15 py_0 conda-forge
intake-xarray 0.5.0 pyhd3eb1b0_0
intel-openmp 2020.2 254
ipykernel 5.3.4 py37h5ca1d4c_0
ipython 7.21.0 py37hb070fc8_0
ipython_genutils 0.2.0 py_1 conda-forge
isodate 0.6.0 pypi_0 pypi
jasper 1.900.1 hd497a04_4
jedi 0.17.2 py37h06a4308_1
jinja2 2.11.3 pyhd3eb1b0_0
jmespath 0.10.0 py_0
joblib 1.0.1 pyhd3eb1b0_0
jpeg 9d h36c2ea0_0 conda-forge
json5 0.9.5 pypi_0 pypi
jsonschema 3.2.0 py_2
jupyter-packaging 0.7.12 pypi_0 pypi
jupyter-server 1.5.1 pypi_0 pypi
jupyter-server-proxy 3.0.2 pypi_0 pypi
jupyter_client 6.1.12 pyhd8ed1ab_0 conda-forge
jupyter_core 4.7.1 py37h89c1867_0 conda-forge
jupyterlab 3.0.12 pypi_0 pypi
jupyterlab-server 2.3.0 pypi_0 pypi
jupyterlab_pygments 0.1.2 py_0
keras-preprocessing 1.1.2 pyhd3eb1b0_0
kiwisolver 1.3.1 py37h2531618_0
krb5 1.18.2 h173b8e3_0
lazy-object-proxy 1.6.0 pypi_0 pypi
lcms2 2.11 h396b838_0
ld_impl_linux-64 2.33.1 h53a641e_7
libaec 1.0.4 he6710b0_1
libcurl 7.71.1 h20c2e04_1
libedit 3.1.20210216 h27cfd23_1
libffi 3.3 he6710b0_2
libgcc-ng 9.1.0 hdf63c60_0
libgfortran-ng 7.3.0 hdf63c60_0
libllvm10 10.0.1 hbcb73fb_5
libnetcdf 4.6.2 hbdf4f91_1001 conda-forge
libpng 1.6.37 hbc83047_0
libprotobuf 3.14.0 h8c45485_0
libsodium 1.0.18 h36c2ea0_1 conda-forge
libssh2 1.9.0 h1ba5d50_1
libstdcxx-ng 9.1.0 hdf63c60_0
libtiff 4.2.0 h85742a9_0
libuuid 1.0.3 h1bed415_2
libwebp-base 1.2.0 h27cfd23_0
libxcb 1.14 h7b6447c_0
libxml2 2.9.10 hb55368b_3
llvmlite 0.36.0 py37h612dafd_4
locket 0.2.1 py37h06a4308_1
lockfile 0.12.2 pypi_0 pypi
lxml 4.6.3 pypi_0 pypi
lz4-c 1.9.3 h2531618_0
magics 1.5.6 pypi_0 pypi
markdown 3.3.4 py37h06a4308_0
markupsafe 1.1.1 py37h14c3975_1
marshmallow 3.10.0 pypi_0 pypi
matplotlib 3.3.4 py37h06a4308_0
matplotlib-base 3.3.4 py37h62a2d02_0
mistune 0.8.4 py37h14c3975_1001
mkl 2020.2 256
mkl-service 2.3.0 py37he8ac12f_0
mkl_fft 1.3.0 py37h54f3939_0
mkl_random 1.1.1 py37h0573a6f_0
monotonic 1.5 py_0
msgpack-python 1.0.2 py37hff7bd54_1
multidict 5.1.0 py37h27cfd23_2
mypy-extensions 0.4.3 pypi_0 pypi
nb-black 1.0.7 pypi_0 pypi
nb_conda_kernels 2.3.1 py37h06a4308_0
nbclassic 0.2.6 pypi_0 pypi
nbclient 0.5.3 pyhd3eb1b0_0
nbconvert 6.0.7 py37_0
nbformat 5.1.2 pyhd3eb1b0_1
ncurses 6.2 he6710b0_1
ndg-httpsclient 0.5.1 pypi_0 pypi
nest-asyncio 1.5.1 pyhd3eb1b0_0
netcdf4 1.5.1 py37had58050_0 conda-forge
networkx 2.5 pypi_0 pypi
notebook 6.3.0 py37h06a4308_0
numba 0.53.0 py37ha9443f7_0
numcodecs 0.7.3 py37h2531618_0
numpy 1.19.2 py37h54aff64_0
numpy-base 1.19.2 py37hfa32c7d_0
oauthlib 3.1.0 py_0
olefile 0.46 py37_0
openssl 1.1.1k h27cfd23_0
opt_einsum 3.1.0 py_0
owlrl 5.2.1 pypi_0 pypi
packaging 20.9 pyhd3eb1b0_0
pandas 1.2.3 py37ha9443f7_0
pandoc 2.12 h06a4308_0
pandocfilters 1.4.3 py37h06a4308_1
parso 0.7.0 py_0
partd 1.1.0 py_0
pathspec 0.8.0 pypi_0 pypi
patool 1.12 pypi_0 pypi
pcre 8.44 he6710b0_0
pdbufr 0.8.2 pypi_0 pypi
pexpect 4.8.0 pyhd3eb1b0_3
pickleshare 0.7.5 pyhd3eb1b0_1003
pillow 8.1.2 py37he98fc37_0
pip 21.0.1 py37h06a4308_0
pluggy 0.13.1 pypi_0 pypi
portalocker 2.2.1 pypi_0 pypi
prometheus_client 0.9.0 pyhd3eb1b0_0
prompt-toolkit 3.0.17 pyh06a4308_0
properscoring 0.1 py_0 conda-forge
protobuf 3.14.0 py37h2531618_1
prov 1.5.1 pypi_0 pypi
psutil 5.7.2 pypi_0 pypi
ptyprocess 0.7.0 pyhd3eb1b0_2
pyasn1 0.4.8 py_0
pyasn1-modules 0.2.8 py_0
pycparser 2.20 py_2
pydap 3.2.2 pyh9f0ad1d_1001 conda-forge
pydot 1.4.2 pypi_0 pypi
pygments 2.8.1 pyhd3eb1b0_0
pyjwt 2.0.0 pypi_0 pypi
pyld 2.0.3 pypi_0 pypi
pyodc 1.0.3 pypi_0 pypi
pyopenssl 19.1.0 pypi_0 pypi
pyparsing 2.4.7 pyhd3eb1b0_0
pyqt 5.9.2 py37h05f1152_2
pyrsistent 0.17.3 py37h7b6447c_0
pyshacl 0.11.3.post1 pypi_0 pypi
pysocks 1.7.1 py37_1
python 3.7.10 hdb3f193_0
python-dateutil 2.8.1 pyhd3eb1b0_0
python-editor 1.0.4 pypi_0 pypi
python-flatbuffers 1.12 pyhd3eb1b0_0
python_abi 3.7 1_cp37m conda-forge
pytz 2021.1 pyhd3eb1b0_0
pyyaml 5.3.1 pypi_0 pypi
pyzmq 19.0.2 py37hac76be4_2 conda-forge
qt 5.9.7 h5867ecd_1
rdflib 5.0.0 pypi_0 pypi
rdflib-jsonld 0.5.0 pypi_0 pypi
readline 8.1 h27cfd23_0
rechunker 0.3.3 pypi_0 pypi
regex 2021.3.17 pypi_0 pypi
renku 0.14.1 pypi_0 pypi
requests 2.24.0 pypi_0 pypi
requests-oauthlib 1.3.0 py_0
rich 9.3.0 pypi_0 pypi
rsa 4.7.2 pyhd3eb1b0_1
ruamel-yaml 0.16.5 pypi_0 pypi
ruamel-yaml-clib 0.2.2 pypi_0 pypi
s3fs 0.5.2 pyhd3eb1b0_0
schema-salad 7.1.20210316164414 pypi_0 pypi
scikit-learn 0.24.1 py37ha9443f7_0
scipy 1.6.1 py37h91f5cce_0
send2trash 1.5.0 pyhd3eb1b0_1
setuptools 52.0.0 py37h06a4308_0
setuptools-scm 4.1.2 pypi_0 pypi
shellescape 3.4.1 pypi_0 pypi
shellingham 1.4.0 pypi_0 pypi
simpervisor 0.4 pypi_0 pypi
sip 4.19.8 py37hf484d3e_0
six 1.15.0 py37h06a4308_0
smmap 3.0.5 pypi_0 pypi
sniffio 1.2.0 pypi_0 pypi
sortedcontainers 2.3.0 pyhd3eb1b0_0
soupsieve 2.2.1 pyhd3eb1b0_0
sqlite 3.35.2 hdfb4753_0
tabulate 0.8.7 pypi_0 pypi
tbb 2020.3 hfd86e86_0
tblib 1.7.0 py_0
tensorboard 2.4.0 pyhc547734_0
tensorboard-plugin-wit 1.6.0 py_0
tensorflow 2.4.1 mkl_py37h2d14ff2_0
tensorflow-base 2.4.1 mkl_py37h43e0292_0
tensorflow-estimator 2.4.1 pyheb71bc4_0
termcolor 1.1.0 py37h06a4308_1
terminado 0.9.3 py37h06a4308_0
testpath 0.4.4 pyhd3eb1b0_0
threadpoolctl 2.1.0 pyh5ca1d4c_0
tk 8.6.10 hbc83047_0
toml 0.10.2 pypi_0 pypi
toolz 0.11.1 pyhd3eb1b0_0
tornado 6.1 py37h27cfd23_0
tqdm 4.48.2 pypi_0 pypi
traitlets 5.0.5 py_0 conda-forge
typed-ast 1.4.2 pypi_0 pypi
typing-extensions 3.7.4.3 hd3eb1b0_0
typing_extensions 3.7.4.3 pyh06a4308_0
urllib3 1.25.11 pypi_0 pypi
wcmatch 6.1 pypi_0 pypi
wcwidth 0.2.5 py_0
webencodings 0.5.1 py37_1
webob 1.8.7 pyhd3eb1b0_0
werkzeug 1.0.1 pyhd3eb1b0_0
wheel 0.36.2 pyhd3eb1b0_0
wrapt 1.12.1 py37h7b6447c_1
xarray 0.17.0 pyhd3eb1b0_0
xhistogram 0.1.2 pyhd8ed1ab_0 conda-forge
xskillscore 0.0.20 pypi_0 pypi
xz 5.2.5 h7b6447c_0
yaml 0.2.5 h7b6447c_0
yarl 1.6.3 py37h27cfd23_0
zarr 2.6.1 pyhd3eb1b0_0
zeromq 4.3.4 h2531618_0
zict 2.0.0 pyhd3eb1b0_0
zipp 3.4.1 pyhd3eb1b0_0
zlib 1.2.11 h7b6447c_3
zstd 1.4.5 h9ceee32_0
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
``` ```
......
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Create biweekly renku datasets from `climetlab-s2s-ai-challenge` # Create biweekly renku datasets from `climetlab-s2s-ai-challenge`
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Goal: Goal:
- Create biweekly renku datasets from [`climatelab-s2s-ai-challenge`](https://github.com/ecmwf-lab/climetlab-s2s-ai-challenge). - Create biweekly renku datasets from [`climatelab-s2s-ai-challenge`](https://github.com/ecmwf-lab/climetlab-s2s-ai-challenge).
- These renku datasets are then used in notebooks: - These renku datasets are then used in notebooks:
- `ML_train_and_predict.ipynb` to train the ML model and do ML-based predictions - `ML_train_and_predict.ipynb` to train the ML model and do ML-based predictions
- `RPSS_verification.ipynb` to calculate RPSS of the ML model - `RPSS_verification.ipynb` to calculate RPSS of the ML model
- `mean_bias_reduction.ipynb` to remove the mean bias and - `mean_bias_reduction.ipynb` to remove the mean bias and
Requirements: Requirements:
- [`climetlab`](https://github.com/ecmwf/climetlab) - [`climetlab`](https://github.com/ecmwf/climetlab)
- [`climatelab-s2s-ai-challenge`](https://github.com/ecmwf-lab/climetlab-s2s-ai-challenge) - [`climatelab-s2s-ai-challenge`](https://github.com/ecmwf-lab/climetlab-s2s-ai-challenge)
- S2S and CPC observations uploaded on [European Weather Cloud (EWC)](https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/training-input/0.3.0/netcdf/index.html) - S2S and CPC observations uploaded on [European Weather Cloud (EWC)](https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/training-input/0.3.0/netcdf/index.html)
Output: [renku dataset](https://renku-python.readthedocs.io/en/latest/commands.html#module-renku.cli.dataset) `s2s-ai-challenge` Output: [renku dataset](https://renku-python.readthedocs.io/en/latest/commands.html#module-renku.cli.dataset) `s2s-ai-challenge`
- observations - observations
- deterministic: - deterministic:
- `hindcast-like-observations_2000-2019_biweekly_deterministic.zarr` - `hindcast-like-observations_2000-2019_biweekly_deterministic.zarr`
- `forecast-like-observations_2020_biweekly_deterministic.zarr` - `forecast-like-observations_2020_biweekly_deterministic.zarr`
- edges: - edges:
- `hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc` - `hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc`
- probabilistic: - probabilistic:
- `hindcast-like-observations_2000-2019_biweekly_terciled.zarr` - `hindcast-like-observations_2000-2019_biweekly_terciled.zarr`
- `forecast-like-observations_2020_biweekly_terciled.nc` - `forecast-like-observations_2020_biweekly_terciled.nc`
- forecasts/hindcasts - forecasts/hindcasts
- deterministic: - deterministic:
- `ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr` - `ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr`
- `ecmwf_forecast-input_2020_biweekly_deterministic.zarr` - `ecmwf_forecast-input_2020_biweekly_deterministic.zarr`
- more models could be added - more models could be added
- benchmark: - benchmark:
- probabilistic: - probabilistic:
- `ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc` - `ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc`
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
import xarray as xr import xarray as xr
import xskillscore as xs import xskillscore as xs
import pandas as pd import pandas as pd
import climetlab_s2s_ai_challenge import climetlab_s2s_ai_challenge
import climetlab as cml import climetlab as cml
print(f'Climetlab version : {cml.__version__}') print(f'Climetlab version : {cml.__version__}')
print(f'Climetlab-s2s-ai-challenge plugin version : {climetlab_s2s_ai_challenge.__version__}') print(f'Climetlab-s2s-ai-challenge plugin version : {climetlab_s2s_ai_challenge.__version__}')
xr.set_options(keep_attrs=True) xr.set_options(keep_attrs=True)
xr.set_options(display_style='text') xr.set_options(display_style='text')
``` ```
%% Output
WARNING: ecmwflibs universal: cannot find a library called MagPlus
Magics library could not be found
Climetlab version : 0.8.6
Climetlab-s2s-ai-challenge plugin version : 0.8.0
<xarray.core.options.set_options at 0x2b51c148f590>
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# caching path for climetlab # caching path for climetlab
cache_path = "/work/mh0727/m300524/S2S_AI/cache" # set your own path cache_path = "/work/mh0727/m300524/S2S_AI/cache4" # set your own path
cml.settings.set("cache-directory", cache_path) cml.settings.set("cache-directory", cache_path)
```
%% Cell type:code id: tags:
``` python
cache_path = "../data" cache_path = "../data"
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Download and cache # Download and cache
Download all files for the observations, forecast and hindcast. Download all files for the observations, forecast and hindcast.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# shortcut # shortcut
from scripts import download from scripts import download
#download() #download()
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## hindcast and forecast `input` ## hindcast and forecast `input`
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# starting dates forecast_time in 2020 # starting dates forecast_time in 2020
dates = xr.cftime_range(start='20200102',freq='7D', periods=53).strftime('%Y%m%d').to_list() dates = xr.cftime_range(start='20200102',freq='7D', periods=53).strftime('%Y%m%d').to_list()
forecast_dataset_labels = ['training-input','test-input'] # ML community forecast_dataset_labels = ['training-input','test-input'] # ML community
# equiv to # equiv to
forecast_dataset_labels = ['hindcast-input','forecast-input'] # NWP community forecast_dataset_labels = ['hindcast-input','forecast-input'] # NWP community
varlist_forecast = ['tp','t2m'] # can add more varlist_forecast = ['tp','t2m'] # can add more
center_list = ['ecmwf'] # 'ncep', 'eccc' center_list = ['ecmwf'] # 'ncep', 'eccc'
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
%%time %%time
# takes ~ 10-30 min to download for one model one variable depending on number of model realizations # takes ~ 10-30 min to download for one model one variable depending on number of model realizations
# and download settings https://climetlab.readthedocs.io/en/latest/guide/settings.html # and download settings https://climetlab.readthedocs.io/en/latest/guide/settings.html
for center in center_list: for center in center_list:
for ds in forecast_dataset_labels: for ds in forecast_dataset_labels:
cml.load_dataset(f"s2s-ai-challenge-{ds}", origin=center, parameter=varlist_forecast, format='netcdf').to_xarray() cml.load_dataset(f"s2s-ai-challenge-{ds}", origin=center, parameter=varlist_forecast, format='netcdf').to_xarray()
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## observations `output-reference` ## observations `output-reference`
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
obs_dataset_labels = ['training-output-reference','test-output-reference'] # ML community obs_dataset_labels = ['training-output-reference','test-output-reference'] # ML community
# equiv to # equiv to
obs_dataset_labels = ['hindcast-like-observations','forecast-like-observations'] # NWP community obs_dataset_labels = ['hindcast-like-observations','forecast-like-observations'] # NWP community
varlist_obs = ['tp', 't2m'] varlist_obs = ['tp', 't2m']
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
%%time %%time
# takes 10min to download # takes 10min to download
for ds in obs_dataset_labels: for ds in obs_dataset_labels:
print(ds) print(ds)
# only netcdf, no format choice # only netcdf, no format choice
cml.load_dataset(f"s2s-ai-challenge-{ds}", date=dates, parameter=varlist_obs).to_xarray() cml.load_dataset(f"s2s-ai-challenge-{ds}", date=dates, parameter=varlist_obs).to_xarray()
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# download obs_time for to create output-reference/observations for other models than ecmwf and eccc, # download obs_time for to create output-reference/observations for other models than ecmwf and eccc,
# i.e. ncep or any S2S or Sub model # i.e. ncep or any S2S or Sub model
obs_time = cml.load_dataset(f"s2s-ai-challenge-observations", parameter=['t2m', 'pr']).to_xarray() obs_time = cml.load_dataset(f"s2s-ai-challenge-observations", parameter=['t2m', 'pr']).to_xarray()
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# create bi-weekly aggregates # create bi-weekly aggregates
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from scripts import aggregate_biweekly, ensure_attributes from scripts import aggregate_biweekly, ensure_attributes
#aggregate_biweekly?? #aggregate_biweekly??
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
for c, center in enumerate(center_list): # forecast centers (could also take models) for c, center in enumerate(center_list): # forecast centers (could also take models)
for dsl in obs_dataset_labels + forecast_dataset_labels: # climetlab dataset labels for dsl in obs_dataset_labels:# + forecast_dataset_labels: # climetlab dataset labels
for p, parameter in enumerate(varlist_forecast): # variables for p, parameter in enumerate(varlist_forecast): # variables
if c != 0 and 'observation' in dsl: # only do once for observations if c != 0 and 'observation' in dsl: # only do once for observations
continue continue
print(f"datasetlabel: {dsl}, center: {center}, parameter: {parameter}") print(f"datasetlabel: {dsl}, center: {center}, parameter: {parameter}")
if 'input' in dsl: if 'input' in dsl:
ds = cml.load_dataset(f"s2s-ai-challenge-{dsl}", origin=center, parameter=parameter, format='netcdf').to_xarray() ds = cml.load_dataset(f"s2s-ai-challenge-{dsl}", origin=center, parameter=parameter, format='netcdf').to_xarray()
elif 'observation' in dsl: # obs only netcdf, no choice elif 'observation' in dsl: # obs only netcdf, no choice
if parameter not in ['t2m', 'tp']: if parameter not in ['t2m', 'tp']:
continue continue
ds = cml.load_dataset(f"s2s-ai-challenge-{dsl}", parameter=parameter, date=dates).to_xarray() ds = cml.load_dataset(f"s2s-ai-challenge-{dsl}", parameter=parameter, date=dates).to_xarray()
if p == 0: if p == 0:
ds_biweekly = ds.map(aggregate_biweekly) ds_biweekly = ds.map(aggregate_biweekly)
else: else:
ds_biweekly[parameter] = ds.map(aggregate_biweekly)[parameter] ds_biweekly[parameter] = ds.map(aggregate_biweekly)[parameter]
ds_biweekly = ds_biweekly.map(ensure_attributes, biweekly=True) ds_biweekly = ds_biweekly.map(ensure_attributes, biweekly=True)
ds_biweekly = ds_biweekly.sortby('forecast_time') ds_biweekly = ds_biweekly.sortby('forecast_time')
if 'test' in dsl: if 'test' in dsl:
ds_biweekly = ds_biweekly.chunk('auto') ds_biweekly = ds_biweekly.chunk('auto')
else: else:
ds_biweekly = ds_biweekly.chunk({'forecast_time':'auto','lead_time':-1,'longitude':-1,'latitude':-1}) ds_biweekly = ds_biweekly.chunk({'forecast_time':'auto','lead_time':-1,'longitude':-1,'latitude':-1})
if 'hindcast' in dsl: if 'hindcast' in dsl:
time = f'{int(ds_biweekly.forecast_time.dt.year.min())}-{int(ds_biweekly.forecast_time.dt.year.max())}' time = f'{int(ds_biweekly.forecast_time.dt.year.min())}-{int(ds_biweekly.forecast_time.dt.year.max())}'
if 'input' in dsl: if 'input' in dsl:
name = f'{center}_{dsl}' name = f'{center}_{dsl}'
elif 'observations': elif 'observations':
name = dsl name = dsl
elif 'forecast' in dsl: elif 'forecast' in dsl:
time = '2020' time = '2020'
if 'input' in dsl: if 'input' in dsl:
name = f'{center}_{dsl}' name = f'{center}_{dsl}'
elif 'observations': elif 'observations':
name = dsl name = dsl
else: else:
assert False assert False
# pattern: {model_if_not_observations}{observations/forecast/hindcast}_{time}_biweekly_deterministic.zarr # pattern: {model_if_not_observations}{observations/forecast/hindcast}_{time}_biweekly_deterministic.zarr
zp = f'{cache_path}/{name}_{time}_biweekly_deterministic.zarr' zp = f'{cache_path}/{name}_{time}_biweekly_deterministic.zarr'
ds_biweekly.attrs.update({'postprocessed_by':'https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/blob/master/notebooks/renku_datasets_biweekly.ipynb'}) ds_biweekly.attrs.update({'postprocessed_by':'https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/blob/master/notebooks/renku_datasets_biweekly.ipynb'})
print(f'save to: {zp}') print(f'save to: {zp}')
ds_biweekly.astype('float32').to_zarr(zp, consolidated=True, mode='w') ds_biweekly.astype('float32').to_zarr(zp, consolidated=True, mode='w')
``` ```
%% Output
datasetlabel: hindcast-like-observations, center: ecmwf, parameter: tp
By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data. This dataset has been dowloaded from IRIDL. By downloading this data you also agree to the terms and conditions defined at https://iridl.ldeo.columbia.edu.
WARNING: ecmwflibs universal: found eccodes at /work/mh0727/m300524/conda-envs/s2s-ai/lib/libeccodes.so
Warning: ecCodes 2.21.0 or higher is recommended. You are running version 2.18.0
By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data.
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/indexing.py:1379: PerformanceWarning: Slicing with an out-of-order index is generating 20 times more chunks
return self.array[key]
datasetlabel: hindcast-like-observations, center: ecmwf, parameter: t2m
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/indexing.py:1379: PerformanceWarning: Slicing is producing a large chunk. To accept the large
chunk and silence this warning, set the option
>>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
... array[indexer]
To avoid creating the large chunks, set the option
>>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
... array[indexer]
return self.array[key]
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/indexing.py:1379: PerformanceWarning: Slicing is producing a large chunk. To accept the large
chunk and silence this warning, set the option
>>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
... array[indexer]
To avoid creating the large chunks, set the option
>>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
... array[indexer]
return self.array[key]
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/indexing.py:1379: PerformanceWarning: Slicing with an out-of-order index is generating 20 times more chunks
return self.array[key]
save to: ../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
x = np.divide(x1, x2, out)
datasetlabel: forecast-like-observations, center: ecmwf, parameter: tp
By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data. This dataset has been dowloaded from IRIDL. By downloading this data you also agree to the terms and conditions defined at https://iridl.ldeo.columbia.edu.
datasetlabel: forecast-like-observations, center: ecmwf, parameter: t2m
save to: ../data/forecast-like-observations_2020_biweekly_deterministic.zarr
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
x = np.divide(x1, x2, out)
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## add to `renku` dataset `s2s-ai-challenge` ## add to `renku` dataset `s2s-ai-challenge`
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# observations as hindcast # observations as hindcast
# run renku commands from projects root directory only # run renku commands from projects root directory only
# !renku dataset add s2s-ai-challenge data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr # !renku dataset add s2s-ai-challenge data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# for further use retrieve from git lfs # for further use retrieve from git lfs
# !renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr # !renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
obs_2000_2019 = xr.open_zarr(f"{cache_path}/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr", consolidated=True) obs_2000_2019 = xr.open_zarr(f"{cache_path}/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr", consolidated=True)
print(obs_2000_2019.sizes,'\n',obs_2000_2019.coords,'\n', obs_2000_2019.nbytes/1e6,'MB') print(obs_2000_2019.sizes,'\n',obs_2000_2019.coords,'\n', obs_2000_2019.nbytes/1e6,'MB')
``` ```
%% Output %% Output
Frozen(SortedKeysDict({'forecast_time': 1060, 'latitude': 121, 'lead_time': 2, 'longitude': 240})) Frozen(SortedKeysDict({'forecast_time': 1060, 'latitude': 121, 'lead_time': 2, 'longitude': 240}))
Coordinates: Coordinates:
* forecast_time (forecast_time) datetime64[ns] 2000-01-02 ... 2019-12-31 * forecast_time (forecast_time) datetime64[ns] 2000-01-02 ... 2019-12-31
* latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0 * latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
* lead_time (lead_time) timedelta64[ns] 14 days 28 days * lead_time (lead_time) timedelta64[ns] 14 days 28 days
* longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5 * longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
valid_time (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 1060), meta=np.ndarray> valid_time (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 1060), meta=np.ndarray>
492.546744 MB 492.546744 MB
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# observations as forecast # observations as forecast
# run renku commands from projects root directory only # run renku commands from projects root directory only
# !renku dataset add s2s-ai-challenge data/forecast-like-observations_2020_biweekly_deterministic.zarr # !renku dataset add s2s-ai-challenge data/forecast-like-observations_2020_biweekly_deterministic.zarr
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# for further use retrieve from git lfs
# !renku storage pull ../data/forecast-like-observations_2020_biweekly_deterministic.zarr
```
%% Cell type:code id: tags:
``` python
obs_2020 = xr.open_zarr(f"{cache_path}/forecast-like-observations_2020_biweekly_deterministic.zarr", consolidated=True) obs_2020 = xr.open_zarr(f"{cache_path}/forecast-like-observations_2020_biweekly_deterministic.zarr", consolidated=True)
print(obs_2020.sizes,'\n',obs_2020.coords,'\n', obs_2020.nbytes/1e6,'MB') print(obs_2020.sizes,'\n',obs_2020.coords,'\n', obs_2020.nbytes/1e6,'MB')
``` ```
%% Output %% Output
Frozen(SortedKeysDict({'forecast_time': 53, 'latitude': 121, 'lead_time': 2, 'longitude': 240})) Frozen(SortedKeysDict({'forecast_time': 53, 'latitude': 121, 'lead_time': 2, 'longitude': 240}))
Coordinates: Coordinates:
* forecast_time (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31 * forecast_time (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31
* latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0 * latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
* lead_time (lead_time) timedelta64[ns] 14 days 28 days * lead_time (lead_time) timedelta64[ns] 14 days 28 days
* longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5 * longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
valid_time (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 53), meta=np.ndarray> valid_time (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 53), meta=np.ndarray>
24.630096 MB 24.630096 MB
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# ecmwf hindcast-input # ecmwf hindcast-input
# run renku commands from projects root directory only # run renku commands from projects root directory only
# !renku dataset add s2s-ai-challenge data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr # !renku dataset add s2s-ai-challenge data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
hind_2000_2019 = xr.open_zarr(f"{cache_path}/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr", consolidated=True) # for further use retrieve from git lfs
print(hind_2000_2019.sizes,'\n',hind_2000_2019.coords,'\n', hind_2000_2019.nbytes/1e6,'MB') # !renku storage pull ../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr
``` ```
%% Output %% Cell type:code id: tags:
Frozen(SortedKeysDict({'forecast_time': 1060, 'latitude': 121, 'lead_time': 2, 'longitude': 240, 'realization': 11})) ``` python
Coordinates: hind_2000_2019 = xr.open_zarr(f"{cache_path}/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr", consolidated=True)
* forecast_time (forecast_time) datetime64[ns] 2000-01-02 ... 2019-12-31 print(hind_2000_2019.sizes,'\n',hind_2000_2019.coords,'\n', hind_2000_2019.nbytes/1e6,'MB')
* latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0 ```
* lead_time (lead_time) timedelta64[ns] 14 days 28 days
* longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
* realization (realization) int64 0 1 2 3 4 5 6 7 8 9 10
valid_time (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 1060), meta=np.ndarray>
5417.730832 MB
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# ecmwf forecast-input # ecmwf forecast-input
# run renku commands from projects root directory only # run renku commands from projects root directory only
# !renku dataset add s2s-ai-challenge data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr # !renku dataset add s2s-ai-challenge data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
fct_2020 = xr.open_zarr(f"{cache_path}/ecmwf_forecast-input_2020_biweekly_deterministic.zarr", consolidated=True) # for further use retrieve from git lfs
print(fct_2020.sizes,'\n',fct_2020.coords,'\n', fct_2020.nbytes/1e6,'MB') # !renku storage pull ../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr
``` ```
%% Output %% Cell type:code id: tags:
Frozen(SortedKeysDict({'forecast_time': 53, 'latitude': 121, 'lead_time': 2, 'longitude': 240, 'realization': 51})) ``` python
Coordinates: fct_2020 = xr.open_zarr(f"{cache_path}/ecmwf_forecast-input_2020_biweekly_deterministic.zarr", consolidated=True)
* forecast_time (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31 print(fct_2020.sizes,'\n',fct_2020.coords,'\n', fct_2020.nbytes/1e6,'MB')
* latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0 ```
* lead_time (lead_time) timedelta64[ns] 14 days 28 days
* longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
* realization (realization) int64 0 1 2 3 4 5 6 7 ... 44 45 46 47 48 49 50
valid_time (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 53), meta=np.ndarray>
1255.926504 MB
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# tercile edges # tercile edges
Create 2 tercile edges at 1/3 and 2/3 quantiles of the 2000-2019 biweekly distrbution for each week of the year Create 2 tercile edges at 1/3 and 2/3 quantiles of the 2000-2019 biweekly distrbution for each week of the year
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
obs_2000_2019 = xr.open_zarr(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr', consolidated=True)
```
%% Cell type:code id: tags:
``` python
from scripts import add_year_week_coords
```
%% Cell type:code id: tags:
``` python
# add week for groupby, see https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge/-/issues/29
obs_2000_2019 = add_year_week_coords(obs_2000_2019)
```
%% Output
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:383: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
FutureWarning,
%% Cell type:code id: tags:
``` python
obs_2000_2019
```
%% Output
<xarray.Dataset>
Dimensions: (forecast_time: 1060, latitude: 121, lead_time: 2, longitude: 240)
Coordinates:
* forecast_time (forecast_time) datetime64[ns] 2000-01-02 ... 2019-12-31
* latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
* lead_time (lead_time) timedelta64[ns] 14 days 28 days
* longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
valid_time (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 1060), meta=np.ndarray>
week (forecast_time) int64 1 2 3 4 5 6 7 ... 47 48 49 50 51 52 53
year (forecast_time) int64 2000 2000 2000 2000 ... 2019 2019 2019
Data variables:
t2m (lead_time, forecast_time, latitude, longitude) float32 dask.array<chunksize=(2, 530, 121, 240), meta=np.ndarray>
tp (lead_time, forecast_time, latitude, longitude) float32 dask.array<chunksize=(2, 530, 121, 240), meta=np.ndarray>
Attributes:
created_by_script: tools/observations/makefile
created_by_software: climetlab-s2s-ai-challenge
function: climetlab_s2s_ai_challenge.extra.forecast_like_obse...
postprocessed_by: https://renkulab.io/gitlab/aaron.spring/s2s-ai-chal...
regrid_method: conservative
source_dataset_name: NOAA NCEP CPC UNIFIED_PRCP GAUGE_BASED GLOBAL v1p0 ...
source_hosting: IRIDL
source_url: http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/...
%% Cell type:code id: tags:
``` python
tercile_file = f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc' tercile_file = f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc'
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
%%time %%time
xr.open_zarr(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr', obs_2000_2019.chunk({'forecast_time':-1,'longitude':'auto'}).groupby('week').quantile(q=[1./3.,2./3.], dim='forecast_time').rename({'quantile':'category_edge'}).astype('float32').to_netcdf(tercile_file)
consolidated=True).chunk({'forecast_time':-1,'longitude':'auto'}).groupby('forecast_time.weekofyear').quantile(q=[1./3.,2./3.], dim=['forecast_time']).rename({'quantile':'category_edge'}).astype('float32').to_netcdf(tercile_file)
``` ```
%% Output %% Output
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
FutureWarning,
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/numpy/lib/nanfunctions.py:1390: RuntimeWarning: All-NaN slice encountered /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/numpy/lib/nanfunctions.py:1390: RuntimeWarning: All-NaN slice encountered
overwrite_input, interpolation) overwrite_input, interpolation)
CPU times: user 21min 25s, sys: 9min 19s, total: 30min 45s CPU times: user 19min 35s, sys: 8min 33s, total: 28min 9s
Wall time: 18min 4s Wall time: 16min 44s
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
tercile_edges = xr.open_dataset(tercile_file) tercile_edges = xr.open_dataset(tercile_file)
tercile_edges tercile_edges
``` ```
%% Output %% Output
<xarray.Dataset> <xarray.Dataset>
Dimensions: (category_edge: 2, latitude: 121, lead_time: 2, longitude: 240, weekofyear: 53) Dimensions: (category_edge: 2, latitude: 121, lead_time: 2, longitude: 240, week: 53)
Coordinates: Coordinates:
* latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0 * latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
* lead_time (lead_time) timedelta64[ns] 14 days 28 days * lead_time (lead_time) timedelta64[ns] 14 days 28 days
* longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5 * longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
* category_edge (category_edge) float64 0.3333 0.6667 * category_edge (category_edge) float64 0.3333 0.6667
* weekofyear (weekofyear) int64 1 2 3 4 5 6 7 8 ... 47 48 49 50 51 52 53 * week (week) int64 1 2 3 4 5 6 7 8 9 ... 45 46 47 48 49 50 51 52 53
Data variables: Data variables:
t2m (weekofyear, category_edge, lead_time, latitude, longitude) float32 ... t2m (week, category_edge, lead_time, latitude, longitude) float32 ...
tp (weekofyear, category_edge, lead_time, latitude, longitude) float32 ... tp (week, category_edge, lead_time, latitude, longitude) float32 ...
Attributes: Attributes:
created_by_script: tools/observations/makefile created_by_script: tools/observations/makefile
created_by_software: climetlab-s2s-ai-challenge created_by_software: climetlab-s2s-ai-challenge
function: climetlab_s2s_ai_challenge.extra.forecast_like_obse... function: climetlab_s2s_ai_challenge.extra.forecast_like_obse...
postprocessed: by https://renkulab.io/gitlab/aaron.spring/s2s-ai-c... postprocessed_by: https://renkulab.io/gitlab/aaron.spring/s2s-ai-chal...
regrid_method: conservative
source_dataset_name: NOAA NCEP CPC UNIFIED_PRCP GAUGE_BASED GLOBAL v1p0 ... source_dataset_name: NOAA NCEP CPC UNIFIED_PRCP GAUGE_BASED GLOBAL v1p0 ...
source_hosting: IRIDL source_hosting: IRIDL
source_url: http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/... source_url: http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/...
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
tercile_edges.nbytes*1e-6,'MB' tercile_edges.nbytes*1e-6,'MB'
``` ```
%% Output %% Output
(49.255184, 'MB') (49.255184, 'MB')
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# run renku commands from projects root directory only # run renku commands from projects root directory only
# tercile edges # tercile edges
#!renku dataset add s2s-ai-challenge data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc #!renku dataset add s2s-ai-challenge data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# to use retrieve from git lfs # to use retrieve from git lfs
#!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc #!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc
#xr.open_dataset("../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc") #xr.open_dataset("../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc")
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# observations in categories # observations in categories
- counting how many deterministic forecasts realizations fall into each category, like counting rps - counting how many deterministic forecasts realizations fall into each category, like counting rps
- categorize forecast-like-observations 2020 into categories - categorize forecast-like-observations 2020 into categories
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
obs_2020 = xr.open_zarr(f'{cache_path}/forecast-like-observations_2020_biweekly_deterministic.zarr', consolidated=True) obs_2020 = xr.open_zarr(f'{cache_path}/forecast-like-observations_2020_biweekly_deterministic.zarr', consolidated=True)
obs_2020.sizes obs_2020.sizes
``` ```
%% Output %% Output
Frozen(SortedKeysDict({'forecast_time': 53, 'latitude': 121, 'lead_time': 2, 'longitude': 240})) Frozen(SortedKeysDict({'forecast_time': 53, 'latitude': 121, 'lead_time': 2, 'longitude': 240}))
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# create a mask for land grid # create a mask for land grid
mask = obs_2020.std(['lead_time','forecast_time']).notnull() mask = obs_2020.std(['lead_time','forecast_time']).notnull()
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# mask.to_array().plot(col='variable') # mask.to_array().plot(col='variable')
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# total precipitation in arid regions are masked # total precipitation in arid regions are masked
# Frederic Vitart suggested by email: "Based on your map we could mask all the areas where the lower tercile boundary is lower than 0.1 mm" # Frederic Vitart suggested by email: "Based on your map we could mask all the areas where the lower tercile boundary is lower than 0.1 mm"
# we are using a dry mask as in https://doi.org/10.1175/MWR-D-17-0092.1 # we are using a dry mask as in https://doi.org/10.1175/MWR-D-17-0092.1
th = 0.01 th = 0.01
tp_arid_mask = tercile_edges.tp.isel(category_edge=0, lead_time=0, drop=True) > th tp_arid_mask = tercile_edges.tp.isel(category_edge=0, lead_time=0, drop=True) > th
#tp_arid_mask.where(mask.tp).plot(col='forecast_time', col_wrap=4) #tp_arid_mask.where(mask.tp).plot(col='forecast_time', col_wrap=4)
#plt.suptitle(f'dry mask: week 3-4 tp 1/3 category_edge > {th} kg m-2',y=1., x=.4) #plt.suptitle(f'dry mask: week 3-4 tp 1/3 category_edge > {th} kg m-2',y=1., x=.4)
#plt.savefig('dry_mask.png') #plt.savefig('dry_mask.png')
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# look into tercile edges # look into tercile edges
```
%% Cell type:code id: tags: # tercile_edges.isel(forecast_time=0)['tp'].plot(col='lead_time',row='category_edge', robust=True)
# tercile_edges.isel(forecast_time=[0,20],category_edge=1)['tp'].plot(col='lead_time', row='forecast_time', robust=True)
``` python
#tercile_edges.isel(forecast_time=0)['tp'].plot(col='lead_time',row='category_edge', robust=True)
```
%% Cell type:code id: tags:
``` python
#tercile_edges.isel(forecast_time=[0,20],category_edge=1)['tp'].plot(col='lead_time', row='forecast_time', robust=True)
```
%% Cell type:code id: tags:
``` python
# tercile_edges.tp.mean(['forecast_time']).plot(col='lead_time',row='category_edge',vmax=.5) # tercile_edges.tp.mean(['forecast_time']).plot(col='lead_time',row='category_edge',vmax=.5)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## categorize observations ## categorize observations
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### forecast 2020 ### forecast 2020
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from scripts import make_probabilistic from scripts import make_probabilistic
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# tp_arid_mask.isel(week=[0,10,20,30,40]).plot(col='week')
```
%% Cell type:code id: tags:
``` python
obs_2020_p = make_probabilistic(obs_2020, tercile_edges, mask=mask) obs_2020_p = make_probabilistic(obs_2020, tercile_edges, mask=mask)
``` ```
%% Output %% Output
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead. /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:383: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
FutureWarning, FutureWarning,
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
obs_2020_p.nbytes/1e6, 'MB' obs_2020_p.nbytes/1e6, 'MB'
``` ```
%% Output %% Output
(147.75984, 'MB') (147.75984, 'MB')
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
obs_2020_p
```
%% Output
<xarray.Dataset>
Dimensions: (category: 3, forecast_time: 53, latitude: 121, lead_time: 2, longitude: 240)
Coordinates:
* forecast_time (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31
* latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
* lead_time (lead_time) timedelta64[ns] 14 days 28 days
* longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
valid_time (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 53), meta=np.ndarray>
* category (category) <U12 'below normal' 'near normal' 'above normal'
Data variables:
t2m (category, lead_time, forecast_time, latitude, longitude) float64 dask.array<chunksize=(1, 2, 53, 121, 240), meta=np.ndarray>
tp (category, lead_time, forecast_time, latitude, longitude) float64 dask.array<chunksize=(1, 2, 53, 121, 240), meta=np.ndarray>
%% Cell type:code id: tags:
``` python
obs_2020_p.astype('float32').to_netcdf(f'{cache_path}/forecast-like-observations_2020_biweekly_terciled.nc') obs_2020_p.astype('float32').to_netcdf(f'{cache_path}/forecast-like-observations_2020_biweekly_terciled.nc')
``` ```
%% Output %% Output
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
x = np.divide(x1, x2, out) x = np.divide(x1, x2, out)
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# forecast-like-observations terciled # forecast-like-observations terciled
# run renku commands from projects root directory only # run renku commands from projects root directory only
# !renku dataset add s2s-ai-challenge data/forecast-like-observations_2020_biweekly_terciled.nc # !renku dataset add s2s-ai-challenge data/forecast-like-observations_2020_biweekly_terciled.nc
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# to use retrieve from git lfs # to use retrieve from git lfs
#!renku storage pull ../data/forecast-like-observations_2020_biweekly_terciled.nc #!renku storage pull ../data/forecast-like-observations_2020_biweekly_terciled.nc
xr.open_dataset("../data/forecast-like-observations_2020_biweekly_terciled.nc") xr.open_dataset("../data/forecast-like-observations_2020_biweekly_terciled.nc")
``` ```
%% Output %% Output
<xarray.Dataset> <xarray.Dataset>
Dimensions: (category: 3, forecast_time: 53, latitude: 121, lead_time: 2, longitude: 240) Dimensions: (category: 3, forecast_time: 53, latitude: 121, lead_time: 2, longitude: 240)
Coordinates: Coordinates:
* forecast_time (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31 * forecast_time (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31
* latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0 * latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
* lead_time (lead_time) timedelta64[ns] 14 days 28 days * lead_time (lead_time) timedelta64[ns] 14 days 28 days
* longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5 * longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
valid_time (lead_time, forecast_time) datetime64[ns] ... valid_time (lead_time, forecast_time) datetime64[ns] ...
* category (category) object 'below normal' 'near normal' 'above normal' * category (category) object 'below normal' 'near normal' 'above normal'
Data variables: Data variables:
t2m (category, lead_time, forecast_time, latitude, longitude) float32 ... t2m (category, lead_time, forecast_time, latitude, longitude) float32 ...
tp (category, lead_time, forecast_time, latitude, longitude) float32 ... tp (category, lead_time, forecast_time, latitude, longitude) float32 ...
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### hindcast 2000_2019 ### hindcast 2000_2019
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
obs_2000_2019 = xr.open_zarr(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr', consolidated=True) obs_2000_2019 = xr.open_zarr(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr', consolidated=True)
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
obs_2000_2019_p = make_probabilistic(obs_2000_2019, tercile_edges, mask=mask) obs_2000_2019_p = make_probabilistic(obs_2000_2019, tercile_edges, mask=mask)
``` ```
%% Output %% Output
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:381: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead. /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/xarray/core/accessor_dt.py:383: FutureWarning: dt.weekofyear and dt.week have been deprecated. Please use dt.isocalendar().week instead.
FutureWarning, FutureWarning,
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
obs_2000_2019_p.nbytes/1e6, 'MB' obs_2000_2019_p.nbytes/1e6, 'MB'
``` ```
%% Output %% Output
(2955.138888, 'MB') (2955.138888, 'MB')
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
obs_2000_2019_p.astype('float32').chunk('auto').to_zarr(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_terciled.zarr', consolidated=True, mode='w') obs_2000_2019_p.astype('float32').chunk('auto').to_zarr(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_terciled.zarr', consolidated=True, mode='w')
``` ```
%% Output %% Output
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide /work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
x = np.divide(x1, x2, out) x = np.divide(x1, x2, out)
<xarray.backends.zarr.ZarrStore at 0x2b34e40d80c0> <xarray.backends.zarr.ZarrStore at 0x2b51c1233360>
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# forecast-like-observations terciled # forecast-like-observations terciled
# run renku commands from projects root directory only # run renku commands from projects root directory only
# !renku dataset add s2s-ai-challenge data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr # !renku dataset add s2s-ai-challenge data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# to use retrieve from git lfs # to use retrieve from git lfs
#!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr #!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr
xr.open_zarr("../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr") xr.open_zarr("../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr")
``` ```
%% Output %% Output
<xarray.Dataset> <xarray.Dataset>
Dimensions: (category: 3, forecast_time: 1060, latitude: 121, lead_time: 2, longitude: 240) Dimensions: (category: 3, forecast_time: 1060, latitude: 121, lead_time: 2, longitude: 240)
Coordinates: Coordinates:
* category (category) <U12 'below normal' 'near normal' 'above normal' * category (category) <U12 'below normal' 'near normal' 'above normal'
* forecast_time (forecast_time) datetime64[ns] 2000-01-02 ... 2019-12-31 * forecast_time (forecast_time) datetime64[ns] 2000-01-02 ... 2019-12-31
* latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0 * latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
* lead_time (lead_time) timedelta64[ns] 14 days 28 days * lead_time (lead_time) timedelta64[ns] 14 days 28 days
* longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5 * longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
valid_time (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 1060), meta=np.ndarray> valid_time (lead_time, forecast_time) datetime64[ns] dask.array<chunksize=(2, 1060), meta=np.ndarray>
Data variables: Data variables:
t2m (category, lead_time, forecast_time, latitude, longitude) float32 dask.array<chunksize=(1, 2, 280, 121, 240), meta=np.ndarray> t2m (category, lead_time, forecast_time, latitude, longitude) float32 dask.array<chunksize=(1, 2, 530, 121, 240), meta=np.ndarray>
tp (category, lead_time, forecast_time, latitude, longitude) float32 dask.array<chunksize=(1, 2, 280, 121, 240), meta=np.ndarray> tp (category, lead_time, forecast_time, latitude, longitude) float32 dask.array<chunksize=(1, 2, 530, 121, 240), meta=np.ndarray>
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# checking category frequencies
# o = xr.open_zarr("../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr")
# w=0
# v='tp'
# o.sel(forecast_time=o.forecast_time.dt.dayofyear==2+7*w).sum('forecast_time', skipna=False)[v].plot(row='lead_time',col='category', levels=[5.5,6.5,7.5])
# o.sel(forecast_time=o.forecast_time.dt.dayofyear==2+7*w).sum('forecast_time', skipna=False).sum('category', skipna=False)[v].plot(row='lead_time', levels=[16.5,17.5,18.5,19.5,20.5])
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Benchmark # Benchmark
center: ECMWF center: ECMWF
The calibration has been performed by using the tercile boundaries from the model climatology rather than from observations. Script by Frederic Vitart. The calibration has been performed by using the tercile boundaries from the model climatology rather than from observations. Script by Frederic Vitart.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
bench_p = cml.load_dataset("s2s-ai-challenge-test-output-benchmark", parameter=['tp','t2m']).to_xarray() bench_p = cml.load_dataset("s2s-ai-challenge-test-output-benchmark", parameter=['tp','t2m']).to_xarray()
``` ```
%% Output %% Output
50%|█████ | 1/2 [00:00<00:00, 6.11it/s] 50%|█████ | 1/2 [00:00<00:00, 6.11it/s]
By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data. By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data.
100%|██████████| 2/2 [00:00<00:00, 6.89it/s] 100%|██████████| 2/2 [00:00<00:00, 6.89it/s]
WARNING: ecmwflibs universal: found eccodes at /work/mh0727/m300524/conda-envs/s2s-ai/lib/libeccodes.so WARNING: ecmwflibs universal: found eccodes at /work/mh0727/m300524/conda-envs/s2s-ai/lib/libeccodes.so
Warning: ecCodes 2.21.0 or higher is recommended. You are running version 2.12.3 Warning: ecCodes 2.21.0 or higher is recommended. You are running version 2.12.3
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
bench_p['category'].attrs = {'long_name': 'tercile category probabilities', 'units': '1', bench_p['category'].attrs = {'long_name': 'tercile category probabilities', 'units': '1',
'description': 'Probabilities for three tercile categories. All three tercile category probabilities must add up to 1.'} 'description': 'Probabilities for three tercile categories. All three tercile category probabilities must add up to 1.'}
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
bench_p['lead_time'] = [pd.Timedelta(f"{i} d") for i in [14, 28]] # take first day of biweekly average as new coordinate bench_p['lead_time'] = [pd.Timedelta(f"{i} d") for i in [14, 28]] # take first day of biweekly average as new coordinate
bench_p['lead_time'].attrs = {'long_name':'forecast_period', 'description': 'Forecast period is the time interval between the forecast reference time and the validity time.', bench_p['lead_time'].attrs = {'long_name':'forecast_period', 'description': 'Forecast period is the time interval between the forecast reference time and the validity time.',
'aggregate': 'The pd.Timedelta corresponds to the first day of a biweekly aggregate.', 'aggregate': 'The pd.Timedelta corresponds to the first day of a biweekly aggregate.',
'week34_t2m': 'mean[day 14, 27]', 'week34_t2m': 'mean[day 14, 27]',
'week56_t2m': 'mean[day 28, 41]', 'week56_t2m': 'mean[day 28, 41]',
'week34_tp': 'day 28 minus day 14', 'week34_tp': 'day 28 minus day 14',
'week56_tp': 'day 42 minus day 28'} 'week56_tp': 'day 42 minus day 28'}
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
bench_p = bench_p / 100 # convert percent to [0-1] probability bench_p = bench_p / 100 # convert percent to [0-1] probability
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
bench_p = bench_p.map(ensure_attributes, biweekly=True) bench_p = bench_p.map(ensure_attributes, biweekly=True)
``` ```
%% Output %% Output
0%| | 0/1 [00:00<?, ?it/s] 0%| | 0/1 [00:00<?, ?it/s]
By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data. By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data.
100%|██████████| 1/1 [00:00<00:00, 4.34it/s] 100%|██████████| 1/1 [00:00<00:00, 4.34it/s]
100%|██████████| 1/1 [00:00<00:00, 4.22it/s] 100%|██████████| 1/1 [00:00<00:00, 4.22it/s]
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# bench_p.isel(forecast_time=2).t2m.plot(row='lead_time', col='category') # bench_p.isel(forecast_time=2).t2m.plot(row='lead_time', col='category')
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
bench_p bench_p
``` ```
%% Output %% Output
<xarray.Dataset> <xarray.Dataset>
Dimensions: (category: 3, forecast_time: 53, latitude: 121, lead_time: 2, longitude: 240) Dimensions: (category: 3, forecast_time: 53, latitude: 121, lead_time: 2, longitude: 240)
Coordinates: Coordinates:
* category (category) object 'below normal' 'near normal' 'above normal' * category (category) object 'below normal' 'near normal' 'above normal'
* forecast_time (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31 * forecast_time (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31
* lead_time (lead_time) timedelta64[ns] 14 days 28 days * lead_time (lead_time) timedelta64[ns] 14 days 28 days
* latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0 * latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
* longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5 * longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
valid_time (forecast_time, lead_time) datetime64[ns] 2020-01-16 ... 2... valid_time (forecast_time, lead_time) datetime64[ns] 2020-01-16 ... 2...
Data variables: Data variables:
tp (category, forecast_time, lead_time, latitude, longitude) float32 ... tp (category, forecast_time, lead_time, latitude, longitude) float32 ...
t2m (category, forecast_time, lead_time, latitude, longitude) float32 ... t2m (category, forecast_time, lead_time, latitude, longitude) float32 ...
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
bench_p.astype('float32').to_netcdf('../data/ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc') bench_p.astype('float32').to_netcdf('../data/ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc')
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
#!renku dataset add s2s-ai-challenge data/ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc #!renku dataset add s2s-ai-challenge data/ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc
``` ```
......
...@@ -146,11 +146,25 @@ def ensure_attributes(da, biweekly=False): ...@@ -146,11 +146,25 @@ def ensure_attributes(da, biweekly=False):
return da return da
def make_probabilistic(ds, tercile_edges, member_dim='realization', mask=None): def add_year_week_coords(ds):
import numpy as np
if 'week' not in ds.coords and 'year' not in ds.coords:
year = ds.forecast_time.dt.year.to_index().unique()
week = (list(np.arange(1,54)))
weeks = week * len(year)
years = np.repeat(year,len(week))
ds.coords["week"] = ("forecast_time", weeks)
ds.coords['week'].attrs['description'] = "This week represents the number of forecast_time starting from 1 to 53. Note: This week is different from the ISO week from groupby('forecast_time.weekofyear'), see https://en.wikipedia.org/wiki/ISO_week_date and https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge/-/issues/29"
ds.coords["year"] = ("forecast_time", years)
ds.coords['year'].attrs['long_name'] = "calendar year"
return ds
def make_probabilistic(ds, tercile_edges, member_dim='realization', mask=None, groupby_coord='week'):
"""Compute probabilities from ds (observations or forecasts) based on tercile_edges.""" """Compute probabilities from ds (observations or forecasts) based on tercile_edges."""
# broadcast # broadcast
if 'forecast_time' not in tercile_edges.dims and 'weekofyear' in tercile_edges.dims: ds = add_year_week_coords(ds)
tercile_edges = tercile_edges.sel(weekofyear=ds.forecast_time.dt.weekofyear) tercile_edges = tercile_edges.sel({groupby_coord: ds.coords[groupby_coord]})
bn = ds < tercile_edges.isel(category_edge=0, drop=True) # below normal bn = ds < tercile_edges.isel(category_edge=0, drop=True) # below normal
n = (ds >= tercile_edges.isel(category_edge=0, drop=True)) & (ds < tercile_edges.isel(category_edge=1, drop=True)) # normal n = (ds >= tercile_edges.isel(category_edge=0, drop=True)) & (ds < tercile_edges.isel(category_edge=1, drop=True)) # normal
an = ds >= tercile_edges.isel(category_edge=1, drop=True) # above normal an = ds >= tercile_edges.isel(category_edge=1, drop=True) # above normal
...@@ -176,12 +190,14 @@ def make_probabilistic(ds, tercile_edges, member_dim='realization', mask=None): ...@@ -176,12 +190,14 @@ def make_probabilistic(ds, tercile_edges, member_dim='realization', mask=None):
'comment': 'All three tercile category probabilities must add up to 1.', 'comment': 'All three tercile category probabilities must add up to 1.',
'variable_before_categorization': 'https://confluence.ecmwf.int/display/S2S/S2S+Surface+Air+Temperature' 'variable_before_categorization': 'https://confluence.ecmwf.int/display/S2S/S2S+Surface+Air+Temperature'
} }
if 'weekofyear' in ds_p.coords: if 'year' in ds_p.coords:
ds_p = ds_p.drop('weekofyear') del ds_p.coords['year']
if groupby_coord in ds_p.coords:
ds_p = ds_p.drop(groupby_coord)
return ds_p return ds_p
def skill_by_year(preds): def skill_by_year(preds, adapt=False):
"""Returns pd.Dataframe of RPSS per year.""" """Returns pd.Dataframe of RPSS per year."""
# similar verification_RPSS.ipynb # similar verification_RPSS.ipynb
# as scorer bot but returns a score for each year # as scorer bot but returns a score for each year
...@@ -194,44 +210,49 @@ def skill_by_year(preds): ...@@ -194,44 +210,49 @@ def skill_by_year(preds):
# from root # from root
#renku storage pull data/forecast-like-observations_2020_biweekly_terciled.nc #renku storage pull data/forecast-like-observations_2020_biweekly_terciled.nc
#renku storage pull data/hindcast-like-observations_2000-2019_biweekly_terciled.nc #renku storage pull data/hindcast-like-observations_2000-2019_biweekly_terciled.nc
cache_path = '../data'
if 2020 in preds.forecast_time.dt.year: if 2020 in preds.forecast_time.dt.year:
obs_p = xr.open_dataset(f'{cache_path}/forecast-like-observations_2020_biweekly_terciled.nc').sel(forecast_time=preds.forecast_time) obs_p = xr.open_dataset(f'{cache_path}/forecast-like-observations_2020_biweekly_terciled.nc').sel(forecast_time=preds.forecast_time)
else: else:
obs_p = xr.open_dataset(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_terciled.zarr', engine='zarr').sel(forecast_time=preds.forecast_time) obs_p = xr.open_dataset(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_terciled.zarr', engine='zarr').sel(forecast_time=preds.forecast_time)
# ML probabilities # ML probabilities
fct_p = preds fct_p = preds
# check inputs
assert_predictions_2020(obs_p)
assert_predictions_2020(fct_p)
# climatology # climatology
clim_p = xr.DataArray([1/3, 1/3, 1/3], dims='category', coords={'category':['below normal', 'near normal', 'above normal']}).to_dataset(name='tp') clim_p = xr.DataArray([1/3, 1/3, 1/3], dims='category', coords={'category':['below normal', 'near normal', 'above normal']}).to_dataset(name='tp')
clim_p['t2m'] = clim_p['tp'] clim_p['t2m'] = clim_p['tp']
## RPSS if adapt:
# select only obs_p where fct_p forecasts provided
for c in ['longitude', 'latitude', 'forecast_time', 'lead_time']:
obs_p = obs_p.sel({c:fct_p[c]})
obs_p = obs_p[list(fct_p.data_vars)]
clim_p = clim_p[list(fct_p.data_vars)]
else:
# check inputs
assert_predictions_2020(obs_p)
assert_predictions_2020(fct_p)
# rps_ML # rps_ML
rps_ML = xs.rps(obs_p, fct_p, category_edges=None, dim=[], input_distributions='p').compute() rps_ML = xs.rps(obs_p, fct_p, category_edges=None, dim=[], input_distributions='p').compute()
# rps_clim # rps_clim
rps_clim = xs.rps(obs_p, clim_p, category_edges=None, dim=[], input_distributions='p').compute() rps_clim = xs.rps(obs_p, clim_p, category_edges=None, dim=[], input_distributions='p').compute()
# rpss
rpss = 1 - (rps_ML / rps_clim)
# https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/issues/7
# penalize ## RPSS
penalize = obs_p.where(fct_p!=1, other=-10).mean('category') # penalize # https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/issues/7
rpss = rpss.where(penalize!=0,other=-10) expect = obs_p.sum('category')
expect = expect.where(expect > 0.98).where(expect < 1.02) # should be True if not all NaN
# https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/issues/50
rps_ML = rps_ML.where(expect, other=2) # assign RPS=2 where value was expected but NaN found
# following Weigel 2007: https://doi.org/10.1175/MWR3280.1
rpss = 1 - (rps_ML.groupby('forecast_time.year').mean() / rps_clim.groupby('forecast_time.year').mean())
# clip # clip
rpss = rpss.clip(-10, 1) rpss = rpss.clip(-10, 1)
# average over all forecasts
rpss = rpss.groupby('forecast_time.year').mean()
# weighted area mean # weighted area mean
weights = np.cos(np.deg2rad(np.abs(rpss.latitude))) weights = np.cos(np.deg2rad(np.abs(rpss.latitude)))
......
source diff could not be displayed: it is stored in LFS. Options to address this: view the blob.