# documentation overview

All code relevant to the submission is in the folder notebooks:

- main_train_predict.ipynb: summarizes the training and prediction pipeline and contains safeguards ticks and details on reproducibility
- CNN_train_predict.py: creates separate predictions for each lead time and variable for 5 different seeds
- helper_ml_data.py: contains functions used for optimization and training

We trained one model for each variable and lead time, i.e., 4 models in total (CNN_train_predict.py#L225-228, CNN_train_predict.py#train_predict_onevar_onelead). Each model is trained on a limited domain over Eastern Europe (of approximately 50° x 50°) (CNN_train_predict.py#L111-112, CNN_train_predict.py#domain). The model predicts tercile probabilities for the 64 grid cells at the center of the input domain jointly. To obtain a global prediction, we slide this local convolutional neural network over the whole globe (using a stride of 8 grid cells in latitude and longitude direction) (CNN_train_predict.py#L146, CNN_train_predict.py#L173, CNN_train_predict.py#slide_predict).

Our local convolutional neural network is a simple post-processing / calibration model in the sense that it only uses S2S forecasts of the corresponding target variables as input (CNN_train_predict.py#L213, CNN_train_predict.py#get_data). The two-channel input is derived by subtracting the weekly tercile edges from the ensemble mean of the S2S target-variable forecast. These two-dimensional fields are standardized by division with the temporally averaged standard deviation over the input domain(CNN_train_predict.py#L105, CNN_train_predict.py#L169, helper_ml_data.py#preprocess_input). All missing values are filled with zeros (CNN_train_predict.py#L60-62, helper_ml_data.py#DataGenerator1). We used the terciled observations as labels for our training.

The model architecture is adapted from Scheuerer et al. (2020) (CNN_train_predict.py#L66-86). The model consists of a convolutional layer with 4 filters of size 3 x 3 and ELU activation and a 2 x 2 max pooling layer. The output of the max pooling layer is flattened and dropout (dropout rate = 0.4) is applied. The resulting vector is run through a dense layer with 10 nodes and ELU activation and a dense layer with 27 nodes (27 = 3 terciles * 9 basis functions) and ELU activation. The resulting vector is reshaped into a 3 x 9 matrix. The nine entries per tercile are now mapped to the 8 x 8 output domain by multiplying with a set of 9 basis functions (64 x 9 matrix). The basis functions are smooth functions with circular bounded support that are uniformly distributed over the output domain (CNN_train_predict.py#L116, helper_ml_data.py#get_basis). As a last step, we apply a soft max activation function.

Our implementation builds on Keras. We minimize the categorical cross-entropy with label smoothing equal to 0.6 using the Adam optimizer with a learning rate of 0.0001, and stop the training after 20 epochs.

For the hyperparameter optimization, we focused on parameters related to the size of input and output domain, namely input domain size, output domain size, radius of the basis functions and label smoothing. The optimization was done only for the model for t2m and lead time 3-4 weeks on the training domain over Eastern Europe. We performed a 10-fold cross-validation using 18 years as training set and two consecutive years as validation set. The optimization experiments are documented in the folder (param_optimization).

The training of the final models was done on the same domain but using all 20 years from 2000 to 2019. The final submission is the average of an ensemble of 5 predictions per variable and lead time (main_train_predict_submit.ipynb#C6, main_train_predict_submit.ipynb#C15), obtained by repeating model estimation with different seeds (CNN_train_predict.py#L217). Each prediction was smoothed using gaussian filtering (e.g. CNN_train_predict.py#L179).

Please find some more information on the general idea in the Readme.