Review Florian Pinault
This contribution uses a multimodel approach. Using several models from scikit-learn (Random Forest, Logistic regression) and additionally consider directly the climatology and the ECMWF data as two additional models. The choice of the best model is based on the RPSS score (using cross-validation).
Overfitting has been clearly avoided here as the methodology is clear and well implemented.
I did not reproduce the results but I feel confident that it would be possible to achieve this, provided that we have the required resources (RAM:60G, CPU:120). It is unclear whether using 120 CPUs is a hard requirement, though.
An additional comment regarding reproducibility : "We hold a copy of the training/obs datasets in our data archive. Please adjust paths in sections 2.1, 2.2, 2.3 as needed to rerun." (in https://renkulab.io/gitlab/lluis.palma/s2s-ai-challenge-bsc/-/blob/submission-ML_models/notebooks/S2S_ML_models.ipynb cell 1). In order to have a standalone fully reproducible notebook, it would be nice to have some more details or pointers on how to "adjust paths to the data". While this may be obvious within the context of the challenge may not be as clear for another researcher willing to reproduce the results.
Overall, I give a positive review and I want to thank the authors for their contribution.