Review Florian Pinault
This contribution splits the full domain (full globe) into 24 predefined sub-domains, then relies on scikit-learn to try different classifiers with various parameters. - MultiLayer Perceptron (hidden_layer_sizes=(64, 128, 64) - Logistic regression - K-nearest neightbors - Random Forest. Eventually, the random forest have been chosen. And its hyperparameters have been highly tuned in order to choose the best parameters. The decision to prefer a random forest model, the way to tune its parameters is not described. It would be nice to know if additional tuning of other statistical models would lead to similar results. The split between different regions would also be interesting to elaborate on: we trust the authors that the definition of the regions is based on previous knowledge of the field or other sources with appropriate reference. Looking at the code only, it appears that the splitting of the regions could have been optimized to have the best results on the challenge's metrics on the test dataset.
The contribution uses additional data regarding El Nino. These data are freely available and clearly identified with a commit of the data and their URLs (https://psl.noaa.gov/gcos_wgsp/Timeseries/Nino34, etc.). The data is also made available on a google drive which is convenient. Nevertheless, a personal google drive may not last as long as the NOAA links or as the data from renku.
The code is understandable but could be clearer, avoiding duplicated and nested "if" and "for", or "from scripts import *".
I did not reproduce the results but I feel confident that it would be easy to achieve.