RPSS should be computed with means of RPS

Hi @aaron.spring , sorry that we raise this issue late, but just noticed it today.

From Weigel et al. 2017 you see that RPS has to be averaged for a number of samples before computing RPSS:

We believe that the current RPSS computation makes no sense, and I'll try to explain why. A probabilistic score for a single forecast is useless. Or alternatively, "a probabilistic forecast is never wrong". For instance, a high RPS for a forecast issued 01/01/2020 for weeks 3&4 can indicate that something unlikely according to the forecasts happened. That doesn't necessarily mean the forecast is wrong. Even with good forecasts, if you collect a large number of samples (RPS values), one would expect some large values sometimes (unlikely things have to happen sometimes). This argument to the limit (very large number of samples) means that the "single-sample RPSS" for a very unlikely event will be close to -infinity, and then averaging over all the scores will be biased by these few very negative values.

We think this should be fixed in the verification notebook.

I think you mean Weigel 2007: https://doi.org/10.1175/MWR3280.1

Sure, yes.

To add a bit more on this: when one performs a verification over an S2S hindcast with many years, one can compute an RPSS for each week of the year by averaging the RPS over many years. This is useful to evaluate the yearly evolution of skill (e.g. extratropical forecasts in winter tend to be more skillful). However in this case we only have 53 weeks for 2020 for the verification, and a single RPSS value summarizing 2020 performance over the 53 weeks has to be computed. The "single-sample RPSS" value for a single week and year is meaningless.

Dear @sergi.bechsala and @sergi.bechsala, thank you for reporting this. Your explanations sound reasonable and you are referencing to well cited references. Have you checked how much of a difference it makes (in terms of numerical score) when doing the mean before or after RPSS formula? EDIT: I did below.

What do you think @awr @fpv07 ?

If we change, we need to decide how we penalise missing values. For now we penalised them with -10 in rpss, for rps we need to find a positive penalty value.

In my example in the template, I get -0.038 instead of -0.096.

New rpss2 maps on top, old rpss below:

More relevant for everyone: benchmark rpss from ecmwf improves from -0.043 to -0.0016

I think it should be ok to get the RPSS at each grid point w the average RPS of all weeks, then curtail RPSS at -10 and average over all grid points.

ah, but RPS can take a missing value for certain 2020 weeks in the submissions, right? I see. I guess the idea is to penalize NAs? Maybe 5*<RPS_clim> would work?

Exactly. Either a multiple of rps_clim or a constant value larger than 3 should be fine for terciles

yes, the maximum RPS one can get for a single forecast is when you have 100% prob of above normal and below normal is observed or vice-versa, which produces an RPS of 2:

a = xr.DataArray([[1,1,1],[0,0,1]], dims=["time","category_edge"])
b = xr.DataArray([[[0,0,1],[0,0,1]],[[1,1,1],[1,1,1]]], dims=["time","member","category_edge"])
xs.rps(observations=a,forecasts=b,category_edges=None,dim=[])                                                                                                                                             
Out[51]: 
<xarray.DataArray (time: 2)>
array([2., 2.])
Dimensions without coordinates: time

I agree with the proposal above (compute RPSS from mean of RPS instead of averaging RPSS + penalyze NaN values by a RPS of 2 (100% forecast in wrong category))

Started working on this in:

template: https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/merge_requests/23
scorer: https://renkulab.io/gitlab/tasko.olevski/s2s-ai-competition-scoring-image/-/merge_requests/5
website: https://github.com/s2s-ai-challenge/s2s-ai-challenge.github.io/pull/29
restart scorer
template 4.1 release
gitter

mentioned in merge request tasko.olevski/s2s-ai-competition-scoring-image!5 (merged)

mentioned in merge request s2s-ai-challenge-template!23 (merged)

mentioned in commit tasko.olevski/s2s-ai-competition-scoring-image@6d3a9070

closed with merge request tasko.olevski/s2s-ai-competition-scoring-image!5 (merged)

mentioned in commit s2s-ai-challenge-template@987ad127

RPSS should be computed with means of RPS

Designs

Child items 0

Activity