Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Average locations, then score the result #52

Open
thodson-usgs opened this issue Jan 27, 2022 · 2 comments
Open

Average locations, then score the result #52

thodson-usgs opened this issue Jan 27, 2022 · 2 comments

Comments

@thodson-usgs
Copy link

Reading Collier et al 2018 it seems that the procedure for computing a score is as follows (using bias as example):

  1. calculate the relative bias error at a given location (equation 13)
  2. score the relative error for that location (equation 14)
  3. compute the scalar score as the average score across all locations (15)

But I believe a better procedure is:

  1. calculate the relative bias error
  2. average across all location
  3. score the result
    Or when scoring a given location, just steps 1 and 3.

Have I misread the paper? When you use the first method, tweaking \alpha in the scoring function can alter how models rank relative to one another, which isn't ideal.

@nocollier
Copy link
Collaborator

nocollier commented Jan 27, 2022 via email

@thodson-usgs
Copy link
Author

thodson-usgs commented Jan 27, 2022

Full disclosure, I wrote a paper on this topic: https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2021MS002681
My coauthors and I were really impressed by the ILAMB system, which is why we borrowed several aspects of that system in the paper, but I also noted this point about alpha. I regret that I made that critique without reaching out to you first.

In that paper, we took a different tack and instead gave the 'overall' score based on a single objective metric. We chose MSE as a demonstration, though admittedly it falls short in many applications. Then we decomposed the MSE into things like bias and variance to show how these different 'concepts' contribute to the model's overall performance. In that way, the overall score is meaningful, in the sense that the model that best approximates reality will score highest. That "meaning" is retained if you average across locations then score. However if you score then average, the relative rankings of models may change and the score loses some of its meaning.

Anyway, we respect all that your team has done. I'd agree the system is useful, but I thought this detail about alpha and rankings was something to be aware of.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants