-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First lag day is same as first forecast day #256
Comments
Hi @ampersandmcd thank you so much for such a detailed issue, perfectly described. There are two issues here.
The really key point, as you say, is that the first element of the output should not be the same as the input. I'll validate with my own dataset to ensure this is the case but your demo looks good, it's just the amendment we need to be careful with so the semantics of all these terms is clear(er) than it has been. Really interested to hear your thoughts on this, thank you so much! This is why we need power users such as yourself! 😉 |
Thanks for providing a detailed breakdown @ampersandmcd! I'll have a look at this, as @JimCircadian mentioned, we'll have a look at the first aspect. The second point mentioned by @JimCircadian is something I've had confusion over understanding, so, some updates to the documentation along with visuals would be a great addition. |
I've confirmed this behaviour, but want to be clear that we have the correct fix. We need this to be clearly defined appropriate to what we're doing: there are plenty of definitions around lag and lead that are clear, the use of the terms and indexing are OK, we have the wrong data in when considering the "forecast date", which is totally a mess up on my part. My impression is that the forecast date is the forecast initialisation date: the date we have data to provide for, so we have a lag of zero on that date (the last day we have data for), then forecast ahead from 1 to n days. The
Therefore generating a sample for 2020-12-01 should result in the targets starting from 2020-12-2, which is where the bug is? @scotthosking I'm interested to hear whether I'm munging the terminology here, but let's put the right fix in, as @bnubald has mentioned this needs clearly documenting as it's always been a headache and now it's causing an actual problem (though I hope we would've caught it before doing any serious work on forecasting! 😉 ) |
On the terminology: I remember being caught out by 'forecast date' before. Could it be worth switching to something less ambiguous, like Is 'forecast time' standard terminology in this community? I know 'data time' and 'valid time' are used in weather forecasting to refer to these times, but 'forecast time' seems to be used less consistently from a quick glance around (appreciating that they might use different terms). |
Excellent, it's probably worth us doing that @ots22, no doubt. This was a topic of confusion for me too. I have come across data time and valid time in operational forecasting scenarios, so will have a review and we can settle the terms once and for all, and document appropriately. |
Part of this confusion I think comes from differences between certain forecast organisations and certain other ones, so we just need to settle on which approach to take. 😉 |
The WeatherBench documentation has a nice explainer between |
I think we'll stick with init-time convention which is what @tom-andersson, I believe, originally intended. That also means less changes, but results in the notion that the forecast initialisation date we're specifying is in fact part of the forecast at step zero: "Each forecast starts with the atmospheric conditions at a specific 'initialization time'.". We should definitely support both, so will raise that as a separate issue. Just to be clear, I'll accept your change to this @ampersandmcd as the "quick fix" to the bug, but we'll need to add a feature in to allow for specifying dates according to valid-time approach, which is what I think I was on. So this is where we are now, which is probably what was meant to be happening all along. Oops
|
icenet/icenet/data/loaders/dask.py
Lines 463 to 464 in f128536
Howdy there! I've been running a toy experiment with one-day lag, one-day forecasts to debug my diffusion model code and seem to be getting predictors that are from the same day as targets. I think the issue originates from this line of code: when
num_channels
is 1,n
only ever takes the value of 0, and hence we are left withforecast_date - dt.timedelta(0)
meaning we get predictors for the same day as the target forecast date. This allows the model to "cheat" in my case as the targetsiconca
is the same as the predictorsiconca
.Fixing this would just require L463 to be
pd.Timestamp(forecast_date - dt.timedelta(days=n+1))
I think.Below is a script I wrote to confirm this behaviour and a few screenshots from my debugging pane if they are helpful!
Note in the screenshots below that
forecast_date == channel_dates
.The text was updated successfully, but these errors were encountered: