Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As a user, I want to mitigate against bad data for particular issued datetimes #78

Open
epag opened this issue Aug 20, 2024 · 77 comments
Open

Comments

@epag
Copy link
Collaborator

epag commented Aug 20, 2024


Author Name: James (James)
Original Redmine Issue: 103868, https://vlab.noaa.gov/redmine/issues/103868
Original Date: 2022-04-19


Given an evaluation of the nwm
When time-series data are requested from a source (e.g., wrds, dstore) that contain several issued times with bad data
Then I want to be able to mitigate that in wres

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-04-19T12:27:35Z


From e-mail, redacted:

Hank,

Right, they won't work with WRDS. In that case, there may not be a good mitigation. It doesn't sound as though a value constraint would work in this case, because the values are not missing or invalid values of streamflow. We do not allow constraints on issue times or valid times that are more complex than a single interval. There is probably a ticket in that.

Regarding what WRDS serves, I think we prefer dumb data services on the whole, so anything that asks them to go in there and mess with data is probably not ideal. We should be able to mitigate on our end; in general, any client that works with data from the real world needs to present options to users that can mitigate against data issues, so I think this one is on us (as well as the publisher).

Cheers,

James
On 19/04/2022 13:18, Hank Herr - NOAA Federal wrote:
> James:
>
> Thanks for the feedback.  If you know of a reasonable mitigation, let me know.  Alex runs using WRDS and I don't believe glob patterns will work with WRDS, but I could be wrong.
>
> Fernando:
>
> So the message should read 2z through 16z.  Okay.  Would this impact all NWM data, including AnA, and SRF and MRF generated with basis/reference times between 2z and 16z?  
>
> Lastly, I think we should ask WRDS to not serve the data that is impacted (i.e., remove it from its database).  Its better to serve no data than bad data in my opinion.  Thoughts?
>
> Hank
>
>
>
> On Mon, Apr 18, 2022 at 6:04 PM Fernando Salas - NOAA Federal <SNIP> wrote:
>
>     The data issue started with the 2z reference times and ran through 16z. 17z reference times and beyond would be good. APD can speak best to the why if you'd like to include more information. It's my understanding that the bad data won't be fixed.
>
>     On Mon, Apr 18, 2022 at 3:24 PM James Brown <SNIP> wrote:
>
>         Hank,
>
>         It would be nice to have a better description than "bad" or "corrupt", but I assume you don't have one yet. Regardless, I think it gets the message across. You may want to clarify that the times are issue times and are inclusive (I assume) and that streamflow from the entire domain is potentially incorrect (I assume). We probably want to contact Alex separately and suggest any mitigations that might be able to eliminate the time-series from these issue times (e.g., perhaps we can suggest a glob pattern).
>
>         Cheers,
>
>         James
>         On 18/04/2022 21:06, Hank Herr - NOAA Federal wrote:
>>         All:
>>
>>         Below is a draft of a News item I plan to post tomorrow morning as soon as I can.  I've cc'd Fernando since he let me know about the corrupt data so that he can correct the message or provide additional information.  As an example, Fernando said that there were instances of streamflow for the Mississippi being on the order of 100 CFS.
>>
>>         Please review.  Again, I need to draft a message first thing tomorrow, since it will impact the overnight evaluations tonight. 
>>
>>         Thanks,
>>
>>         Hank
>>
>>         =====================================
>>
>>         Bad NWM Data for April 18, 2022, 3Z to 17Z 
>>
>>         The NWM output data from April 18, 2022, 3Z to 17Z, is corrupt.  This includes data acquired from the NWC dStore archive and served from WRDS.  Evaluations spanning that time period can be expected to be incorrect due to that bad data.  It's unclear at this time if and when (and how) the sources of the NWM data WRES users employ, WRDS and NWC dStore, will be "fixed".  If you have any questions, please contact Hank Herr (Hank[removed]).     

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-04-19T12:30:57Z


I suppose the most direct/obvious solution is to allow a constraint on @issuedDates@ (and probably on -@validDates@- @dates@ and @leadHours@ too) that is more complex than a single interval. In terms of declaration, perhaps the most obvious thing is to allow zero, one or more @issuedDates@ and zero, one or more @validDates@ and zero, one or more @leadHours@ (in some cases, zero is not allowed, depending on other declaration).

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-04-19T12:32:36Z


But, obviously, allowing something in declaration and actually supporting that declaration are two completely different things.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-04-19T12:37:54Z


As an aside, we're a bit inconsistent in how we declare the bounds, sometimes explicitly, sometimes implicitly. Default @minOccurs@ and @maxOccurs@ is 1, btw.

            <xs:element name="unit" type="xs:string" minOccurs="0" maxOccurs="1" />
            <xs:element name="unitAlias" type="unitAlias" minOccurs="0" maxOccurs="unbounded" />
            <xs:element name="featureService" type="featureService" minOccurs="0" maxOccurs="1" />
            <xs:element name="feature" type="feature" minOccurs="0" maxOccurs="unbounded"/>
            <xs:element name="featureGroup" type="featurePool" minOccurs="0" maxOccurs="unbounded"/>
            <xs:element name="gridSelection" type="unnamedFeature" minOccurs="0" maxOccurs="unbounded"/>
            <xs:element name="leadHours" type="intBoundsType" minOccurs="0" />

            <!-- analysisDurations chooses which analysis durations to evaluate #65216, #61593 -->
            <xs:element name="analysisDurations" type="durationBoundsType" minOccurs="0" />
            <xs:element name="dates" type="dateCondition" minOccurs="0" />
            <xs:element name="issuedDates" type="dateCondition" minOccurs="0" />
</code>

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-04-19T12:39:41Z


Not sure about the priority of this ticket (edit: or the target release). Perhaps it is much higher if there is no workaround, i.e., this period of invalid time-series data could continue to compromise nwm evaluations for weeks to come, I suppose.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-04-19T12:40:42Z


Added Alex as a watcher to this one.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-04-19T12:43:34Z


For AnA, the data is obtained based on its valid datetime, not issued datetime, so the solution will need to be implemented for @dates@, not probably (you mention @validDates@ as a "probably", though I think you were referring to @dates@). That is, unless I'm misunderstanding something about how the WRES obtains AnA data. The AnA data is served as a long time series from WRDS, though I think (?) WRDS is stitching together multiple sources to obtain that time series.

Agreed on the solutions in #103868-2.

Thanks for creating this ticket,

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-04-19T12:44:01Z


The @season@ constraint isn't a workaround because we only allow one of those too. I think there was another ticket on relaxing that.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-04-19T12:45:48Z


Hank wrote:

so the solution will need to be implemented for @dates@, not probably (you mention @validDates@ as a "probably", though I think you were referring to @dates@).

Yup, @dates@, I forgot that we don't clarify the flavor (it's implicit).

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-04-19T12:46:50Z


Yes, #51030 would probably be another solution.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-04-19T12:49:52Z


( Although a closer look at the use case for #51030 suggests that @validDatesPoolingWindow@ might be the better approach for that one (based on datetime intervals), aka #86646. )

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-04-19T12:51:29Z


-Actually, I think I like #86646 as the better solution, the most flexible and probably the easiest to implement and declare too, although our declaration language needs some work more generally.-

edit: Ah, but the goal here is to combine/filter the two periods either side of the bad data into one pool, not to separately pool the time-series either side of the bad data so, yes, a more flexible filter on the @dates@/@issuedDates@/@leadHours@ is probably the best approach.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-04-19T12:59:00Z


I'll see if I can work out a glob pattern that can help with users who use dStore. I'm not that good with glob, so it might take some time. I need a pattern that will exclude files for 4/18/2022 from 2Z to 16Z. Tricky.

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-04-19T13:04:30Z


It may be easier to list the individual blobs, I don't know whether that is possible.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: alexander.maestre (alexander.maestre)
Original Date: 2022-04-19T13:24:20Z


James wrote:

Added Alex as a watcher to this one.

Thank you for adding me.

Checking the data...

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Jesse (Jesse)
Original Date: 2022-04-19T13:56:20Z


James wrote:

I suppose the most direct/obvious solution is to allow a constraint on @issuedDates@ (and probably on -@validDates@- @dates@ and @leadHours@ too) that is more complex than a single interval. In terms of declaration, perhaps the most obvious thing is to allow zero, one or more @issuedDates@ and zero, one or more @validDates@ and zero, one or more @leadHours@ (in some cases, zero is not allowed, depending on other declaration).

I think the simplest thing is to not complicate the declaration any more than it already is. In this (hopefully) rare case, the workaround I favor is to curate the dataset by hand and run the evaluation using that dataset.

Hank wrote:

I'll see if I can work out a glob pattern that can help with users who use dStore. I'm not that good with glob, so it might take some time. I need a pattern that will exclude files for 4/18/2022 from 2Z to 16Z. Tricky.

The NWM reader is not file based anymore, so I don't think that would work. I think the workaround here is to curate the dataset by hand and run the evaluation against that.

A less laborious workaround is to ignore the results of any evaluation that included the bad data.

Another option (in between) is to use existing tools or update our tools that make it easy to convert NWM data into usable form (in other words, assist with the job of curating the dataset).

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-04-19T13:59:06Z


I don't think this is a rare case. It happens quite often. By total coincidence, I was just discussing exactly the same problem with the HEFS folks. So I think we need to provide appropriate tools. Messing with data is not the answer. Simple things should be simple. More complex things should be allowed to be more complex. Relaxing the @maxOccurs@ does not complicate the declaration of simple things.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Jesse (Jesse)
Original Date: 2022-04-19T14:04:02Z


I think #86646 would be good too.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Jesse (Jesse)
Original Date: 2022-04-19T14:05:53Z


If you have data corruption, I'm sorry, but I think that is a far bigger issue outside WRES.

If data corruption happens regularly, again, that's outside WRES.

But perhaps we can build an AI data corruption flagging tool, wouldn't that be nice?

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-04-19T14:07:08Z


Jesse wrote:

I think #86646 would be good too.

Yeah, I thought so at first, but it isn't actually the problem here, which is a filtering problem not a pooling problem. Issue #86646 doesn't propose to add more complex intervals for pools, rather to allow a explicitly declared pool sequence. The goal in this use case is to omit some data from a pool, not to create N pools.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-04-19T14:07:59Z


Jesse wrote:

If you have data corruption, I'm sorry, but I think that is a far bigger issue outside WRES.

If data corruption happens regularly, again, that's outside WRES.

But perhaps we can build an AI data corruption flagging tool, wouldn't that be nice?

Sorry but real data = bad data sometimes, it just does.

By all means, ask the publisher to correct, as I suggested we should do.

We also need to provide tools.

Our answer cannot be "sorry, you cannot use wres".

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Jesse (Jesse)
Original Date: 2022-04-19T14:08:31Z


If the data are corrupt, how can we even count on the issued dates to be correct? What is this so-called corruption that is so easily worked around by a machine?

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Jesse (Jesse)
Original Date: 2022-04-19T14:09:47Z


James wrote:

Jesse wrote:

I think #86646 would be good too.

Yeah, I thought so at first, but it isn't actually the problem here, which is a filtering problem not a pooling problem. Issue #86646 doesn't propose to add more complex intervals for pools, rather to allow a explicitly declared pool sequence. The goal in this use case is to omit some data from a pool, not to create N pools.

You say tomato I say tomato. To say "I want N pools" is necessarily to say "I do not want any pools except the N pools" meaning to exclude or omit. So it does achieve the goal.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-04-19T14:09:59Z


There is no need for AI. The simple use case here is a constrained on @issuedDates@. Incidentally, this is exactly the same situation that HEFS described today too. This is a common situation w/r to bad data, it is usually for a specific set of datetimes.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Jesse (Jesse)
Original Date: 2022-04-19T14:10:18Z


Also if the data are corrupt, how are they readable at all?

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-04-19T14:11:09Z


Jesse wrote:

If the data are corrupt, how can we even count on the issued dates to be correct? What is this so-called corruption that is so easily worked around by a machine?

This is the exact scenario presented. The issue datetimes are fine, the data is not. You can dislike the scenario, but you don't get to redefine it.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Jesse (Jesse)
Original Date: 2022-04-19T14:13:28Z


Perhaps we have different definitions of "corrupt", I need to go see what these data look like before commenting further. Usually when data are corrupt, you can't even read them successfully.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-04-19T14:13:56Z


Jesse wrote:

You say tomato I say tomato. To say "I want N pools" is necessarily to say "I do not want any pools except the N pools" meaning to exclude or omit. So it does achieve the goal.

It doesn't achieve the goal. The use case is a set of time-series, @{A,B,C}@, where @{A,C}@ should be placed in a pool. You cannot do that with explicit pools because we only allow one interval per time dimension in our pool descriptions, not N. The use case is not @{A}@ and @{C}@ in two separate pools.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-04-19T14:15:04Z


Jesse wrote:

Perhaps we have different definitions of "corrupt", I need to go see what these data look like before commenting further. Usually when data are corrupt, you can't even read them successfully.

As I said in the e-mail exchange, "corrupt" is not likely to be the correct term here. Regardless, readable data with wrong time-series values = reality with time-series data, it happens quite often (e.g., instrument saturation etc. etc.).

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-04-19T14:16:57Z


Ice effects in nwis data is a similar sort of problem (generally speaking, that is - by all means we should seek to understand the cause of this specific use case and what type and cause of "badness" we are dealing with), but nwis provide a nice flag for it and that is another thing we should respect (and has a separate ticket, iirc).

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Jesse (Jesse)
Original Date: 2022-04-19T15:38:38Z


I don't know how much more time we want to waste talking about this but I need to take a break.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Jesse (Jesse)
Original Date: 2022-04-19T15:44:43Z


I don't really object to the capability of excluding some date ranges optionally, btw. It was the sudden emergency that had me worried, and if it is an emergency capability needed, carry on.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-04-19T15:52:08Z


I will leave urgency to others, but it probably isn't very high unless Russ and Alex and Hank have a collectively strong opinion about it and it seems more important than the bajillion other items in the backlog rated high or urgent. I'm unclear about whether WPOD actually looks at these evaluations, for example.

In terms of declaring dates to include versus dates to exclude, I guess I prefer the positive/inclusive declaration (of up to N instances of @issuedDates@ without any toggle like @exclude=true@ or a new @excludeIssuedDates@), but perhaps you are not indicating a strong preference in that regard.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Jesse (Jesse)
Original Date: 2022-04-19T15:54:55Z


James wrote:

I will leave urgency to others, but it probably isn't very high unless Russ and Alex and Hank have a collectively strong opinion about it and it seems more important than the bajillion other items in the backlog rated high or urgent. I'm unclear about whether WPOD actually looks at these evaluations, for example.

Yes, agree.

James wrote:

In terms of declaring dates to include versus dates to exclude, I guess I prefer the positive/inclusive declaration (of up to N instances of @issuedDates@ without any toggle like @exclude=true@ or a new @excludeIssuedDates@), but perhaps you are not indicating a strong preference in that regard.

Right. As long as the simple case stays simple and the complex case is optional, that is good.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-04-19T15:56:12Z


I need to process this. I do believe this may cost us users, insofar as their NWM evaluations will be invalid for quite a while, so they may be tempted to do their own thing and drop WRES on the floor. This will impact RFC users, specifically NWRFC and MARFC, at the least.

I'm tempted to go through Russ to "demand" (if we can) that the publishers do something. Remove the data from their end or add a quality flag of sorts. I don't know. Something. That would then require WRDS to react, as well, by allowing folks to ask for data based on quality, for example.

I need to eat lunch first.

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-04-19T16:03:59Z


Fwiw, I think we could express surprise, but it is probably p******g in the wind w/r to wcoss, since that is a "production" machine and there are probably very tight rules about not re-running processes ad-hoc. Based on limited experience, wcoss is already overbooked. And since we cannot presently run nwm anywhere other than, wcoss, I believe, that is probably the end of that. To some extent, wres and its users have suffered and will continue to suffer weaknesses in the upstream things on which our users rely for good evaluations. We can continue to push that point, but it is what it is, and we'll probably need to mitigate on our end with speed that reflects the urgency (probably not that high, but you seem to imply differently).

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-04-19T16:07:59Z


As an aside, I think it's way more common for this sort of issue to arise with observations/measurements or related processes, rather than models/forecasting systems or related processes. It is worth finding out what happened for our own curiosity, I think. Perhaps a parameter file or initial conditions file failed to transfer or something and the model ran (edit: unexceptionally) with bad data. So there probably is a good fix in principle (unlike the problems that typically arise with measurements/obs), just not in practice.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Jesse (Jesse)
Original Date: 2022-04-19T16:08:23Z


Hank wrote:

I need to process this. I do believe this may cost us users, insofar as their NWM evaluations will be invalid for quite a while, so they may be tempted to do their own thing and drop WRES on the floor. This will impact RFC users, specifically NWRFC and MARFC, at the least.

I am trying to understand what manual intervention the users would do outside of WRES that could not also be achieved with manual intervention using WRES because the problem is outside of both WRES and the users' tools, right?

Maybe I am still missing something. What is it that we are expected to do about an upstream data issue that is not detectable by any machine? I see the suggestion in this ticket but it is still painful for the users either way, right?

Hank wrote:

I'm tempted to go through Russ to "demand" (if we can) that the publishers do something. Remove the data from their end or add a quality flag of sorts. I don't know. Something. That would then require WRDS to react, as well, by allowing folks to ask for data based on quality, for example.

My expectation would be to reduce the frequency of this occurrence in future, not so much to go back and fix the existing data. That could mean a quality flag, that could help, yes.

What am I missing here?

Alex, help me out!

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Jesse (Jesse)
Original Date: 2022-04-19T16:13:19Z


I guess a ten minute break was enough. I guess since it sounds urgent I am back to discussing.

If it is urgent that WRES take some action let's do it.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Jesse (Jesse)
Original Date: 2022-04-19T16:19:47Z


James wrote:

The use case is a set of time-series, @{A,B,C}@, where @{A,C}@ should be placed in a pool. You cannot do that with explicit pools because we only allow one interval per time dimension in our pool descriptions, not N. The use case is not @{A}@ and @{C}@ in two separate pools.

Right, right. I am trying to imagine the consequences of opening the potentially-overlapping multiple @dates@ and @issuedDates@ can of worms. It doesn't seem pretty but I agree that it is the way it would probably have to be done.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Jesse (Jesse)
Original Date: 2022-04-19T16:23:13Z


Then again, we already have a precedent for "excludes" with graphics. Also, the simple overall @issuedDates@ and simple overall @dates@ filters affords the simple overall rolling windows start/stop bookends. If you have multiples of those, that complicates where the windows start and end. In other words, the software would need a rule of "earliest" or "first declared" or whatever in order to resolve that ambiguity. Right now it is relatively simple and clear given the complicated goal of rolling windows. Also how to deal with overlaps: are those invalid or valid? How are they resolved? Etc.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Jesse (Jesse)
Original Date: 2022-04-19T16:25:17Z


I suppose what the software would do is take the earliest of the earliest and latest of the latest and use that as the overall window boundaries. How that pans out for reading, ingest, retrieval, etc., there are a variety of ways to handle it.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Jesse (Jesse)
Original Date: 2022-04-19T16:25:52Z


It's still ugly and frustrating: something bad happened, therefore everything has to get more complex and worse.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-04-19T16:27:08Z


Right, but that @excludeGraphics@ (or whatever it is called) is not a very good precedent.

If you allow both include and exclude filters then you have a consistency problem, I agree, so another reason to not allow for disagreement. If we have N filters that declare datetimes to include, then they are order invariant, they can overlap etc. Each filter simply adds more dates to include or it does not because they are already included.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Jesse (Jesse)
Original Date: 2022-04-19T16:29:00Z


If the data were truly corrupt, meaning truly malformed, such that an error appeared when reading the data, wouldn't we want WRDS to correct that by either removing the data or repairing the data?

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-04-19T16:29:28Z


The schema gets more complex, I agree. Simple evaluations do not get more complex.

Also, I agree with your point about not rushing this through. There should be a relatively high bar to complicating the schema, I agree.

I am all for not rushing this. But if we were going to do it, then I would lean to relaxing the @macOccurs@ and then dealing with that at pool creation time (post-ingest, post-retrieval) where it's super easy to apply a bunch of order-invariant filters.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-04-19T16:32:59Z


I suppose it means that the function that generates pool boundaries needs to be called N times too, once for every possible combination of boundaries, so that becomes more complex or, rather, iterative too.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-04-19T16:41:33Z


Those changes would need to happen in @TimeWindowGenerator@, but that is probably not too hard because those helpers already builds complex pools from simple pools, taking one dimension at a time and then forming the complex things from the simpler things. So more simpler building blocks would be added (respecting the N overall boundaries, instead of 1) and then the combination would happen as usual.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-04-19T16:49:42Z


Conceptually, the hardest part is the "one big pool" situation because the basic assumption until now has been one time dimension per one time window and one time window per one pool. Now we are talking about taking that to N for the situation with "one big pool", i.e., where no explicit pools are declared, just the outer boundaries. We don't want that.

But then, we don't need to address that until post-retrieval, at filtering time, not pool boundary creation time. In other words, "one big pool" would be the union of the various intervals and then they (edit: the separate intervals) would be applied (edit: separately) as filters to the pool data, just like any other filters are applied (e.g., @values@).

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: alexander.maestre (alexander.maestre)
Original Date: 2022-04-19T18:14:09Z


Hello WRES -
Here is the ROE input :
I do not see any problem at all (I am getting worried now... uuuhhh)
All the protoypes ran, I got the geojsons and checked the plots.
The prototypes just deal with the Gages II reference headwater stations (located in the upper part of the watershed not affected by nudging). I am not checking the above action anymore.

Some plots:

!Capture_20220419_01.PNG!

Extended AnA Hydrograph for a location showing a response on April 18 during the period of time we have the problem. I do not see any problems here. Model is responding to the precipitation in a magnitude similar to previous responses.

!Capture_20220419_02.PNG!

The peak error map for the extended AnA shows responses very similar to what I had seen before. Something seems to be going on in CA, but in general peaks undersimulated. This is because for sure because all the events identified in the map occurred before April 18th.

!Capture_20220419_03.PNG!

One of the CA locations shows that the elevated values in the Extended AnA signal has been something going on for more than a week. Nothing different yesterday.

I also checked the continuous prototypes. In few locations I found that the lead times smaller than 24 hours were showing something going on compared with the remanining lead times.
However, when I plot the Bias Fraction for all the locations, I do not see too many changes between the 6 hour and 48 hour lead times.

!Capture_20220419_04.PNG!

!Capture_20220419_05.PNG!

Finally, I did a map of the hourly inevent test for the Extended AnA (prototype 1) that uses WRES after ROE modifies the signal.
!Capture_20220419_06.PNG!

No problems here either.

Alex

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: alexander.maestre (alexander.maestre)
Original Date: 2022-04-19T18:15:11Z


Jesse wrote:

Hank wrote:

I need to process this. I do believe this may cost us users, insofar as their NWM evaluations will be invalid for quite a while, so they may be tempted to do their own thing and drop WRES on the floor. This will impact RFC users, specifically NWRFC and MARFC, at the least.

I am trying to understand what manual intervention the users would do outside of WRES that could not also be achieved with manual intervention using WRES because the problem is outside of both WRES and the users' tools, right?

Maybe I am still missing something. What is it that we are expected to do about an upstream data issue that is not detectable by any machine? I see the suggestion in this ticket but it is still painful for the users either way, right?

Hank wrote:

I'm tempted to go through Russ to "demand" (if we can) that the publishers do something. Remove the data from their end or add a quality flag of sorts. I don't know. Something. That would then require WRDS to react, as well, by allowing folks to ask for data based on quality, for example.

My expectation would be to reduce the frequency of this occurrence in future, not so much to go back and fix the existing data. That could mean a quality flag, that could help, yes.

What am I missing here?

Alex, help me out!

Hey Jesse!
Catching up in the total entries on this ticket.

Ready to help.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-04-19T18:19:56Z


Alex, are you sure you're looking at time-series within the interval edit: of issued times [2022-04-18T02:00:00Z, 2022-04-18T16:00:00Z], as I believe that is the interval affected? Based on one of the images above, it seems not...

Regarding statistics, it may be hard to tell, but this is because the pool contains a lot of other valid time-series too.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-04-19T18:32:34Z


Alex:

Thanks. I'll review what you wrote more closely, but this could mean that the impact is drowned out and hard to notice when combined with data from a longer time period.

Jesse:

Jesse wrote:

Hank wrote:

I need to process this. I do believe this may cost us users, insofar as their NWM evaluations will be invalid for quite a while, so they may be tempted to do their own thing and drop WRES on the floor. This will impact RFC users, specifically NWRFC and MARFC, at the least.

I am trying to understand what manual intervention the users would do outside of WRES that could not also be achieved with manual intervention using WRES because the problem is outside of both WRES and the users' tools, right?

Maybe I am still missing something. What is it that we are expected to do about an upstream data issue that is not detectable by any machine? I see the suggestion in this ticket but it is still painful for the users either way, right?

A potential user who has their own code on the side, such as NWRFC who does some work themselves as well as using WRES, can toss out the bad data on their end through some code changes, so that that their evaluations don't include it. A simple if-check looking at the dates would do it. From a software development point of view, it would be awful, but they are scientists so their main goal will be end results, not software design. It would be very easy for them to put in such a check. Of course, if they don't have their own code and rely solely on the WRES, then this won't happen.

We offer no comparable alternative in the WRES. The only alternative appears to be having our users gather the data themselves, toss out what they don't want, reformat it into CSVs (presumably), and ship the data back using the data-direct method in an evaluation. That is a lot more work than adding a if-check. But, again, that only applies to folks who have their own alternative tools.

I don't expect us to do anything about the upstream data; I expect the upstream data folks to do something about it. But they won't, and, because our users only see the WRES, not WRDS and NWM/APD/WCOSS, they'll blame us. Or at least it will impact their use of our services.

Hank wrote:

I'm tempted to go through Russ to "demand" (if we can) that the publishers do something. Remove the data from their end or add a quality flag of sorts. I don't know. Something. That would then require WRDS to react, as well, by allowing folks to ask for data based on quality, for example.

My expectation would be to reduce the frequency of this occurrence in future, not so much to go back and fix the existing data. That could mean a quality flag, that could help, yes.

Agree about a quality flag. If they know the data is bad, they should be able to flag it, and WRDS should be able to let us know.

Thanks,

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-04-19T18:54:35Z


I'm about to talk to Russ about it (I think), but I'm wondering if I'm the only one worrying about this, and if that means I should just stop worrying about it. To me, having bad NWM data floating around for folks to use is a bad thing. Yet I hear of no one who is trying to solve this data-side. Jay reported that WPOD just talked about not using it. Okay. But what about the other users, like the RFCs? Have they been told anything?

And maybe my concern about our users blaming us instead of the data publisher, and then taking it out on us, is overblown.

Anyway, maybe Russ has a better feel for how much I should be concerned. I'll let you know. Thanks,

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: alexander.maestre (alexander.maestre)
Original Date: 2022-04-19T18:59:15Z


James wrote:

Alex, are you sure you're looking at time-series within the interval edit: of issued times [2022-04-18T02:00:00Z, 2022-04-18T16:00:00Z], as I believe that is the interval affected? Based on one of the images above, it seems not...

Regarding statistics, it may be hard to tell, but this is because the pool contains a lot of other valid time-series too.

I am checking the plots and figures again...

I see that the PNGs generated by WRES for the continuous evaluations ended on 04-19T01:15. Then I am expecting that the period of data included is there. That was the reason I was expecting something going on in the lead times 3-12 hours but the
map shows something different.

Now the Extended AnA is generated at 16Z but available in the d-store around 20Z. The plots I have show valid times ending on 2022-04-18T12Z, then we should have at least 10 values on that range. The only thing that I can think if that because the problem was solved at 16Z then we do not have the problem on the Extended AnA.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: alexander.maestre (alexander.maestre)
Original Date: 2022-04-19T19:06:09Z


Hank -

My two cents will be to have a qualifier. I do not want to create extra work, but I would like to know for example the periods when the water is not flowing because it is ice. I know I can request that information in NWIS. I do not know if I can obtain that information from the WRDS observations API and I am not sure the NWM indicates periods of flow = 0 because the model the stream is frozen.

In this particular case, I would like to continue running the ROEs as they did yesterday and this morning. I would prefer a qualifier from the output file that says something like .... Potential NWM error during the period [a..b] affecting the product you requested.
The concern that I have is that it seems that we do not know exactly what was the problem. I got the impression that it was affecting the Extended AnA but the headwater reference gages seem to be fine. Was the problem below the headwater reference gages? Was the problem only affecting the short range and medium range? all CONUS?

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-04-19T19:34:17Z


Just finished talking to Russ. He did express significant concern over this issue, and said others have as well, so its not just me.

The problem appears to have been a fail over to the WCOSS-2 that did not use the correct hot-start file. I think that means, it had the wrong or no initial conditions, but I can't say for sure. Russ was fuzzy on the details, because he didn't know the details well himself.

He asked Brian, via chat, if anything was going to be done about the bad data by the "publisher" of the data. Brian did not respond by the end of our meeting.

Until we can gain clarity on what if anything the data publisher will do about the bad data, and how that may impact WRDS and dStore, I don't think we should prioritize a fix for this issue. We need more information. I'll keep you all posted if I learn anything new.

Thanks!

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-04-19T19:53:16Z


Ah, yeah, I thought it would be something like that.

I don't think a QC flag works in this case. If you can add a QC flag, then you can make everything missing or omit the data or do something sensible like that. I think a QC flag only works when there is some sort of state-dependent QC procedure that can be applied, as in river icing. This was a simple process/procedural error and the fix is to avoid that situation in future by having better checks within the pipeline so that the process throws an error or recovers or does something sane, rather than propagating it silently.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2023-06-05T16:31:39Z


I haven't re-read this whole thread, but I think there is at least a clear avenue for declaring multiple intervals if we want to support it (edit: in the new declaration language).

For example (supported):

reference_dates:
  minimum: 2551-03-17T00:00:00Z
  maximum: 2551-03-20T00:00:00Z
</code>

Could be supported:

reference_dates:
  - minimum: 2551-03-17T00:00:00Z
    maximum: 2551-03-20T00:00:00Z
  - minimum: 2551-03-26T00:00:00Z
    maximum: 2551-03-29T00:00:00Z
</code>

The same applies to other flavors of time interval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant