Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As a user, I want faster retrieval from nwis for short-in-time evaluations #72

Open
epag opened this issue Aug 20, 2024 · 3 comments
Open

Comments

@epag
Copy link
Collaborator

epag commented Aug 20, 2024


Author Name: James (James)
Original Redmine Issue: 101014, https://vlab.noaa.gov/redmine/issues/101014
Original Date: 2022-02-02


Given a short-in-time evaluation that acquires data from nwis
When the data is chunked in time for requests
Then the chunks should achieve a better balance of retrieval time versus data re-use


Redmine related issue(s): 108361


@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-02-02T13:50:50Z


This is a particular problem for large in space evaluations and, while waiting a very long time for data to be acquired from nwis in #100844 for an evaluation that spans 10 days, I noticed messages like this:

2022-02-02T13:12:53.362+0000 [WebSource Ingest -> #82] WARN wres.io.reading.waterml.WaterMLSource - Skipping site 01052500 because multiple timeseries for variable 00060 from USGS NWIS URI https://nwis.waterservices.usgs.gov/nwis/iv?endDT=2022-01-01T00%3A00%3A00Z&format=json&parameterCd=00060&sites=01052500&startDT=2021-01-01T00%3A00%3A01Z

I want to revisit that chunking.

One alternative might be to use year ranges for evaluations that span one year or more, else the exact range required or some smaller, fixed, period to promote re-use (e.g., 3 months for evaluations > 6 months, else 1 month).

Obviously, this is a trade-off and adds complexity and there's nothing inherently wrong with favoring re-use for long-in-time evaluations, but it's a little too painful for short-in-time, large-in-space evaluations.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-02-02T13:55:35Z


I see a related commit:wres|cd1a3db7b6f9d971d1d3b7ce5205cef57ed307f2, which references #80554 and #86887.

I also see the main event in commit:wres|811476a207f808831555ff7ce6859529d5c428a5, which references #80554.

I guess I will read #80554 as time allows, but the purpose of this ticket is to revisit that explicit trade-off because I think it went too far.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-02-02T13:57:38Z


In other words, the goal would be to reframe the trade-off and not destroy performance for multi-year evaluations, rather to do better for shorter evaluations. This will add some code complexity and may reduce data re-use overall, but it's necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant