Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in Start of Rains - First Year getting "NA" #9130

Open
lilyclements opened this issue Sep 11, 2024 · 2 comments
Open

Bug in Start of Rains - First Year getting "NA" #9130

lilyclements opened this issue Sep 11, 2024 · 2 comments

Comments

@lilyclements
Copy link
Contributor

lilyclements commented Sep 11, 2024

@rdstern I wanted to check something on the start of rains dialog which I think is a bug:

(1) On the start of rains dialog, the first year is always giving NA (even if there is a start)

We are running a rolling sum of rainfall from the right. So, if am doing a rolling sum of n = 2 values, I have

9, 10, 3, 2

This becomes

NA, 19, 13, 5

This is okay except for that NA. Because, we then say that start rain is:

  • If the first rainfall value is NA, then the start of rains is NA
  • If the first rolling sum value is NA, then the start of rains is NA
  • Otherwise, take the first DOY value

From our second argument there, this means that our first year will always take that NA even if there is a start day

Is this what we want or expect? If so, then I'll stop here. But, I don't think we want the first year to always be NA!

Question 1: Why are we saying that if the first rolling sum is NA, then the start of rains is NA?
(As someone who wrote this code seven years ago, this could have just been a misunderstanding at the time on my part, or I could now be missing something major). It seems this is where both of our problems are coming from. Instead, don't we just want to look at:

  • If the first rainfall value is NA, then the start of rains is NA
  • Otherwise, take the first DOY value

This is just removing a simple line from the filter in the R Code. (solution (a))

An alternative option (b) is: If we set fill to be a value (e.g., 0), then it will set the first n - 1 values as 0.

> x <- c(3, 4, NA, 1, 2, 3)
> RcppRoll::roll_sumr(x, n = 2, fill = NA, na.rm = FALSE)
# [1] NA  7 NA NA  3  5

> RcppRoll::roll_sumr(x, n = 2, fill = 0, na.rm = FALSE)
# [1]  0  7 NA NA  3  5

This helps, but, I would like to understand why we "care" about NAs in the rolling sum anyway

(2) If the last n days of the previous year is missing, then the next year has no start date (NA)

(e.g. if the rolling sum is n = 2, then, the 366th doy on the last year being NA, so the first day on the following year is NA since it's doing a rolling sum with an NA and a value, and so it gives NA)

This is fixed by solution (a) above (but not solution (b)). But, this also leads to a side-question to potentially fix/look into even if we do do solution (a) above:

If there is a missing value in the rainfall, the rolling sum set is NA. Do we want this?
Presumably, if the rolling sum of the rainfall exceeds a threshold despite an NA in it, then we are happy to ignore the NA? Because, the NA isn't going to be negative rainfall -- if we are looking for the rolling sum of over 50mm of rain in 3 days, and we have

DAY 1: 1mm
DAY 2: 48mm
DAY 3: NA
DAY 4: 20mm

Then currently that would give: NA, NA, NA, NA. But, on day 4 we have exceeded 50mm of rainfall so we don't care what the NA value is?

x <- c(1, 48, NA, 20)

RcppRoll::roll_sumr(x, n = 3, fill = NA, na.rm = FALSE)
# [1] NA NA NA NA

RcppRoll::roll_sumr(x, n = 3, fill = NA, na.rm = TRUE)
# [1] NA NA 49 68

So: Side Question: If there is a missing value in the rainfall, the rolling sum set is NA. Do we want this?

@lilyclements lilyclements changed the title Bug in Start Rains dialog Bug in Start of Rains - First Year getting "NA" Sep 11, 2024
@rdstern
Copy link
Collaborator

rdstern commented Sep 11, 2024

@lilyclements I agree that the rolling sum is missing (for a given year/season) if it finds a missing rainfall value. But I hope the first year isn't always missing. It is missing in the first year if day 1 is also the earliest starting day. Then the rolling sum needs to start at the end of the previous year - which isn't there.

@lilyclements
Copy link
Contributor Author

@rdstern makes sense - and you're right that its only missing if the starting day is day one. Is this what we want? If you're happy with how this is all working then I'll close the issue. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants