-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Single Monthly Values in Scalar and Generic Classes #42
base: i38-generic-var-templates
Are you sure you want to change the base?
Conversation
…nto i39-single-val-per-month
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall excellent, nice clean code.
A few minor suggested changes, which you can adopt at your discretion, except I do recommend DRYing up the monthly data date-checking code, which is copy-pasted.
R/climdexGenericScalar.R
Outdated
unique_months <- unique(format(valid_dates, "%Y-%m")) | ||
day_of_month <- as.integer(format(valid_dates, "%d")) | ||
|
||
# Check that the length of unique months matches the number of dates, ensuring only one value per month |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to require 12 months? Or is it legitimate to have fewer than that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no requirement for inputs to have multiples of 12 months. I’ve added a test to reflect that.
R/climdexGenericScalar.R
Outdated
northern.hemisphere = northern.hemisphere, | ||
calendar = calendar | ||
) | ||
return(obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice and clean.
Q: Would it be cleaner to simplify to
return climdexGenericScalar.raw(
...
);
and not use obj
?
(This would apply in several places in this codebase.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
R/climdexGenericVector.R
Outdated
@@ -45,6 +45,9 @@ climdexGenericVector.raw <- function( | |||
if (missing(secondary)) { | |||
stop("Secondary data argument is missing.") | |||
} | |||
if (length(secondary) == 0) { | |||
stop("Secondary must not be an empty vector.") | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you instead call check.generic.argument.validity
on secondary
? (Might need to make its stop messages more generic -- or parametrize the name of the data used in them.) That would repeat some checks but it might end up being simpler to be sure we are validating everything completely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve moved the secondary
and format
checks into check.generic.argument.validity
with a flag for when we pass the extra secondary and format vector parameters. I’ve moved the data & dates check to an internal validate_data_dates
function that we call with scalar or primary and secondary vector data.
R/climdexGenericVector.R
Outdated
if (!all(day_of_month == 1)) { | ||
stop("Data must be on the 1st day of each month.") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The date checking is an exact copy of code in another function. Suggest we DRY that up in a check function called in each place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to a util function check.single.month.dates
.
dates = dates, | ||
northern.hemisphere = TRUE, | ||
calendar = "gregorian" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to check basic things about the output of climdexSingleMonthlyScalar.raw
, such as
- data out is the same as scalar_data
- dates out same as dates in
- etc.
Especially as we are relying on it to check the result of climdexSingleMonthlyScalar.csv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've introduced a new validation function that applies to both scalar and vector, raw and CSV construction tests, to check basic items. Please note that the output is infilled with NA values and missing dates, and some filtering is necessary to align the output with the original input set.
format = "polar", | ||
northern.hemisphere = TRUE, | ||
calendar = "gregorian" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question re. basic things about output of climdexSingleMonthlyVector.ra
checkTrue( | ||
!inherits(result, "try-error"), | ||
"Function raised an error despite valid monthly data." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand: Why wouldn't an NA value make it throw an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible that climate data may be incomplete for certain dates, for example, due to sensor issues. While we cannot accept data with NA dates, we do accept missing data for all our input classes.
) | ||
|
||
checkEquals(length(scalar_obj@data[!is.na(scalar_obj@data)]), n_months, "Large dataset not handled correctly") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without myself checking for every possible error message, this looks thorough and like it covers all those error conditions. Well done.
Summary
This pull request introduces new wrapper functions to support single monthly values for both scalar and vector climate data. These functions create
ClimdexGenericScalar
andClimdexGenericVector
objects with a constraint that enforces only one value per month. Additionally, they automatically set themax.missing.days
parameter to+Inf
, enabling more flexible data handling.Key Enhancements
New
climdexSingleMonthlyScalar.raw
/.csv
andclimdexSingleMonthlyVector.raw
/.csv
Functions:max.missing.days
to+Inf
.Data Validation:
data
anddates
vectors.data
anddates
are neither entirelyNA
nor empty vectors.Index Calculations:
Test Cases:
New test cases ensure that:
NA
values, empty vectors, multiple values per month).