Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Single Monthly Values in Scalar and Generic Classes #42

Open
wants to merge 13 commits into
base: i38-generic-var-templates
Choose a base branch
from

Conversation

QSparks
Copy link
Contributor

@QSparks QSparks commented Oct 30, 2024

Summary

This pull request introduces new wrapper functions to support single monthly values for both scalar and vector climate data. These functions create ClimdexGenericScalar and ClimdexGenericVector objects with a constraint that enforces only one value per month. Additionally, they automatically set the max.missing.days parameter to +Inf, enabling more flexible data handling.

Key Enhancements

New climdexSingleMonthlyScalar.raw / .csv and climdexSingleMonthlyVector.raw / .csv Functions:

  • Added functions to handle scalar or vector climate data with a single value per month constraint.
  • Automatically sets max.missing.days to +Inf.

Data Validation:

  • Single value per month constraint: Checks to ensure that there is no more than one value per month in input data and dates vectors.
  • Ensures that data and dates are neither entirely NA nor empty vectors.
  • Validates that each date corresponds to the 1st day of each month and raises errors if there is more than one value per month or if the dates are inconsistent.

Index Calculations:

  • Added warnings for cases where single-value-per-month data might impact index calculations (e.g., exact dates or monthly statistics).

Test Cases:

New test cases ensure that:

  • Single-value-per-month constraints are enforced correctly.
  • Errors are raised for invalid or inconsistent data (e.g., NA values, empty vectors, multiple values per month).
  • Index calculations produce accurate results with single-value-per-month data.

@QSparks QSparks self-assigned this Oct 30, 2024
@QSparks QSparks requested a review from rod-glover October 30, 2024 21:22
@QSparks QSparks marked this pull request as ready for review October 30, 2024 21:22
Copy link

@rod-glover rod-glover left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall excellent, nice clean code.

A few minor suggested changes, which you can adopt at your discretion, except I do recommend DRYing up the monthly data date-checking code, which is copy-pasted.

unique_months <- unique(format(valid_dates, "%Y-%m"))
day_of_month <- as.integer(format(valid_dates, "%d"))

# Check that the length of unique months matches the number of dates, ensuring only one value per month

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to require 12 months? Or is it legitimate to have fewer than that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no requirement for inputs to have multiples of 12 months. I’ve added a test to reflect that.

northern.hemisphere = northern.hemisphere,
calendar = calendar
)
return(obj)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice and clean.

Q: Would it be cleaner to simplify to

return climdexGenericScalar.raw(
  ...
);

and not use obj?

(This would apply in several places in this codebase.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

R/climdexGenericScalar.R Outdated Show resolved Hide resolved
@@ -45,6 +45,9 @@ climdexGenericVector.raw <- function(
if (missing(secondary)) {
stop("Secondary data argument is missing.")
}
if (length(secondary) == 0) {
stop("Secondary must not be an empty vector.")
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you instead call check.generic.argument.validity on secondary? (Might need to make its stop messages more generic -- or parametrize the name of the data used in them.) That would repeat some checks but it might end up being simpler to be sure we are validating everything completely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve moved the secondary and format checks into check.generic.argument.validity with a flag for when we pass the extra secondary and format vector parameters. I’ve moved the data & dates check to an internal validate_data_dates function that we call with scalar or primary and secondary vector data.

if (!all(day_of_month == 1)) {
stop("Data must be on the 1st day of each month.")
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The date checking is an exact copy of code in another function. Suggest we DRY that up in a check function called in each place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to a util function check.single.month.dates.

dates = dates,
northern.hemisphere = TRUE,
calendar = "gregorian"
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check basic things about the output of climdexSingleMonthlyScalar.raw, such as

  • data out is the same as scalar_data
  • dates out same as dates in
  • etc.

Especially as we are relying on it to check the result of climdexSingleMonthlyScalar.csv

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've introduced a new validation function that applies to both scalar and vector, raw and CSV construction tests, to check basic items. Please note that the output is infilled with NA values and missing dates, and some filtering is necessary to align the output with the original input set.

format = "polar",
northern.hemisphere = TRUE,
calendar = "gregorian"
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question re. basic things about output of climdexSingleMonthlyVector.ra

checkTrue(
!inherits(result, "try-error"),
"Function raised an error despite valid monthly data."
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand: Why wouldn't an NA value make it throw an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible that climate data may be incomplete for certain dates, for example, due to sensor issues. While we cannot accept data with NA dates, we do accept missing data for all our input classes.

)

checkEquals(length(scalar_obj@data[!is.na(scalar_obj@data)]), n_months, "Large dataset not handled correctly")
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without myself checking for every possible error message, this looks thorough and like it covers all those error conditions. Well done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants