Support Single Monthly Values in Scalar and Generic Classes #42

QSparks · 2024-10-30T20:50:40Z

Summary

This pull request introduces new wrapper functions to support single monthly values for both scalar and vector climate data. These functions create ClimdexGenericScalar and ClimdexGenericVector objects with a constraint that enforces only one value per month. Additionally, they automatically set the max.missing.days parameter to +Inf, enabling more flexible data handling.

Key Enhancements

New `climdexSingleMonthlyScalar.raw` / `.csv` and `climdexSingleMonthlyVector.raw` / `.csv` Functions:

Added functions to handle scalar or vector climate data with a single value per month constraint.
Automatically sets max.missing.days to +Inf.

Data Validation:

Single value per month constraint: Checks to ensure that there is no more than one value per month in input data and dates vectors.
Ensures that data and dates are neither entirely NA nor empty vectors.
Validates that each date corresponds to the 1st day of each month and raises errors if there is more than one value per month or if the dates are inconsistent.

Index Calculations:

Added warnings for cases where single-value-per-month data might impact index calculations (e.g., exact dates or monthly statistics).

Test Cases:

New test cases ensure that:

Single-value-per-month constraints are enforced correctly.
Errors are raised for invalid or inconsistent data (e.g., NA values, empty vectors, multiple values per month).
Index calculations produce accurate results with single-value-per-month data.

…nto i39-single-val-per-month

rod-glover

Overall excellent, nice clean code.

A few minor suggested changes, which you can adopt at your discretion, except I do recommend DRYing up the monthly data date-checking code, which is copy-pasted.

rod-glover · 2024-11-04T21:33:29Z

R/climdexGenericScalar.R

+  unique_months <- unique(format(valid_dates, "%Y-%m"))
+  day_of_month <- as.integer(format(valid_dates, "%d"))
+
+  # Check that the length of unique months matches the number of dates, ensuring only one value per month


Do we need to require 12 months? Or is it legitimate to have fewer than that?

There is no requirement for inputs to have multiples of 12 months. I’ve added a test to reflect that.

rod-glover · 2024-11-04T21:35:24Z

R/climdexGenericScalar.R

+    northern.hemisphere = northern.hemisphere,
+    calendar = calendar
+  )
+  return(obj)


Nice and clean.

Q: Would it be cleaner to simplify to

return climdexGenericScalar.raw( ... );

and not use obj?

(This would apply in several places in this codebase.)

R/climdexGenericScalar.R

rod-glover · 2024-11-04T21:42:18Z

R/climdexGenericVector.R

@@ -45,6 +45,9 @@ climdexGenericVector.raw <- function(
  if (missing(secondary)) {
    stop("Secondary data argument is missing.")
  }
+  if (length(secondary) == 0) {
+    stop("Secondary must not be an empty vector.")
+  }


Could you instead call check.generic.argument.validity on secondary? (Might need to make its stop messages more generic -- or parametrize the name of the data used in them.) That would repeat some checks but it might end up being simpler to be sure we are validating everything completely.

I’ve moved the secondary and format checks into check.generic.argument.validity with a flag for when we pass the extra secondary and format vector parameters. I’ve moved the data & dates check to an internal validate_data_dates function that we call with scalar or primary and secondary vector data.

rod-glover · 2024-11-04T21:44:56Z

R/climdexGenericVector.R

+  if (!all(day_of_month == 1)) {
+    stop("Data must be on the 1st day of each month.")
+  }


The date checking is an exact copy of code in another function. Suggest we DRY that up in a check function called in each place.

Moved to a util function check.single.month.dates.

rod-glover · 2024-11-04T21:55:55Z

tests/test_single_value_per_month.R

+    dates = dates,
+    northern.hemisphere = TRUE,
+    calendar = "gregorian"
+  )


Do we need to check basic things about the output of climdexSingleMonthlyScalar.raw, such as

data out is the same as scalar_data

dates out same as dates in

etc.

Especially as we are relying on it to check the result of climdexSingleMonthlyScalar.csv

I've introduced a new validation function that applies to both scalar and vector, raw and CSV construction tests, to check basic items. Please note that the output is infilled with NA values and missing dates, and some filtering is necessary to align the output with the original input set.

rod-glover · 2024-11-04T21:59:29Z

tests/test_single_value_per_month.R

+    format = "polar",
+    northern.hemisphere = TRUE,
+    calendar = "gregorian"
+  )


Same question re. basic things about output of climdexSingleMonthlyVector.ra

rod-glover · 2024-11-04T22:00:58Z

tests/test_single_value_per_month.R

+  checkTrue(
+    !inherits(result, "try-error"),
+    "Function raised an error despite valid monthly data."
+  )


Not sure I understand: Why wouldn't an NA value make it throw an error?

It's possible that climate data may be incomplete for certain dates, for example, due to sensor issues. While we cannot accept data with NA dates, we do accept missing data for all our input classes.

rod-glover · 2024-11-04T22:04:00Z

tests/test_single_value_per_month.R

+  )
+
+  checkEquals(length(scalar_obj@data[!is.na(scalar_obj@data)]), n_months, "Large dataset not handled correctly")
+}


Without myself checking for every possible error message, this looks thorough and like it covers all those error conditions. Well done.

QSparks added 9 commits October 25, 2024 15:10

WIP: Adding support for single vals per month

3324eac

Add testing for single val per month

4fa5cc8

Merge remote-tracking branch 'origin_ssh/i38-generic-var-templates' i…

7d026fc

…nto i39-single-val-per-month

Add examples and cross-refs to docs

bb5d3c8

Resolve merge conflicts

b1688cc

Resolve merge conflicts

3c229f7

Add single value per month testing

7bdf395

Update tests and add validation for empty or NA inputs

6c3f6ff

Add descriptive error messages to document failed checkFuncs

7e6e2a3

QSparks self-assigned this Oct 30, 2024

Use obj date.factors for single monthly value check

ec6f655

QSparks requested a review from rod-glover October 30, 2024 21:22

QSparks marked this pull request as ready for review October 30, 2024 21:22

rod-glover approved these changes Nov 4, 2024

View reviewed changes

QSparks added 3 commits November 7, 2024 12:54

Address PR feedback

c28faec

Add missing dates check, formatting

fa08c1a

Include NA cases in basic validity check

600d7d5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Single Monthly Values in Scalar and Generic Classes #42

Support Single Monthly Values in Scalar and Generic Classes #42

QSparks commented Oct 30, 2024 •

edited

Loading

rod-glover left a comment

rod-glover Nov 4, 2024

QSparks Nov 7, 2024

rod-glover Nov 4, 2024

QSparks Nov 7, 2024

rod-glover Nov 4, 2024

QSparks Nov 7, 2024

rod-glover Nov 4, 2024

QSparks Nov 7, 2024

rod-glover Nov 4, 2024

QSparks Nov 8, 2024

rod-glover Nov 4, 2024

rod-glover Nov 4, 2024

QSparks Nov 8, 2024

rod-glover Nov 4, 2024

Support Single Monthly Values in Scalar and Generic Classes #42

Are you sure you want to change the base?

Support Single Monthly Values in Scalar and Generic Classes #42

Conversation

QSparks commented Oct 30, 2024 • edited Loading

Summary

Key Enhancements

New climdexSingleMonthlyScalar.raw / .csv and climdexSingleMonthlyVector.raw / .csv Functions:

Data Validation:

Index Calculations:

Test Cases:

rod-glover left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

QSparks commented Oct 30, 2024 •

edited

Loading

New `climdexSingleMonthlyScalar.raw` / `.csv` and `climdexSingleMonthlyVector.raw` / `.csv` Functions: