Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#GSOC PR : Add Preprocess Function for Data Cleaning and Validation #3321

Merged
merged 99 commits into from
Aug 2, 2024
Merged
Show file tree
Hide file tree
Changes from 94 commits
Commits
Show all changes
99 commits
Select commit Hold shift + click to select a range
dccd805
Add Preprocess Function for Data Cleaning and Validation
sambhavnoobcoder Jun 27, 2024
eaa0846
Rename Function to NA_preprocess and Add Roxygen Documentation
sambhavnoobcoder Jun 29, 2024
9a887d4
Merge branch 'PecanProject:develop' into Preprocess-Function
sambhavnoobcoder Jun 29, 2024
cbc0a34
added author name and fixed roxygen formatting slightly
sambhavnoobcoder Jun 29, 2024
2879e6a
updated code to work with CNN in place of random forest model
sambhavnoobcoder Jul 11, 2024
8de0277
runner code for the NA_preprocess and NA_Downscale function.
sambhavnoobcoder Jul 11, 2024
e3403e6
Printing Evaulation metrics for the model
sambhavnoobcoder Jul 11, 2024
be698d5
Prepare metrics data for multi-axis line plot visualization
sambhavnoobcoder Jul 11, 2024
35f0a6e
Create multi-metric line plot for ensemble performance visualization
sambhavnoobcoder Jul 11, 2024
48b7c50
Add R-squared plot and combine with MSE/MAE plot
sambhavnoobcoder Jul 11, 2024
ebb32fb
Add scatter plot comparing actual vs predicted values for ensemble mo…
sambhavnoobcoder Jul 11, 2024
8cc689f
Implement Taylor Diagram for ensemble model evaluation
sambhavnoobcoder Jul 11, 2024
7a4a68a
Merge branch 'develop' into Preprocess-Function
dlebauer Jul 11, 2024
064fc30
Updated NA_downscale.Rd with changes with regards to CNN implementation
sambhavnoobcoder Jul 11, 2024
04d439f
Created NA_preprocess.Rd
sambhavnoobcoder Jul 11, 2024
e167ca4
Updated the NA_preprocess to SDA_downscale_preprocess , NA_downscale …
sambhavnoobcoder Jul 12, 2024
f28ebb2
Merge branch 'PecanProject:develop' into Preprocess-Function
sambhavnoobcoder Jul 15, 2024
e524b5e
refactored code leaving only functions in the code
sambhavnoobcoder Jul 15, 2024
a3a92f2
Updated the description of the return type of SDA_preprocess function.
sambhavnoobcoder Jul 15, 2024
9226ef5
Update SDA_downscale function to use base R pipe operator |>
sambhavnoobcoder Jul 17, 2024
f2bab83
Add explicit namespaces for non-base functions
sambhavnoobcoder Jul 17, 2024
cac3c8e
Implement dynamic carbon pool naming in SDA_downscale function
sambhavnoobcoder Jul 17, 2024
ecd5aa1
Improve data scaling to ensure consistency across train and test sets
sambhavnoobcoder Jul 20, 2024
5ce2339
Improve date handling in SDA_downscale_preprocess function
sambhavnoobcoder Jul 20, 2024
bd7cfa5
Refactor SDA_downscale function to accept covariates as direct input
sambhavnoobcoder Jul 20, 2024
a870b93
Updated description for SDA_downscale parameters
sambhavnoobcoder Jul 20, 2024
ca14c09
Renaming variables according to nomenclature standards
sambhavnoobcoder Jul 20, 2024
832801f
Updated documentation wrt variable nomenclature change
sambhavnoobcoder Jul 20, 2024
ce4a597
Add model selection feature to SDA_downscale function
sambhavnoobcoder Jul 21, 2024
d02318f
Update SDA_downscale function documentation
sambhavnoobcoder Jul 21, 2024
ac572da
Refactor SDA_downscale function to remove metrics calculation
sambhavnoobcoder Jul 21, 2024
350278f
Add calculate_metrics function for downscaling results
sambhavnoobcoder Jul 21, 2024
0c4fb82
Add documentation comments to calculate_metrics function
sambhavnoobcoder Jul 21, 2024
7574abc
Refactor SDA_downscale function for improved efficiency
sambhavnoobcoder Jul 21, 2024
6acfd74
Optimize SDA_downscale function and improve covariate handling
sambhavnoobcoder Jul 21, 2024
5b6f577
Create SDA_downscale.Rd
sambhavnoobcoder Jul 21, 2024
50ee452
Create SDA_downscale_preprocess.Rd
sambhavnoobcoder Jul 21, 2024
f812daa
Create calculate_metrics.Rd
sambhavnoobcoder Jul 21, 2024
47656a3
Merge branch 'PecanProject:develop' into Preprocess-Function
sambhavnoobcoder Jul 23, 2024
f55c2de
Delete NA_downscale.Rd
sambhavnoobcoder Jul 23, 2024
d751ffc
Delete NA_preprocess.Rd
sambhavnoobcoder Jul 23, 2024
06bf26b
Renamed function from calculate_metrics to SDA_downscale_metrics
sambhavnoobcoder Jul 23, 2024
bb66142
Refactor SDA_downscale function data prep snippet for improved effici…
sambhavnoobcoder Jul 23, 2024
4d2c6a5
Update SDA_downscale function to make seed optional
sambhavnoobcoder Jul 23, 2024
7e97841
Update SDA_downscale function documentation to improve seeding method…
sambhavnoobcoder Jul 23, 2024
fe5699d
set default model type
sambhavnoobcoder Jul 23, 2024
a20389f
Updated documentation for Default argument
sambhavnoobcoder Jul 23, 2024
1dd9e6c
Removed extra roxygen block
sambhavnoobcoder Jul 23, 2024
35f0b3e
modified title of SDA_downscale function
sambhavnoobcoder Jul 23, 2024
91236ac
Keeping date as a Date type
sambhavnoobcoder Jul 23, 2024
d01f739
Refactor SDA_downscale_preprocess for consistent date handling
sambhavnoobcoder Jul 23, 2024
62a8e44
Updated documentation to suit date type
sambhavnoobcoder Jul 24, 2024
7f782f2
Update documentation for clarification of variable data
sambhavnoobcoder Jul 24, 2024
21a615a
added namespace to functions
sambhavnoobcoder Jul 24, 2024
c8c234a
Unify output structure for RF and CNN models in SDA_downscale function
sambhavnoobcoder Jul 24, 2024
19402db
removed extra description for preprocess function
sambhavnoobcoder Jul 24, 2024
f43a50a
Changed the documentation for predictors for downscale instead of CNN
sambhavnoobcoder Jul 24, 2024
62221d9
Update modules/assim.sequential/R/downscale_function.R
sambhavnoobcoder Jul 24, 2024
0af7df7
Update modules/assim.sequential/R/downscale_function.R
sambhavnoobcoder Jul 24, 2024
1e6a484
Update modules/assim.sequential/R/downscale_function.R
sambhavnoobcoder Jul 24, 2024
529fe6f
update carbon_data call
sambhavnoobcoder Jul 24, 2024
2227fd9
updated full_data preprocess call
sambhavnoobcoder Jul 24, 2024
80e5b2d
Revert "update carbon_data call"
sambhavnoobcoder Jul 24, 2024
9f6554b
Update SDA_downscale.Rd documentation
sambhavnoobcoder Jul 24, 2024
6001fad
Change date type to Date in preprocess function
sambhavnoobcoder Jul 24, 2024
909ae68
Update SDA_downscale.Rd
sambhavnoobcoder Jul 24, 2024
9c08465
Update SDA_downscale_preprocess.Rd
sambhavnoobcoder Jul 24, 2024
3889397
Delete calculate_metrics.Rd
sambhavnoobcoder Jul 24, 2024
71c1013
Create SDA_downscale_metrics.Rd
sambhavnoobcoder Jul 24, 2024
38c9e7a
modified namespaces
sambhavnoobcoder Jul 26, 2024
460672c
Merge branch 'develop' into Preprocess-Function
mdietze Jul 26, 2024
7859206
Update NAMESPACE
sambhavnoobcoder Jul 28, 2024
e8c40ac
Update DESCRIPTION with keras3
sambhavnoobcoder Jul 28, 2024
53a9cae
Update pecan_package_dependencies.csv
sambhavnoobcoder Jul 28, 2024
b9cc4fb
Update pecan_package_dependencies.csv for some changes
sambhavnoobcoder Jul 28, 2024
d659567
Update NAMESPACE
sambhavnoobcoder Jul 28, 2024
aacc890
Reverting NAMESPACE
sambhavnoobcoder Jul 28, 2024
2d248d9
Reverting pecan_package_dependencies.csv to original
sambhavnoobcoder Jul 28, 2024
c9fdd7b
Update DESCRIPTION removing keras3
sambhavnoobcoder Jul 29, 2024
172bf55
degraded roxygen version to 7.3.1
sambhavnoobcoder Jul 29, 2024
5e43b35
Revert to last successful version
sambhavnoobcoder Jul 29, 2024
fa209b4
added keras3 to the suggests
sambhavnoobcoder Jul 29, 2024
b1bd57f
Update DESCRIPTION
sambhavnoobcoder Jul 29, 2024
fa9ed04
Update NAMESPACE
sambhavnoobcoder Jul 29, 2024
0cf995a
Update DESCRIPTION
sambhavnoobcoder Jul 29, 2024
6686b8d
Update NAMESPACE
sambhavnoobcoder Jul 29, 2024
7a66814
Update NAMESPACE
mdietze Jul 30, 2024
a25e663
Update NAMESPACE
mdietze Jul 30, 2024
b7ac546
Update NAMESPACE
mdietze Jul 30, 2024
b61fd4e
Update modules/assim.sequential/DESCRIPTION
mdietze Jul 30, 2024
d976a02
Merge branch 'develop' into Preprocess-Function
mdietze Jul 30, 2024
95018e4
Merge branch 'PecanProject:develop' into Preprocess-Function
sambhavnoobcoder Jul 31, 2024
732b966
Update pecan_package_dependencies.csv
sambhavnoobcoder Jul 31, 2024
6389e93
Update pecan_package_dependencies.csv
sambhavnoobcoder Jul 31, 2024
91ef69f
Update modules/assim.sequential/DESCRIPTION
mdietze Aug 1, 2024
1d215b7
Update modules/assim.sequential/DESCRIPTION
mdietze Aug 1, 2024
7b9ec87
Update docker/depends/pecan_package_dependencies.csv
mdietze Aug 1, 2024
91d3da6
Merge branch 'develop' into Preprocess-Function
mdietze Aug 1, 2024
b32e5e3
Merge branch 'develop' into Preprocess-Function
mdietze Aug 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docker/depends/pecan_package_dependencies.csv
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,7 @@
"jsonlite","*","models/stics","Imports",FALSE
"jsonlite","*","modules/data.atmosphere","Imports",FALSE
"jsonlite","*","modules/data.remote","Suggests",FALSE
"keras3","*","modules/assim.sequential","Suggests",FALSE
mdietze marked this conversation as resolved.
Show resolved Hide resolved
"knitr","*","base/visualization","Suggests",FALSE
"knitr","*","modules/data.atmosphere","Suggests",FALSE
"knitr",">= 1.42","base/db","Suggests",FALSE
Expand Down
1 change: 1 addition & 0 deletions modules/assim.sequential/DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ Suggests:
plotrix,
plyr (>= 1.8.4),
randomForest,
keras3,
mdietze marked this conversation as resolved.
Show resolved Hide resolved
raster,
readr,
reshape2 (>= 1.4.2),
Expand Down
273 changes: 211 additions & 62 deletions modules/assim.sequential/R/downscale_function.R
Original file line number Diff line number Diff line change
@@ -1,88 +1,237 @@
##' @title North America Downscale Function
##' @name NA_downscale
##' @author Joshua Ploshay
##' @title Preprocess Data for Downscaling
##' @name SDA_downscale_preprocess
##' @author Sambhav Dixit
##'
##' @param data In quotes, file path for .rds containing ensemble data.
##' @param coords In quotes, file path for .csv file containing the site coordinates, columns named "lon" and "lat".
##' @param date In quotes, if SDA site run, format is yyyy/mm/dd, if NEON, yyyy-mm-dd. Restricted to years within file supplied to 'data'.
##' @param C_pool In quotes, carbon pool of interest. Name must match carbon pool name found within file supplied to 'data'.
##' @param covariates SpatRaster stack, used as predictors in randomForest. Layers within stack should be named. Recommended that this stack be generated using 'covariates' instructions in assim.sequential/inst folder
##' @details This function will downscale forecast data to unmodeled locations using covariates and site locations
##' @param data_path Character. File path for .rds containing ensemble data.
##' @param coords_path Character. File path for .csv file containing the site coordinates, with columns named "lon" and "lat".
mdietze marked this conversation as resolved.
Show resolved Hide resolved
##' @param date Date. If SDA site run, format is yyyy/mm/dd; if NEON, yyyy-mm-dd. Restricted to years within the file supplied to 'data_path'.
##' @param carbon_pool Character. Carbon pool of interest. Name must match the carbon pool name found within the file supplied to 'data_path'.
##' @details This function ensures that the specified date and carbon pool are present in the input data. It also checks the validity of the site coordinates and aligns the number of rows between site coordinates and carbon data.
##'
##' @description This function uses the randomForest model.
##' @description This function reads and checks the input data, ensuring that the required date and carbon pool exist, and that the site coordinates are valid.
##'
##' @return It returns the `downscale_output` list containing lists for the training and testing data sets, models, and predicted maps for each ensemble member.

##' @return A list containing The read .rds data , The cleaned site coordinates, and the preprocessed carbon data.

NA_downscale <- function(data, coords, date, C_pool, covariates){

SDA_downscale_preprocess <- function(data_path, coords_path, date, carbon_pool) {
# Read the input data and site coordinates
input_data <- readRDS(data)
site_coordinates <- terra::vect(readr::read_csv(coords), geom=c("lon", "lat"), crs="EPSG:4326")
input_data <- readRDS(data_path)
site_coordinates <- readr::read_csv(coords_path)

# Convert input_data names to Date objects
input_date_names <- lubridate::ymd(names(input_data))
names(input_data) <- input_date_names

# Convert the input date to a Date object
standard_date <- lubridate::ymd(date)

# Ensure the date exists in the input data
if (!standard_date %in% input_date_names) {
stop(paste("Date", date, "not found in the input data."))
}

# Extract the carbon data for the specified focus year
index <- which(names(input_data) == date)
index <- which(input_date_names == standard_date)
data <- input_data[[index]]
carbon_data <- as.data.frame(t(data[which(names(data) == C_pool)]))
names(carbon_data) <- paste0("ensemble",seq(1:ncol(carbon_data)))

# Extract predictors from covariates raster using site coordinates
predictors <- as.data.frame(terra::extract(covariates, site_coordinates,ID = FALSE))

# Combine each ensemble member with all predictors
ensembles <- list()
for (i in seq_along(carbon_data)) {
ensembles[[i]] <- cbind(carbon_data[[i]], predictors)
# Ensure the carbon pool exists in the input data
if (!carbon_pool %in% names(data)) {
stop(paste("Carbon pool", carbon_pool, "not found in the input data."))
}

# Rename the carbon_data column for each ensemble member
for (i in 1:length(ensembles)) {
ensembles[[i]] <- dplyr::rename(ensembles[[i]], "carbon_data" = "carbon_data[[i]]")
carbon_data <- as.data.frame(t(data[which(names(data) == carbon_pool)]))
names(carbon_data) <- paste0("ensemble", seq(ncol(carbon_data)))

# Ensure site coordinates have 'lon' and 'lat' columns
if (!all(c("lon", "lat") %in% names(site_coordinates))) {
stop("Site coordinates must contain 'lon' and 'lat' columns.")
}

# Split the observations in each data frame into two data frames based on the proportion of 3/4
ensembles <- lapply(ensembles, function(df) {
sample <- sample(1:nrow(df), size = round(0.75*nrow(df)))
train <- df[sample, ]
test <- df[-sample, ]
split_list <- list(train, test)
return(split_list)
})

# Rename the training and testing data frames for each ensemble member
for (i in 1:length(ensembles)) {
# names(ensembles) <- paste0("ensemble",seq(1:length(ensembles)))
names(ensembles[[i]]) <- c("training", "testing")
# Ensure the number of rows in site coordinates matches the number of rows in carbon data
if (nrow(site_coordinates) != nrow(carbon_data)) {
message("Number of rows in site coordinates does not match the number of rows in carbon data.")
if (nrow(site_coordinates) > nrow(carbon_data)) {
message("Truncating site coordinates to match carbon data rows.")
site_coordinates <- site_coordinates[1:nrow(carbon_data), ]
} else {
message("Truncating carbon data to match site coordinates rows.")
carbon_data <- carbon_data[1:nrow(site_coordinates), ]
}
}

# Train a random forest model for each ensemble member using the training data
rf_output <- list()
for (i in 1:length(ensembles)) {
rf_output[[i]] <- randomForest::randomForest(ensembles[[i]][[1]][["carbon_data"]] ~ land_cover+tavg+prec+srad+vapr+nitrogen+phh2o+soc+sand,
data = ensembles[[i]][[1]],
ntree = 1000,
na.action = stats::na.omit,
keep.forest = T,
importance = T)
message("Preprocessing completed successfully.")
return(list(input_data = input_data, site_coordinates = site_coordinates, carbon_data = carbon_data))
}

##' @title SDA Downscale Function
##' @name SDA_downscale
##' @author Joshua Ploshay, Sambhav Dixit
##'
##' @param preprocessed List. Preprocessed data returned as an output from the SDA_downscale_preprocess function.
##' @param date Date. If SDA site run, format is yyyy/mm/dd; if NEON, yyyy-mm-dd. Restricted to years within file supplied to 'preprocessed' from the 'data_path'.
##' @param carbon_pool Character. Carbon pool of interest. Name must match carbon pool name found within file supplied to 'preprocessed' from the 'data_path'.
##' @param covariates SpatRaster stack. Used as predictors in downscaling. Layers within stack should be named. Recommended that this stack be generated using 'covariates' instructions in assim.sequential/inst folder
##' @param model_type Character. Either "rf" for Random Forest or "cnn" for Convolutional Neural Network. Default is Random Forest.
##' @param seed Numeric or NULL. Optional seed for random number generation. Default is NULL.
##' @details This function will downscale forecast data to unmodeled locations using covariates and site locations
##'
##' @description This function uses either Random Forest or Convolutional Neural Network model based on the model_type parameter.
##'
##' @return A list containing the training and testing data sets, models, predicted maps for each ensemble member, and predictions for testing data.

SDA_downscale <- function(preprocessed, date, carbon_pool, covariates, model_type = "rf", seed = NULL) {
carbon_data <- preprocessed$carbon_data

mdietze marked this conversation as resolved.
Show resolved Hide resolved
# Convert site coordinates to SpatVector
site_coordinates <- terra::vect(preprocessed$site_coordinates, geom = c("lon", "lat"), crs = "EPSG:4326")

# Extract predictors from covariates raster using site coordinates
predictors <- as.data.frame(terra::extract(covariates, site_coordinates, ID = FALSE))

# Dynamically get covariate names
covariate_names <- names(predictors)

# Create a single data frame with all predictors and ensemble data
full_data <- cbind(carbon_data, predictors)

mdietze marked this conversation as resolved.
Show resolved Hide resolved
# Split the observations into training and testing sets
if (!is.null(seed)) {
set.seed(seed) # Only set seed if provided
}
sample <- sample(1:nrow(full_data), size = round(0.75 * nrow(full_data)))
train_data <- full_data[sample, ]
test_data <- full_data[-sample, ]

# Prepare data for both RF and CNN
x_data <- as.matrix(full_data[, covariate_names])
y_data <- as.matrix(carbon_data)

# Calculate scaling parameters from all data
scaling_params <- list(
mean = colMeans(x_data),
sd = apply(x_data, 2, stats::sd)
)

# Normalize the data
x_data_scaled <- scale(x_data, center = scaling_params$mean, scale = scaling_params$sd)

# Generate predictions (maps) for each ensemble member using the trained models
maps <- list(ncol(rf_output))
for (i in 1:length(rf_output)) {
maps[[i]] <- terra::predict(object = covariates,
model = rf_output[[i]],na.rm = T)
mdietze marked this conversation as resolved.
Show resolved Hide resolved
# Split into training and testing sets
x_train <- x_data_scaled[sample, ]
x_test <- x_data_scaled[-sample, ]
y_train <- y_data[sample, ]
y_test <- y_data[-sample, ]

# Initialize lists for outputs
models <- list()
maps <- list()
predictions <- list()

if (model_type == "rf") {
for (i in seq_along(carbon_data)) {
ensemble_col <- paste0("ensemble", i)
formula <- stats::as.formula(paste(ensemble_col, "~", paste(covariate_names, collapse = " + ")))
models[[i]] <- randomForest::randomForest(formula,
data = train_data,
ntree = 1000,
na.action = stats::na.omit,
keep.forest = TRUE,
importance = TRUE)

maps[[i]] <- terra::predict(covariates, model = models[[i]], na.rm = TRUE)
predictions[[i]] <- stats::predict(models[[i]], test_data)
}
} else if (model_type == "cnn") {
x_train <- keras3::array_reshape(x_train, c(nrow(x_train), 1, ncol(x_train)))
x_test <- keras3::array_reshape(x_test, c(nrow(x_test), 1, ncol(x_test)))

for (i in seq_along(carbon_data)) {
model <- keras3::keras_model_sequential() |>
keras3::layer_conv_1d(filters = 64, kernel_size = 1, activation = 'relu', input_shape = c(1, length(covariate_names))) |>
keras3::layer_flatten() |>
keras3::layer_dense(units = 64, activation = 'relu') |>
keras3::layer_dense(units = 1)

model |> keras3::compile(
loss = 'mean_squared_error',
optimizer = keras3::optimizer_adam(),
metrics = c('mean_absolute_error')
)

model |> keras3::fit(
x = x_train,
y = y_train[, i],
epochs = 100,
batch_size = 32,
validation_split = 0.2,
verbose = 0
)

models[[i]] <- model

cnn_predict <- function(model, newdata, scaling_params) {
newdata <- scale(newdata, center = scaling_params$mean, scale = scaling_params$sd)
newdata <- keras3::array_reshape(newdata, c(nrow(newdata), 1, ncol(newdata)))
predictions <- stats::predict(model, newdata)
return(as.vector(predictions))
}

prediction_rast <- terra::rast(covariates)
maps[[i]] <- terra::predict(prediction_rast, model = models[[i]],
fun = cnn_predict,
scaling_params = scaling_params)

predictions[[i]] <- cnn_predict(models[[i]], x_data[-sample, ], scaling_params)
}
} else {
stop("Invalid model_type. Please choose either 'rf' for Random Forest or 'cnn' for Convolutional Neural Network.")
}

# Organize the results into a single output list
downscale_output <- list(ensembles, rf_output, maps)
downscale_output <- list(
data = list(training = train_data, testing = test_data),
models = models,
maps = maps,
predictions = predictions,
scaling_params = scaling_params
)

# Rename each element of the output list with appropriate ensemble numbers
for (i in 1:length(downscale_output)) {
names(downscale_output[[i]]) <- paste0("ensemble",seq(1:length(downscale_output[[i]])))
for (i in seq_along(carbon_data)) {
names(downscale_output$models)[i] <- paste0("ensemble", i)
names(downscale_output$maps)[i] <- paste0("ensemble", i)
names(downscale_output$predictions)[i] <- paste0("ensemble", i)
}

# Rename the main components of the output list
names(downscale_output) <- c("data", "models", "maps")

return(downscale_output)
}

##' @title Calculate Metrics for Downscaling Results
##' @name SDA_downscale_metrics
##' @author Sambhav Dixit
##'
##' @param downscale_output List. Output from the SDA_downscale function, containing data, models, maps, and predictions for each ensemble.
##' @param carbon_pool Character. Name of the carbon pool used in the downscaling process.
##'
##' @details This function calculates performance metrics for the downscaling results. It computes Mean Squared Error (MSE), Mean Absolute Error (MAE), and R-squared for each ensemble. The function uses the actual values from the testing data and the predictions generated during the downscaling process.
##'
##' @description This function takes the output from the SDA_downscale function and computes various performance metrics for each ensemble. It provides a way to evaluate the accuracy of the downscaling results without modifying the main downscaling function.
##'
##' @return A list of metrics for each ensemble, where each element contains MAE , MSE ,R_squared ,actual values from testing data and predicted values for the testing data

SDA_downscale_metrics <- function(downscale_output, carbon_pool) {
metrics <- list()

for (i in 1:length(downscale_output$data)) {
actual <- downscale_output$data[[i]]$testing[[paste0(carbon_pool, "_ens", i)]]
predicted <- downscale_output$predictions[[i]]

mse <- mean((actual - predicted)^2)
mae <- mean(abs(actual - predicted))
r_squared <- 1 - sum((actual - predicted)^2) / sum((actual - mean(actual))^2)

metrics[[i]] <- list(MSE = mse, MAE = mae, R_squared = r_squared, actual = actual, predicted = predicted)
}

names(metrics) <- paste0("ensemble", seq_along(metrics))

return(metrics)
}
31 changes: 0 additions & 31 deletions modules/assim.sequential/man/NA_downscale.Rd

This file was deleted.

40 changes: 40 additions & 0 deletions modules/assim.sequential/man/SDA_downscale.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading