Version 3.2.0

pbreheny · Feb 18, 2025 · 92c117f · 92c117f
1 parent 78c4e8d
commit 92c117f
Show file tree

Hide file tree

Showing 6 changed files with 41 additions and 46 deletions.
diff --git a/.version.json b/.version.json
@@ -1,6 +1,6 @@
 {
   "schemaVersion": 1,
   "label": "GitHub",
-  "message": "4.1.0",
+  "message": "4.2.0",
   "color": "blue"
 }
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,7 +1,7 @@
 Package: plmmr
 Title: Penalized Linear Mixed Models for Correlated Data
-Version: 4.1.1.0
-Date: 2025-01-30
+Version: 4.2.0
+Date: 2025-02-18
 Authors@R: c(
     person("Tabitha K.", "Peter", , "[email protected]", role = "aut",
            comment = c(ORCID = "0009-0005-2524-4751")),

diff --git a/NEWS.md b/NEWS.md
@@ -1,61 +1,59 @@
-# plmmr 4.1.0.5 (2025-01-22)
+# plmmr 4.2.0
 
-## Bug fixes
+- **Bug in BLUP**: We caught a mathematical error in our earlier implementation of best linear unbiased prediction. The issue had to do with an inconsistency in the scaling among the terms used in constructing this predictor. This issue impacted prediction within cross-validation as well as the `predict()` method for our `plmm` class.
 
-We have recently caught a couple of bugs in our model fitting functions -- we apologize for any errors these may have caused in downstream analysis, and we explain how we have addressed these issues below:
+- **Bug in processing delimited files**: We noticed a bug in the way that models were fit to data from delimited files. The previous version was not correctly implementing the transformation of model results from the standardized scale to the original scale due to the inadvertent addition of two rows in the `beta_vals` object (only one row should be added, for the intercept).
 
--   **Bug in BLUP**: We caught a mathematical error in our earlier implementation of best linear unbiased prediction. The issue had to do with an inconsistency in the scaling among the terms used in constructing this predictor. This issue impacted prediction within cross-validation as well as the `predict()` method for our `plmm` class. We recommend that users who had used best linear unbiased prediction (BLUP) in previous analysis re-run their analysis using this corrected version.
+- **Clarification of parallelization option for cross-validation:** The `cv_plmm()` method offers parallelization through the `cluster` option – we have now clarified in our documentation that at this time, this option is only available for analyzing data stored in-memory. We have added an example that demonstrates this option at work in the article for analyzing matrix data.
+- **Change of default settings for prediction**: The default prediction method in both `predict()` and `cv_plmm()` is now 'blup' (best linear unbiased prediction).
+- **Change in objects returned by default in** `plmm()`: By default, the main model fitting function `plmm()` now returns the filepath for `std_X` when the design matrix is stored file-backed; `plmm()` also returns `y` (the outcome vector used to fit the model), and `std_Xbeta` (the linear predictors on the standardized scale). These components are used to construct the best linear unbiased predictor.
+- **Change in arguments passed to** `predict()`: In tandem with the change in what is returned by `plmm()` by default, the `predict()` method no longer needs a separate `y` argument to be supplied for `type = 'blup'`.
+- **Change in arguments supplied to** `plmm()` and `cv_plmm()`: the option `compact_save` no longer exists; instead, `save_rds` offers the option to save .rds/.log files, and `return_fit` offers the option to return the output of `plmm()` in the current R session. Note that .log files are now only constructed when `save_rds = TRUE`.
 
--   **Bug in processing delimited files**: We noticed a bug in the way that models were fit to data from delimited files. The previous version was not correctly implementing the transformation of model results from the standardized scale to the original scale due to the inadvertent addition of two rows in the `beta_vals` object (only one row should be added, for the intercept). This error has been corrected. We recommend that users who have used the previous version of **plmmr** to analyze data from delimited files re-run their analyses.
+# plmmr 4.1.0
 
-## Other changes
+- **Restore plmm(X,y) syntax**: Where version 4.0.0 required that `create_design()` always be called prior to `plmm()` or `cv_plmm()`; this update restores the X,y syntax consistent with other packages (e.g., `glmnet`, `ncvreg`). Note that this syntax is only available for the case where the design matrix is stored in-memory as a `matrix` or `data.frame` object. The `create_design()` function is still required for cases where the design matrix/dataset is stored in an external file.
 
--   **Clarification of parallelization option for cross-validation:** The `cv_plmm()` method offers parallelization through the `cluster` option – we have now clarified in our documentation that at this time, this option is only available for analyzing data stored in-memory. We have added an example that demonstrates this option at work in the article for analyzing matrix data.
--   **Change of default settings for prediction**: The default prediction method in both `predict()` and `cv_plmm()` is now 'blup' (best linear unbiased prediction).
--   **Change in objects returned by default in** `plmm()`: By default, the main model fitting function `plmm()` now returns the filepath for `std_X` when the design matrix is stored file-backed; `plmm()` also returns `y` (the outcome vector used to fit the model), and `std_Xbeta` (the linear predictors on the standardized scale). These components are used to construct the best linear unbiased predictor.
--   **Change in arguments passed to** `predict()`: In tandem with the change in what is returned by `plmm()` by default, the `predict()` method no longer needs a separate `y` argument to be supplied for `type = 'blup'`.
--   **Change in arguments supplied to** `plmm()` and `cv_plmm()`: the option `compact_save` no longer exists; instead, `save_rds` offers the option to save .rds/.log files, and `return_fit` offers the option to return the output of `plmm()` in the current R session. Note that .log files are now only constructed when `save_rds = TRUE`.
+- **Bug fix**: The 4.0.0 version of `create_design()` required `X` to have column names, and errored out with an uninformative message if no names were supplied (see issue 61). This is now fixed -- column names are not required unless the user wants to specify an argument to `unpen`.
 
-# plmmr 4.1.0 (2024-10-23)
+- **Argument name change**: In `create_design()`, the argument to specify an outcome in the in-memory case has been renamed to `y`; this makes the syntax consistent, e.g., `create_design(X, y)`. Note again that this change is relevant to in-memory data only.
 
--   **Restore plmm(X,y) syntax**: Where version 4.0.0 required that `create_design()` always be called prior to `plmm()` or `cv_plmm()`; this update restores the X,y syntax consistent with other packages (e.g., `glmnet`, `ncvreg`). Note that this syntax is only available for the case where the design matrix is stored in-memory as a `matrix` or `data.frame` object. The `create_design()` function is still required for cases where the design matrix/dataset is stored in an external file.
+- **Internal:** Fixed LTO type mismatch bug.
 
--   **Bug fix**: The 4.0.0 version of `create_design()` required `X` to have column names, and errored out with an uninformative message if no names were supplied (see issue 61). This is now fixed -- column names are not required unless the user wants to specify an argument to `unpen`.
+# plmmr 4.0.0
 
--   **Argument name change**: In `create_design()`, the argument to specify an outcome in the in-memory case has been renamed to `y`; this makes the syntax consistent, e.g., `create_design(X, y)`. Note again that this change is relevant to in-memory data only.
+- **Major re-structuring of preprocessing pipeline:** Data from external files must now be processed with `process_plink()` or `process_delim()`. All data (including in-memory data) must be prepared for analysis via `create_design()`. This change ensures that data are funneled into a uniform format for analysis.
 
--   **Internal:** Fixed LTO type mismatch bug.
+- **Documentation updated:** The vignettes for the package are now all revised to include examples of the complete pipeline with the new `create_design()` syntax. There is an article for each type of data input (matrix/data.frame, delimited file, and PLINK).
 
-# plmmr 4.0.0 (2024-10-07)
+- **CRAN:** The package is on CRAN now.
 
--   **Major re-structuring of preprocessing pipeline:** Data from external files must now be processed with `process_plink()` or `process_delim()`. All data (including in-memory data) must be prepared for analysis via `create_design()`. This change ensures that data are funneled into a uniform format for analysis.
+# plmmr 3.2.0
+_2024-09-02_
 
--   **Documentation updated:** The vignettes for the package are now all revised to include examples of the complete pipeline with the new `create_design()` syntax. There is an article for each type of data input (matrix/data.frame, delimited file, and PLINK).
+- **bigsnpr now in Suggests, not Imports:** The essential filebacking support is now all done with `bigmemory` and `bigalgebra`. The `bigsnpr` package is used only for processing PLINK files.
 
--   **CRAN:** The package is on CRAN now.
+- **dev branch gwas_scale** has a version of the pipeline that runs completely file-backed.
 
-# plmmr 3.2.0 (2024-09-02)
+# plmmr 3.1.0
+_2024-07-13_
 
--   **bigsnpr now in Suggests, not Imports:** The essential filebacking support is now all done with `bigmemory` and `bigalgebra`. The `bigsnpr` package is used only for processing PLINK files.
+- **Enhancement:** To make `plmmr` have better functionality for writing scripts, the functions `process_plink()`, `plmmm()`, and `cv_plmm()` now (optionally) write '.log' files, as in PLINK.
 
--   **dev branch gwas_scale** has a version of the pipeline that runs completely file-backed.
+- **Enhancement:** In cases where users are working with large datasets, it may not be practical or desirable for all the results returned by `plmmm()` or `cv_plmm()` to be saved in a single '.rds' file. There is now an option in both of these model fitting functions called 'compact_save', which gives users the option to save the output in multiple, smaller '.rds' files.
 
-# plmmr 3.1.0 (2024-07-13)
+- **Argument removed:** Argument `std_needed` is no longer available in `plmm()` and `cv_plmm()` functions.
 
--   **Enhancement:** To make `plmmr` have better functionality for writing scripts, the functions `process_plink()`, `plmmm()`, and `cv_plmm()` now (optionally) write '.log' files, as in PLINK.
+# plmmr 3.0.0
+_2024-06-27_
 
--   **Enhancement:** In cases where users are working with large datasets, it may not be practical or desirable for all the results returned by `plmmm()` or `cv_plmm()` to be saved in a single '.rds' file. There is now an option in both of these model fitting functions called 'compact_save', which gives users the option to save the output in multiple, smaller '.rds' files.
+- **Bug fix:** Cross-validation implementation issues fixed. Previously, the full set of eigenvalues were used inside CV folds, which is not ideal as it involves information from outside the fold. Now, the entire modeling process is cross-validated: the standardization, the eigendecomposition of the relatedness matrix, the model fitting, and the backtransformation onto the original scale for prediction.
 
--   **Argument removed:** Argument `std_needed` is no longer available in `plmm()` and `cv_plmm()` functions.
+- **Computational speedup:** The standardization and rotation of filebacked data are now much faster; `bigalgebra` and `bigmemory` are now used for these computations.
 
-# plmmr 3.0.0 (2024-06-27)
+- **Internal:** On the standardized scale, the intercept of the PLMM is the mean of the outcome. This derivation considerably simplifies the handling of the intercept internally during model fitting.
 
--   **Bug fix:** Cross-validation implementation issues fixed. Previously, the full set of eigenvalues were used inside CV folds, which is not ideal as it involves information from outside the fold. Now, the entire modeling process is cross-validated: the standardization, the eigendecomposition of the relatedness matrix, the model fitting, and the backtransformation onto the original scale for prediction.
+# plmmr 2.2.1
+_2024-03-16_
 
--   **Computational speedup:** The standardization and rotation of filebacked data are now much faster; `bigalgebra` and `bigmemory` are now used for these computations.
-
--   **Internal:** On the standardized scale, the intercept of the PLMM is the mean of the outcome. This derivation considerably simplifies the handling of the intercept internally during model fitting.
-
-# plmmr 2.2.1 (2024-03-16)
-
--   **Name change:** Changed package name to `plmmr`; note that `plmm()`, `cv_plmm()`, and other functions starting with `plmm_` have not changed names.
+- **Name change:** Changed package name to `plmmr`; note that `plmm()`, `cv_plmm()`, and other functions starting with `plmm_` have not changed names.
diff --git a/R/zzz.R b/R/zzz.R
@@ -1,9 +1,5 @@
-.onAttach <- function(libname, pkgname) {
-  packageStartupMessage("This is ", pkgname, " ", utils::packageVersion(pkgname), ".\n")
-}
-
 # Define an environment to store the state
 .plmmr_env <- new.env()
 
 # Initialize the state
-.plmmr_env$warning_shown <- FALSE
+.plmmr_env$warning_shown <- FALSE
diff --git a/README.md b/README.md
@@ -1,5 +1,6 @@
 <!-- badges: start -->
 [![GitHub version](https://img.shields.io/static/v1?label=GitHub&message=4.1.1.0&color=blue&logo=github)](https://github.com/pbreheny/plmmr) 
+[![CRAN version](https://img.shields.io/cran/v/plmmr?logo=R)](https://cran.r-project.org/package=plmmr)
 [![R-CMD-check](https://github.com/pbreheny/plmmr/workflows/R-CMD-check/badge.svg)](https://github.com/pbreheny/plmmr/actions) 
 [![Codecov test coverage](https://codecov.io/gh/pbreheny/plmmr/branch/master/graph/badge.svg)](https://app.codecov.io/gh/pbreheny/plmmr?branch=master)
 <!-- badges: end -->

diff --git a/inst/CITATION b/inst/CITATION
@@ -9,7 +9,7 @@ bibentry(
   title = "plmmr: an R package to fit penalized linear mixed models for genome-wide association data with complex correlation structure",
   journal = "arXiv preprint",
   year = "2025",
-  doi = "https://doi.org/10.48550/arXiv.2502.01577")
+  doi = "10.48550/arXiv.2502.01577")
 
 bibentry(
   bibtype="Article",
@@ -21,5 +21,5 @@ bibentry(
   volume = "45",
   pages = "427--444",
   number = "5",
-  url = "https://doi.org/10.1002/gepi.22384")
+  doi = "10.1002/gepi.22384")