Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add chromPeakSummary function and a fix #772

Open
wants to merge 25 commits into
base: devel
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
2c57df3
feat: report m/z values in chromPeaks matrix for XChromatograms
jorainer Sep 16, 2024
dacd665
fix: peak shape quality calculation for gap filling
jorainer Sep 18, 2024
188541c
feat: add chromPeakSummary generic (issue #705)
jorainer Sep 18, 2024
e5aa3ff
docs: add documentation for chromPeakSummary generic
jorainer Sep 18, 2024
b59b15e
Add internal function to calculate beta metrics
pablovgd Sep 18, 2024
c017cba
Fixed requested changes for PR #767
pablovgd Sep 19, 2024
fe7509c
Fix for PR #767
pablovgd Sep 19, 2024
670de04
Merge pull request #767 from pablovgd/issue705
jorainer Sep 19, 2024
c581cde
added chromPeakSummary method
pablovgd Sep 20, 2024
f9d4b0d
requested changes for PR #768
pablovgd Sep 20, 2024
e7ee120
Merge pull request #768 from pablovgd/issue705
jorainer Sep 23, 2024
dd68e2f
Added section on peak quality to vignette.
pablovgd Sep 23, 2024
a49539e
Fixed typos in peak quality vignette.
pablovgd Sep 23, 2024
13cfaf3
Merge pull request #770 from pablovgd/issue705
jorainer Sep 24, 2024
156788a
refactor: little fixes
jorainer Sep 24, 2024
fd9cf65
Address William's comments
jorainer Oct 1, 2024
39cffa7
Merge branch 'devel' into jomain
jorainer Oct 8, 2024
a3bc33b
bump version
jorainer Oct 8, 2024
dc9ae94
Update xcms.Rmd
philouail Oct 28, 2024
ac3e209
Update xcms.Rmd
philouail Oct 28, 2024
23dd061
Update xcms.Rmd
philouail Oct 28, 2024
9f1b7b6
Merge pull request #776 from sneumann/phili
jorainer Oct 29, 2024
93d8b2d
Merge branch 'devel' into jomain
jorainer Oct 30, 2024
97b781e
Merge remote-tracking branch 'origin/jomain' into jomain
jorainer Oct 30, 2024
46ca5ae
Update NEWS.md
jorainer Oct 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Package: xcms
Version: 4.3.3
Version: 4.3.4
Title: LC-MS and GC-MS Data Analysis
Description: Framework for processing and visualization of chromatographically
separated and single-spectra mass spectral data. Imports from AIA/ANDI NetCDF,
Expand Down
6 changes: 4 additions & 2 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -461,7 +461,8 @@ export("CentWaveParam",
"CleanPeaksParam",
"MergeNeighboringPeaksParam",
"FilterIntensityParam",
"ChromPeakAreaParam")
"ChromPeakAreaParam",
"BetaDistributionParam")
## Param class methods.

## New Classes
Expand Down Expand Up @@ -530,7 +531,8 @@ exportMethods("hasChromPeaks",
"featureSpectra",
"chromPeakSpectra",
"chromPeakChromatograms",
"featureChromatograms"
"featureChromatograms",
"chromPeakSummary"
)

## feature grouping functions and methods.
Expand Down
12 changes: 12 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,17 @@
# xcms 4.3

## Changes in version 4.3.4

- Address issue #765: peak detection on chromatographic data: report a
chromatogram's `"mz"`, `"mzmin"` and `"mzmax"` as the mean m/z and lower and
upper m/z in the `chromPeaks()` matrix.
- Fix calculation of the correlation coefficient for peak shape similarity with
an idealized bell shape (*beta*) during gap filling for centWave-based
chromatographic peak detection with parameter `verboseBetaColumns = TRUE`.
- Add `chromPeakSummary` generic (issue #705).
- Add `chromPeakSummary()` method to calculate the *beta* quality metrics.


## Changes in version 4.3.3

- Fix issue #755: `chromatogram()` with `msLevel = 2` fails to extract
Expand Down
58 changes: 58 additions & 0 deletions R/AllGenerics.R
Original file line number Diff line number Diff line change
Expand Up @@ -535,6 +535,64 @@ setGeneric("chromPeakData<-", function(object, value)
setGeneric("chromPeakSpectra", function(object, ...)
standardGeneric("chromPeakSpectra"))

#' @title Chromatographic peak summaries
#'
#' @name chromPeakSummary
#'
#' @description
#'
#' The `chromPeakSummary()` method calculates summary statistic or other
jorainer marked this conversation as resolved.
Show resolved Hide resolved
#' metrics for each of the identified chromatographic peaks in an *xcms*
#' result object, such as the [XcmsExperiment()]. Different metrics can be
#' calculated, depending (and configured) using dedicated *parameter* classes.
jorainer marked this conversation as resolved.
Show resolved Hide resolved
#' As a result, the method returns a `matrix` or `data.frame` with one row
#' per chromatographic peak. Each column contains calculated values, depending
#' on the used method/parameter class.
#'
#' Currently implemented methods/parameter classes are:
#'
#' - `BetaDistributionParam`: calculates the *beta_cor* and *beta_snr* quality
#' metrics as described in (Kumler 2023) representing the result from a
jorainer marked this conversation as resolved.
Show resolved Hide resolved
#' (correlation) test of similarity to a bell curve and the signal-to-noise
jorainer marked this conversation as resolved.
Show resolved Hide resolved
#' ratio calculated on the residulas of this test.
jorainer marked this conversation as resolved.
Show resolved Hide resolved
#'
#' @param BPPARAM Parallel processing setup. See [bpparam()] for details.
#'
#' @param chunkSize `integer(1)` defining the number of samples from which data
#' should be loaded and processed at a time.
#'
#' @param msLevel `integer(1)` with the MS level of the chromatographic peaks
#' on which the metric should be calculated.
#'
#' @param object an *xcms* result object containing information on
#' identified chromatographic peaks.
#'
#' @param param a parameter object defining the method/summaries that should
#' be calculated (see description above for supported parameter classes).
#'
#' @param ... additional arguments passed to the method implementation.
#'
#' @return
#'
#' A `matrix` or `data.frame` with the same number of rows as there are
#' chromatographic peaks. Columns contain the calculated values. The number of
#' columns, their names and content depend on the used parameter object. See
#' the respective documentation above for more details.
#'
#' @author Pablo Vangeenderhuysen, Johannes Rainer
#'
#' @md
#'
#' @references
#'
#' Kumler W, Hazelton B J and Ingalls A E (2023) "Picky with peakpicking:
#' assessing chromatographic peak quality with simple metrics in metabolomics"
#' *BMC Bioinformatics* 24(1):404. doi: 10.1186/s12859-023-05533-4
#'
#' @export
setGeneric("chromPeakSummary", function(object, param, ...)
standardGeneric("chromPeakSummary"))

setGeneric("collect", function(object, ...) standardGeneric("collect"))
setGeneric("consecMissedLimit", function(object, ...)
standardGeneric("consecMissedLimit"))
Expand Down
5 changes: 5 additions & 0 deletions R/DataClasses.R
Original file line number Diff line number Diff line change
Expand Up @@ -2182,3 +2182,8 @@ setClass("FilterIntensityParam",
msg
else TRUE
})

setClass("BetaDistributionParam",
contains = "Param"
)

46 changes: 43 additions & 3 deletions R/XcmsExperiment-functions.R
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks for implementing this! I haven't checked it for bugs but the logic looks sound.

Original file line number Diff line number Diff line change
Expand Up @@ -520,8 +520,9 @@
## consider adding 0 or NA intensity for those.
mat <- do.call(rbind, xsub)
if (nrow(mat)) {
nr <- vapply(xsub, nrow, NA_integer_)
## can have 0, 1 or x values per rt; repeat rt accordingly
rts <- rep(rt[keep], vapply(xsub, nrow, integer(1L)))
rts <- rep(rt[keep], nr)
maxi <- which.max(mat[, 2L])[1L]
mmz <- do.call(mzCenterFun, list(mat[, 1L], mat[, 2L]))
if (is.na(mmz)) mmz <- mat[maxi, 1L]
Expand All @@ -530,15 +531,54 @@
sum(mat[, 2L], na.rm = TRUE) *
((rtr[2L] - rtr[1L]) / max(1L, (length(keep) - 1L)))
)
if ("beta_cor" %in% cn)
if ("beta_cor" %in% cn) {
res[i, c("beta_cor", "beta_snr")] <- .get_beta_values(
mat[, 2L], rts)
vapply(xsub[nr > 0], function(z) sum(z[, "intensity"]),
NA_real_),
rt[keep][nr > 0])
}
}
}
}
res[!is.na(res[, "maxo"]), , drop = FALSE]
}


#' Calculates quality metrics for a chromatographic peak.
#'
#' @param x `list` of peak matrices (from a single MS level and from a single
#' file/sample).
#'
#' @param rt retention time for each peak matrix.
#'
#' @param peakArea `matrix` defining the chrom peak area.
#'
#' @author Pablo Vangeenderhuysen
#'
#' @noRd
.chrom_peak_beta_metrics <- function(x, rt, peakArea, ...) {
res <- matrix(NA_real_, ncol = 2L, nrow = nrow(peakArea))
rownames(res) <- rownames(peakArea)
colnames(res) <- c("beta_cor","beta_snr")
for (i in seq_len(nrow(res))) {
rtr <- peakArea[i, c("rtmin", "rtmax")]
keep <- which(between(rt, rtr))
if (length(keep)) {
xsub <- lapply(x[keep], .pmat_filter_mz,
mzr = peakArea[i, c("mzmin", "mzmax")])
nr <- vapply(xsub, nrow, NA_integer_)
res[i, c("beta_cor", "beta_snr")] <- .get_beta_values(
vapply(xsub[nr > 0], function(z) sum(z[, "intensity"]), NA_real_),
rt[keep][nr > 0])
}
}
res
}





#' Difference to the original code is that the weighted mean is also calculated
#' if some of the peak intensities in the profile matrix are 0
#'
Expand Down
36 changes: 36 additions & 0 deletions R/XcmsExperiment.R
Original file line number Diff line number Diff line change
Expand Up @@ -2013,3 +2013,39 @@ setMethod(
object[i = sort(unique(file)), keepAdjustedRtime = keepAdjustedRtime,
keepFeatures = keepFeatures, ...]
})

#' @rdname chromPeakSummary
setMethod(
"chromPeakSummary",
signature(object = "XcmsExperiment", param = "BetaDistributionParam"),
function(object, param, msLevel = 1L, chunkSize = 2L, BPPARAM = bpparam()) {
if (length(msLevel) != 1)
stop("Can only perform peak metrics for one MS level at a time.")
if (!hasChromPeaks(object, msLevel = msLevel))
stop("No ChromPeaks definitions for MS level ", msLevel, " present.")
## Define region to calculate metrics from for each file
cp <- chromPeaks(object, msLevel = msLevel)
f <- factor(cp[,"sample"], seq_along(object))
pal <- split.data.frame(cp[, c("mzmin", "mzmax", "rtmin", "rtmax")], f)
names(pal) <- seq_along(pal)
## Manual chunk processi ng because we have to split `object` and `pal`
idx <- seq_along(object)
chunks <- split(idx, ceiling(idx / chunkSize))
pb <- progress_bar$new(format = paste0("[:bar] :current/:",
"total (:percent) in ",
":elapsed"),
total = length(chunks) + 1L, clear = FALSE)
pb$tick(0)
# mzf <- "wMean"
res <- lapply(chunks, function(z, ...) {
pb$tick()
.xmse_integrate_chrom_peaks(
.subset_xcms_experiment(object, i = z, keepAdjustedRtime = TRUE,
ignoreHistory = TRUE),
pal = pal[z], intFun = .chrom_peak_beta_metrics,
msLevel = msLevel, BPPARAM = BPPARAM)
})
res <- do.call(rbind, res)
pb$tick()
res
})
29 changes: 18 additions & 11 deletions R/do_findChromPeaks-functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -3720,14 +3720,20 @@ peaksWithCentWave <- function(int, rt,
#' requires at least 5 scans or it will return NA for both parameters.
#'
#' @param intensity A numeric vector corresponding to the peak intensities
#'
#' @param rtime A numeric vector corresponding to the retention times of each
#' intensity. If not provided, intensities will be assumed to be equally spaced.
#' intensity. Retention times are expected to be in increasing order
#' without duplicates. If not provided, intensities will be assumed to be
#' equally spaced.
#'
#' @param skews A numeric vector of the skews to try, corresponding to the
#' shape1 of dbeta with a shape2 of 5. Values less than 5 will be increasingly
#' right-skewed, while values greater than 5 will be left-skewed.
#' shape1 of dbeta with a shape2 of 5. Values less than 5 will be
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @wkumler , I needed to read that a few times. So a value of 5 means symmetric ? Why is it not zero centered, so that abs() tells you the skewedness (regardless in what direction) and you can <0 or >0 if you only want the direction ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, thank you for the review. I'm using the language inherent to the dbeta() function where the values of 2-5 correspond to the "positive parameters" alpha and beta (or, in R, shape1 and shape2). I'm absolutely open to rescaling these and I agree that a positive/negative skew would be more intuitive. If we're open to rescaling and we have a parameter to pass now in chromPeakSummary then we also may want to allow a wider variety of numbers - the values of 5 matched my intuition for peak shape corners but other folks may want more/less tail and more/less skew.

#' increasingly right-skewed, while values greater than 5 will be
#' left-skewed.
#'
#' @param zero.rm Boolean value controlling whether "missing" scans are dropped
#' prior to curve fitting. The default, TRUE, will remove intensities of zero
#' or NA
#' prior to curve fitting. The default, TRUE, will remove intensities of
#' zero or NA
#'
#' @author William Kumler
#'
Expand All @@ -3740,21 +3746,22 @@ peaksWithCentWave <- function(int, rt,
rtime <- rtime[keep]
intensity <- intensity[keep]
}
if(length(intensity)<5){
if (length(intensity) < 5) {
best_cor <- NA
beta_snr <- NA
} else {
beta_sequence <- rep(.scale_zero_one(rtime), each=length(skews))
beta_sequence <- rep(.scale_zero_one(rtime), each = length(skews))
beta_vals <- t(matrix(dbeta(beta_sequence, shape1 = skews, shape2 = 5),
nrow = length(skews)))
# matplot(beta_vals)
beta_cors <- cor(intensity, beta_vals)
best_cor <- max(beta_cors)
best_curve <- beta_vals[, which.max(beta_cors)]
noise_level <- sd(diff(.scale_zero_one(best_curve)-.scale_zero_one(intensity)))
beta_snr <- log10(max(intensity)/noise_level)
noise_level <- sd(diff(.scale_zero_one(best_curve) -
.scale_zero_one(intensity)))
beta_snr <- log10(max(intensity) / noise_level)
}
c(best_cor=best_cor, beta_snr=beta_snr)
c(best_cor = best_cor, beta_snr = beta_snr)
}


Expand All @@ -3769,5 +3776,5 @@ peaksWithCentWave <- function(int, rt,
#'
#' @noRd
.scale_zero_one <- function(num_vec){
(num_vec-min(num_vec))/(max(num_vec)-min(num_vec))
(num_vec-min(num_vec)) / (max(num_vec) - min(num_vec))
}
5 changes: 5 additions & 0 deletions R/functions-Params.R
Original file line number Diff line number Diff line change
Expand Up @@ -397,3 +397,8 @@ FilterIntensityParam <- function(threshold = 0, nValues = 1L, value = "maxo") {
new("FilterIntensityParam", threshold = as.numeric(threshold),
nValues = as.integer(nValues), value = value)
}

#' @rdname chromPeakSummary
BetaDistributionParam <- function() {
new("BetaDistributionParam")
}
12 changes: 6 additions & 6 deletions R/functions-XCMSnExp.R
Original file line number Diff line number Diff line change
Expand Up @@ -279,6 +279,7 @@ dropGenericProcessHistory <- function(x, fun) {
valsPerSpect = valsPerSpect, rtrange = rtr,
mzrange = mzr)
if (length(mtx)) {
## mtx: time, mz, intensity
if (any(!is.na(mtx[, 3]))) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually recommend avoiding hardcoded column numbers, imagine someone came up with (time, ccs, mz, intensity). Might need that mtx is created with named columns somewhere above. Is that the only occurance of a hardcoded column number in the PR ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree - generally. Here, we load the mtx matrix with a function that is supposed to return a 3 column matrix with the columns exactly in the expected order. Thus, IMHO for this case (as long as there is no user input involved) it's save to use access-by-index. Also, because here we're calling this in a loop thousands of times, the way how we subset could affect the performance.

I did some timings on that:

mtx <- cbind(time = 1:5, mz = 1:5, intensity = c(2342.2, 123.1, 231.1, 23.1, 123.23))
int_col <- 3L

library(microbenchmark)
> microbenchmark(mtx[, 3L], mtx[, "intensity"], mtx[, int_col])
Unit: nanoseconds
               expr min    lq   mean median    uq  max neval cld
          mtx[, 3L] 354 363.5 437.30  380.5 477.5  948   100   a
 mtx[, "intensity"] 389 404.0 543.07  412.0 474.0 5036   100   a
     mtx[, int_col] 381 399.5 469.45  407.0 484.0 1135   100   a

## How to calculate the area: (1)sum of all intensities / (2)by
## the number of data points (REAL ones, considering also NAs)
Expand All @@ -290,21 +291,20 @@ dropGenericProcessHistory <- function(x, fun) {
## as e.g. centWave. Using max(1, ... to avoid getting Inf in
## case the signal is based on a single data point.
## (3) rtr[2] - rtr[1]
res[i, "into"] <- sum(mtx[, 3], na.rm = TRUE) *
res[i, "into"] <- sum(mtx[, 3L], na.rm = TRUE) *
((rtr[2] - rtr[1]) /
max(1, (sum(rtim >= rtr[1] & rtim <= rtr[2]) - 1)))
maxi <- which.max(mtx[, 3])
maxi <- which.max(mtx[, 3L])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, above was not the only hardcoded 3 :-) and more below ...

res[i, c("rt", "maxo")] <- mtx[maxi[1], c(1, 3)]
res[i, c("rtmin", "rtmax")] <- rtr
## Calculate the intensity weighted mean mz
meanMz <- do.call(mzCenterFun, list(mtx[, 2], mtx[, 3]))
if (is.na(meanMz)) meanMz <- mtx[maxi[1], 2]
res[i, "mz"] <- meanMz

if ("beta_cor" %in% cn) {
if ("beta_cor" %in% cn)
res[i, c("beta_cor", "beta_snr")] <- .get_beta_values(
mtx[, 3L], mtx[, 1L])
}
vapply(split(mtx[, 3L], mtx[, 1L]), sum, NA_real_),
unique(mtx[, 1L]))
} else {
res[i, ] <- rep(NA_real_, ncols)
}
Expand Down
24 changes: 18 additions & 6 deletions R/methods-Chromatogram.R
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,10 @@
#' @return
#'
#' If called on a `Chromatogram` object, the method returns an [XChromatogram]
#' object with the identified peaks. See [peaksWithCentWave()] for details on
#' the peak matrix content.
#' object with the identified peaks. Columns `"mz"`, `"mzmin"` and `"mzmax"` in
#' the `chromPeaks()` peak matrix provide the mean m/z and the maximum and
#' minimum m/z value of the `Chromatogram` object. See [peaksWithCentWave()]
#' for details on the remaining columns.
#'
#' @seealso [peaksWithCentWave()] for the downstream function and [centWave]
#' for details on the method.
Expand Down Expand Up @@ -70,10 +72,18 @@ setMethod("findChromPeaks", signature(object = "Chromatogram",
rt = rtime(object)),
as(param, "list")))
object <- as(object, "XChromatogram")
chromPeaks(object) <- res
chromPeaks(object) <- .add_mz(res, object@mz)
object
})

.add_mz <- function(x, mz = c(NA_real_, NA_real_)) {
nx <- nrow(x)
tmp <- cbind(mz = rep(mean(mz), nx),
mzmin = rep(mz[1L], nx),
mzmax = rep(mz[2L], nx))
cbind(tmp, x)
}

#' @title matchedFilter-based peak detection in purely chromatographic data
#'
#' @description
Expand All @@ -97,8 +107,10 @@ setMethod("findChromPeaks", signature(object = "Chromatogram",
#' @return
#'
#' If called on a `Chromatogram` object, the method returns a `matrix` with
#' the identified peaks. See [peaksWithMatchedFilter()] for details on the
#' matrix content.
#' the identified peaks. Columns `"mz"`, `"mzmin"` and `"mzmax"` in
#' the `chromPeaks()` peak matrix provide the mean m/z and the maximum and
#' minimum m/z value of the `Chromatogram` object. See
#' [peaksWithMatchedFilter()] for details on the remaining columns.
#'
#' @seealso [peaksWithMatchedFilter()] for the downstream function and
#' [matchedFilter] for details on the method.
Expand Down Expand Up @@ -134,7 +146,7 @@ setMethod("findChromPeaks", signature(object = "Chromatogram",
rt = rtime(object)),
as(param, "list")))
object <- as(object, "XChromatogram")
chromPeaks(object) <- res
chromPeaks(object) <- .add_mz(res, object@mz)
object
})

Expand Down
Loading
Loading