-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add chromPeakSummary function and a fix #772
base: devel
Are you sure you want to change the base?
Conversation
- Report also a chromatogram's m/z values if `findChromPeaks` is run on a `Chromatogram` or `Chromatograms` object (issue #765).
- Peak shape quality (similarity to gaussian shape) calculation is now performed using the EIC representation of a chromatographic peak, i.e. with intensities of mass peaks for the same retention time (but different m/z) summed.
Add internal function to calculate beta metrics
added chromPeakSummary method
Added peak quality section to xcms vignette.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great ! Also thanks for the small vignette ! The code looks good to me, thank @pablovgd, excited to incorporate it in the overall workflow and see how to integrate it with the other preprocessing steps to improve them.
PS: I think the GHA needs to be updated, i they are failing because of warnings.
R/XcmsExperiment-functions.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks for implementing this! I haven't checked it for bugs but the logic looks sound.
Thanks @wkumler for your review! I have now addressed all your suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks for the PR !
What I couldn't attach to a specific line, would it make sense to add the paper to https://github.com/sneumann/xcms/blob/devel/inst/CITATION ?
Yours, Steffen
#' @param skews A numeric vector of the skews to try, corresponding to the | ||
#' shape1 of dbeta with a shape2 of 5. Values less than 5 will be increasingly | ||
#' right-skewed, while values greater than 5 will be left-skewed. | ||
#' shape1 of dbeta with a shape2 of 5. Values less than 5 will be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @wkumler , I needed to read that a few times. So a value of 5 means symmetric ? Why is it not zero centered, so that abs()
tells you the skewedness (regardless in what direction) and you can <0
or >0
if you only want the direction ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, thank you for the review. I'm using the language inherent to the dbeta()
function where the values of 2-5 correspond to the "positive parameters" alpha and beta (or, in R, shape1 and shape2). I'm absolutely open to rescaling these and I agree that a positive/negative skew would be more intuitive. If we're open to rescaling and we have a parameter to pass now in chromPeakSummary
then we also may want to allow a wider variety of numbers - the values of 5 matched my intuition for peak shape corners but other folks may want more/less tail and more/less skew.
@@ -279,6 +279,7 @@ dropGenericProcessHistory <- function(x, fun) { | |||
valsPerSpect = valsPerSpect, rtrange = rtr, | |||
mzrange = mzr) | |||
if (length(mtx)) { | |||
## mtx: time, mz, intensity | |||
if (any(!is.na(mtx[, 3]))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually recommend avoiding hardcoded column numbers, imagine someone came up with (time, ccs, mz, intensity)
. Might need that mtx is created with named columns somewhere above. Is that the only occurance of a hardcoded column number in the PR ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree - generally. Here, we load the mtx
matrix
with a function that is supposed to return a 3 column matrix with the columns exactly in the expected order. Thus, IMHO for this case (as long as there is no user input involved) it's save to use access-by-index. Also, because here we're calling this in a loop thousands of times, the way how we subset could affect the performance.
I did some timings on that:
mtx <- cbind(time = 1:5, mz = 1:5, intensity = c(2342.2, 123.1, 231.1, 23.1, 123.23))
int_col <- 3L
library(microbenchmark)
> microbenchmark(mtx[, 3L], mtx[, "intensity"], mtx[, int_col])
Unit: nanoseconds
expr min lq mean median uq max neval cld
mtx[, 3L] 354 363.5 437.30 380.5 477.5 948 100 a
mtx[, "intensity"] 389 404.0 543.07 412.0 474.0 5036 100 a
mtx[, int_col] 381 399.5 469.45 407.0 484.0 1135 100 a
((rtr[2] - rtr[1]) / | ||
max(1, (sum(rtim >= rtr[1] & rtim <= rtr[2]) - 1))) | ||
maxi <- which.max(mtx[, 3]) | ||
maxi <- which.max(mtx[, 3L]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, above was not the only hardcoded 3
:-) and more below ...
Regarding the CITATIONS - I added the reference to the respective man page. I think adding that as a CITATION for xcms itself might be bit too much, as the paper deals more with peak shape quality, independently of xcms? |
For now I only bumped the version and fixed the NEWS.md, but I did not address the other points above - can we maybe discuss them again @sneumann ? |
This PR adds the
chromPeakSummary()
method and a first implementation to calculate the peak shape quality from @wkumler on chrom peak results (thanks to @pablovgd for contributing).In addition, it fixes a bug in the calculation of the beta scores during gap filling.