Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MapRTTransformer does not change RT according to trafoXML #276

Closed
jonasscheid opened this issue Jul 21, 2023 · 6 comments
Closed

MapRTTransformer does not change RT according to trafoXML #276

jonasscheid opened this issue Jul 21, 2023 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@jonasscheid
Copy link
Collaborator

jonasscheid commented Jul 21, 2023

Description of the bug

In the map_alignment step the MapAlignerIdentification computes a trafoXML for each respective run/replicate based on a set of replicate idXMLs. However, the MapRTTransformer inserts values not corresponding to the actual information in the trafoXML. If I only pick one idXML from Comet and do MapAlignerIdentification -> MapRTTransformer I get the following:

Example for a peptide AAAAAAQSVY that is only found once in a run:
Before MapAligner: (from idXML)

<PeptideIdentification score_type="q-value" higher_score_better="false" significance_threshold="0.0" MZ="461.735076031900007" RT="2015.799999999999955" spectrum_reference="controllerType=0 controllerNumber=1 scan=6631" >
            <PeptideHit score="3.315649867374006e-03" sequence="AAAAAAQSVY" charge="2" aa_before="A" aa_after="A" start="7" end="16" protein_refs="PH_1213" >

After MapAligner (trafoXML):
<Pair from="2015.8" to="2029.05" note="AAAAAAQSVY"/>

After MapRTTransformer:

<PeptideIdentification score_type="q-value" higher_score_better="false" significance_threshold="0.0" MZ="461.735076031900007" RT="2018.374633876312828" spectrum_reference="controllerType=0 controllerNumber=1 scan=6631" >
            <PeptideHit score="3.315649867374006e-03" sequence="AAAAAAQSVY" charge="2" aa_before="A" aa_after="A" start="7" end="16" protein_refs="PH_1213" >

Example with a peptide ATYPYQVVR found 3 times in a replicate:
Before MapAligner: (from idXML)

<PeptideIdentification score_type="q-value" higher_score_better="false" significance_threshold="0.0" MZ="548.794127531900017" RT="2471.400000000000091" spectrum_reference="controllerType=0 controllerNumber=1 scan=9033" >
            <PeptideHit score="7.333682556312206e-03" sequence="ATYPYQVVR" charge="2" aa_before="A" aa_after="A" start="240" end="248" protein_refs="PH_217" >
        <PeptideIdentification score_type="q-value" higher_score_better="false" significance_threshold="0.0" MZ="548.793700531899958" RT="2479.400000000000091" spectrum_reference="controllerType=0 controllerNumber=1 scan=9076" >
            <PeptideHit score="4.211208292840946e-03" sequence="ATYPYQVVR" charge="2" aa_before="A" aa_after="A" start="240" end="248" protein_refs="PH_217" >
       <PeptideIdentification score_type="q-value" higher_score_better="false" significance_threshold="0.0" MZ="548.793395031899991" RT="2487.5" spectrum_reference="controllerType=0 controllerNumber=1 scan=9118" >
            <PeptideHit score="5.747126436781609e-03" sequence="ATYPYQVVR" charge="2" aa_before="A" aa_after="A" start="240" end="248" protein_refs="PH_217" >

After MapAligner (trafoXML):
<Pair from="2479.4" to="2487.925" note="ATYPYQVVR"/>

After MapRTTransformer:

        <PeptideIdentification score_type="q-value" higher_score_better="false" significance_threshold="0.0" MZ="548.794127531900017" RT="2474.90907977266761" spectrum_reference="controllerType=0 controllerNumber=1 scan=9033" >
            <PeptideHit score="7.333682556312206e-03" sequence="ATYPYQVVR" charge="2" aa_before="A" aa_after="A" start="240" end="248" protein_refs="PH_217" >
        <PeptideIdentification score_type="q-value" higher_score_better="false" significance_threshold="0.0" MZ="548.793700531899958" RT="2482.925487953464199" spectrum_reference="controllerType=0 controllerNumber=1 scan=9076" >
            <PeptideHit score="4.211208292840946e-03" sequence="ATYPYQVVR" charge="2" aa_before="A" aa_after="A" start="240" end="248" protein_refs="PH_217" >
        <PeptideIdentification score_type="q-value" higher_score_better="false" significance_threshold="0.0" MZ="548.793395031899991" RT="2491.042101236520466" spectrum_reference="controllerType=0 controllerNumber=1 scan=9118" >
            <PeptideHit score="5.747126436781609e-03" sequence="ATYPYQVVR" charge="2" aa_before="A" aa_after="A" start="240" end="248" protein_refs="PH_217" >

There seems to be something wrong with mapping the transformed RTs from the trafoXML to idXML. Needs to be investigated further in the respective OpenMS implementation

Edit: Transforming RT of an mzml has the same issue

Command used and terminal output

nextflow run nf-core/mhcquant -r dev -profile test_full,docker --outdir test_quant

Relevant files

test_quant_issue.zip

System information

No response

@jonasscheid jonasscheid added the bug Something isn't working label Jul 21, 2023
@jonasscheid
Copy link
Collaborator Author

Unfortunately, the to=xx value in the trafoXML cannot be found in the transformed file. That means there is not even a false mapping of RT values happening 🤔.

@timosachsenberg bug in OpenMS?

@jpfeuffer
Copy link

Doesn't the Transformer fit a curve to the found points? If that is the case, it is to be expected that the regression does not necessarily go through all points.
Do you have the full trafoXML somewhere? Is there maybe a description of the curve/line?

@jpfeuffer
Copy link

Yes, I am pretty sure the points are just anchors. The real transformation happens through intercept and slope in the beginning of the file.

@jonasscheid
Copy link
Collaborator Author

jonasscheid commented Jul 25, 2023

Ah that makes sense! Indeed if I use slope and intercept with the from value I end up with the correct transformation. Ok, then everything is as it should be 🙌🏼 Thanks a lot @jpfeuffer 🙏🏼

A quick follow-up question: Would you recommend b_spline transformation over linear transformation?
Doc of MapAlignerIdentification says

This algorithm has been tested mostly with the "b_spline" model

@jpfeuffer
Copy link

Not necessarily. A lot of MS programs these days just use a linear transformation AFAIK and it turns out OK. On the other hand, splines wont be a bottleneck in runtime and might yield more overlaps (hopefully without overfitting).

I don't think there is a lot that can go wrong with the algorithm itself so the warning might be overly cautious. Especially if you evaluate the results carefully.

@jonasscheid
Copy link
Collaborator Author

Alright! That is very good to know. Thanks again 🙏🏼

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants