Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cosine in Dash interface and JSON don't always match #199

Open
helenamrusso opened this issue May 11, 2024 · 3 comments
Open

Cosine in Dash interface and JSON don't always match #199

helenamrusso opened this issue May 11, 2024 · 3 comments

Comments

@helenamrusso
Copy link

I would like to report what I believe to be a bug in the metabolomics spectrum resolver. I’m using it to retrieve the cosine similarity of a list of USIs, which overall has been working well and is providing me what I need. However, today I noticed cases in which the JSON file does not show the correct cosine similarity.

Example:
Dash interface - cos 0.7962, and is indeed a good match: https://metabolomics-usi.gnps2.org/dashinterface/?usi1=mzspec:MSV000085142:vehicle_LI_C_Se[…]90&cosine=standard&fragment_mz_tolerance=0.1&grid=False

JSON - cos 0.01299: https://metabolomics-usi.gnps2.org/json/mirror/?usi1=mzspec:MSV000085142:vehicle_LI_C_Sept[…]nnotate_peaks=%5B%5B95.08549499511719%5D%2C%20%5B%5D%5D

I manually checked many, and overall these values match exactly. But with big lists, I’m wondering how many will be an example like this one.

@mwang87
Copy link
Owner

mwang87 commented May 11, 2024 via email

@helenamrusso
Copy link
Author

I did more investigation into this issue and I have some more information.

Please consider this USI as an example: mzspec:MSV000085142:vehicle_LI_C_Sept_m2:scan:137

in the web interface, the precmz is 188.1761
in the JSON file, the precmz is 709.1234

I checked this dataset in massive and filtered for the filename (https://massive.ucsd.edu/ProteoSAFe/dataset_files.jsp?task=a1375e1eca11456f9bed4b71c3f12f8d#%7B%22table_sort_history%22%3A%22main.collection_asc%22%2C%22main.file_descriptor_input%22%3A%22vehicle_LI_C_Sept_m2%22%7D), and there are two files with the same filename, but in different folders (one negative, and another one positive data).

I downloaded both files and inspected the 137 scan.
in positive mode: m/z 188.1761
in negative mode: m/z 709.1235

therefore, in this case, dash interface is showing positive data, JSON is showing negative data.

PS: as a background... I got this USI (mzspec:MSV000085142:vehicle_LI_C_Sept_m2:scan:137) out of fastMASST searches, and the fastMASST result is pointing to this USI as 188 precmz.

@bittremieux
Copy link
Collaborator

Thanks for the detailed investigation. This is an interesting edge case. The USI standard details how to distinguish multiple runs with the file name in a single dataset, using the subfolder mechanism in section 3.6.1.

So in this case, the unique USIs would be:

  • mzspec:MSV000085142:[pos-mzXML]vehicle_LI_C_Sept_m2:scan:137
  • mzspec:MSV000085142:[LI carnitine treatment_Yiming/neg-mzXML]vehicle_LI_C_Sept_m2:scan:137

However, our resolver doesn't seem to support this format, nor does the general MassIVE resolver. It does seem to return all matching files though.

So it seems like the solution must be two-fold:

  1. Proper resolving of USIs containing subfolders through MassIVE and our resolver.
  2. Proper reporting of unique USIs from MASST.

And maybe:
3. Give an error message if a non-unique USI is provided in the resolver?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants