-
Notifications
You must be signed in to change notification settings - Fork 0
MSP files management
IDSL.FSA was designed to manage .msp format mass spectrometry files with various structures with no pre-processing treatments. Thus, IDSL.FSA was designed to provide multiple easy to use modules to manage .msp files which a number of them are summarized below:
The msp2FSdb
module can generate organized Fragmentation Spectra DataBase (FSDB) libraries for data parsing using one or multiple .msp files for a comprehensive screening. Additionally, this module is able to deconvolute MSP blocks containing multiple PrecursorMZ values in a msp line (e.g. PrecursorMZ: 208.0615, 146.0611
for N-Benzoylserine in negative mode). The msp2FSdb
module was designed to be consistent with various .msp files structures particularly from NIST, GNPS, MoNA, IDSL.CSA libraries. The msp2FSdb
module generally can work for any .msp files as long as Num Peaks
rows are available in the .msp file.
msp2FSdb(path, MSPfile_vector = "", massIntegrationWindow = 0, allowedNominalMass = FALSE,
allowedWeightedSpectralEntropy = TRUE, noiseRemovalRatio = 0.01, number_processing_threads = 1)
path: address of .msp file
MSPfile_vector: a vector of .msp file names
massIntegrationWindow: Mass accuracy in Da
allowedNominalMass: c(TRUE
, FALSE
). Select TRUE
only for nominal mass analysis.
allowedWeightedSpectralEntropy: c(TRUE
, FALSE
). Weighted entropy to measure entropy similarity score.
noiseRemovalRatio: noise removal ratio relative to the basepeak to measure entropy similarity score (0-1)
number_processing_threads: number of parallel processing threads compatible with the Windows and Linux environments.
The msp2TrainingMatrix
can generate aligned match table using ions from individual MSP blocks.
msp2TrainingMatrix(path, MSPfile = "", minDetectionFreq = 100, selectedFSdbIDs = NULL, dimension = "wide",
massAccuracy = 0.01, allowedNominalMass = FALSE, allowedWeightedSpectralEntropy = TRUE,
noiseRemovalRatio = 0.01, number_processing_threads = 1)
path: Address of .msp file
MSPfile: A .msp file name or FSDB in .Rdata format
minDetectionFreq: The minimum frequency of detection for an ion across the entire MSP blocks
selectedFSdbIDs: selected MSP block/FSDB IDs to limit the screening to specific ion blocks
dimension: c("wide", "long"). wide or long alignment matrix output
massAccuracy: Mass accuracy (Da)
allowedNominalMass: c(TRUE
, FALSE
). Select TRUE
only for nominal mass analysis.
allowedWeightedSpectralEntropy: c(TRUE
, FALSE
). Weighted entropy to measure entropy similarity score.
noiseRemovalRatio: noise removal ratio relative to the basepeak to measure entropy similarity score (0-1)
number_processing_threads: number of parallel processing threads compatible with the Windows and Linux environments.
The mgf2msp
can convert Mascot generic format files (.mgf) into NIST mass spectra format (.msp). The mgf2msp
module is fast which requires <2 sec for .mgf files with ~5,000 fragmentation blocks on a single thread. The converted files are stored in the same directory with .msp extension.
mgf2msp(path, MGFfile = "")
path: Location of the original .msp file
MGFfile: Name of the mgf file with its extension
In many instances, .msp public libraries include both positive and negative fragmentation data in one .msp file. Thus, IDSL.FSA utilized a module, mspPosNegSplitter
, to separate positive and negative MSP blocks for a rapid and efficient annotation. This module is easy to use:
mspPosNegSplitter(path, MSPfile = "", number_processing_threads = 1)
path: Location of the original .msp file
MSPfile: Name of the .msp file with its extension
number_processing_threads: number of parallel processing threads compatible with the Windows and Linux environments
The separated MSP blocks are stored in the same directory with "_Pos" and "_Neg" suffixes.
The FSdb2precursorType
can detect potential ionization pathways for molecular formulas using a vector of InChIKey values from an FSDB. This module only searches for the first 14 InChIKey letters; and therefore, may result with multiple potential precursor types. This module returns a matrix of frequency for each InChIKey in the FSDB. The headers of the matrix columns represent precursor types.
FSdb2precursorType(InChIKeyVector, libFSdb, tableIndicator = "Frequency", number_processing_threads = 1)
InChIKeyVector: A vector of InChIKey values. This value may contain whole InChIKey strings or first 14 InChIKey letters.
libFSdb: A converted MSP library reference file using the msp2FSdb
module which is an FSDB produced by the IDSL.FSA package.
tableIndicator: c("Frequency", "PrecursorMZ"). To show frequency or a median of PrecursorMZ values in the output dataframe for each precursor type.
number_processing_threads: number of parallel processing threads compatible with the Windows and Linux environments
This FSA_msp2Cytoscape
module performs pairwise MSP block analysis to create Cytoscape networks files. This module is especially beneficial to find related peaks in an analysis.
FSA_msp2Cytoscape(path, MSPfile = "", mspVariableVector = NULL, mspNodeID = NULL,
massError = 0.01, RTtolerance = NA, minEntropySimilarity = 0.75, allowedNominalMass = FALSE,
allowedWeightedSpectralEntropy = TRUE, noiseRemovalRatio = 0.01, number_processing_threads = 1)
path: address of .msp file
MSPfile: A .msp file name or FSDB in .Rdata format
mspVariableVector: a vector of MSP variables
mspNodeID: MSP Node ID which is the ID that is required for the specsim
ID generation
massError: Mass accuracy in Da
RTtolerance: Retention time tolerance (min) to match MSP blocks. Select NA to ignore retention time match. This option is especially beneficial to find co-occurring compounds.
minEntropySimilarity: Minimum entropy similarity score
allowedNominalMass: c(TRUE
, FALSE
). Select TRUE
only for nominal mass analysis.
allowedWeightedSpectralEntropy: c(TRUE
, FALSE
). Weighted entropy to measure entropy similarity score.
noiseRemovalRatio: noise removal ratio relative to the basepeak to measure entropy similarity score (0-1)
number_processing_threads: number of parallel processing threads compatible with the Windows and Linux environments
This FSA_uniqueMSPblockTagger
module performs pairwise MSP blocks analysis to remove similar MSP blocks in an .msp file.
FSA_uniqueMSPblockTagger(path, MSPfile = "", aggregateBy = "Name", massError = 0.01,
RTtolerance = NA, minEntropySimilarity = 0.75, noiseRemovalRatio = 0.01,
allowedNominalMass = FALSE, allowedWeightedSpectralEntropy = TRUE, plotSpectra = FALSE,
number_processing_threads = 1)
path: address of .msp file
MSPfile: A .msp file name or FSDB in .Rdata format
aggregateBy: a variable to aggregate the MSP blocks based on
massError: Mass accuracy in Da
RTtolerance: Retention time tolerance (min) to match MSP blocks. Select NA to ignore retention time match. This option is especially beneficial to find co-occurring compounds.
minEntropySimilarity: Minimum entropy similarity score
noiseRemovalRatio: noise removal ratio relative to the basepeak to measure entropy similarity score (0-1)
allowedNominalMass: c(TRUE
, FALSE
). Select TRUE
only for nominal mass analysis.
allowedWeightedSpectralEntropy: c(TRUE
, FALSE
). Weighted entropy to measure entropy similarity score.
plotSpectra: c(TRUE
, FALSE
). Select TRUE
to plot similar spectra in individual folders
number_processing_threads: number of parallel processing threads compatible with the Windows and Linux environments.
This FSA_uniqueMSPblockTagger
module performs pairwise MSP blocks analysis to remove similar MSP blocks in an .msp file only using a retention time window.
FSA_uniqueMSPblockTaggerUntargeted(path, MSPfile_vector, minCSAdetectionFrequency = 20,
minEntropySimilarity = 0.75, massError = 0.01, massErrorPrecursor = 0.01,
RTtolerance = 0.1, noiseRemovalRatio = 0.01, allowedNominalMass = FALSE,
allowedWeightedSpectralEntropy = TRUE, plotSpectra = FALSE,
number_processing_threads = 1)
path: address of .msp file
MSPfile_vector: a vector of .msp file names
minCSAdetectionFrequency: minimum CSA detection frequency
minEntropySimilarity: Minimum entropy similarity score
massError: Mass accuracy in Da
massErrorPrecursor: Mass accuracy of precursor in Da
RTtolerance: Retention time tolerance (min) to match MSP blocks.
noiseRemovalRatio: noise removal ratio relative to the basepeak to measure entropy similarity score (0-1)
allowedNominalMass: c(TRUE
, FALSE
). Select TRUE
only for nominal mass analysis.
allowedWeightedSpectralEntropy: c(TRUE
, FALSE
). Weighted entropy to measure entropy similarity score.
plotSpectra: c(TRUE
, FALSE
). Select TRUE
to plot similar spectra in individual folders
number_processing_threads: number of parallel processing threads compatible with the Windows and Linux environments.