🏹 arcMS

arcMS can convert (HD)MS^E data acquired with Waters UNIFI to tabular format for use in R or Python, with a small filesize when saved on disk. It is compatible with data containing ion mobility (HDMS^E) or not (MS^E). Conversion of mzML files is also supported (see convert_mzml_to_parquet()).

Two output data file formats can be obtained:

the Apache Parquet format for minimal filesize and fast access.
the HDF5 format, with fast access but larger filesize.

arcMS stands for accessible, rapid and compact, and is also based on the french word arc, which means bow, to emphasize that it is compatible with the Apache Arrow library.

A companion app (R/Shiny app) is provided at https://github.com/leesulab/arcms-dataviz for fast visualization of the converted data (Parquet format) as 2D plots, TIC, BPI or EIC chromatograms…

Also, check the vignette("open-files") for details on how converted files can be opened in R or Python, and the full tutorial on how to query, filter, aggregate data (e.g. to obtain chromatograms or spectra).

⬇️ Installation

You can install arcMS in R with the following command:

install.packages("pak")
pak::pak("leesulab/arcMS")

To use the HDF5 format, the rhdf5 package needs to be installed:

pak::pak("rhdf5")

🚀 Usage

First load the package:

library("arcMS")

Then create connection parameters to the UNIFI API (retrieve token). See vignette("api-configuration") to know how to configure the API and register a client app.

con = create_connection_params(apihosturl = "http://localhost:50034/unifi/v1", identityurl = "http://localhost:50333/identity/connect/token")

If arcMS and the R session are run from another computer than where the UNIFI API is installed, replace localhost by the IP address of the UNIFI API.

con = create_connection_params(apihosturl = "http://192.0.2.0:50034/unifi/v1", identityurl = "http://192.0.2.0:50333/identity/connect/token")

Now these connection parameters will be used to access the UNIFI folders. The following function will show the list of folders and their IDs (e.g. abe9c297-821e-4152-854a-17c73c9ff68c in the example below).

folders = folders_search()
folders

#>                                     id       name               path folderType
#> 3 abe9c297-821e-4152-854a-17c73c9ff68c Christelle Company/Christelle    Project
#> 4 abe7a0e6-99d2-4e57-a618-f4b085f48443 EMMANUELLE Company/EMMANUELLE    Project
#>                               parentId
#> 3 7c3a0fc7-3805-4c14-ab68-8da3e115702e
#> 4 7c3a0fc7-3805-4c14-ab68-8da3e115702e

With a folder ID, we can access the list of Analysis items in the folder:

ana = analysis_search("abe9c297-821e-4152-854a-17c73c9ff68c")
ana

Finally, with an Analysis ID, we can get the list of samples (injections) acquired in this Analysis:

samples = get_samples_list("e236bf99-31cd-44ae-a4e7-74915697df65")
samples

Once we get a sample ID, we can use it to download the sample data, using the future framework for parallel processing:

library(future)
plan(multisession)
convert_one_sample_data(sample_id = "0134efbf-c75a-411b-842a-4f35e2b76347")

This command will get the sample name (sample_name) and its parent analysis (analysis_name), create a folder named analysis_name in the working directory and save the sample data with the name sample_name.parquet and its metadata with the name sample_name-metadata.json (metadata is also saved in the parquet file).

With an Analysis ID, we can convert and save all samples from the chosen Analysis:

convert_all_samples_data(analysis_id = "e236bf99-31cd-44ae-a4e7-74915697df65")

To use the HDF5 format instead of Parquet, the format argument can be used as below:

convert_one_sample_data(sample_id = "0134efbf-c75a-411b-842a-4f35e2b76347", format = "hdf5")

convert_all_samples_data(analysis_id = "e236bf99-31cd-44ae-a4e7-74915697df65", format = "hdf5")

This will save the samples data and metadata in the same file.h5 file.

Other functions are available to only collect the data from the API to an R object, and then to save this R object to a Parquet file (see vignette("collect-save-functions")). CCS values can also be retrieved in addition to bin index and drift time values, see vignette("get-ccs-values").

Parquet or HDF5 files can be opened easily in R with the arrow or rhdf5 packages. Parquet files contain both low and high energy spectra (HDMSe), and HDF5 files contain low energy in the “ms1” dataset, high energy in the “ms2” dataset, and metadata in the “metadata” dataset. The fromJSON function from jsonlite package will import the metadata json file (associated with the Parquet file) as a list of dataframes.

sampleparquet = arrow::read_parquet("sample.parquet")
metadataparquet = jsonlite::fromJSON("sample-metadata.json")

samplems1hdf5 = rhdf5::h5read("sample.h5", name = "ms1")
samplems2hdf5 = rhdf5::h5read("sample.h5", name = "ms2")
samplemetadatahdf5 = rhdf5::h5read("sample.h5", name = "samplemetadata")
spectrummetadatahdf5 = rhdf5::h5read("sample.h5", name = "spectrummetadata")

✨ Shiny App

A Shiny application is available to use the package easily. To run the app, just use the following command (it might need to install a few additional packages):

run_app()

📖 Citing

When using arcMS or referencing it in an academic article, please include the following citation:

Le Roux, J.; Sade, J. arcMS: Transformation of Multi-Dimensional High-Resolution Mass Spectrometry Data to Columnar Format for Compact Storage and Fast Access. Bioinformatics Advances 2024, 4 (1). https://doi.org/10.1093/bioadv/vbae160.

Name	Name	Last commit message	Last commit date
Latest commit julienleroux5 Finish 1.2.2b Jan 6, 2025 3f88411 · Jan 6, 2025 History 180 Commits
.github	.github	update doc for remote example parquet file	Aug 5, 2024
R	R	- fixed documentation	Jan 6, 2025
inst	inst	minor fix in shiny app UI	Dec 24, 2024
man	man	- fixed documentation	Jan 6, 2025
pkgdown/favicon	pkgdown/favicon	adding logo to documentation	Oct 17, 2024
tests	tests	parquetMS renamed to arcMS	Jan 15, 2024
vignettes	vignettes	fixing article link in distant-query vignette	Sep 23, 2024
.RData	.RData	explode data optimization	Jul 16, 2024
.Rbuildignore	.Rbuildignore	Finish Hotfix-logo	Oct 20, 2024
.gitignore	.gitignore	Finish Hotfix-logo	Oct 20, 2024
DESCRIPTION	DESCRIPTION	updated changelog and version number	Dec 23, 2024
LICENSE	LICENSE	parquetMS renamed to arcMS	Jan 15, 2024
LICENSE.md	LICENSE.md	parquetMS renamed to arcMS	Jan 15, 2024
NAMESPACE	NAMESPACE	- fixed documentation	Jan 6, 2025
NEWS.md	NEWS.md	updated changelog and version number	Dec 23, 2024
README.Rmd	README.Rmd	- fixed documentation	Jan 6, 2025
README.md	README.md	- fixed documentation	Jan 6, 2025
_pkgdown.yml	_pkgdown.yml	tutorial vignette moved as article to avoid check note when building	Sep 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

🏹 arcMS

⬇️ Installation

🚀 Usage

✨ Shiny App

📖 Citing

About

Licenses found

Releases

Packages

Contributors 2

Languages

License

leesulab/arcMS

Folders and files

Latest commit

History

Repository files navigation

🏹 arcMS

⬇️ Installation

🚀 Usage

✨ Shiny App

📖 Citing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages