Releases · iquasere/MOSCA

30 Jan 09:36

iquasere

2.3.0

837336b

Merging of paired reads when no assembly is performed Latest

Latest

MOSCA was calling genes directly from the preprocessed reads.
Now, it merges paired-end reads first, and then calls the genes on those reads.
When gene calling, MOSCA still considers the data as reads (-complete=0), not complete genomes (-complete=1).

Update on `sortmerna` functions

SortMeRNA databases have been updated, and are now provided as a tar file multiple database files. Each of these databases can be used separately for a specific type of search. MOSCA now provides the sortmerna_database parameter, which sets which database will be used:

if fast, MOSCA will use the smr_v4.3_fast_db.fasta database.
if default, MOSCA will use the smr_v4.3_ default_db.fasta database.
if sensitive, MOSCA will use the smr_v4.3_sensitive_db.fasta database.
if sensitive_with_rfam, MOSCA will use the smr_v4.3_sensitive_db_rfam_seeds.fasta database.

Only one database file can be used at a time.

`minimum_read_length` parameter split for MG and MT

Now, minimum length of reads for further analysis can be set with the minimum_mg_read_length and minimum_mt_read_length parameters.

Added minimum_envs folder and contents

For commands and resources to update envs when needed

Also, some fixes

Converting readcounts (for MG and MT) to int was turning them all to zeros (because they are normalized). MOSCA now keeps them as float.
Blocked the print of MOSCA's TXT logo. Don't know why it doesn't work on the tests.
Fix on Summary Report, now rows have information for both "Name" and "Sample" levels (before, there were rows for "Name" and rows for "Sample").
Another fix on Summary Report, counting annotated genes was not done properly.
When not performing assembly, General Report was not importing correctly the readcounts. Now, it does.

Assets 2

15 Jan 10:32

iquasere

2.2.1

a9dd6b2

Added default parameters JSON

I hadn't updated MOSCA's recipe in Bioconda to include the new default_config.json file. This release has no code updates, but serves to include the file in MOSCA's recipe.

Assets 2

04 Jan 10:31

iquasere

2.2.0

a9dd6b2

Default parameters, input sanitization and final reports updates

MOSCA now has default parameters

These default parameters are set by the default_config.json file.

Input quality checking

Implemented checking of invalid names in experiments - names can't start with number, even a float (e.g., 5AA or .5Name).

Updates on final reports

Renamed Protein report to General report.
New report - Expression. This report includes only genes expressed.
Technical report was renamed to Versions. It is also exported as EXCEL now, because it brings information on every environment.

Implemented minimum value imputation

For MP analysis, but it's still not an option to use. For now, is a feature in preparation.

No more build_deps in Dockerfile

It's no longer needed, conda handles it all.

Dependencies update

Fixed snakemake version to <8 - some of its new functionality is incompatible with MOSCA implementation.
Added pandas as dependency - mosca.py now has functions that require it.
Updated to newest versions of UPIMAPI, reCOGnizer and KEGGCharter - allowed to remove the parameters related to database download.

Blocked MGMT test

Because GitHub actions doesn't provide enough disk space for it.

Also, several fixes

Fix on DE handling multiple samples
Fix on KEGGCharter handling multiple samples
Fix on multi_sheet excel handling multiple samples and numbering
Fix on converting RAW spectra to MGF outside a container environment
MOSCA now prints snakemake command properly
Fix on adding normalized matrices to entry report
Several fixes on summary report
Necessary reparations on EC numbers and KEGG IDs, as those come from UPIMAPI in non-compatible format for KEGGCharter
Fix on inputting mods to generate_parameters_file function

Assets 2

28 Apr 14:39

iquasere

2.1.0

dfe4714

Reintroduction of MOSCA into Bioconda

Since MOSCA 1.3.6, the list of dependencies of the pipeline has become too complex for conda to manage.
This release makes use of snakemake environments to simplify the minimal environment required to install MOSCA. MOSCA ow only requires snakemake.

Now MOSCA uses snakemake's rules

All the rules have been moved to corresponding .smk files. This has simplified a lot the main script.
Script files can no longer be run through the command line, however. Interface is with snakemake directly.
First step into producing a web-service.

Added schema for validating config.json

config.schema.yaml checks if all needed informations are present, and in correct format, on the input config file.

New parameter

metaproteomics_add_reference_proteomes: New option for not searching for reference proteomes for organisms identified. Helps save a lot of time during Peptide-to-Spectrum Matching.

Tests have been reformatted

Complete MGMP has been reintroduced, however, it still fails for too much disk usage. It'll be a problem for another time.

Several fixes and improvements

params.method was not being correctly read on de_analysis.R.
config.json is now explicitly required.
tmp directory when handling SortMeRNA is created inside SortMeRNA output directory.
Removed pandas warnings concerning reading files without low_memory=False.
Memory allocated in metaproteomics now in G instead of M.
Removed UPIMAPI apt dependencies - are no longer needed.
Fix on reading method for normalization.
Fix on parsing conditions in de_analysis.R.

Assets 2

16 Jan 22:51

iquasere

2.0.0

eaafad5

Metaproteogenomics - a new level of omics analyses

New workflow of metaproteomics analyses, based on metagenomics (MG) results.

This new layer of analysis allows to input spectra - both in raw and standard formats - to MOSCA for metaproteomics (MP) analysis

MOSCA's MP workflow is as follows:

1. Database construction

A database is built from MG results, aiming to include all sequences that can possibly be in the datasets. This include:

the genes identified by FragGeneScan on the MG gene calling step
reference proteomes retrieved from UniProt of the taxa identified in the annotation step with UPIMAPI
the cRAP database
the protease sequence - only automatically available sequence is Trypsin for now, all others must be inputted manually

This database will then be submitted for a first round of Peptide-to-Spectrum matching with SearchCLI and PeptideShaker. All proteins with at least one Peptide-to-Spectrum match (PSM) are collected for the final database - the metaproteogenomics database.

2. Peptide-to-Spectrum matching

SearchCLI is used for obtaining PSMs from inputted spectra, using as reference the database constructed in the previous step. SearchCLI is used with three search engines - X!Tandem, MyriMatch and MS-GF+. More engines might be added in the future.

3. Protein inference

PeptideShaker is used for protein inference and quantification, based on spectracounts. PSMs are selected at a 5 % local False Discovery Rate, and only peptides with two or more PSMs and only proteins with two or more peptides identified are selected for further analysis

4. Normalization, imputation and differential protein expression analysis

Spectracounts are normalized with Variance Stabilizing Normalization. Missing values are imputed using Local Least Squares Imputation.

Normalized and imputed spectracounts are then submitted for differential protein expression analysis with Reproducibility-Optimized Test Statistics. Log2foldchange and p-values are retrieved for reporting.

5. Metabolic pathway representation and final reportings

All following steps are performed as close as possible to metatranscriptomics (gene expression) analysis.

Metabolic maps are built with KEGGCharter, showing protein expression levels from MP and genomic potential from MG.

Final reports include all results from MG, and report on differential expression analysis of proteins.

Other updates

MOSCA has increased its workflow in around 40 %.

MOSCA is now compatible with the six months old updates of UniProt, through UPIMAPI. It includes the parsing of taxonomic columns, to continue representing taxonomic kronas.

Snakemake conda environments are now used, instead of one single environment. This has made possible again to build MOSCA's environments, and may signal the return of MOSCA to Bioconda.

Assets 2

31 May 17:47

iquasere

1.6.1

1bfb799

Re-added KEGGCharter to workflow

KEGGCharter is again run from "MOSCA_Entry_Report".
Changed its output filename in the rule because the tool now only outputs in TSV.

Also some fixes in environment.yml

fixed perl version
added subversion

Assets 2

24 May 10:21

iquasere

1.6.0

0b15425

Stand-alone metatranscriptomics worflow implemented

Metatranscriptomics can be used as reference without metagenomics

If MG is not inputted, MT will be used for the MG part of MOSCA's workflow - assembly, binning, gene calling and annotation.
Trinity and RNAspades now available as assembler options
rule join_reads now considers possibility of MT as reference

Changes in config.json

experiments.tsv integrated into config.json as a parameter (list of dictionaries)
adapted config.json column names to MOSGUITO
New parameter - "suffix"
- This parameter allows to specify a suffix to follow the _R1/_R2 special characters in files names, MOSCA will consider that those characters are followed by the "suffix" (e.g., _L001 would serve for the files mg_R1_L001.fq and mg_R2_L001.fq)

Adaptations for new versions of tools

SortMeRNA 4 fully implemented
Always gzips SortMeRNA output
UPIMAPI used directly instead of DIAMOND
- MOSCA now accepts UPIMAPI's three options for database: "taxids", "uniprot" or "swissprot"
Small adjustment on CI to allow running reCOGnizer with mini cdd.tar.gz
Fixed krona version (to 2.5) for compatibility with MaxBin2 - MaxBin2 dependencies are presenting problems for higher versions, and krona's more recent versions would force to install those damaged dependencies

Added technical files, removed old scripts

added .gitignore
join_information.py deprecated, replaced by mosca_tools functions and rules in Snakefile

Changes in environment and CI files

install.bash no longer installs mamba
added gmcloser to environment.yml
added simplified cdd.tar.gz for CI
added test for complete workflow of MOSCA
new default for max-ref-number with metaquast - is now 0 to allow running CI

Miscellaneous fixes

fix on snakefile - checks if "Name" in "experiments" is ""
bins and DE results go to the folders of their respective "samples"
several fixes on reporting
fix on alignment functions in mosca_tools.py
fix on de_analysis.R
fix on obtaining directories for Illumina adapters and rRNA databases on preprocessing step

Assets 2

03 Aug 09:33

iquasere

1.5.1

ecc5d11

Fixed high quality bins evaluation

MOSCA was evaluating wrongly the high quality bins.

Best probability threshold is now written at the end of iterative binning.

Assigned minus 1 thread in Snakefile for quantification rule.

Allows upimapi to run simultaneously.

metaSPAdes upped to version 3.15 to not run out of memory.

Fixed some bugs in name assignment.

Assets 2

26 Jul 16:42

iquasere

1.5.0

5f69b5e

Iterative binning for best binning

do_iterative_binning option now available!

Iterative binning cycles between MaxBin and CheckM - MaxBin obtains the bins, CheckM checks their quality
Iterative binning cycles by many probability thresholds to determine the value for the best binning

New option for differential expression - minimum_fold_change!

Determine padj for up or down expression, instead of just 0 difference

Assets 2

14 Jun 09:10

iquasere

1.4.0

b4cef84

Can now be installed from source code

Automatic setup from source code is now functional, and suggested installation method is through the bash script.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update on `sortmerna` functions

`minimum_read_length` parameter split for MG and MT

Added minimum_envs folder and contents

Also, some fixes

MOSCA now has default parameters

Input quality checking

Updates on final reports

Implemented minimum value imputation

No more build_deps in Dockerfile

Dependencies update

Blocked MGMT test

Also, several fixes

Reintroduction of MOSCA into Bioconda

Now MOSCA uses snakemake's rules

Added schema for validating config.json

New parameter

Tests have been reformatted

Several fixes and improvements

1. Database construction

2. Peptide-to-Spectrum matching

3. Protein inference

4. Normalization, imputation and differential protein expression analysis

5. Metabolic pathway representation and final reportings

Other updates

Also some fixes in environment.yml

Metatranscriptomics can be used as reference without metagenomics

Changes in config.json

Adaptations for new versions of tools

Added technical files, removed old scripts

Changes in environment and CI files

Miscellaneous fixes

Releases: iquasere/MOSCA

Merging of paired reads when no assembly is performed

Update on sortmerna functions

minimum_read_length parameter split for MG and MT

Added minimum_envs folder and contents

Also, some fixes

Added default parameters JSON

Default parameters, input sanitization and final reports updates

MOSCA now has default parameters

Input quality checking

Updates on final reports

Implemented minimum value imputation

No more build_deps in Dockerfile

Dependencies update

Blocked MGMT test

Also, several fixes

Reintroduction of MOSCA into Bioconda

Reintroduction of MOSCA into Bioconda

Now MOSCA uses snakemake's rules

Added schema for validating config.json

New parameter

Tests have been reformatted

Several fixes and improvements

Metaproteogenomics - a new level of omics analyses

1. Database construction

2. Peptide-to-Spectrum matching

3. Protein inference

4. Normalization, imputation and differential protein expression analysis

5. Metabolic pathway representation and final reportings

Other updates

Re-added KEGGCharter to workflow

Also some fixes in environment.yml

Stand-alone metatranscriptomics worflow implemented

Metatranscriptomics can be used as reference without metagenomics

Changes in config.json

Adaptations for new versions of tools

Added technical files, removed old scripts

Changes in environment and CI files

Miscellaneous fixes

Fixed high quality bins evaluation

Iterative binning for best binning

Can now be installed from source code

Update on `sortmerna` functions

`minimum_read_length` parameter split for MG and MT