Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SortMeRNA database version #1354

Open
wzheng0520 opened this issue Aug 16, 2024 · 3 comments
Open

SortMeRNA database version #1354

wzheng0520 opened this issue Aug 16, 2024 · 3 comments
Milestone

Comments

@wzheng0520
Copy link

Description of feature

Hi,

Thanks for providing this wonderful pipeline for using!

After digging some information in sortmerna and this pipeline, I have an curious about defualt SortMeRNA database version we are currently using.

Based on https://github.com/nf-core/rnaseq/blob/master/assets/rrna-db-defaults.txt, the rRNA database pointed into SortMeRNA old rRNA database version (SILVA 119). However, starting from version 4.3.4 in sortmerna, they started to allow to use newer SILVA database version (SILVA 138), which could allow commercial using. However, based on my understanding, if we want to use those new generated rRNA database, we might need to download them at first and then applied into sortmerna. I am wondering is there any plan to update the default database on RNA-seq pipeline and allow the newer SILVA database could be applied?

Sincerely
Winnie Zheng

@MatthiasZepper
Copy link
Member

As far as SortMeRNA itself is concerned, the current module version is 3.4.6, but 3.4.7 is out. So it would be very welcome, if you would update the module to use the latest version. Then it will most likely be updated to the latest version when the next pipeline release is due.

the rRNA database pointed into SortMeRNA old rRNA database version (SILVA 119). However, starting from version 4.3.4 in sortmerna, they started to allow to use newer SILVA database version (SILVA 138), which could allow commercial using.

Sorry, I can't follow here. Indeed, we are pointing to the references for version 4.3.4 in the pipeline, but it doesn't seem that those files have been updated for the last five years. One can also always supply the ribo_database_manifest parameter to specify their own one.

Did you consider a reference from another source? And how would commercial use matter - because of some restrictions on the SILVA database?

@wzheng0520
Copy link
Author

Hi Matthias,

Thanks for your quick replying!

I wanted to let you know that as of SILVA database version 138 or newer, there are no longer any licensing restrictions on commercial use, according to SILVA's licensing information.

Furthermore, SortMeRNA has updated their database builds based on SILVA 138. Although these are not included in the usual rRNA databases on GitHub, SortMeRNA has provided a download link to the newer SILVA database version in response to an issue ticket SortMeRNA issue #282.

The new rRNA content databases include:

smr_v4.3_default_db.fasta
smr_v4.3_fast_db.fasta
smr_v4.3_sensitive_db_rfam_seeds.fasta
smr_v4.3_sensitive_db.fasta
These updates should be useful for your current work.

@MatthiasZepper
Copy link
Member

Ah, I see! They now distribute the references as an extra asset in selected releases instead of committing them to the main repo. Thanks for pointing this out!

For release 3.16 of the pipeline, we should indeed look into this, but since the references are compressed into an archive, we can't just update the paths, but would need to implement a download and extraction step in the pipeline (or submit the uncompressed versions to our nf-core test data repo respectively mirror them on AWS). I will add this to the roadmap.

@MatthiasZepper MatthiasZepper added this to the 3.16.0 milestone Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants