Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Additional Profiles and Clusters #29

Merged
merged 24 commits into from
Oct 24, 2024
Merged

Conversation

kylacochrane
Copy link
Contributor

@kylacochrane kylacochrane commented Sep 27, 2024

This update aims to enhance the pipeline by integrating additional reference profiles and clusters from user-provided database parameters:

  • --db_profiles will be incorporated through the APPEND_PROFILES process (which follows LOCIDEX_MERGE_REF).
  • --db_clusters will be integrated via the APPEND_CLUSTERS process (which follows CLUSTER_FILE).

Both parameters are required for their respective processes, and users must provide both; it is not possible to supply only one.

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Copy link

github-actions bot commented Sep 27, 2024

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit a78a8ad

+| ✅ 144 tests passed       |+
#| ❔  23 tests were ignored |#
!| ❗   4 tests had warnings |!

❗ Test warnings:

❔ Tests ignored:

  • files_exist - File is ignored: assets/nf-core-gasnomenclature_logo_light.png
  • files_exist - File is ignored: docs/images/nf-core-gasnomenclature_logo_dark.png
  • files_exist - File is ignored: docs/images/nf-core-gasnomenclature_logo_light.png
  • files_exist - File is ignored: .github/workflows/awstest.yml
  • files_exist - File is ignored: .github/workflows/awsfulltest.yml
  • nextflow_config - Config variable ignored: manifest.name
  • nextflow_config - Config variable ignored: manifest.homePage
  • nextflow_config - Config variable ignored: params.max_cpus
  • files_unchanged - File ignored due to lint config: LICENSE or LICENSE.md or LICENCE or LICENCE.md
  • files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
  • files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
  • files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md
  • files_unchanged - File ignored due to lint config: .github/workflows/branch.yml
  • files_unchanged - File ignored due to lint config: assets/email_template.html
  • files_unchanged - File ignored due to lint config: assets/email_template.txt
  • files_unchanged - File ignored due to lint config: assets/sendmail_template.txt
  • files_unchanged - File does not exist: assets/nf-core-gasnomenclature_logo_light.png
  • files_unchanged - File does not exist: docs/images/nf-core-gasnomenclature_logo_light.png
  • files_unchanged - File does not exist: docs/images/nf-core-gasnomenclature_logo_dark.png
  • files_unchanged - File ignored due to lint config: docs/README.md
  • actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/gasnomenclature/gasnomenclature/.github/workflows/awstest.yml
  • actions_awsfulltest - actions_awsfulltest
  • pipeline_name_conventions - pipeline_name_conventions

✅ Tests passed:

Run details

  • nf-core/tools version 3.0.1
  • Run at 2024-10-22 13:51:26

@kylacochrane kylacochrane marked this pull request as ready for review October 4, 2024 20:53
Copy link
Member

@emarinier emarinier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

main.nf Outdated

if ((params.db_profiles && !params.db_clusters) || (!params.db_profiles && params.db_clusters)) {
Copy link
Member

@emarinier emarinier Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to change this, because it's probably easier to understand the way you have it, but you could do this with an XOR: true only if one of the two is true.

I don't think Python (edit: Nextflow/Groovy) has an actual logical XOR operator, so it might reduce to something like:

if bool(params.db_profiles) != bool(params.db_clusters):

but again, just a comment, not necessary to change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion Eric - played around a bit and got it working here: 7d54942

@sgsutcliffe
Copy link

It looks good nothing to change. Always have suggestions though!

It could be made more clear in the README.md (or with additional error reporting but more work). Basically, it should be emphasized that any address levels in the additional databases that are not in the samplesheet address will be dropped. The error could be triggered if max address size in the samplesheet is smaller than number of columns/levels in the database. As those will be will be dropped (I believe based on how csvtk concat works). For the README.md you could emphasize headers must match and do something like this for the example reference database.
address 1.1. ... . n

sample_id l1 l2 ... ln
sampleA 1 1 ... 1
sampleB 1 1 ... 2
sampleC 2 1 ... 1

Copy link
Member

@apetkau apetkau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for making these changes Kyla. Great work 😄

A few in-line comments below

nextflow_schema.json Outdated Show resolved Hide resolved
tests/pipelines/main_append_databases.nf.test Show resolved Hide resolved
modules/local/append_clusters/main.nf Outdated Show resolved Hide resolved
modules/local/append_profiles/main.nf Outdated Show resolved Hide resolved
@kylacochrane
Copy link
Contributor Author

Thank you everyone for your review and suggestions!
I have updated the README but additional changes will need to be made once the new version of GAS CALL is used, as there will be simplified formatting introduced.

@kylacochrane kylacochrane requested a review from apetkau October 15, 2024 14:10
Copy link
Member

@apetkau apetkau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much Kyla for addressing all my comments. And for adding all those tests 😄. Amazing work.

I just have one more question given in-line.

modules/local/append_profiles/main.nf Outdated Show resolved Hide resolved
modules/local/append_profiles/main.nf Outdated Show resolved Hide resolved
@kylacochrane kylacochrane requested a review from apetkau October 15, 2024 21:03
Copy link
Member

@apetkau apetkau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for all the great work you've done with this Kyla. It looks amazing and handles so many more situations with sample names. I really appreciate it 😄

# Calculate the frequency of each sample_id across both sources
csvtk freq -t -f id combined_profiles.tsv > sample_counts.tsv

# For any sample_id that appears in both the reference and database, add a 'db_' prefix to the sample_id from the database
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's really cool. This would solve the issue with duplicates in all situations then 😄 . Thanks so much.

}
}

test("Test pipeline when appended profiles or clusters have sample_id overlap") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding this test 😄

@kylacochrane kylacochrane merged commit 574877c into dev Oct 24, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants