Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several fixes after running a full dataset #24

Open
wants to merge 14 commits into
base: dev
Choose a base branch
from
Open

Conversation

LeonHafner
Copy link
Contributor

@LeonHafner LeonHafner commented Oct 17, 2024

Changes:

  • multiple chromosomes by sorting the bed files Wrong sorting of ROSE chrom_sizes and bed #19
  • fixes TPM calculation (might fix files input at calculate_tpm.py are causing error #21, saw the issue just yet)
  • add memory parameter to ChromHMM binarizeBams and LearnModel
  • fix duplicated gene versions in DYNAMITE:PREPROCESS
  • fix error in DYNAMITE where the test set was smaller than 1 sample
  • change dynamite error strategy (set to ignore) to handle tasks with too few samples
  • fix duplicated gene versions in TF_TG_SCORE

@LeonHafner LeonHafner requested a review from nictru October 17, 2024 19:50
@LeonHafner LeonHafner linked an issue Nov 5, 2024 that may be closed by this pull request
@LeonHafner LeonHafner marked this pull request as ready for review November 7, 2024 16:16
@LeonHafner LeonHafner self-assigned this Nov 18, 2024
Copy link

github-actions bot commented Dec 12, 2024

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 13313a7

+| ✅ 214 tests passed       |+
#| ❔  11 tests were ignored |#
!| ❗  12 tests had warnings |!

❗ Test warnings:

  • readme - README contains the placeholder zenodo.XXXXXXX. This should be replaced with the zenodo doi (after the first release).
  • pipeline_todos - TODO string in nextflow.config: Optionally, you can add a pipeline-specific nf-core config at https://github.com/nf-core/configs
  • pipeline_todos - TODO string in README.md: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file.
  • pipeline_todos - TODO string in README.md: Add bibliography of tools and data used in your pipeline
  • pipeline_todos - TODO string in main.nf: A stub section should mimic the execution of the original module as best as possible
  • pipeline_todos - TODO string in output.md: Write this documentation describing your workflow's output
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_todos - TODO string in base.config: Check the defaults for all processes
  • pipeline_todos - TODO string in base.config: Customise requirements for specific processes.
  • pipeline_todos - TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required

❔ Tests ignored:

  • files_exist - File is ignored: assets/multiqc_config.yml
  • files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
  • files_unchanged - File ignored due to lint config: assets/sendmail_template.txt
  • template_strings - Ignoring Jinja template strings in file /home/runner/work/tfactivity/tfactivity/modules/local/report/create/app/templates/base.html
  • template_strings - Ignoring Jinja template strings in file /home/runner/work/tfactivity/tfactivity/modules/local/report/create/app/templates/configuration.html
  • template_strings - Ignoring Jinja template strings in file /home/runner/work/tfactivity/tfactivity/modules/local/report/create/app/templates/macros.html
  • template_strings - Ignoring Jinja template strings in file /home/runner/work/tfactivity/tfactivity/modules/local/report/create/app/templates/network.html
  • template_strings - Ignoring Jinja template strings in file /home/runner/work/tfactivity/tfactivity/modules/local/report/create/app/templates/snp.html
  • template_strings - Ignoring Jinja template strings in file /home/runner/work/tfactivity/tfactivity/modules/local/report/create/app/templates/tf.html
  • template_strings - Ignoring Jinja template strings in file /home/runner/work/tfactivity/tfactivity/modules/local/report/create/app/templates/tg.html
  • multiqc_config - multiqc_config

✅ Tests passed:

Run details

  • nf-core/tools version 3.0.2
  • Run at 2025-01-04 19:25:17

Comment on lines 128 to 130
withName: "RUN_DYNAMITE" {
errorStrategy = "ignore"
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what situations is this relevant? By setting the errorStrategy to ignore, we also prevent the pipeline from trying again if it fails due to too little RAM etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dynamite fails with exitStatus 139 when running with too little data, so I adapted the errorStrategy accordingly.

Comment on lines -194 to +196
rndselect=sample(x=nrow(M),size=as.numeric(argsL$testsize)*nrow(M))
# Test on a single example if dataset size is too small
rndselect=sample(x=nrow(M),size=ifelse(as.numeric(argsL$testsize)*nrow(M) < 1, 1, as.numeric(argsL$testsize)*nrow(M)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is a copy from here and I would like to keep it identical if possible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could open a PR to their repository, however, the tool does not appear to be actively maintained. The last open PR is from 2020 and still unanswered.

The changes improve the tool's robustness when handling small dataset sizes, making it essential for a reliable pipeline and also for the run on our lactation data.

df_lengths = df_lengths.loc[df_counts.index]
df_lengths = df_lengths.loc[df_lengths.index.isin(df_counts.index)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check if it can happen that the dataframes have different orders?

If we cannot be entirely sure of this, it might be better to build the intersection of both indices and then subset both to the intersection

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to the index intersection you proposed. Seems not necessary for our use case, since both data frames have the same order, but it's definitely more robust this way.

@@ -14,38 +15,34 @@ def format_yaml_like(data: dict, indent: int = 0) -> str:
"""
yaml_str = ""
for key, value in data.items():
spaces = " " * indent
spaces = " " * indent
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this was not on purpose

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think four spaces are the default for nf-core modules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

files input at calculate_tpm.py are causing error Wrong sorting of ROSE chrom_sizes and bed
2 participants