Several fixes after running a full dataset #24

LeonHafner · 2024-10-17T19:50:38Z

Changes:

multiple chromosomes by sorting the bed files Wrong sorting of ROSE chrom_sizes and bed #19
fixes TPM calculation (might fix files input at calculate_tpm.py are causing error #21, saw the issue just yet)
add memory parameter to ChromHMM binarizeBams and LearnModel
fix duplicated gene versions in DYNAMITE:PREPROCESS
fix error in DYNAMITE where the test set was smaller than 1 sample
change dynamite error strategy (set to ignore) to handle tasks with too few samples
fix duplicated gene versions in TF_TG_SCORE

github-actions · 2024-12-12T12:50:41Z

`nf-core pipelines lint` overall result: Passed ✅ ⚠️

Posted for pipeline commit 13313a7

+| ✅ 214 tests passed       |+
#| ❔  11 tests were ignored |#
!| ❗  12 tests had warnings |!

❗ Test warnings:

readme - README contains the placeholder zenodo.XXXXXXX. This should be replaced with the zenodo doi (after the first release).
pipeline_todos - TODO string in nextflow.config: Optionally, you can add a pipeline-specific nf-core config at https://github.com/nf-core/configs
pipeline_todos - TODO string in README.md: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file.
pipeline_todos - TODO string in README.md: Add bibliography of tools and data used in your pipeline
pipeline_todos - TODO string in main.nf: A stub section should mimic the execution of the original module as best as possible
pipeline_todos - TODO string in output.md: Write this documentation describing your workflow's output
pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
pipeline_todos - TODO string in base.config: Check the defaults for all processes
pipeline_todos - TODO string in base.config: Customise requirements for specific processes.
pipeline_todos - TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required

❔ Tests ignored:

files_exist - File is ignored: assets/multiqc_config.yml
files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
files_unchanged - File ignored due to lint config: assets/sendmail_template.txt
template_strings - Ignoring Jinja template strings in file /home/runner/work/tfactivity/tfactivity/modules/local/report/create/app/templates/base.html
template_strings - Ignoring Jinja template strings in file /home/runner/work/tfactivity/tfactivity/modules/local/report/create/app/templates/configuration.html
template_strings - Ignoring Jinja template strings in file /home/runner/work/tfactivity/tfactivity/modules/local/report/create/app/templates/macros.html
template_strings - Ignoring Jinja template strings in file /home/runner/work/tfactivity/tfactivity/modules/local/report/create/app/templates/network.html
template_strings - Ignoring Jinja template strings in file /home/runner/work/tfactivity/tfactivity/modules/local/report/create/app/templates/snp.html
template_strings - Ignoring Jinja template strings in file /home/runner/work/tfactivity/tfactivity/modules/local/report/create/app/templates/tf.html
template_strings - Ignoring Jinja template strings in file /home/runner/work/tfactivity/tfactivity/modules/local/report/create/app/templates/tg.html
multiqc_config - multiqc_config

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: assets/nf-core-tfactivity_logo_light.png
files_exist - File found: conf/modules.config
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/images/nf-core-tfactivity_logo_light.png
files_exist - File found: docs/images/nf-core-tfactivity_logo_dark.png
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: main.nf
files_exist - File found: conf/base.config
files_exist - File found: conf/igenomes.config
files_exist - File found: conf/igenomes_ignored.config
files_exist - File found: .github/workflows/awstest.yml
files_exist - File found: .github/workflows/awsfulltest.yml
files_exist - File found: modules.json
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: docs/images/nf-core-tfactivity_logo.png
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/NfcoreTemplate.groovy
files_exist - File not found check: lib/Utils.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: lib/WorkflowMain.groovy
files_exist - File not found check: lib/WorkflowTfactivity.groovy
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: Singularity
files_exist - File not found check: lib/nfcore_external_java_deps.jar
files_exist - File not found check: .travis.yml
nextflow_config - Found nf-schema plugin
nextflow_config - Config variable found: manifest.name
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: manifest.homePage
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: validation.help.enabled
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable found: validation.help.beforeText
nextflow_config - Config variable found: validation.help.afterText
nextflow_config - Config variable found: validation.help.command
nextflow_config - Config variable found: validation.summary.beforeText
nextflow_config - Config variable found: validation.summary.afterText
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config variable (correctly) not found: params.max_cpus
nextflow_config - Config variable (correctly) not found: params.max_memory
nextflow_config - Config variable (correctly) not found: params.max_time
nextflow_config - Config variable (correctly) not found: params.validationFailUnrecognisedParams
nextflow_config - Config variable (correctly) not found: params.validationLenientMode
nextflow_config - Config variable (correctly) not found: params.validationSchemaIgnoreParams
nextflow_config - Config variable (correctly) not found: params.validationShowHiddenParams
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config manifest.name began with nf-core/
nextflow_config - Config variable manifest.homePage began with https://github.com/nf-core/
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config manifest.version ends in dev: 0.0.1dev
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
nextflow_config - nextflow.config contains configuration profile test
nextflow_config - Config default value correct: params.min_peak_occurrence= 1
nextflow_config - Config default value correct: params.window_size= 50000
nextflow_config - Config default value correct: params.decay= true
nextflow_config - Config default value correct: params.expression_aggregation= mean
nextflow_config - Config default value correct: params.affinity_aggregation= max
nextflow_config - Config default value correct: params.chromhmm_states= 10
nextflow_config - Config default value correct: params.chromhmm_threshold= 0.75
nextflow_config - Config default value correct: params.chromhmm_enhancer_marks= H3K27ac,H3K4me1
nextflow_config - Config default value correct: params.chromhmm_promoter_marks= H3K4me3
nextflow_config - Config default value correct: params.rose_tss_window= 2500
nextflow_config - Config default value correct: params.rose_stitching_window= 12500
nextflow_config - Config default value correct: params.min_count= 50
nextflow_config - Config default value correct: params.min_tpm= 1.0
nextflow_config - Config default value correct: params.min_count_tf= 50
nextflow_config - Config default value correct: params.min_tpm_tf= 1.0
nextflow_config - Config default value correct: params.dynamite_ofolds= 3
nextflow_config - Config default value correct: params.dynamite_ifolds= 6
nextflow_config - Config default value correct: params.dynamite_alpha= 0.1
nextflow_config - Config default value correct: params.dynamite_randomize= false
nextflow_config - Config default value correct: params.dynamite_min_regression= 0.1
nextflow_config - Config default value correct: params.alpha= 0.05
nextflow_config - Config default value correct: params.igenomes_base= s3://ngi-igenomes/igenomes/
nextflow_config - Config default value correct: params.custom_config_version= master
nextflow_config - Config default value correct: params.custom_config_base= https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Config default value correct: params.publish_dir_mode= copy
nextflow_config - Config default value correct: params.validate_params= true
nextflow_config - Config default value correct: params.pipelines_testdata_base_path= https://raw.githubusercontent.com/nf-core/test-datasets/
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - CODE_OF_CONDUCT.md matches the template
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/bug_report.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/config.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/PULL_REQUEST_TEMPLATE.md matches the template
files_unchanged - .github/workflows/branch.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .github/workflows/linting.yml matches the template
files_unchanged - assets/email_template.html matches the template
files_unchanged - assets/email_template.txt matches the template
files_unchanged - assets/nf-core-tfactivity_logo_light.png matches the template
files_unchanged - docs/images/nf-core-tfactivity_logo_light.png matches the template
files_unchanged - docs/images/nf-core-tfactivity_logo_dark.png matches the template
files_unchanged - docs/README.md matches the template
files_unchanged - .gitignore matches the template
files_unchanged - .prettierignore matches the template
actions_ci - '.github/workflows/ci.yml' is triggered on expected events
actions_ci - '.github/workflows/ci.yml' checks minimum NF version
actions_awstest - '.github/workflows/awstest.yml' is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml does not use -profile test
readme - README Nextflow minimum version badge matched config. Badge: 24.04.2, Config: 24.04.2
plugin_includes - No wrong validation plugin imports have been found
pipeline_name_conventions - Name adheres to nf-core convention
template_strings - Did not find any Jinja template strings (0 files)
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: fix-linting.yml
actions_schema_validation - Workflow validation passed: release-announcements.yml
actions_schema_validation - Workflow validation passed: awsfulltest.yml
actions_schema_validation - Workflow validation passed: template_version_comment.yml
actions_schema_validation - Workflow validation passed: download_pipeline.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: clean-up.yml
actions_schema_validation - Workflow validation passed: awstest.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'
base_config - conf/base.config found and not ignored.
modules_config - conf/modules.config found and not ignored.
modules_config - CLEAN_BED found in conf/modules.config and Nextflow scripts.
modules_config - FILTER_CONVERT_GTF found in conf/modules.config and Nextflow scripts.
modules_config - SORT_BED found in conf/modules.config and Nextflow scripts.
modules_config - SORT_CHROM_SIZES found in conf/modules.config and Nextflow scripts.
modules_config - CONSTRUCT_TSS found in conf/modules.config and Nextflow scripts.
modules_config - FILTER_PREDICTIONS found in conf/modules.config and Nextflow scripts.
modules_config - STITCHING found in conf/modules.config and Nextflow scripts.
modules_config - TSS_OVERLAP found in conf/modules.config and Nextflow scripts.
modules_config - FILTER_OVERLAPS found in conf/modules.config and Nextflow scripts.
modules_config - UNSTITCHED_REGIONS found in conf/modules.config and Nextflow scripts.
modules_config - CONCAT_AND_SORT found in conf/modules.config and Nextflow scripts.
modules_config - BEDTOOLS_SORT found in conf/modules.config and Nextflow scripts.
modules_config - BEDTOOLS_MERGE found in conf/modules.config and Nextflow scripts.
modules_config - ANNOTATE_SAMPLES found in conf/modules.config and Nextflow scripts.
modules_config - CONCAT_SAMPLES found in conf/modules.config and Nextflow scripts.
modules_config - FILTER_MIN_OCCURRENCE found in conf/modules.config and Nextflow scripts.
modules_config - RUN_DYNAMITE found in conf/modules.config and Nextflow scripts.
modules_config - COMBINE_TFS_PER_ASSAY found in conf/modules.config and Nextflow scripts.
modules_config - COMBINE_TGS_PER_ASSAY found in conf/modules.config and Nextflow scripts.
nfcore_yml - Repository type in .nf-core.yml is valid: pipeline
nfcore_yml - nf-core version in .nf-core.yml is set to the latest version: 3.0.2

Run details

nf-core/tools version 3.0.2
Run at 2025-01-04 19:25:17

nictru · 2024-12-19T14:30:24Z

conf/modules.config

+    withName: "RUN_DYNAMITE" {
+        errorStrategy = "ignore"
+    }


In what situations is this relevant? By setting the errorStrategy to ignore, we also prevent the pipeline from trying again if it fails due to too little RAM etc

Dynamite fails with exitStatus 139 when running with too little data, so I adapted the errorStrategy accordingly.

nictru · 2024-12-19T14:32:05Z

bin/DYNAMITE.R

-                rndselect=sample(x=nrow(M),size=as.numeric(argsL$testsize)*nrow(M))
+                # Test on a single example if dataset size is too small
+                rndselect=sample(x=nrow(M),size=ifelse(as.numeric(argsL$testsize)*nrow(M) < 1, 1, as.numeric(argsL$testsize)*nrow(M)))


This file is a copy from here and I would like to keep it identical if possible

We could open a PR to their repository, however, the tool does not appear to be actively maintained. The last open PR is from 2020 and still unanswered.

The changes improve the tool's robustness when handling small dataset sizes, making it essential for a reliable pipeline and also for the run on our lactation data.

nictru · 2024-12-19T14:35:08Z

modules/local/counts/calculate_tpm/templates/calculate_tpm.py

-df_lengths = df_lengths.loc[df_counts.index]
+df_lengths = df_lengths.loc[df_lengths.index.isin(df_counts.index)]


Did you check if it can happen that the dataframes have different orders?

If we cannot be entirely sure of this, it might be better to build the intersection of both indices and then subset both to the intersection

I switched to the index intersection you proposed. Seems not necessary for our use case, since both data frames have the same order, but it's definitely more robust this way.

nictru · 2024-12-19T14:35:56Z

modules/local/fimo/combine_results/templates/combine_results.py

@@ -14,38 +15,34 @@ def format_yaml_like(data: dict, indent: int = 0) -> str:
    """
    yaml_str = ""
    for key, value in data.items():
-        spaces = "  " * indent
+        spaces = "    " * indent


I guess this was not on purpose

I think four spaces are the default for nf-core modules.

LeonHafner added 8 commits September 20, 2024 14:04

add sorting for chrom_sizes

23cca51

fix calculate_tpm

f61345d

define memory parameter for ChromHMM binarizeBams

86b199f

ChromHMM LearnModel add memory parameter

ef367ec

dynamite preprocess fix duplicates

c94e85a

fix dynamite balanced sampling

ba2f075

change dynamite error strategy

1f6422a

Fix tf_tg duplicate genes after version clipping

0b983e5

LeonHafner requested a review from nictru October 17, 2024 19:50

LeonHafner linked an issue Nov 5, 2024 that may be closed by this pull request

Wrong sorting of ROSE chrom_sizes and bed #19

Open

LeonHafner marked this pull request as ready for review November 7, 2024 16:16

LeonHafner self-assigned this Nov 18, 2024

add staging for fimo files

3c5908b

LeonHafner added 3 commits December 12, 2024 19:39

change fimo outdir transfer

a57d808

increase memory of report

18d0cfe

streamline fimo output

45ba08d

nictru reviewed Dec 19, 2024

View reviewed changes

LeonHafner added 2 commits January 4, 2025 19:37

dynamite error strategy

cb8e10a

add index intersection

13313a7

LeonHafner requested a review from nictru January 4, 2025 19:25

LeonHafner mentioned this pull request Jan 4, 2025

Add end2end part to make it accessible to more users #18

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Several fixes after running a full dataset #24

Several fixes after running a full dataset #24

LeonHafner commented Oct 17, 2024 •

edited

Loading

github-actions bot commented Dec 12, 2024 •

edited

Loading

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

nictru Dec 19, 2024

LeonHafner Jan 4, 2025

nictru Dec 19, 2024

LeonHafner Jan 3, 2025

nictru Dec 19, 2024

LeonHafner Jan 4, 2025

nictru Dec 19, 2024

LeonHafner Jan 3, 2025

		df_lengths = df_lengths.loc[df_counts.index]
		df_lengths = df_lengths.loc[df_lengths.index.isin(df_counts.index)]

Several fixes after running a full dataset #24

Are you sure you want to change the base?

Several fixes after running a full dataset #24

Conversation

LeonHafner commented Oct 17, 2024 • edited Loading

github-actions bot commented Dec 12, 2024 • edited Loading

nf-core pipelines lint overall result: Passed ✅ ⚠️

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LeonHafner commented Oct 17, 2024 •

edited

Loading

github-actions bot commented Dec 12, 2024 •

edited

Loading

`nf-core pipelines lint` overall result: Passed ✅ ⚠️