arboratornf (Arborator)

This Nextflow pipeline implements Arborator. Arborator takes as input JSON-formatted genomic profiles and corresponding metadata, groups genomic profiles according to specified metadata criteria, and then summarizes each group and its corresponding metadata.

Input

The input to the pipeline is a standard sample sheet (passed as --input samplesheet.csv) that looks like:

sample	mlst_alleles	metadata_partition	metadata_1	metadata_2	metadata_3	metadata_4	metadata_5	metadata_6	metadata_7	metadata_8
SampleA	sampleA.mlst.json	partition_meta	meta_1	meta_2	meta_3	meta_4	meta_5	meta_6	meta_7	meta_8

The columns of the samplesheet are defined as follows:

sample: The sample ID name. This name should not contain spaces.
mlst_alleles: A URI path to a JSON-formatted genomic profile. An example of this file is provided in tests/data/profiles/S1.mlst.json.
metadata_partition: The specific metadata column used to partition the genomic profiles. For example, this column might refer to the outbreak number and the contain such entries as "1", "2", etc. If an individual sample row in the sample sheet is missing the metadata_partition entry, that sample will not be included in the analysis and will instead be reported in the arborator/metadata.excluded.tsv output file.
metadata_1..metadata_8: Metadata that will be associated with each genomic profile. These metadata will be summarized in the Arborator outputs.

The names of each metadata column (metadata_partition, and metadata_1..metadata_8) are provided using the following parameters:

--metadata_partition_name: The name of the metadata_partition column (for example: "outbreak").
--metadata_1_header..metadata_8_header: The name of each individual metadata column (for example: "organism" or "source").

Entries in the metadata_partition column in the sample sheet, as well as the name provided by the metadata_partition_name parameter, must contain only the following characters alphanumeric, _, ., and - characters.

Entries in the metadata columns in the sample sheet (metadata_1 through metadata_8), as well as the name provided by the metadata header parameters (metadata_1_header through metadata_8_header), may not contain newlines, tabs, ", ', \, |, ;, >, or < characters.

An example of the sample sheet is available in tests/data/samplesheets/samplesheet.csv and corresponding example metadata headers are available in assets/parameters.yaml.

Furthermore, the structure of the sample sheet is programmatically defined in assets/schema_input.json. Validation of the sample sheet is performed by nf-validation.

Parameters

The mandatory parameters are --input, which specifies the samplesheet as described above, and --output, which specifies the output results directory. You may wish to provide -profile singularity to specify the use of singularity containers and -r [branch] to specify which GitHub branch you would like to run. Metadata-related parameters are described above in Input.

Optional

The optional parameters are as follows:

Arborator

--ar_config: The Arborator-specific config file for specifying the operations used when summarizing metadata and how such metadata should be displayed in the output.
--ar_thresholds: The clustering thresholds used by Arborator. These thresholds must be provided as a list of integers.

Further parameters (defaults from nf-core) are defined in nextflow_schema.json.

Running

To execute the pipeline, please run:

nextflow run phac-nml/arboratornf -profile singularity -r main -latest --input assets/samplesheet.csv -params-file assets/parameters.yaml --outdir results

Where the samplesheet.csv is structured as specified in the Input section and parameters.yaml provides parameters for renaming metadata column headers, which may either be specified individually on the command line or collectively in a parameters file.

Additional details on usage of the pipeline are found in docs/usage.md.

Output

A JSON-formatted file for loading metadata into IRIDA Next is output by this pipeline. The format of this JSON-formatted file is specified in our Pipeline Standards for the IRIDA Next JSON. This JSON-formatted file is written directly within the --outdir provided to the pipeline with the name iridanext.output.json.gz (ex: [outdir]/iridanext.output.json.gz).

An example of the what the contents of the IRIDA Next JSON-formatted file looks like for this particular pipeline is as follows:

{
    "files": {
        "global": [
            {
                "path": "arborator/metadata.included.tsv"
            },
            {
                "path": "arborator/metadata.excluded.tsv"
            },
            {
                "path": "arborator/cluster_summary.tsv"
            },
            // ...
        ],
        "samples": {

        }
    },
    "metadata": {
        "samples": {

        }
    }
}

Within the files section of this JSON-formatted file, all of the output paths are relative to the outdir. Therefore, "path": "arborator/cluster_summary.tsv" refers to a file located within outdir/arborator/cluster_summary.tsv.

The arborator/metadata.included.tsv and arborator/metadata.excluded.tsv output files summarize which samples were analyzed and which were not. Samples that contain missing data for the metadata_partition column will not be included in analysis and will be reported in the arborator/metadata.excluded.tsv output file.

Additional details on output files are found in docs/output.md.

Test profile

To run with the test profile, please do:

nextflow run phac-nml/arboratornf -profile docker,test -r main -latest --outdir results

Legal

Licensed under the MIT License (the "License"); you may not use this work except in compliance with the License. You may obtain a copy of the License at:

https://opensource.org/license/mit/

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Name		Name	Last commit message	Last commit date
Latest commit History 147 Commits
.devcontainer		.devcontainer
.github		.github
assets		assets
bin		bin
conf		conf
docs		docs
modules		modules
subworkflows/local		subworkflows/local
tests		tests
workflows		workflows
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitpod.yml		.gitpod.yml
.nf-core.yml		.nf-core.yml
.pre-commit-config.yaml		.pre-commit-config.yaml
.prettierignore		.prettierignore
.prettierrc.yml		.prettierrc.yml
CHANGELOG.md		CHANGELOG.md
CITATIONS.md		CITATIONS.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
main.nf		main.nf
modules.json		modules.json
nextflow.config		nextflow.config
nextflow_schema.json		nextflow_schema.json
nf-test.config		nf-test.config
pyproject.toml		pyproject.toml
tower.yml		tower.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

arboratornf (Arborator)

Input

Parameters

Optional

Arborator

Running

Output

Test profile

Legal

About

Releases 2

Packages

Contributors 4

Languages

License

phac-nml/arboratornf

Folders and files

Latest commit

History

Repository files navigation

arboratornf (Arborator)

Input

Parameters

Optional

Arborator

Running

Output

Test profile

Legal

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 4

Languages

Packages