nf-cdstransfer

Overview

This Nextflow workflow is designed to process a sample sheet (samplesheet.csv), retrieve files from Synapse based on entityId, and upload them to an AWS S3 bucket.

[!NOTE]
The workflow consists of two main steps:

synapse_get: Downloads the files from Synapse using the entityId from the sample sheet.
cds_upload: Uploads the downloaded files to a specified AWS S3 bucket.

Example Usage

nextflow run ncihtan/nf-cdstransfer --input samplesheet.csv

Input

Parameters

Parameter	Type	Description
input	`str`	Path to input samplesheet CSV file containing entityId and aws_uri columns. Required.
take_n	`int`	Number of samples to process from the samplesheet. Use -1 to process all samples. Default: `-1`
dryrun	`bool`	If true, adds `--dryrun` flag to AWS copy commands for testing without actual file transfer. Default: `false`
aws_secret_prefix	`str`	Prefix for AWS credential environment variables. Used to construct variable names like `${aws_secret_prefix}_AWS_ACCESS_KEY_ID`. Useful for managing multiple AWS credential sets. Default: `""`

Workflow Details

Overview

The workflow transfers files from Synapse to CDS (Cloud Data Service) in three main steps:

Read and parse input samplesheet
Download files from Synapse
Upload files to CDS S3 bucket
Generate transfer report

Secrets

Nextflow secrets are used to ensure that tokens, keys and secrets are not exposed

The following Nextflow secrets should be set:

SYNAPSE_AUTH_TOKEN: Synapse authentication token.
<params.aws_secret_prefix>_AWS_ACCESS_KEY_ID: AWS access key ID. eg CDS_AWS_ACCESS_KEY_ID
<params.aws_secret_prefix>: AWS secret access key. eg CDS_AWS_SECRET_ACCESS_KEY

nextflow secrets set SYNAPSE_AUTH_TOKEN <SUPER_SECRET_THING>

Input Samplesheet Requirements

Field	Required	Pattern	Description	Example
entityId	Yes	`^syn\d+$`	Synapse entity ID starting with 'syn' followed by numbers	`syn123456`
file_url_in_cds	Yes	`^s3://.+`	URL to the file location in AWS S3, must start with 's3://'	`s3://mybucket/path/to/file`

Notes:

Additional columns are allowed but not validated
Both fields are mapped internally:
- entityId → entityid
- file_url_in_cds → aws_uri

Configuration

Plugins Used

The workflow uses the following plugins:

nf-schema: For parameter validation and schema management
nf-boost: For enhanced functionality and utilities

Default settings

The included nextflow.config file specifies the following default options. These are used if not overridden by a custom config or profile.

docker.enabled = true

Profiles

The nextflow.config file defines several profiles to customize the workflow execution. Below are the available profiles and the parameters/settings they configure:

Setting / Profile	test	CDS	local	docker	tower
params.input	$projectDir/samplesheet.csv	-	-	-	-
params.aws_secret_prefix	TEST	CDS	-	-	-
params.dryrun	true	-	-	-	-
docker.enabled	true	true	true	true	true
process.executor	local	-	local	-	-
process.cpus		-	-	-	1 * task.attempt
process.memory		-	-	-	1.GB * task.attempt
process.maxRetries		-	-	-	3
process.errorStrategy		-	-	-	retrys

Process 1: `synapse_get`

Downloads files from Synapse using entityIds.

Input

meta: Object containing entityId and aws_uri

Output

Tuple of (meta, downloaded file path)

Dependencies

Requires SYNAPSE_AUTH_TOKEN secret
Uses synapsepythonclient container

Process 2: `cds_upload`

Uploads downloaded files to CDS S3 bucket.

Input

Tuple of (meta, file path) from synapse_get

Output

Tuple of (meta, upload success boolean)

Dependencies

Requires AWS credentials:
- ${aws_secret_prefix}_AWS_ACCESS_KEY_ID
- ${aws_secret_prefix}_AWS_SECRET_ACCESS_KEY
Uses AWS CLI container

Output

No specific outputs are generated by the workflow.
By default a trace file is saved to reports/trace.csv

Running the Workflow

Prerequisites

Ensure Nextflow is installed.
Ensure you have access to the necessary containers (synapseclient, awscli).
Ensure you have the appropriate credentials for Synapse and AWS.

Example usage

Run the workflow with the following command:

nextflow run ncihtan/nf-cdstransfer --input path/to/samplesheet.csv

Using the test profile will use samplesheet.csv when stored in your projectDir. Please generate your own samplesheet and use aws_secret_prefix TEST when setting your relevent AWS Nextflow secrets

nextflow run ncihtan/nf-cdstransfer -profile test

To avoid having to reset secrets when moving between destination accounts you can set your secrets using a prefix

nextflow secrets set MYCREDS_AWS_ACCESS_KEY_ID
nextflow secrets set MYCREDS_AWS_SECRET_ACCESS_KEY
nextflow run ncihtan/nf-cdstransfer --aws_secret_prefix MYCREDS

or use a configured profile in which params.aws_secret_prefix is set

nextflow run ncihtan/nf-cdstransfer -profile CDS --input samplesheet.csv

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
assets		assets
conf		conf
.gitignore		.gitignore
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config
nextflow_schema.json		nextflow_schema.json
samplesheet.csv		samplesheet.csv
tower.yaml		tower.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nf-cdstransfer

Overview

Example Usage

Input

Parameters

Workflow Details

Overview

Secrets

Input Samplesheet Requirements

Configuration

Plugins Used

Default settings

Profiles

Process 1: `synapse_get`

Input

Output

Dependencies

Process 2: `cds_upload`

Input

Output

Dependencies

Output

Running the Workflow

Prerequisites

Example usage

About

Releases

Packages

Contributors 2

Languages

ncihtan/nf-cdstransfer

Folders and files

Latest commit

History

Repository files navigation

nf-cdstransfer

Overview

Example Usage

Input

Parameters

Workflow Details

Overview

Secrets

Input Samplesheet Requirements

Configuration

Plugins Used

Default settings

Profiles

Process 1: synapse_get

Input

Output

Dependencies

Process 2: cds_upload

Input

Output

Dependencies

Output

Running the Workflow

Prerequisites

Example usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Process 1: `synapse_get`

Process 2: `cds_upload`

Packages