Skip to content

Commit

Permalink
Merge branch 'RTIInternational:master' into master
Browse files Browse the repository at this point in the history
  • Loading branch information
carynwillis authored Nov 18, 2024
2 parents bda94ac + 7bcb1a8 commit 8f32eef
Show file tree
Hide file tree
Showing 7 changed files with 301 additions and 238 deletions.
218 changes: 0 additions & 218 deletions cellranger_extract_rename/v1.0/rename_files.sh

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
FROM joshkeegan/zip:3.19.1
FROM alpine:3.20.3

LABEL maintainer="David Williams <[email protected]>"
LABEL description="This Docker image contains an in-house script written to extract and rename files coming from the 'outs.zip' file from Cellranger"
LABEL description="This Docker image contains an in-house script written to extract and rename files coming from the 'outs.zip' file or outs directory from Cellranger"
LABEL software-version="1.1"

RUN apk update
RUN apk add vim vim-doc vim-tutor
RUN apk add bash
RUN apk add --no-cache vim vim-doc vim-tutor zip unzip
RUN apk add --no-cache bash

# Add scripts to make it run
ADD rename_files.sh /
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Description

This Docker image contains an in-house script written to extract and rename key files from the large 'outs.zip' file from the Cell Ranger process. Its purpose is to fit in the single cell RNA sequencing workflow, as well as ATACseq and potentially GUIDEseq, in the RMIP project. The Input to this should be ZIP file(s) coming from a Cell Ranger run, and the output is an output directory with renamed files and a renamed copy of the input 'outs.zip' file. Files and relative paths within the 'outs.zip' file are listed in the table below:
This Docker image contains an in-house script written to extract and rename key files from the large 'outs.zip' file or large 'outs' directory from the Cell Ranger process. Its purpose is to fit in the single cell RNA sequencing workflow, as well as ATACseq and potentially GUIDEseq, in the RMIP project. The Input to this should be a ZIP file or directory coming from a Cell Ranger run, and the output is an output directory with renamed files and a renamed copy of the input 'outs.zip' file (if given). Files and relative paths within the 'outs.zip' file are listed in the table below:

| Directory | Filename | Description | Link |
| -- | -- | -- | -- |
Expand All @@ -11,53 +11,54 @@ This Docker image contains an in-house script written to extract and rename key
| ./ | possorted_genome_bam.bam.bai | Index file associated with the BAM file. | https://www.10xgenomics.com/support/software/cell-ranger/analysis/outputs/cr-outputs-bam |
| ./ | filtered_feature_bc_matrix.h5 | Filtered feature-barcode matrices describing the number of UMIs associated with a feature and a barcode. | https://www.10xgenomics.com/support/software/cell-ranger/analysis/outputs/cr-outputs-h5-matrices |

Each of these files are copied to an output directory and prepended with a "linker" for a given sample. Additionally, the given ZIP file is copied to this directory and also prepended with this linker. Given the example linker `RMIP_001_002_A_003_B`, this can be separated into different components, delimited by `_`. Details on the linker and its format are below.
Each of these files are copied to an output directory and prepended with a "linker" for a given sample. Additionally, the given ZIP file is copied to this directory and also prepended with this linker. Given the example linker `RMIP_001_allo2_A_003_B`, this can be separated into different components, delimited by `_`. Details on the linker and its format are below.

| Name | Component | Description |
| -- | -- | -- |
| RMIP identifier | `RMIP` | Goes at the beginning of each linker |
| Project Identifier | `001` | Numeric only |
| Participant ID | `002` | Alphanumeric only, no length restrictions |
| Participant ID | `allo2` | Alphanumeric only, no length restrictions |
| Discriminator | `A` | Alphabetic only - combination with "Identifier" uniquely identifies every collection event |
| Identifier | `003` | Numeric only - combination with "Discriminator" uniquely identifies every collection event |
| Vial identifier (alphabetic) | `B` | Alphabetic only - identifies specific collection aliquot - optional if only one vial |

## Sample usage

This script is used to extract results from the large output ZIP file from Cell Ranger.
This script is used to extract results from the output of a Cell Ranger run.

Build
```
docker build -t cellranger_extract_rename:v1 .
docker build -t cellranger_extract_rename:v1.1 .
```

Run
```
docker run -it -v $PWD:/data cellranger_extract_rename:v1 rename_files.sh
docker run -it -v $PWD:/data cellranger_extract_rename:v1.1 rename_files.sh
```

Usage info:
```
Usage: ./rename_files.sh [OPTIONS]
Usage: /rename_files.sh [OPTIONS]
Options:
-h, --help Display this help message
-v, --verbose Enable verbose mode
-l, --linker STRING Specify name of linker to prepend to extracted files (format 'RMIP_<ddd>_<ddd>_<w>_<ddd>_<w>') - Required
e.g. linker='RMIP_001_001_A_001_A'
-l, --linker STRING Specify name of linker to prepend to extracted files (format 'RMIP_<ddd>_<alphanum>_<w>_<ddd>_<w>') - Required
e.g. linker='RMIP_001_allo1_A_001_A'
Note that the Vial Identifier (last letter) is optional
-z, --zip_file STRING/PATH Specify name and path of ZIP file to read, decompress, and rename - Required
-i, --input STRING/PATH Specify name and path of either ZIP file to read OR input directory - Required
-o, --output_dir STRING/PATH Specify directory where to put extracted files. Default = '.'
Example usage
Required flags: ./rename_files.sh -z outs.zip -l RMIP_001_001_A_001_A
Verbose mode: ./rename_files.sh -v -z outs.zip -l RMIP_001_001_A_001_B
Writing to output directory: ./rename_files.sh -z outs.zip -l RMIP_001_001_A_001_C -o test_output
Required flags (ZIP input): ./rename_files.sh -i outs.zip -l RMIP_001_allo1_A_001_A
Required flags (DIRECTORY input): ./rename_files.sh -i outs -l RMIP_001_allo1_A_001_B
Writing to output directory: ./rename_files.sh -i outs.zip -l RMIP_001_allo1_A_001_C -o outs
Verbose mode: ./rename_files.sh -v -i outs.zip -l RMIP_001_allo1_A_001_D
```

## Files included

- `Dockerfile`: the Docker file used to build this image
- `rename_files.sh`: Bash shell script that serves as the main executable when the Docker container is run. Expected behavior is to take a specified input ZIP file in the current working directory, unzip specific files from it, and rename those files to include a prefix signifying a sample name. This writes those specific files and a renamed copy of the given ZIP file to an output directory.
- `rename_files.sh`: Bash shell script that serves as the main executable when the Docker container is run. Expected behavior is to take a specified input ZIP file OR directory in the current working directory, extract specific files from it, and rename those files to include a prefix signifying a sample name. This writes those specific files and a renamed copy of the ZIP file (if given) to an output directory.

## Contact

Expand Down
Loading

0 comments on commit 8f32eef

Please sign in to comment.