-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #41 from compbiocore/develop
Develop
Showing
29 changed files
with
3,757 additions
and
911 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file was deleted.
Oops, something went wrong.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
|
||
RefChef creates folders to store your references. The names of these folders is based on: | ||
|
||
1. The [`master.yaml`](./specs.md#master.yaml) key (which should match the 'name' entry under 'metadata' in `master.yaml`). | ||
|
||
2. The 'component' entry under 'levels' in [`master.yaml`](./specs.md#master.yaml). | ||
|
||
Here is the collapsed file tree that refchef created from the Tutorial part of the documentation and what the directory names are based on: | ||
|
||
```bash | ||
./Users/jwalla12/references #this directory is specified in refchef-cook or the config files | ||
└── S_cerevisiae #this is named after the 'key' and the 'name' entry under 'metadata' in master.yaml | ||
├── bowtie2_index #this folder is created in the master.yaml `commands` section. | ||
├── bwa_index #this folder is created in the master.yaml `commands` section. | ||
├── gtf #this folder is created in the master.yaml `commands` section. | ||
└── primary #this is named after the 'component' entry under 'levels' in master.yaml | ||
``` | ||
|
||
Here is the expanded file tree: | ||
|
||
```bash | ||
./Users/jwalla12/references | ||
└── S_cerevisiae | ||
├── bowtie2_index | ||
│ └── metadata.txt | ||
├── bwa_index | ||
│ └── metadata.txt | ||
├── gtf | ||
│ ├── CHECKSUMS | ||
│ ├── Saccharomyces_cerevisiae.R64-1-1.87.gtf | ||
│ ├── final_checksums.md5 | ||
│ ├── metadata.txt | ||
│ └── postdownload-checksums.md5 | ||
└── primary | ||
├── CHECKSUMS | ||
├── Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa | ||
├── bowtie2_index -> /Users/jwalla12/references/S_cerevisiae/bowtie2_index | ||
├── bwa_index -> /Users/jwalla12/references/S_cerevisiae/bwa_index | ||
├── final_checksums.md5 | ||
├── metadata.txt | ||
└── postdownload-checksums.md5 | ||
``` | ||
This indicates that refchef has created symlinked directories for bowtie2 and bwa indices in `/Users/jwalla12/references/S_cerevisiae/primary`. This process (linking reference and index) is triggered by: | ||
1. The addition of the `src:` line in bowtie2.yaml and bwa.yaml | ||
2. Specifying the master.yaml `levels` are `indices:` in the master.yaml | ||
|
||
If we look at the output from [`refchef-menu`](./usage.md#refchef-menu), we see the UUID for the primary reference file, which is `dff337a6-9a1d-3313-8ced-dc6f3bfc9689`. | ||
|
||
```bash | ||
┌ 🐶 RefChef Menu ────────────────────────┬───────────┬───────────────────────────────────────────┬──────────────────────────────────────┐ | ||
│ name │ organism │ component │ description │ uuid │ | ||
├──────────────┼──────────────────────────┼───────────┼───────────────────────────────────────────┼──────────────────────────────────────┤ | ||
│ S_cerevisiae │ Saccharomyces cerevisiae │ primary │ corresponds to ganbank id GCA_000146045.2 │ dff337a6-9a1d-3313-8ced-dc6f3bfc9689 │ | ||
└──────────────┴──────────────────────────┴───────────┴───────────────────────────────────────────┴──────────────────────────────────────┘ | ||
``` | ||
In this clipping from bowtie2.yaml, note that the UUID was indicated in the `src:` entry under `component`, `indices`, and `levels`. | ||
|
||
```yaml | ||
S_cerevisiae: | ||
levels: | ||
indices: | ||
- component: bowtie2_index | ||
complete: | ||
status: false | ||
src: dff337a6-9a1d-3313-8ced-dc6f3bfc9689 | ||
``` | ||
This indicates which primary reference was used to create the index file. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
--- | ||
|
||
###**master.yaml** <a name="master.yaml"></a> | ||
|
||
**overview** | ||
Refchef uses YAML files that are composed of nested entry and value pairs -- for example, the entry and value pair `common_name`: `yeast`. The spacing and indentation of the entries and values are meaningful - Refchef uses the convention of using 2 spaces to indent each subsequent level of the entries and values in the YAML and a `:` and space are between each entry and value. Some entries in the yaml will have a preceeding `-` and a space before them (such as `- component:` and the commands under the `commands` header), which are required for Refchef to properly process the YAML. | ||
|
||
See the [`master.yaml` file specifications](./specs.md#master.yaml) for more information. | ||
|
||
Example `master.yaml` before processing: | ||
```yaml | ||
S_cerevisiae: | ||
metadata: | ||
name: S_cerevisiae | ||
common_name: yeast | ||
ncbi_taxon_id: 4932 | ||
organism: Saccharomyces cerevisiae | ||
organization: ensembl | ||
custom: no | ||
description: corresponds to genbank id GCA_000146045.2 | ||
downloader: joselynn wallace | ||
ensembl_release_number: 87 | ||
accession: | ||
genbank: | ||
refseq: | ||
levels: | ||
references: | ||
- component: primary | ||
complete: | ||
status: false | ||
commands: | ||
- wget ftp://ftp.ensembl.org/pub/release-87/fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa.gz | ||
- wget ftp://ftp.ensembl.org/pub/release-87/fasta/saccharomyces_cerevisiae/dna/CHECKSUMS | ||
- md5 *.gz > postdownload-checksums.md5 | ||
- gunzip *.gz | ||
- md5 *.* > final_checksums.md5 | ||
``` | ||
|
||
The string of text entered in the `key` field (`S_cerevisiae` in the above example) will be used to create a folder inside the directory you specify as your output in your config file (`cfg.ini` or `cfg.yaml`) or `refchef-cook` arguments. In the previous quickstart example, we used `/Users/jwalla12/references` as the output directory for `refchef-cook`. Here is the collapsed file tree that refchef created, note that the folder containing the primary reference is nested inside a folder named `S_cerevisiae` based on the `key`. | ||
|
||
```bash | ||
./Users/jwalla12/references #this directory is specified in refchef-cook or the config files | ||
└── S_cerevisiae | ||
├── bowtie2_index | ||
├── bwa_index | ||
├── gtf | ||
└── primary | ||
``` | ||
|
||
**master.yaml metadata** | ||
The `metadata` section of `master.yaml` contains information about the references, including the organism name, taxon_id, etc. | ||
|
||
!!! Caution | ||
When running a new YAML file to add additional information to a primary reference, metadata entries present in the initial [`master.yaml`](#master.yaml) file can be omitted (for example, `ncbi_taxon_id:`, `common_name:`). When adding indices or annotations to a primary reference already in [`master.yaml`](#master.yaml), the metadata in [`master.yaml`](#master.yaml) will be overwritten by the metadata in the new.yaml file. This could be helpful in situations where you want to update the metadata fields. | ||
|
||
**master.yaml levels** | ||
The `levels` section contains higher level information about the references, including when they were downloaded and the exact commands used to download and process the references. | ||
|
||
!!! Caution | ||
The entry `status` must be set to `false` for Refchef to exeecute the commands in the code block. If it is set to `true`, the code will not execute (even if the -e flag is set). After a code block is executed, the `false` flag will flip to `true` automatically and the `time:` entry will appear under the `status` header. The `time:` header will be populated with the datetime stamp the reference was downloaded. | ||
|
||
**master.yaml commands** | ||
This portion of the `master.yaml` should be populated with the specific commands you want to execute to download and process your reference. Each command should be prepended with a `-` and a space. | ||
|
||
!!! Caution | ||
Each time files are processed using a set of commands in the YAML, the last command must run `md5` on all of the files and direct the output to a file called `final_checksums.md5`. | ||
|
||
--- | ||
|
||
### **cfg.yaml** <a name="cfg.yaml"></a> | ||
**overview** | ||
Refchef requires configuration information, which can be passed as arguments or specified in a configuration file. A `cfg.yaml` is one option for configuration and should contain the following fields. Also indicated below: If filling out the field is required, their expected format, and a brief description of their contents. | ||
|
||
|
||
See the [`cfg.yaml` file specifications](./specs.md#cfg.yaml) for more information. | ||
|
||
**example:** | ||
```yaml | ||
config-yaml: | ||
path-settings: | ||
reference-directory: /Users/jwalla12/references | ||
git-directory: /Users/jwalla12/remote_references | ||
remote-repository: jrwallace/remote_references | ||
log-settings: | ||
log: 'yes' | ||
``` | ||
--- | ||
### **cfg.ini** <a name="cfg.ini"></a> | ||
**overview** | ||
Refchef requires configuration information, which can be passed as arguments or specified in a configuration file. A `cfg.ini` is one option for configuration and should contain the following fields. Also indicated below: If filling out the field is required, their expected format, and a brief description of their contents. | ||
|
||
See the [`cfg.ini` file specifications](./specs.md#cfg.ini) for more information. | ||
|
||
**example:** | ||
|
||
```toml | ||
[path-settings] | ||
reference-directory=/Users/jwalla12/references | ||
git-directory=/Users/jwalla12/remote_references | ||
remote-repository=jrwallace/remote_references | ||
[log-settings] | ||
log=yes | ||
[runtime-settings] | ||
break-on-error=yes | ||
verbose=yes | ||
``` | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
**RefChef comes with two commands:** | ||
|
||
[**`refchef-cook`**](./usage.md#refchef-cook): | ||
Will read recipes and execute the commands that will retrieve the references, indices, or annotations based on the contents of [`master.yaml`](./inputs.md#master.yaml). | ||
|
||
[**`refchef-menu`**](./usage.md#refchef-menu): | ||
Provides a way for the user to list all references present in the system, based on [`master.yaml`](./inputs.md#master.yaml), as well as filter the list of references based on metadata options. | ||
|
||
data:image/s3,"s3://crabby-images/439b7/439b7eedcef7ba7aa19a698090b4f2ecf2101d44" alt="Diagram" | ||
|
||
**RefChef requires a [`master.yaml`](./inputs.md#master.yaml) file:** | ||
|
||
In addition to the [`refchef-cook`](./usage.md#refchef-cook) and [`refchef-menu`](./usage.md#refchef-menu) commands, RefChef requires a [`master.yaml`](./inputs.md#master.yaml) containing a list of references, indices, annotations, and metadata, as well as the commands necessary to download and process the files. | ||
When [`refchef-cook`](./usage.md#refchef-cook) is executed, RefChef will append the [`master.yaml`](./inputs.md#master.yaml) to change the `complete` option from `false` to `true`and will also add a `uuid` for each reference, the date the files were downloaded and their location, as well as a complete list of files downloaded. | ||
Based on the arguments you pass to [`refchef-cook`](./usage.md#refchef-cook), it will either commit those changes to [`master.yaml`](./inputs.md#master.yaml) to a local repository or commit and push the changes to a remote repository. | ||
|
||
**RefChef requires configuration information:** | ||
|
||
[`refchef-cook`](./usage.md#refchef-cook) and [`refchef-menu`](./usage.md#refchef-menu) both require some configuration information, including: | ||
|
||
1. Where you'd like the references to be saved | ||
2. The local git repository for version control of references | ||
3. The remote github repository for version control of reference | ||
sequences (optional). | ||
|
||
This information can be specified in a [`cfg.yaml`](./inputs.md#cfg.yaml) file, a [`cfg.ini`](./inputs.md#cfg.ini) file, or it can be passed as arguments to [`refchef-cook`](./usage.md#refchef-cook). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,255 @@ | ||
This quickstart assumes that [bwa](http://bio-bwa.sourceforge.net/) and [bowtie2](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) are installed and in your current path. | ||
|
||
Create a [remote repository](https://help.github.com/en/articles/creating-a-new-repository) and [clone it](https://help.github.com/en/articles/cloning-a-repository). | ||
|
||
Create a directory for refchef to save your references. | ||
|
||
Create a [`master.yaml`](./inputs.md#master.yaml) file and save it in your local git repository directory. Here is a [`master.yaml`](./inputs.md#master.yaml) file that will download a yeast genome from Ensembl: | ||
|
||
```yaml | ||
S_cerevisiae: | ||
metadata: | ||
name: S_cerevisiae | ||
common_name: yeast | ||
ncbi_taxon_id: 4932 | ||
organism: Saccharomyces cerevisiae | ||
organization: ensembl | ||
custom: no | ||
description: corresponds to genbank id GCA_000146045.2 | ||
downloader: joselynn wallace | ||
ensembl_release_number: 87 | ||
accession: | ||
genbank: | ||
refseq: | ||
levels: | ||
references: | ||
- component: primary | ||
complete: | ||
status: false | ||
commands: | ||
- wget ftp://ftp.ensembl.org/pub/release-87/fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa.gz | ||
- wget ftp://ftp.ensembl.org/pub/release-87/fasta/saccharomyces_cerevisiae/dna/CHECKSUMS | ||
- md5 *.gz > postdownload-checksums.md5 | ||
- gunzip *.gz | ||
- md5 *.* > final_checksums.md5 | ||
``` | ||
Pass the configuration arguments in a config file or directly to [`refchef-cook`](./usage.md#refchef-cook) (as seen in the following example): | ||
|
||
``` | ||
refchef-cook -e -o /Users/jwalla12/references -gl /Users/jwalla12/remote_references -gr jrwallace/remote_references --git commit -l | ||
``` | ||
After [`refchef-cook`](./usage.md#refchef-cook) is run, [`master.yaml`](./inputs.md#master.yaml) will reflect that you have downloaded the reference and it will now look like this: | ||
```yaml | ||
S_cerevisiae: | ||
metadata: | ||
name: S_cerevisiae | ||
common_name: yeast | ||
ncbi_taxon_id: 4932 | ||
organism: Saccharomyces cerevisiae | ||
organization: ensembl | ||
custom: false | ||
description: corresponds to genbank id GCA_000146045.2 | ||
downloader: joselynn wallace | ||
ensembl_release_number: 87 | ||
accession: | ||
genbank: null | ||
refseq: null | ||
levels: | ||
references: | ||
- component: primary | ||
complete: | ||
status: true | ||
time: '2019-07-25 09:08:37.478553' | ||
commands: | ||
- wget ftp://ftp.ensembl.org/pub/release-87/fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa.gz | ||
- wget ftp://ftp.ensembl.org/pub/release-87/fasta/saccharomyces_cerevisiae/dna/CHECKSUMS | ||
- md5 *.gz > postdownload-checksums.md5 | ||
- gunzip *.gz | ||
- md5 *.* > final_checksums.md5 | ||
location: /Users/jwalla12/references/S_cerevisiae/primary | ||
files: | ||
- metadata.txt | ||
- postdownload-checksums.md5 | ||
- CHECKSUMS | ||
- final_checksums.md5 | ||
- Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa | ||
uuid: dff337a6-9a1d-3313-8ced-dc6f3bfc9689 | ||
``` | ||
|
||
Make another .yaml file to create a bowtie2 index of this genome, call the file `bowtie2.yaml`. | ||
|
||
```yaml | ||
S_cerevisiae: | ||
levels: | ||
indices: | ||
- component: bowtie2_index | ||
complete: | ||
status: false | ||
src: dff337a6-9a1d-3313-8ced-dc6f3bfc9689 | ||
commands: | ||
- mkdir /Users/jwalla12/references/S_cerevisiae/bowtie2_index | ||
- cd /Users/jwalla12/references/S_cerevisiae/bowtie2_index | ||
- ln -s /Users/jwalla12/references/S_cerevisiae/primary/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa ./Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa | ||
- bowtie2-build Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa S_cerevisiae | ||
- md5 ./*.* > ./final_checksums.md5 | ||
``` | ||
Then use [`refchef-cook`](./usage.md#refchef-cook) and specify the new yaml to add to [`master.yaml`](./inputs.md#master.yaml). | ||
|
||
``` | ||
refchef-cook -e -o /Users/jwalla12/references -gl /Users/jwalla12/remote_references -gr jrwallace/remote_references -n /Users/jwalla12/remote_references/bowtie2.yaml -g commit -l | ||
``` | ||
Make another .yaml file to create a bwa index of this genome, call the file `bwa.yaml`. | ||
```yaml | ||
S_cerevisiae: | ||
levels: | ||
indices: | ||
- component: bwa_index | ||
complete: | ||
status: false | ||
src: dff337a6-9a1d-3313-8ced-dc6f3bfc9689 | ||
commands: | ||
- mkdir /Users/jwalla12/references/S_cerevisiae/bwa_index | ||
- cd /Users/jwalla12/references/S_cerevisiae/bwa_index | ||
- ln -s /Users/jwalla12/references/S_cerevisiae/primary/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa ./Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa | ||
- bwa index Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa -p S_cerevisiae | ||
- md5 ./*.* > ./final_checksums.md5 | ||
``` | ||
|
||
Then use [`refchef-cook`](./usage.md#refchef-cook) and specify the new yaml to add to [`master.yaml`](./inputs.md#master.yaml). | ||
|
||
``` | ||
refchef-cook -e -o /Users/jwalla12/references -gl /Users/jwalla12/remote_references -gr jrwallace/remote_references -n /Users/jwalla12/remote_references/bwa.yaml -g commit -l | ||
``` | ||
|
||
We can also track annotation files for the reference genome. Make the following .yaml file and call it `gtf.yaml`: | ||
|
||
```yaml | ||
S_cerevisiae: | ||
levels: | ||
annotations: | ||
- component: gtf | ||
complete: | ||
status: false | ||
commands: | ||
- wget ftp://ftp.ensembl.org/pub/release-87/gtf/saccharomyces_cerevisiae/Saccharomyces_cerevisiae.R64-1-1.87.gtf.gz | ||
- wget ftp://ftp.ensembl.org/pub/release-87/gtf/saccharomyces_cerevisiae/CHECKSUMS | ||
- md5 *.gz > postdownload-checksums.md5 | ||
- gunzip *.gz | ||
- md5 *.* > final_checksums.md5 | ||
``` | ||
Then use [`refchef-cook`](./usage.md#refchef-cook) and specify the new yaml to add to [`master.yaml`](./inputs.md#master.yaml). | ||
|
||
``` | ||
refchef-cook -e -o /Users/jwalla12/references -gl /Users/jwalla12/remote_references -gr jrwallace/remote_references -n /Users/jwalla12/remote_references/gtf.yaml -g commit -l | ||
``` | ||
We can see what references are available using [`refchef-menu`](./usage.md#refchef-menu): | ||
``` | ||
refchef-menu -f /Users/jwalla12/remote_references/master.yaml | ||
``` | ||
``` | ||
┌ 🐶 RefChef Menu ────────────────────────┬───────────────┬───────────────────────────────────────────┬──────────────────────────────────────┐ | ||
│ name │ organism │ component │ description │ uuid │ | ||
├──────────────┼──────────────────────────┼───────────────┼───────────────────────────────────────────┼──────────────────────────────────────┤ | ||
│ S_cerevisiae │ Saccharomyces cerevisiae │ gtf │ corresponds to genbank id GCA_000146045.2 │ 5f7ae94c-2e51-3cc6-bcbf-6e251c75ef2f │ | ||
│ S_cerevisiae │ Saccharomyces cerevisiae │ bowtie2_index │ corresponds to genbank id GCA_000146045.2 │ 93393699-cb40-3ad7-ac07-ae4bdb1efd3e │ | ||
│ S_cerevisiae │ Saccharomyces cerevisiae │ bwa_index │ corresponds to genbank id GCA_000146045.2 │ dff337a6-9a1d-3313-8ced-dc6f3bfc9689 │ | ||
│ S_cerevisiae │ Saccharomyces cerevisiae │ primary │ corresponds to genbank id GCA_000146045.2 │ dff337a6-9a1d-3313-8ced-dc6f3bfc9689 │ | ||
└──────────────┴──────────────────────────┴───────────────┴───────────────────────────────────────────┴──────────────────────────────────────┘ | ||
``` | ||
We can also get this information if we look at [`master.yaml`](./inputs.md#master.yaml): | ||
```yaml | ||
S_cerevisiae: | ||
metadata: | ||
name: S_cerevisiae | ||
common_name: yeast | ||
ncbi_taxon_id: 4932 | ||
organism: Saccharomyces cerevisiae | ||
organization: ensembl | ||
custom: false | ||
description: corresponds to genbank id GCA_000146045.2 | ||
downloader: joselynn wallace | ||
ensembl_release_number: 87 | ||
accession: | ||
genbank: null | ||
refseq: null | ||
levels: | ||
references: | ||
- component: primary | ||
complete: | ||
status: true | ||
time: '2019-07-25 16:26:42.700668' | ||
commands: | ||
- wget ftp://ftp.ensembl.org/pub/release-87/fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa.gz | ||
- wget ftp://ftp.ensembl.org/pub/release-87/fasta/saccharomyces_cerevisiae/dna/CHECKSUMS | ||
- md5 *.gz > postdownload-checksums.md5 | ||
- gunzip *.gz | ||
- md5 *.* > final_checksums.md5 | ||
location: /Users/jwalla12/references/S_cerevisiae/primary | ||
files: | ||
- metadata.txt | ||
- postdownload-checksums.md5 | ||
- CHECKSUMS | ||
- final_checksums.md5 | ||
- Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa | ||
uuid: dff337a6-9a1d-3313-8ced-dc6f3bfc9689 | ||
indices: | ||
- component: bowtie2_index | ||
complete: | ||
status: true | ||
time: '2019-07-25 16:26:43.971349' | ||
src: dff337a6-9a1d-3313-8ced-dc6f3bfc9689 | ||
commands: | ||
- mkdir /Users/jwalla12/references/yeast_refs/bowtie2_index | ||
- cd /Users/jwalla12/references/yeast_refs/bowtie2_index | ||
- ln -s /Users/jwalla12/references/yeast_refs/primary/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa | ||
/Users/jwalla12/references/yeast_refs/bowtie2_index/ | ||
- bowtie2-build /Users/jwalla12/references/yeast_refs/bowtie2_index/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa | ||
S_cerevisiae | ||
- md5 /Users/jwalla12/references/yeast_refs/bowtie2_index/*.* > /Users/jwalla12/references/yeast_refs/bowtie2_index/final_checksums.md5 | ||
location: /Users/jwalla12/references/S_cerevisiae/bowtie2_index | ||
files: | ||
- metadata.txt | ||
uuid: 84928c3e-af1a-11e9-a45e-8c8590bd206d | ||
- component: bwa_index | ||
complete: | ||
status: true | ||
time: '2019-07-25 16:26:45.183284' | ||
src: dff337a6-9a1d-3313-8ced-dc6f3bfc9689 | ||
commands: | ||
- mkdir /Users/jwalla12/references/yeast_refs/bwa_index | ||
- cd /Users/jwalla12/references/yeast_refs/bwa_index | ||
- ln -s /Users/jwalla12/references/yeast_refs/primary/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa | ||
/Users/jwalla12/references/yeast_refs/bwa_index/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa | ||
- bwa index /Users/jwalla12/references/yeast_refs/bwa_index/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa | ||
> /Users/jwalla12/references/yeast_refs/bwa_index/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa | ||
- md5 /Users/jwalla12/references/yeast_refs/bwa_index/*.* > /Users/jwalla12/references/yeast_refs/bwa_index/final_checksums.md5 | ||
location: /Users/jwalla12/references/S_cerevisiae/bwa_index | ||
files: | ||
- metadata.txt | ||
uuid: 854b7780-af1a-11e9-a9f8-8c8590bd206d | ||
annotations: | ||
- component: gtf | ||
complete: | ||
status: true | ||
time: '2019-07-25 16:26:54.326082' | ||
commands: | ||
- wget ftp://ftp.ensembl.org/pub/release-87/gtf/saccharomyces_cerevisiae/Saccharomyces_cerevisiae.R64-1-1.87.gtf.gz | ||
- wget ftp://ftp.ensembl.org/pub/release-87/gtf/saccharomyces_cerevisiae/CHECKSUMS | ||
- md5 *.gz > postdownload-checksums.md5 | ||
- gunzip *.gz | ||
- md5 *.* > final_checksums.md5 | ||
location: /Users/jwalla12/references/S_cerevisiae/gtf | ||
files: | ||
- metadata.txt | ||
- postdownload-checksums.md5 | ||
- Saccharomyces_cerevisiae.R64-1-1.87.gtf | ||
- CHECKSUMS | ||
- final_checksums.md5 | ||
uuid: 5f7ae94c-2e51-3cc6-bcbf-6e251c75ef2f | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,71 +1,128 @@ | ||
# Specifications for `master.yaml` | ||
### `master.yaml` <a name="master.yaml"></a> | ||
|
||
The [`master.yaml`](./inputs.md#master.yaml) file is the main source of information that RefChef uses to retrieve references, indices, and annotations. It is composed of sequences of code blocks that correspond to each reference. Each code block in [`master.yaml`](./inputs.md#master.yaml) starts with a `key`, followed by `metadata` and `levels`. | ||
|
||
See the [`master.yaml` overview and usage](./inputs.md#master.yaml) for more information. | ||
|
||
--- | ||
```yaml | ||
reference_test1: | ||
metadata: | ||
name: reference_test1 | ||
species: mouse | ||
organization: ucsc | ||
downloader: fgelin | ||
levels: | ||
references: | ||
- component: primary | ||
complete: | ||
status: false | ||
commands: | ||
- wget -nv https://s3.us-east-2.amazonaws.com/refchef-tests/chr1.fa.gz | ||
- md5 *.fa.gz > postdownload_checksums.md5 | ||
- gunzip *.gz | ||
- md5 *.fa > final_checksums.md5 | ||
``` | ||
The `master.yaml` file is the main source of information that RefChef uses to retrieve references, indices, and annotations. | ||
|
||
### Specifications | ||
|
||
The `key` section consists of: | ||
|
||
`<reference_name>:` | ||
Expected format: String where <reference_name\> is the name of the reference. | ||
|
||
--- | ||
|
||
Each block has a key with the name of the reference, index, or annotation. | ||
The `metadata` section consists of: | ||
|
||
>`metadata.name` | ||
>Expected format: <reference_name\> string, should be the same as the block's `key` | ||
>`metadata.common_name` | ||
>Expected format: string | ||
>`metadata.ncbi_taxon_id` | ||
>Expected format: integer, based on [NCBI taxon ID](https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi) | ||
>`metadata.organism` | ||
>Expected format: string | ||
>`metadata.organization` | ||
>Expected format: string | ||
>`metadata.custom` | ||
>Expected format: string | ||
>`metadata.description` | ||
>Expected format: string | ||
>`metadata.downloader` | ||
>Expected format: string | ||
>`metadata.ensembl_release_number` | ||
>Expected format: integer | ||
>>`metadata.accession.genbank` | ||
>>Expected format: string | ||
>>`metadata.accession.refseq` | ||
>>Expected format: string | ||
--- | ||
|
||
`reference_name.metadata` | ||
Expected format: key - value mapping | ||
The `levels` section consists of: | ||
|
||
`reference_name.metadata.name` | ||
Expected format: <reference_name> string, should be the same as the block's key | ||
>`levels.<type>` | ||
>Where <type\>: `references`, `annotations`, or `indices` | ||
>>`levels.<type>.- component` | ||
>>Expected format: string | ||
>>>`levels.<type>.complete.status` | ||
>>>Expected format: boolean (note that if `complete.status` is set to `true` RefChef will skip the current block and not retrieve any file. RefChef automatically changes the status to `true` after retrieving files for the first time.) | ||
>>`levels.<type>.src` | ||
Expected format: UUID string from existing reference, when adding an index file for a reference RefChef will create a symlink to the index files in the reference folder. | ||
|
||
>>`levels.<type>.commands` | ||
Expected format: Each command should start with `- `, this section is a list of commands to download and process each reference. | ||
|
||
After [`refchef-cook`](./usage.md#refchef-cook) is run and references are downloaded, `levels.<type>.complete.status: false` will change to `levels.<type>.complete.status: true` and the following fields will be added to `master.yaml` | ||
|
||
>>>`levels.<type>.complete.time` | ||
>>>Expected format: RefChef will autopopulate this field with the date and time stamp the reference was downloaded if `levels.<type>.complete.status: true` | ||
>>`levels.<type>.location` | ||
Expected format: Refchef will autopopulate this field with the directory where downloaded files are stored if `levels.<type>.complete.status: true` | ||
>>`levels.<type>.files` | ||
Expected format: Refchef will autopopulate this field with a list of files that were downloaded if `levels.<type>.complete.status: true` | ||
>>`levels.<type>.uuid` | ||
Expected format: Refchef will autopopulate this field with a UUID for your reference file if `levels.<type>.complete.status: true` | ||
--- | ||
|
||
### `cfg.yaml` <a name="cfg.yaml"></a> | ||
|
||
If using a `cfg.yaml` file, the `cfg.yaml` file should follow the following specs: | ||
|
||
>>`config-yaml.path-settings.reference-directory` | ||
Expected format: String, path to reference storage directory | ||
|
||
>>`config-yaml.path-settings.git-directory` | ||
Expected format: String, path to local git repository | ||
|
||
>>`config-yaml.path-settings.remote-repository` | ||
Expected format: String, remote git repository, should be in the format of `user/repo` | ||
|
||
>>`config-yaml.log-settings.log` | ||
Expected format: String, should be either 'yes' or 'no' in single quotes, indicating whether or not log files will be made | ||
|
||
Also see the [`cfg.yaml` overview and example.](./usage.md#cfg.yaml) | ||
|
||
--- | ||
### `cfg.ini` <a name="cfg.ini"></a> | ||
|
||
`reference_name.metadata.species` | ||
Expected format: string | ||
If using a `cfg.ini` file, the `cfg.ini` file should follow the following specs: | ||
|
||
`reference_name.metadata.organization` | ||
Expected format: string | ||
`[path-settings].reference-directory=` | ||
Expected format: String, path to reference storage directory | ||
|
||
`reference_name.metadata.downloader` | ||
Expected format: string | ||
`[path-settings].git-directory=` | ||
Expected format: String, path to local git repository | ||
|
||
`reference_name.levels` | ||
Expected format: key - value mapping | ||
`[path-settings].remote-repository=` | ||
Expected format: String, remote git repository, should be in the format of `user/repo` | ||
|
||
`reference_name.levels.<type>` | ||
Where <type\>: `references`, `annotations`, or `indices` | ||
Expected format: list of key - value mappings | ||
`[log-settings].log=` | ||
Expected format: String, should be either 'yes' or 'no', indicating whether or not log files will be made | ||
|
||
> `reference_name.levels.<type>.-` | ||
`[runtime-settings].break-on-error=` | ||
Expected format: String, should be either 'yes' or 'no', indicating how RefChef should respond when encountering an error | ||
|
||
> `component` | ||
Expected format: string | ||
`complete.status` | ||
Expected formate: boolean (note that if `complete.status` is set to `true` RefChef will skip the current block and not retrieve any file. RefChef automatically changes the status to true after retrieving files for the first time.) | ||
`src` | ||
Expected format: UUID v4, or string. If a UUID of an existing reference is entered, RefChef will create a symlink to the index files from the reference folder. | ||
`commands` | ||
Expected format: list of strings | ||
`[runtime-settings].verbose=` | ||
Expected format: String, should be either 'yes' or 'no', toggles between verbosity output settings | ||
|
||
After RefChef runs and retrieves the files, the following fields will be appended the following fields to `master.yaml`: | ||
Also see the [`cfg.ini` overview and example.](./usage.md#cfg.ini) | ||
|
||
>`reference_name.levels.<type>.-` | ||
|
||
> `location` | ||
Expected format: string | ||
`files` | ||
Expected format: list of strings | ||
`uuid` | ||
Expected format: UUID v4 |
This file was deleted.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
###**Download reference, local repository `master.yaml` version control:** | ||
data:image/s3,"s3://crabby-images/88473/88473feedcb92d14e8e3a65a2390cfb206c82d28" alt="Diagram" | ||
###**Download reference, remote repository `master.yaml` version control:** | ||
data:image/s3,"s3://crabby-images/befac/befac50ddf1f797614704c03573615635b6eb27b" alt="Diagram" | ||
###**Download new reference, local repository `master.yaml` version control:** | ||
data:image/s3,"s3://crabby-images/d5313/d5313591180cae77156b772cd1a42d120322cf9e" alt="Diagram" | ||
###**Add manually downloaded reference, append commands to master.yaml, do not execute commands, local repository `master.yaml` version control:** | ||
data:image/s3,"s3://crabby-images/3c80b/3c80b04d8b30b9f973fe48c7073ff9f844d64640" alt="Diagram" | ||
###**refchef-menu to view references available on the system:** | ||
data:image/s3,"s3://crabby-images/ec277/ec277cc324b6627255d042a151b4bb5bbb1c6ca4" alt="Diagram" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters