-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
13 additions
and
34 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,56 +1,35 @@ | ||
# Assembly pipeline for Mytilus genomes | ||
|
||
Assembly pipeline from 10x chromium reads from the preprint | ||
"Three new genome assemblies of blue mussel lineages: North and South European Mytilus edulis and Mediterranean Mytilus galloprovincialis" bioRxiv ([https://doi.org/10.1101/2022.09.02.506387](https://doi.org/10.1101/2022.09.02.506387 )). | ||
|
||
[`snakemake`](https://snakemake.readthedocs.io/en/stable/) (in a conda environnement for example) and | ||
[`singularity`](https://github.com/hpcng/singularity) need to be installed. | ||
|
||
## Supernova storage workarounds | ||
|
||
Supernova use large amount of storage for temporary and final results. | ||
|
||
The supernova results are stored on a distant NAS that needs to be mounted first on my system. | ||
``` | ||
sshfs nas4:/share/sea/sea/projects/ref_genomes/assembly_10x/results/supernova_assemblies \ | ||
results/supernova_assemblies \ | ||
-o idmap=user,compression=no,uid=1000,gid=1000,allow_root | ||
``` | ||
|
||
I also use a 4T disk as a temporary local storage for supernova computation | ||
I also used a 4T disk as a temporary local storage for supernova computation | ||
`sudo mount /dev/sd[x]1 /data/ref_genomes/assembly_10x/tmp` | ||
|
||
To run use: | ||
``` | ||
conda activate snake_env | ||
snakemake --use-conda --conda-frontend mamba --conda-prefix .conda \ | ||
--use-singularity --singularity-args "-B /nas_sea:/nas_sea" \ | ||
-j {threads} | ||
``` | ||
|
||
Final versions are *_v6.pseudohap.fasta.gz and they correspond to: | ||
- mgal_01 | ||
- medu_01 | ||
- mtro_01 | ||
|
||
Another version of mtro is done, tros_v7, also called mtro_02 which is improved by LRScaf with nanopore reads, scaffolding on the *Mytilus coruscus* reference genome and Pilon corrections. | ||
## How to run | ||
|
||
To run use: | ||
``` | ||
conda activate snake_env | ||
snakemake --use-conda --conda-frontend mamba --conda-prefix .conda \ | ||
snakemake --use-conda \ | ||
--use-singularity --singularity-args "-B /nas_sea:/nas_sea" \ | ||
-j {threads} mtro_improvement | ||
``` | ||
|
||
## Calling for pop check | ||
|
||
This part uses another dataset of reference individuals called with angsd. | ||
For comparison we also call with angsd (especially ANGSD puts major allele as REF in bcf and is therefore incompatible with bcftools call). | ||
``` | ||
ln -s /data2/myt_popgen/angsd_calling/results/post_analysis/subset.sites resources/angsd_subset.sites | ||
ln -s /data2/myt_popgen/angsd_calling/results/post_analysis/subset.beagle.gz resources/angsd_ref_subset.beagle.gz | ||
-j {threads} \ | ||
[either all_v6, asm_improvement, stats, repeats, annotation, finalize or ncbi_submission (see workflow/Snakefile)] | ||
``` | ||
|
||
## Annotation tools to build beforehand | ||
|
||
``` | ||
sudo singularity build -F resources/cactus_v1.3.0-gpu.sif \ | ||
docker://quay.io/comparative-genomics-toolkit/cactus:v1.3.0-gpu | ||
sudo singularity build resources/cat.sif docker://quay.io/ucsc_cgl/cat | ||
``` |