Skip to content

Commit

Permalink
Improve README
Browse files Browse the repository at this point in the history
  • Loading branch information
alxsimon committed Sep 16, 2022
1 parent bc3cafa commit de6c471
Showing 1 changed file with 13 additions and 34 deletions.
47 changes: 13 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,56 +1,35 @@
# Assembly pipeline for Mytilus genomes

Assembly pipeline from 10x chromium reads from the preprint
"Three new genome assemblies of blue mussel lineages: North and South European Mytilus edulis and Mediterranean Mytilus galloprovincialis" bioRxiv ([https://doi.org/10.1101/2022.09.02.506387](https://doi.org/10.1101/2022.09.02.506387 )).

[`snakemake`](https://snakemake.readthedocs.io/en/stable/) (in a conda environnement for example) and
[`singularity`](https://github.com/hpcng/singularity) need to be installed.

## Supernova storage workarounds

Supernova use large amount of storage for temporary and final results.

The supernova results are stored on a distant NAS that needs to be mounted first on my system.
```
sshfs nas4:/share/sea/sea/projects/ref_genomes/assembly_10x/results/supernova_assemblies \
results/supernova_assemblies \
-o idmap=user,compression=no,uid=1000,gid=1000,allow_root
```

I also use a 4T disk as a temporary local storage for supernova computation
I also used a 4T disk as a temporary local storage for supernova computation
`sudo mount /dev/sd[x]1 /data/ref_genomes/assembly_10x/tmp`

To run use:
```
conda activate snake_env
snakemake --use-conda --conda-frontend mamba --conda-prefix .conda \
--use-singularity --singularity-args "-B /nas_sea:/nas_sea" \
-j {threads}
```

Final versions are *_v6.pseudohap.fasta.gz and they correspond to:
- mgal_01
- medu_01
- mtro_01

Another version of mtro is done, tros_v7, also called mtro_02 which is improved by LRScaf with nanopore reads, scaffolding on the *Mytilus coruscus* reference genome and Pilon corrections.
## How to run

To run use:
```
conda activate snake_env
snakemake --use-conda --conda-frontend mamba --conda-prefix .conda \
snakemake --use-conda \
--use-singularity --singularity-args "-B /nas_sea:/nas_sea" \
-j {threads} mtro_improvement
```

## Calling for pop check

This part uses another dataset of reference individuals called with angsd.
For comparison we also call with angsd (especially ANGSD puts major allele as REF in bcf and is therefore incompatible with bcftools call).
```
ln -s /data2/myt_popgen/angsd_calling/results/post_analysis/subset.sites resources/angsd_subset.sites
ln -s /data2/myt_popgen/angsd_calling/results/post_analysis/subset.beagle.gz resources/angsd_ref_subset.beagle.gz
-j {threads} \
[either all_v6, asm_improvement, stats, repeats, annotation, finalize or ncbi_submission (see workflow/Snakefile)]
```

## Annotation tools to build beforehand

```
sudo singularity build -F resources/cactus_v1.3.0-gpu.sif \
docker://quay.io/comparative-genomics-toolkit/cactus:v1.3.0-gpu
sudo singularity build resources/cat.sif docker://quay.io/ucsc_cgl/cat
```

0 comments on commit de6c471

Please sign in to comment.