SyntheticSequenceEvaluation

Pipeline to evaluate synthetic sequences.

Installation & Setup

Conda setup

In the base environment of the lambda server, run the following commands:

conda create -n mdhpipeline -c rapidsai -c nvidia -c conda-forge  \
    cuml=22.04 python=3.9 cudatoolkit=11.2 \
    jupyterlab pytorch
conda activate mdhpipeline
export IBM_POWERAI_LICENSE_ACCEPT=yes
pip install -U scikit-learn

conda install matplotlib
conda uninstall pytorch
pip uninstall torch
pip install torch --no-cache-dir
pip install plotly==5.8.2

PyPI setup

python3 -m venv env
source env/bin/activate
pip3 install -U pip setuptools wheel
pip3 install -r requirements/dev.txt
pip3 install -r requirements/requirements.txt
pip3 install -e .
pip3 install torch
pip3 install pytorch-lightning

Running Alphafold

First log in to Lambda. Then add the following to .bashrc by:

vim ~/.bashrc
i

alias alphafold_container='/software/singularity/bin/singularity exec --nv -B /lambda_stor/ /lambda_stor/data/hsyoo/AlphaFoldImage/alphafold.sif bash'
alias alphafold_env='source /opt/miniconda3/etc/profile.d/conda.sh; conda activate alphafold'

Press ESC and then type :wq then press Enter to save the changes and exit vim.

Then run source ~/.bashrc or re-login. Then run:

/software/singularity/bin/singularity exec --nv -B /lambda_stor/ -B /software /lambda_stor/data/hsyoo/AlphaFoldImage/alphafold.sif bash
alphafold_env
mkdir examplerun
cd examplerun
cp -r /opt/alphafold/* .
vim test_seq.fasta # this is where you paste in the protein sequence whose 3D structure you would like AlphaFold to predict.

If running AlphaFold with lambda_fold.py

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6 python /homes/lind/gene_transformer/examples/folding/lambda_fold.py -i /homes/lind/examplerun/alphafold_fasta -o /homes/lind/MDH-pipeline/run_alphafold/alphafold_results

Note that -i should lead to a directory that contains fasta files, not to the fasta file itself, and -o should lead to an empty directory.

If running AlphaFold without using lambda_fold.py

./run.sh -d /lambda_stor/data/hsyoo/AlphaFoldData  -o test_out -f test_seq.fasta \
 -t 2020-05-01 -p casp14 -m model_1,model_2,model_3,model_4,model_5 \
-a 0

In the last command, -a indicates the GPU to use. In this case, we are using the 0th GPU. You can use the command nvidia-smi to check which GPUs are available in this Lambda node.

Now, AlphaFold should start to run. This may take an hour.

After AlphaFold finishes running, run the following to specify where you would like to store the AlphaFold results.

scp -r username@lambda:/homes/lind/examplerun/ local/directory/path

For example, this might be scp -r [email protected]:/homes/lind/examplerun/* ~/Documents/mdh_results for me.

Running generate.py

Example 1: Running `generate_fasta`

python3 generate.py --mode get_fasta --config /homes/lind/MDH-pipeline/mdh_gpt.yaml --load_pt_checkpoint /homes/mzvyagin/gpt2_mdh_example/gpt2_earnest_river_122_mdh.pt --fasta_path /homes/lind/MDH-pipeline/fasta/fasta_test3.fasta

Example 2: Running `fasta_to_embeddings`

python3 generate.py --mode get_embeddings --config /homes/lind/MDH-pipeline/mdh_gpt.yaml --load_pt_checkpoint /homes/mzvyagin/gpt2_mdh_example/gpt2_earnest_river_122_mdh.pt --fasta_path /homes/lind/MDH-pipeline/fasta/fasta_test3.fasta --embeddings_output_path /homes/lind/MDH-pipeline/embeddings/embeddings_test3.npy

Running visualize.py

Example 1: Get t-SNE plots

python3 visualize.py --mode tsne --embed_path /homes/mzvyagin/MDH/perlmutter_data/gpt2_generated_embeddings.npy --fasta_path /homes/mzvyagin/MDH/perlmutter_data/globalstep2850.fasta --save_path /homes/lind/MDH-pipeline/visualize/tsne_gen.png --get_subplots True

Example 2: Get UMAP plots

python3 visualize.py --mode umap --embed_path /homes/mzvyagin/MDH/perlmutter_data/gpt2_generated_embeddings.npy --fasta_path /homes/mzvyagin/MDH/perlmutter_data/globalstep2850.fasta --save_path /homes/lind/MDH-pipeline/visualize/umap_gen.png --get_subplots True

Example 3: Get embedding distance vs. alignment score plots

python3 visualize.py --mode align_plot --embed_path /homes/mzvyagin/MDH/perlmutter_data/inference-test-embeddings.npy --fasta_path /homes/mzvyagin/MDH/gene_transformer/gene_transformer/data/full_mdh_fasta/test.fasta --save_path /homes/lind/MDH-pipeline/visualize/embed_dist_vs_align_score_test.png --alignment_type global --num_workers 70

Example 4: Getting histogram showing mean/max/min alignment scores between two collections of sequences

python3 visualize.py --mode align_hist_mean_max_min --embed_path /homes/mzvyagin/mdh_gpt2/likely-wood-857/likely-wood-857-1024concat_seqs_fasta_embeddings_mean.npy --fasta_path /homes/mzvyagin/mdh_gpt2/likely-wood-857/likely-wood-857_1024concat_seqs.fasta --embed_path2 /homes/mzvyagin/mdh_gpt2/likely-wood-857/likely-wood-857-test_fasta_embeddings_mean.npy --fasta_path2 /homes/mzvyagin/MDH/gene_transformer/gene_transformer/data/full_mdh_fasta/test.fasta --save_path /homes/lind/MDH-pipeline/visualize/gen_test_hist_mean_max_min.png --alignment_type global --num_workers 70

Name		Name	Last commit message	Last commit date
Latest commit History 215 Commits
Examples		Examples
biosynseq		biosynseq
data		data
requirements		requirements
.gitattributes		.gitattributes
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SyntheticSequenceEvaluation

Installation & Setup

Conda setup

PyPI setup

Running Alphafold

If running AlphaFold with lambda_fold.py

If running AlphaFold without using lambda_fold.py

Running generate.py

Example 1: Running `generate_fasta`

Example 2: Running `fasta_to_embeddings`

Running visualize.py

Example 1: Get t-SNE plots

Example 2: Get UMAP plots

Example 3: Get embedding distance vs. alignment score plots

Example 4: Getting histogram showing mean/max/min alignment scores between two collections of sequences

About

Releases

Packages

Contributors 2

Languages

ramanathanlab/SyntheticSequenceEvaluation

Folders and files

Latest commit

History

Repository files navigation

SyntheticSequenceEvaluation

Installation & Setup

Conda setup

PyPI setup

Running Alphafold

If running AlphaFold with lambda_fold.py

If running AlphaFold without using lambda_fold.py

Running generate.py

Example 1: Running generate_fasta

Example 2: Running fasta_to_embeddings

Running visualize.py

Example 1: Get t-SNE plots

Example 2: Get UMAP plots

Example 3: Get embedding distance vs. alignment score plots

Example 4: Getting histogram showing mean/max/min alignment scores between two collections of sequences

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Example 1: Running `generate_fasta`

Example 2: Running `fasta_to_embeddings`

Packages