Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alphapulldown 2.0.1 - create_individual_features.py with pre-computed mmseqs MSAs #486

Open
wresch opened this issue Jan 31, 2025 · 3 comments

Comments

@wresch
Copy link

wresch commented Jan 31, 2025

When i generate MSAs locally with colabfold i end up with a folder like

$ ls -lh pulldown_cf_msas
-rw-r--r-- 1 user group 1.9M Jan 31 17:06 O76094.a3m
-rw-r--r-- 1 user group 9.5M Jan 31 17:06 P08240.a3m
-rw-r--r-- 1 user group 935K Jan 31 17:06 P09132.a3m
-rw-r--r-- 1 user group 2.0M Jan 31 17:06 P22090.a3m

after renaming the sequences.

then i try to use the existing MSAs with

create_individual_features.py \
  --fasta_paths=bait.fasta,candidates.fasta \
  --output_dir=pulldown_cf_msas \
  --use_precomputed_msas=True \
  --use_mmseqs2 \
  --max_template_date=2023-01-01 \
  --skip_existing=False

and unlike in older versions (at least in 0.30.7 and 1.0.4) this now hits the colabfold API and re-generates the MSA - at least it does a request and fails on systems without direct access to the internet.

looks like use_precomputed_msas is not passed to the feature generation function if mmseqs is used. If i don't specify --use_mmseqs2 then it regenerates the MSA with the alphafold pipeline b/c it expects a subdirectory containing the MSAs generated by that pipeline instead of the single .a3m file.

Some of this is supposed to be used for large-ish workflows so we want to avoid hitting the API to much and we do eventually get blocked.

The documentation suggests this should still be possible (https://github.com/KosinskiLab/AlphaPulldown?tab=readme-ov-file#run-mmseqs2-locally). Am i missing something?

@christophista
Copy link

Hello
Thank you for opening the issue. I have the same problem.
The pipeline for running

  1. colabfold_search,
  2. rename and then
  3. create_individual_features.py

also accesses the MSA server and performs the alignment there again.

Thank you for your help in advance!

@sonaida
Copy link

sonaida commented Feb 5, 2025

Hello,
I have the same problem trying locally run MMseqs2.
Two log files of create_individual_features.py. The same code, the same alignment file.

Before: (AlphaPulldown/1.0.4) - works fine

I0205 18:09:59.794106 140456668571456 templates.py:857] Using precomputed obsolete pdbs /opt/gensoft/data/AlphaPulldown/1.0.4/pdb_mmcif/obsolete.dat.
I0205 18:09:59.886897 140456668571456 create_individual_features.py:191] running mmseq now
I0205 18:09:59.887056 140456668571456 objects.py:189] You chose to calculate MSA with mmseq2.
Please also cite: Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S and Steinegger M. ColabFold: Making protein folding accessible to all. Nature Methods (2022) doi: 10.1038/s41592-022-01488-1
I0205 18:09:59.887385 140456668571456 objects.py:198] looking for possible precomputed a3m at Colab_msas_bait-gp03/BAB65153.1.a3m
I0205 18:09:59.887429 140456668571456 objects.py:200] input is Colab_msas_bait-gp03/BAB65153.1.a3m
I0205 18:09:59.890167 140456668571456 objects.py:203] Finished parsing the precalculated a3m_file
Now will search for template
I0205 18:09:59.898038 140456668571456 objects.py:242] will search for templates in local template database
I0205 18:09:59.899698 140456668571456 hmmbuild.py:121] Launching subprocess ['/opt/conda/envs/AlphaPulldown/bin/hmmbuild', '--amino', '/local/scratch/tmp/tmpi9vuoqyf/output.hmm', '/local/scratch/tmp/tmpi9vuoqyf/query.msa']
I0205 18:09:59.901239 140456668571456 utils.py:36] Started hmmbuild query
I0205 18:09:59.981078 140456668571456 hmmbuild.py:128] hmmbuild stdout:

Now: (AlphaPulldown/2.0.1)

I0205 17:51:08.065152 139704594203264 create_individual_features.py:308] Running MMseqs2 for feature generation...
I0205 17:51:08.067650 139704594203264 objects.py:188] You chose to calculate MSA with mmseq2.
Please also cite: Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S and Steinegger M. ColabFold: Making protein folding accessible to all. Nature Methods (2022) doi: 10.1038/s41592-022-01488-1
0%| | 0/150 [elapsed: 00:00 remaining: ?]
SUBMIT: 0%| | 0/150 [elapsed: 00:00 remaining: ?]
PENDING: 0%| | 0/150 [elapsed: 00:00 remaining: ?]E0205 17:51:09.542120 139704594203264 colabfold.py:219] Sleeping for 6s. Reason: PENDING
PENDING: 0%| | 0/150 [elapsed: 00:07 remaining: ?]E0205 17:51:16.126149 139704594203264 colabfold.py:219] Sleeping for 9s. Reason: PENDING
RUNNING: 0%| | 0/150 [elapsed: 00:16 remaining: ?]
RUNNING: 6%|▌ | 9/150 [elapsed: 00:16 remaining: 04:22]E0205 17:51:25.705300 139704594203264 colabfold.py:219] Sleeping for 9s. Reason: RUNNING
COMPLETE: 6%|▌ | 9/150 [elapsed: 00:26 remaining: 04:22]
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:26 remaining: 00:00]
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:27 remaining: 00:00]
I0205 17:51:36.024016 139704594203264 batch.py:799] Sequence 0 found no templates

@Jun2BCR
Copy link

Jun2BCR commented Feb 6, 2025

Hello,
Thank you so much for bringing up this question. I am experiencing the same issue. It will be highly appreciated if someone can help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants