-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subsetting databases #39
Comments
I noticed, this seems to work with afdb_rep_v4. Perhaps something is missing from the reference genomes? |
I'm sorry there was a bug at assigning mode for database reading. Thank you for notifying this and please check if this is solved in the latest version. |
Hi,
|
Latest version of Foldcomp. Subsetting 'a_thaliana' should work with foldcomp of latest commit. |
Ok, great. Does this include the binaries you distribute or only the pip installation/git clone? |
Please use git clone to get the latest updare. Python distribution is not updated with the latest commit. For the mmseqs2 part, I'm not sure what happened. I'll check this with mmseqs2 developers. |
Ok, thanks for the help! |
Hi,
Thank you for the great resource!
I am having trouble subsetting databases and decompressing subsets of the databases you provide here: https://foldcomp.steineggerlab.workers.dev
According to the instructions, I should be able to decompress a subset of a database given an "id_list.txt".
This is how I do it for e.g. A. thaliana:
head -n 1 data/a_thaliana.lookup
0 AF-A0A178UFC4-F1-model_v4.pdb 0
As I understand it, the ID here is "AF-A0A178UFC4-F1-model_v4".
Now, I write this into a file called id_list.txt, then I run the command:
foldcomp decompress --id-list id_list.txt data/a_thaliana
with the response:
Decompressing files in data/a_thaliana using 1 threads
Output directory: data/a_thaliana_pdb/
[Warning] AF-A0A178UFC4-F1-model_v4 not found in database.
I have tried many different ways of naming the ids based on what is in a_thaliana.lookup, but nothing seems to work. The same using mmseqs to subset the database:
"""
createsubdb --subdb-mode 0 --id-mode 1 id_list.txt a_thaliana test_sel/output_foldcomp_db
MMseqs Version: ad6dfc66d7bbc4fd626fc19adf10ba587bc137c4
Subdb mode 0
Database ID mode 1
Verbosity 3
Could not find name AF-A0A178UFC4-F1-model_v4 in lookup
Time for merging to output_foldcomp_db: 0h 0m 0s 1ms
Time for processing: 0h 0m 0s 34ms
"""
Can you please explain what I am doing wrong and how to properly specify the IDs?
Best,
Patrick
The text was updated successfully, but these errors were encountered: