Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Homo-oligomer prediction? #8

Open
heejongkim opened this issue Jul 21, 2023 · 6 comments
Open

Homo-oligomer prediction? #8

heejongkim opened this issue Jul 21, 2023 · 6 comments

Comments

@heejongkim
Copy link

Hi,
Thanks for releasing a fantastic package to the scientific community.
I just started testing with the example inputs to understand the input requirements and formats.

Here's my primary question:
During the test, I got stuck with how to format the input files, fasta and crosslinking data, for homodimer or homo-oligomer prediction. Have you tried or design the package for this type of cases?

And my side question is:
For the input, do I have to follow "A", "B", "C" naming scheme or I can be flexible on that? I tested a few different ways but none worked very well.

Thank you very much.

best,
heejong

@lhatsk
Copy link
Collaborator

lhatsk commented Jul 21, 2023

Hi Heejong,

We focused on heteromeric assemblies in this release, since homomers pose a different challenge. Nevertheless, you can predict homomers. We cannot distinguish intra- and inter-protein links in this case, therefore you would just define them as self-links:

5 A 15 A 0.1

If, however, you would like to only include them as inter-chain links, it gets a little more complicated.

You would either need to replicate the features, say you have a homo-dimer, AlphaLink will generate A.feature.pkl.gz and A.uniprot.pkl.gz. You could copy them to B.feature.pkl.gz and B.uniprot.pkl.gz and adjust chains.txt from A A to A B. Now you can include the inter-chain links as

5 A 15 B 0.1

Or just ignore intra-chain links altogether by inserting here: https://github.com/Rappsilber-Laboratory/AlphaLink2/blob/main/unifold/dataset.py#L153

if i == j:
continue

Note that you would need to run python setup.py install again afterwards to propagate the changes.

And my side question is: For the input, do I have to follow "A", "B", "C" naming scheme or I can be flexible on that? I tested a few different ways but none worked very well.

At the moment, you would need to adhere to the A,B,C,... naming scheme. Uni-Fold internally maps the sequence in order to A,B,C,... The final mapping can be found in "chain_id_map.json" in the output directory.

What went wrong in your case? What would you prefer, just using the sequence id from the FASTA? The generic naming scheme makes it easier, esp., for homo-multimeric targets.

Hope this helps,
Kolja

/edit updated the code snippet to conform with the recent update.

@heejongkim
Copy link
Author

Hi Kolija,
Thanks for the guidance. I will give it a shot and get back to you soon.

For the part that I got error was more like naming scheme in filename.
e.g.) my filename was Protein1_Portein2.fasta, which has entries of >Protein1 and >Protien2
So, Alphalink ended up facing two choices Protein1.fasta and Protein1_Protein2.fasta and that might've caused the issue.

best,
heejong

@lhatsk
Copy link
Collaborator

lhatsk commented Aug 2, 2023

I fixed the handling of FASTA filenames with multiple underscores, which hopefully also resolves your issue.

@heejongkim
Copy link
Author

Awesome. Much appreciate it.
Will give it a shot!

@heejongkim
Copy link
Author

Hi Kolja,

I'm finally circling back to this matter.

I'm actively testing the homodimer situation right now but, in the meantime, I got another more complex situation.

What if you have 5 subunit complex, consisting of homodimer and homotrimer and they all interact each other?
I've thought about it but I feel like I may inflate the ambiguous information too much to hinder the inference.
If you have any suggestions towards proper setup for the inference, that would be awesome.

Thank you so much.

best,
heejong

@lhatsk
Copy link
Collaborator

lhatsk commented Jan 30, 2024

How many links do you have per interaction? I usually just keep them, the network seems to be able to deal with it fairly well. If the results are bad, remove the homomeric links as suggested here: #8 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants