Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependency issues #1

Open
carbondrop-nick opened this issue Oct 2, 2024 · 10 comments
Open

Dependency issues #1

carbondrop-nick opened this issue Oct 2, 2024 · 10 comments

Comments

@carbondrop-nick
Copy link

Super cool, but hard to get going. On top of the stated dependencies I also had to add a lot of modules to my environment to get this working:
ipdb einops ml-collections dm-tree ipywidgets jupyter einx torch_geometric tmtools openmm POT rdkit mdtraj pdbfixer

Seems to be missing "sampling" module. The imported configs, inference, and data.loader point to modules that don't exist (unless you meant the ones under Pretrain?)

mdtraj also requires C++ tools to install and pdbfixer can't be installed via pip (conda-forge instead). Would be great to find a better way to distribute this.

@WillHua127
Copy link
Owner

i am working on the open-source thing, make everything better!

@pgmikhael
Copy link

Hi,

Super exciting work! Will just re-iterate the above. Would also be helpful in general to not import all functions from a script (from file import *) since it makes it harder to follow where things break when/if they do.

@WillHua127
Copy link
Owner

Thanks Peter! I am cleaning everything. Plan to release everything in Nov.

@WillHua127
Copy link
Owner

things should be fixed, let me know if you can run enzymeflow_demo.ipynb

@carbondrop-nick
Copy link
Author

carbondrop-nick commented Oct 7, 2024

Thanks! I think we're pretty close, but I am running into a few key naming issues in the pretrain. It looks like ProteinLigandNetwork is expecting some slightly different keys than the ones present:
Missing key(s) in state_dict:
"guide_ligand_mpnn.mpnn.atom_convs.0.lin.weight", "guide_ligand_mpnn.mpnn.atom_convs.1.lin.weight", "guide_ligand_mpnn.mpnn.mol_conv.lin.weight"
Unexpected key(s) in state_dict:
"guide_ligand_mpnn.mpnn.atom_convs.0.lin_src.weight", "guide_ligand_mpnn.mpnn.atom_convs.0.lin_dst.weight", "guide_ligand_mpnn.mpnn.atom_convs.1.lin_src.weight

@pgmikhael
Copy link

This may be a matter of the right pyg version that leads to different model implementations. installing torch_geometric with pip and the specified version seems to work.

@carbondrop-nick
Copy link
Author

Well spotted! Indeed I was working with a later version of torch_geometric.

@carbondrop-nick
Copy link
Author

Looks like
meta_eval_csv = pd.read_csv('data/metadata_eval.csv')
is referring to a flat file that includes absolute paths to Will's computer instead of relative paths:
/Users/willhua/Desktop/EnzymeFlow/data/processed_eval/msa/P07964/P07964.pkl
I edited the file to remove all instances of /Users/willhua/Desktop/EnzymeFlow/ and it seemed to work fine

@WillHua127
Copy link
Owner

I am working on the multi motif scaffolding, i.e., enzymeflow generate enzyme motifs, I am trying to find the seq_idx or seq_position that maps the motifs back to the whole enzyme. Let me know you have suggestions, or would like to collaborate on this.

@carbondrop-nick
Copy link
Author

A good answer would be pretty far beyond me. Enzyme "motif" is a messy concept since catalytic machinery has to be able to access multiple states, making hidden dynamic trajectories important. I would stick to the hidden representation and just try to consider the "foldiness" of the protein separately from its "enzyminess" for a given chemical reaction (enzyminess = 0 for most reactions, >0 for known reaction(s)), but that is much easier said than done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants