Decoder appears to decode random initial fingerprints to the same SMILES #22

andreirekesh · 2024-08-02T23:02:56Z

I've recently been interested in running SynNet with the most recent version of the US Stock Enamine BBs. I ran steps 0-2 to preprocess the data and wanted to try reward-guided molecule generation using GA per the instructions in the readme. However, I notice that even with the initial randomly generated fingerprints, 70-80 of the initial 100 are decoded to the same SMILES string:

CC(C)(C)OC(=O)N1CC2NCCN(S(=O)(=O)CC(=O)c3ccccc3)C2C1

This causes the GA population update to hang forever, as insufficient unique new molecules are found to add to the pool and increment parent_idx to num_population in each step of the algorithm.

Could this be the result of the difference in the Enamine stock between the time of publication and now? Any help is appreciated!

Thank you,
Andrei

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decoder appears to decode random initial fingerprints to the same SMILES #22

Decoder appears to decode random initial fingerprints to the same SMILES #22

andreirekesh commented Aug 2, 2024

Decoder appears to decode random initial fingerprints to the same SMILES #22

Decoder appears to decode random initial fingerprints to the same SMILES #22

Comments

andreirekesh commented Aug 2, 2024