Skip to content
This repository has been archived by the owner on Jul 10, 2024. It is now read-only.

Meaningless Stereo is sometimes honored #4

Open
tylerperyea opened this issue Dec 20, 2013 · 2 comments
Open

Meaningless Stereo is sometimes honored #4

tylerperyea opened this issue Dec 20, 2013 · 2 comments

Comments

@tylerperyea
Copy link
Contributor

Some meaningless stereo annotations (wedge/dash bonds) produce different hashes than non-annotated bonds.

Example 1

Compare:

C[C@H]1OC(C)O[C@@H](C)O1

stereononsense

WDF2GBCFX-X5KQLPPFPK-XK7RRPGCCM2-XK25W2RXGM3Z

vs

CC1OC(C)OC(C)O1

stereononsense2

WDF2GBCFX-X5KQLPPFPK-XK7RRPGCCM2-XK23BSZ142DG

In this example, there are 3 stereo centers that could be annotated. However, out of the 8 absolute permutations, only 2 are actually unique:
stereononsenseexplained

You'll notice that of all 8 possibilities, only 2 are non-degenerate. And in both cases, it must be the case that at least two adjacent methyl groups are on the same side of the ring. So the information provided by the first structure is self-evident.

The InChI algorithm does handle this specific case (possibly by accident), but it does not handle the general issue, as explained in example 2.

Example 2

Compare:

[C@H](C)1CCC(C)CC1

stereononsense3

T75RBW5S8-8D9T563A7Y-8YC8NQXD9W5-8Y5APDLVJ782

vs

C(C)1CCC(C)CC1

stereononsense4

T75RBW5S8-8D9T563A7Y-8YC8NQXD9W5-8Y5VPCVHUV1Z

Again, these should be equivalent, but currently generate different hashes. For reasons I can't imagine, the two above also generate different InChIs. This is especially odd, considering it's a much simpler case of the general problem explained in example 1.

@caodac
Copy link
Contributor

caodac commented Dec 20, 2013

Gorgeous embedded images!

@tylerperyea
Copy link
Contributor Author

Thanks! I'm probably getting carried away with them...

On the resolution of this, one naive solution is to do the following:

  1. Mark all potential non-annotated stereo centers for which R/S is not applicable
  2. Generate canonical hashes for all possible absolute permutations at those centers
  3. Canonically hash the unique set of possible results with current known R/S configurations
  4. Apply this result as the exclusive form of stereochemical encoding in the hash

I believe that would work, in theory ... but a few of the logistics are problematic. Also, there is the potential for a combinatorial explosion in some of these cases. A non-annotated inositol is the worst real case I can think of (with 64 naive permutations), but I'm sure there are worse examples that are still relevent.

tylerperyea added a commit that referenced this issue May 1, 2019
fix with some tests for issue #4, #3, #7, needs evaluation
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants