-
Notifications
You must be signed in to change notification settings - Fork 10
E/Z perception on tautomers #5
Comments
Yeah, this example highlights the subtle difference between using SMILES vs MOL as input. In the latter case, the coordinates are preserved and any double bond generated due to tautomer takes on whatever configuration implied by the coordinates. I guess the question is whether the Z in the input is real or just a byproduct of how it's drawn. |
I would tend to say any E/Z configuration that emerges from generated / provided coordinates after tautomer generation should be considered bogus. In fact, I wonder how much things would change if you explicitly ignore E/Z in cases where that bond can become single via a tautomer ... I think that solution is 95% correct, but I'd have to see if there are problem cases... |
By default any double bond generated by tautomer enumeration should not have any E/Z stereo annotated. The problem might be due the output serializer that automatically perceives E/Z stereo based on the coordinates regardless whether E/Z stereo is annotated or not. |
That makes sense. The simple solution, then, is to force a "double either" bond type on all tautomer-generated bonds, rather than a typical "double". I believe you can do that explicitly, and it should behave exactly like an unannotated double bond, regardless of coordinates. |
That is kind of unsatisfying as if one had started with the 'canonical tautomer' - that would be considered to have E/Z stereo, but the other molecule would get a different InCHI with an explicit lack of E/Z stereo. The point is that we expect all these structures to be in equilibrium with one another. Can we generate both explicit E and Z when you have tautomer generated double bonds? Then choose 1 canonical struct across the whole set - if that one winds up having E or Z, so what, at least all the right starting structs will be grouped together. Noel On Dec 20, 2013, at 4:07 AM, "Tyler Peryea" <[email protected]mailto:[email protected]> wrote: That makes sense. The simple solution, then, is to force a "double either" bond type on all tautomer-generated bonds, rather than a typical "double". I believe you can do that explicitly, and it should behave exactly like an unannotated double bond, regardless of coordinates. — |
I think that's reasonable. I believe you're saying that any E/Z center annotations on tautomeric bonds should be effectively disregarded/collapsed. I think I agree with this. The simplest approach, I believe, is just to force all double bonds that can undergo tautomerism into double/either bonds. However, this is where I need someone more knowledgable about organic chem than I am. I am embaressed to admit that I have a hard time distinguishing some of the "equillibrium problems" from "resonance / failures of valance bond model". For example, the following are equivalent in the standardizer: Is it actually the case that the double bonds are interconverting, and allowing for different free-rotation around the single bonds (in which case all E/Z is then an illusion)? Or is it that this is a conjugated pi system, with a locked conformation, imperfectly described as alternating double bonds (in which case we need to respect orientation regardless of which form is used for drawing)? More practically, are the following also strictly equivalent to the above: |
Ok, good point. The electrons are delocalized - but the geometry is relatively fixed. http://onlinelibrary.wiley.com/doi/10.1002/jhet.5570200439/abstract I cant get the pdf, but the abstract seems to imply that the hydrazide (with free rotation) is in equilibrium with the hydrazone in solution, so there is not a true E/Z center here. So, getting back to my point of when you create a E/Z center, you should include both E and Z as tautomers ... If it were, then .... -N-N=[eingang]ring_kekule_one On Dec 20, 2013, at 5:39 AM, "Tyler Peryea" <[email protected]mailto:[email protected]> wrote: I think that's reasonable. I believe you're saying that any E/Z center annotations on tautomeric bonds should be effectively disregarded/collapsed. I think I agree with this. The simplest approach, I believe, is just to force all double bonds that can undergo tautomerism into double/either bonds. However, this is where I need someone more knowledgable about organic chem than I am. I am embaressed to admit that I have a hard time distinguishing some of the "equillibrium problems" from "resonance / failures of valance bond model". For example, the following are equivalent in the standardizer: [conjugate]https://f.cloud.github.com/assets/1581898/1789710/f4db43ce-6960-11e3-9aa2-5f55bdbab1ee.png Is it actually the case that the double bonds are interconverting, and allowing for different free-rotation around the single bonds (in which case all E/Z is then an illusion)? Or is it that this is a conjugated pi system, with a locked conformation, imperfectly described as alternating double bonds (in which case we need to respect orientation regardless of which form is used for drawing)? More practically, are the following also strictly equivalent to the above: [others]https://f.cloud.github.com/assets/1581898/1789754/4e1c1a70-6962-11e3-9fa0-8a72a71d5012.png — |
Perhaps this is only a problem for compounds with a hetero atom at either end of the conjugation, or maybe also in the middle of conjugation. Noel From: Tyler Peryea [mailto:[email protected]] Not sure if I followed you completely ... but I think I get what you're saying. From what I can see, generating both E & Z forms for every new tautomer, and arbitrarily picking one canonical form, with one canonical E/Z isomerism is reasonable. I think it's functionally equivalent to ignoring E/Z isomerism among atoms involved in tautomerism (which also seems reasonable to me). If there are times when E/Z in alterable bonds is meaningful though, I'm not sure I catch the general rule that could be coded. To further clarify, the following appear quite different to me, and I'd intuitively give them different hashes. But the difference is technically on the s-cis/c-trans configuration, and so both produce the same smiles, inchi, etc ... Once the bonds are alternated, the superficial annotation becomes a real annotation, and that center becomes true cis, which suddenly makes these two structures distinct. If we say all such cases are just considered to be interconverting, this is pretty easy. But I don't think that's the case here (is it?). If these two things are different, we'd have to make sure the tautomer canonicalizer lands on a form that preserves their differences. I'm not sure if that's reasonable or not ... — |
sorry just observed these are the same Generates the same InChIKeys: But different hashes: I'm not really sure if they should be the same or different ... I will do some literature searches / phone-a-friend. |
Again, this is similar in the original example. The only reason why they would be different is because of the coordinates. If you feed the input as SMILES, then the hash keys should be the same. Notice that InChI says that there is an E/Z (not sure which) in this example. |
Ok, I've just pushed upstream the fix ( |
In certain cases, unspecified E/Z information is encoded as known (or known E/Z information is lost) based on tautomer generation.
Example 1
Consider the following two structures, which have the same smiles, but are drawn differently (molfiles at the bottom).
Compare:
vs
Notice that while the smiles representations are exactly the same, the structures still get different hashes based on their initial coordinates. This happens because the cannonical tautomer has a different E/Z bond location than the one drawn above:
After selecting the prefered tautomer, E/Z is apparently recalculated based on the original atom coordinates. This leads two apparently identical structures to have different hashes.
The resolution to this problem isn't trivial, and is more a shortcoming of valance bond theory than of the encoding in general. This will require a bit of research, and an expert should be consulted. My intuition is that any cis/trans designation should be allowed if (and only if) both involved bonded atoms remain in an sp2 hybridized state across all tautomers (therefore the atoms and their substituents should remain coplanar).
If this is accurate, there is an unfortunate corollary: The prefered tautomer in the above example is either wrong, or should capture cis/trans information about the exocyclic bond, even though it is not explicitly a double bond.
The molfiles for the above structures are posted here for convenience:
The text was updated successfully, but these errors were encountered: