Skip to content
This repository has been archived by the owner on Jul 10, 2024. It is now read-only.

Non-stereo Encoding Problem with Explicit Hydrogens #7

Open
tylerperyea opened this issue Dec 20, 2013 · 3 comments
Open

Non-stereo Encoding Problem with Explicit Hydrogens #7

tylerperyea opened this issue Dec 20, 2013 · 3 comments
Labels

Comments

@tylerperyea
Copy link
Contributor

In certain cases, explicit hydrogens seem to cause trouble for the atom-labelling layer of the hash. In these cases, it seems that the smiles generated by the standardizer produces a different hash than the input molfile itself.

Consider the following poorly layed-out structure:
encodeprob
[molfile below]

Direct generation of hash from this Std_SMILES:

[H][C@@]12CC3=C(C(O)=C(OC)C(C)=C3)[C@@]([H])(N1C)[C@@]4([H])N([C@H]2O)[C@@]5([H])COC(=O)[C@]8(CS[C@]4([H])C6=C5C7=C(OCO7)C(C)=C6OC(C)=O)NCCC9=C8C=C(OC)C(O)=C9

And this hash:

DCLRH149F-FGAV2BD6PA-FA8DSLTXL4L-FALJX635AFC5

However, when that same smiles is fed into the standardizer, I get:

DCLRH149F-FFMPLZ16VC-FC1Y2MQMGXU-FCUZ42LBF8VB

If the explicit hydrogens are removed entirely:
encodeprob2

The output hash is now compatible with the smiles.

DCLRH149F-FFMPLZ16VC-FC1Y2MQMGXU-FCU1SY5C8458

Molfile for explicit hydrogen version:


  Ketcher 12201304332D 1   1.00000     0.00000     0

 59 67  0     1  0            999 V2000
   -2.2321   -1.8660    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
   -1.7321   -1.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.5981   -0.5000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.5981    0.5000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.4641    1.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.4641    2.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -4.3301    2.5000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.5981    2.5000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.5981    3.5000    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   -3.4641    4.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.7321    2.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.8660    2.5000    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   -1.7321    1.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.8660    0.5000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.4740    1.2647    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
   -1.7321    0.0000    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
   -0.9071   -0.4750    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    1.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000   -1.0000    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
   -0.8660   -1.5000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.8660   -2.5000    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    0.8660   -1.5000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.8561   -2.3746    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    2.4488   -3.1947    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.5544   -3.0234    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    5.3132   -1.2000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.5741   -1.3179    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    4.8632    0.2250    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.0294    0.9234    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.9488    1.1197    0.0000 S   0  0  0  0  0  0  0  0  0  0  0  0
    0.8660    0.5000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.8811    1.3246    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    1.7321    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.5981    0.5000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.4244    1.4848    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    1.6097    2.0768    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.7419    1.9858    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.7927    2.9165    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    3.4641    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.9301    0.3000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.4641   -1.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.2072   -1.6691    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    3.8005   -2.5827    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.8060   -2.4781    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    2.5981   -1.5000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.7321   -1.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.6506   -0.2222    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    6.4172    0.1894    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.4966    1.1232    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.8342    1.6954    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.0136    2.6792    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.2512    3.3264    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.4306    4.3102    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    4.3096    2.9897    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.5473    3.6370    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    3.7266    4.6207    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.1303    2.0060    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.8676    1.3838    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  2  1  1  1     0  0
  2  3  1  0     0  0
  3  4  1  0     0  0
  4  5  1  0     0  0
  5  6  2  0     0  0
  6  7  1  0     0  0
  6  8  1  0     0  0
  8  9  1  0     0  0
  9 10  1  0     0  0
  8 11  2  0     0  0
 11 12  1  0     0  0
 11 13  1  0     0  0
  4 13  2  0     0  0
 13 14  1  0     0  0
 14 15  1  1     0  0
 14 16  1  0     0  0
  2 16  1  0     0  0
 16 17  1  0     0  0
 14 18  1  0     0  0
 18 19  1  1     0  0
 18 20  1  0     0  0
 20 21  1  0     0  0
  2 21  1  0     0  0
 21 22  1  1     0  0
 20 23  1  0     0  0
 23 24  1  1     0  0
 23 25  1  0     0  0
 25 26  1  0     0  0
 26 27  1  0     0  0
 27 28  2  0     0  0
 29 27  1  0     0  0
 29 30  1  6     0  0
 30 31  1  0     0  0
 31 32  1  0     0  0
 18 32  1  0     0  0
 32 33  1  1     0  0
 32 34  1  0     0  0
 34 35  1  0     0  0
 35 36  1  0     0  0
 36 37  1  0     0  0
 37 38  1  0     0  0
 37 39  2  0     0  0
 35 40  2  0     0  0
 40 41  1  0     0  0
 40 42  1  0     0  0
 42 43  1  0     0  0
 43 44  1  0     0  0
 44 45  1  0     0  0
 45 46  1  0     0  0
 42 46  2  0     0  0
 46 47  1  0     0  0
 23 47  1  0     0  0
 34 47  2  0     0  0
 29 48  1  0     0  0
 48 49  1  0     0  0
 49 50  1  0     0  0
 50 51  1  0     0  0
 51 52  1  0     0  0
 52 53  2  0     0  0
 53 54  1  0     0  0
 53 55  1  0     0  0
 55 56  1  0     0  0
 56 57  1  0     0  0
 55 58  2  0     0  0
 58 59  1  0     0  0
 29 59  1  0     0  0
 51 59  2  0     0  0
M  END
@caodac
Copy link
Contributor

caodac commented Dec 22, 2013

This example isn't so much about explicit H's but more about their parities. To fix this we need to be able to define a canonical set of parity flags for the specified stereocenters. This is far from a trivial fix.

@tylerperyea
Copy link
Contributor Author

It can't just be parity though, because this messes up the atom label layer, not just the stereo layer. If it only had a problem with stereo, that'd be less concerning.

@caodac
Copy link
Contributor

caodac commented Jan 4, 2014

This particular example should now work properly from the recent commit 9b38dbd.

tylerperyea added a commit that referenced this issue May 1, 2019
fix with some tests for issue #4, #3, #7, needs evaluation
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants