Missing matches in the result #1

hadoth · 2023-07-11T13:52:43Z

Steps to reproduce:

Set query molecule to didD@@QInUxV`@@B and run using the idorsia_toy_space_a.txt synthon space.

Expected behavior:

For Synthon_A of snar_b-25 there should be hits for four different synthons: dcLDpEtKhhbSiIf^v[hHBf@@, dcLDpEtKichYAIeY~kh@bf@@, dmtDPITKickHhdhcJz@Hf@@ and dcNDPAWPnfNdfUgzn`BJX@@ (560 in total).

Actual result:

Results for only one synthon are returned (440 in total).

Probable cause:

Break in line 2763 and 2767 of SynthonSpace.java. As a result mapped_frag is always of size 1. This seems to be done on purpose, but results in rather unexpected behavior.

The text was updated successfully, but these errors were encountered:

lithom · 2023-08-29T07:41:30Z

Thanks for testing the hyperspace software thoroughly!

The observed behavior is indeed "a feature" and intentional. One of the challenges of implementing the algorithm was to handle "very general" queries in a reasonable way. The reasoning behind this "break" is, that in case that we have a complete substructure hit inside a single building block, then we assume that the query is very general and will probably generate millions of hits (in the toy space, 500 / 37k is >1% of the complete space, in large spaces this might be millions or billions of structures, probably "more than we can easily handle in subsequent processing of the structures"). The measures for handling "excessive results" are also described in the JCIM publication in the subsection "Handling excessive enumeration of results" (it was not in the preprint but was rightly requested by one of the reviewers).

I agree that this is somewhat confusing, and it might cut off interesting structures. The first implementation of the software was using a "process all structures, then report the full result at once" approach, therefore it was really necessary to have such rather strict cutoff criteria. I extended the software and now it can also "continuously stream" results, i.e. it could be an option to remove these cutoff mechanisms in the algorithm. I don't know if this is really helpful though, alternatively it could make sense to include in the results a "results might be truncated" flag in case that one of the cutoff mechanisms is engaged (there are also two other hard limits in the algorithm).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing matches in the result #1

Missing matches in the result #1

hadoth commented Jul 11, 2023

lithom commented Aug 29, 2023

Missing matches in the result #1

Missing matches in the result #1

Comments

hadoth commented Jul 11, 2023

Steps to reproduce:

Expected behavior:

Actual result:

Probable cause:

lithom commented Aug 29, 2023