Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluator should fail individually on bad SMILES from ThermoML instead of entire file #620

Open
lilyminium opened this issue Feb 11, 2025 · 0 comments

Comments

@lilyminium
Copy link
Contributor

Describe the bug

When Evaluator parses a ThermoML file, it tries to read SMILES. If the SMILES fails (e.g. on a radical error), the exception means the rest of the file is also ignored, instead of that singular entry. Due to differences in the OpenFF toolkit over different versions (e.g. the addition of the radical error), this means some past work is unnecessarily highly unreproducible (e.g. the filtering of the Sage training dataset for 2.0). Beyond reproducibility concerns, this also impacts the reading of any properties later in files that contain invalid molecules, which I ran into while trying to read viscosities.

Desired solution

Wrap the call below, or the individual smiles call within it, in a try/except:

compound = _Compound.from_xml_node(node, namespace)

Output

Computing environment (please complete the following information):

  • Operating system
  • Output of running conda list

Additional context

FWIW I don't think (~70% sure) that fixing this issue will make the Sage filtering 100% reproducible, but... less unreproducible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant