Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compound conversion: Add index to output #1189

Open
hechth opened this issue Jan 31, 2022 · 12 comments
Open

Compound conversion: Add index to output #1189

hechth opened this issue Jan 31, 2022 · 12 comments

Comments

@hechth
Copy link
Contributor

hechth commented Jan 31, 2022

The compound conversion tool which is part of the chemical toolbox doesn't handle indices etc. for the files which it processes and silently drops lines that are invalid - this makes working with larger files problematic, as the output format can no more be associated with the inputs.

Is there a way to add indices to the files to indicate which output belongs to which input or is the only option to run collections and have one identifier per job?

@bgruening
Copy link
Owner

@hechth you are talking about that tool? https://github.com/bgruening/galaxytools/blob/master/chemicaltoolbox/openbabel/ob_convert.xml

Which input format are you using?

@hechth
Copy link
Contributor Author

hechth commented Jan 31, 2022

@bgruening Indeed!

I'm using a normal list, so the inchi format how it is called.

Some example data is attached.
inchi.zip

@bgruening
Copy link
Owner

Can you try adding an additional column (https://usegalaxy.eu/root?tool_id=toolshed.g2.bx.psu.edu/repos/devteam/add_value/addValue/1.0.0) to the inchi file? Is that preserved by openbabel?

@hechth
Copy link
Contributor Author

hechth commented Jan 31, 2022

I tried using 2 columns separated with ,, that didn't change anything on the specific history (https://umsa.cerit-sc.cz/u/hechth/h/compound-convert-test).

@bgruening
Copy link
Owner

try adding a new column with a tab using the tool from above

@hechth
Copy link
Contributor Author

hechth commented Feb 1, 2022

Nope - tried adding a column manually, using tabs, commas, the Galaxy tool, but always the same - no index in the output and invalid data gets dropped silently.

@bgruening
Copy link
Owner

Maybe @simonbray has an idea?
This tool is using simply openbabel, so if openbabl can not deal with this I think we are out of luck here.

@simonbray
Copy link
Collaborator

Can you use a different file format? I think inchi is in general not a good choice for the input.

With smiles or sdf you can specify the index in the molecule name/title.

@hechth
Copy link
Contributor Author

hechth commented Feb 2, 2022

I explicitly want the inchi, since I want to compute smiles from inchi.

I also don't get why indexing is possible with SMILES and not with inchi? They're both just texts ...

@simonbray
Copy link
Collaborator

I also don't get why indexing is possible with SMILES and not with inchi? They're both just texts ...

What I meant is that SMILES has a name/title/label which you can append a index to.

I explicitly want the inchi, since I want to compute smiles from inchi.

I think as @bgruening said we are limited by the underlying software. Maybe you can use a Galaxy workaround like this? https://usegalaxy.eu/u/sbray/h/inchi-index

@hechth
Copy link
Contributor Author

hechth commented Feb 3, 2022

In this scenario the join works as the inchi doesn't change - but if we actually change specific parts of it, they are no more identical, so the workaround doesn't function.

If I come up with a solution, should I just PR it here? Otherwise, I think I could solve our specific needs with a targeted tool.

Thank you very much for your support and for looking into this!

@simonbray
Copy link
Collaborator

Yes, PRs are always welcome, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants