-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: fermentation products #144
Comments
We have never actually parsed the Bergey's full-text like this, only abstracts. Why did you select Table 3 instead of Table 2 or instead of both? Could we focus just on the tables? There are PDF table parsers that, depending on how the PDF is built, might be able to automate this. Otherwise it will be a Natural Language Processing project. |
@lwaldron I don't believe every chapter is formatted consistently to the point that Table 1 or Table 2 would always have fermentation products. I highlighted several different examples because I don't always find the info in the same place. I haven't thought much about how to solve this problem because I didn't even realize you were working on it until we talked today, but it might be possible to at least curate a medium-to-large chunk just by parsing tables with "fermentation products". |
Yeah at the level of tables a semi-automated process could look something like (not necessarily 3 people but just to show how it could be broken up - Person 1 probably would be more of a microbiome expert than 2-3 though):
If you would start noting tables you'd like to see ingested, I'll put them on a priority list for as soon as I can find a person with bandwidth. It doesn't necessarily have to be just fermentation products, and once we're pulling those from a table like this we might as well pull the other rows too. |
I would like to take a shot at being Person 2 and Person 3! Just need Person 1 to direct me to the tables since I'm not microbiome expert. |
@JonathanYe3 let me know what you need when you get around to this |
Thanks @kbeckenrode! All I need is a list of tables that we want to parse and cleanup. Can @lwaldron confirm if this pdf (https://drive.google.com/file/d/1L1sgkZClTp3NjDW3RgAnZvzx4fiobyCG/view?usp=sharing) is the only one we're working with? |
@JonathanYe3 any table that describes fermentation we'd like to capture, Let me know if you need help. |
Yup, thank you! I think Eric Yu is working hard on this project so best of luck to him. |
It's very useful to be able to pull glucose fermentation products out of Bergey data. For example, because human microbiome-related publications regularly claim 'X organism is a butyrate producer' and this is only sometimes correct.
Here's a linked example of a Bergey chapter (https://drive.google.com/file/d/1L1sgkZClTp3NjDW3RgAnZvzx4fiobyCG/view?usp=sharing). I've highlighted some of the different ways this data is displayed for various bugs. You need to download it to see highlights (sorry) - Google doc view does not show them. It would be challenging to parse out this data, but really useful to include as a phenotype if possible in future.
The text was updated successfully, but these errors were encountered: