-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explain how literatures are classified. Classify literature according to model organisms #26
Comments
On 2016-09-22 Chris wrote: Hi Michael, I'm not exactly sure what you're looking for, but if you're looking for keywords to pull out papers for those species I guess I would suggest the following: for drosophila: Drosophila, Drosophila melanogaster, D. melanogaster, fruit fly for C. elegans: C. elegans, Caenorhabditis elegans, Caenorhabditis for arabidopsis: Arabidopsis, A. thaliana, Arabidopsis thaliana for mouse: Mus musculus, musculus, murine, mouse, mice for zebrafish: zebrafish, Danio rerio, D. rerio As for other model organisms of interest to the curation and biomedical community: budding yeast: Saccharomyces cerevisiae fission yeast: Schizosaccharomyces pombe slime mold: Dictyostelium discoideum Norway brown rat: Rattus norvegicus black rat: Rattus rattus sea squirt: Ciona intestinalis African clawed frog: Xenopus laevis Western clawed frog: Xenopus tropicalis Bacteria: Escherichia coli (E. coli), Bacillus subtilis (B. subtilis) There's an extensive list on Wikipedia: https://en.wikipedia.org/wiki/List_of_model_organisms I hope that helps, Chris |
Hi Michael, I wasn't quite sure what you meant by 'subject' in your email, but here are a few other thoughts for classifying literature by organism. Paper titles and abstracts are probably reasonably good sources of organism names, with all the usual caveats about false positives (e.g. organism is mentioned but the paper does not contain experiments about it) and false negatives (e.g. authors mention mouse but also do an experiment in a human cell line that they don't mention). For papers that have already been indexed by PubMed, the MESHHeadingList and ChemicalList tags in the XML could also be used. The list of organisms on the GO annotation downloads page may be helpful, since it indicates organisms for which there was at least sufficient interest to generate GO annotations: http://geneontology.org/page/download-annotations Other microbial species that might be of interest are listed in Table 1 of this WormBook chapter: http://www.wormbook.org/chapters/www_intermicrobpath/intermicrobpath.html |
More from Chris and Michael: Michael: Chris: |
We could also solicit input from the various MODs here, as most groups probably have PubMed keyword searches that get them a reasonably good list of papers, or at least could help us avoid any obvious pitfalls. |
In the nxml files there is a 'subject' line, but I don't know what M. On 09/27/2016 10:59 AM, vanaukenk wrote: Hi Michael, I wasn't quite sure what you meant by 'subject' in your email, but Paper titles and abstracts are probably reasonably good sources of For papers that have already been indexed by PubMed, the The list of organisms on the GO annotation downloads page may be http://geneontology.org/page/download-annotations Other microbial species that might be of interest are listed in Table http://www.wormbook.org/chapters/www_intermicrobpath/intermicrobpath.html — |
Okay, thanks. I was looking at the XML display for papers in PubMed and searching the page for 'Subject' but couldn't find anything there. For example: |
I attached an example. Open it in an text editor and search for the M. On 09/27/2016 11:17 AM, vanaukenk wrote: Okay, thanks. I was looking at the XML display for papers in PubMed — |
Interesting example. Comparing the nxml subjects: with the MESHHeadings: Grasshoppers Phenotype Phylogeny Pigmentation Polymorphism, Genetic Population Dynamics Survival Analysis there isn't much, if any overlap, and the actual organism, grasshoppers, is only represented in the MESHHeadingList. I don't know how representative this example is, but it suggests that maybe a union of the nxml subjects with the MESH terms might be the best source for mining subjects for literature classification. |
Well, in this case the animal is mentioned in the title. I don't M. On 09/27/2016 11:57 AM, vanaukenk wrote: Interesting example. Comparing the nxml subjects: with the MESHHeadings: Grasshoppersgenetics Phenotype Phylogeny Pigmentation Polymorphism, Genetic Population Dynamics Survival Analysis there isn't much, if any overlap, and the actual organism, I don't know how representative this example is, but it suggests that — |
Yes, I see the point about not wanting to have to go to a third source. |
Transferring from email thread to this ticket.
Michael wrote on 2016-09-15:
There was a request to classify the PMCOA corpus according to more
topics than just Biology, Medicine, Genetics, Genomics etc, i.e.,
according to model organisms. While ultimately we will use an SVM for
that, right now I am scanning subject, title and journal name for
keywords to classify the corpus and make what's known in TPC as
literatures. So I would appreciate if you could contribute keywords for
those fields (subject, article title and journal name) for the following
organisms:
drosophila
C. elegans
arabidopsis
mouse
zebrafish
If you can think of any other model organism that would be of interest
to the curation and biomedical community, let me know.
Michael.
The text was updated successfully, but these errors were encountered: