-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Matching gene synonyms now produces false positives #70
Comments
hdac-6, synonym for hda-6 and mentioned 37 times in 56039, is now returned. |
I still see false positives - ok seen now that is in the to do. @valearna let us know when we need to retest this |
We still need to discuss this issue before I can modify the code again @draciti. The new version of the pipeline extracts a gene if its sysonyms are mentioned in the text, even if the gene itself is never mentioned. I thought this was what we wanted, but maybe I got it wrong. If we want to recognize genes and their synonyms as separate entities what should we do if we have mentions to both the main gene and its synonyms? |
We decided to remove synonyms from the list |
I'm reopening this issue as it looks like the ACKnowledge pipeline might still be extracting genes by recognition of gene name synonyms. See, for example: WBPaper00065553 where we extracted M4 ( WBGene00249437 ). |
Looking for matches to gene synonyms from the gin_synonyms table, now produces false positives in the gene lists.
For the six papers in testing:
56039 - TOR (WBGene00002583)
56066 - Ny (WBGene00003066)
56066 - SL2 (WBGene00004836)
56066 - eT1 (WBGene00006772)
56075 - B1 (WBGene00002053)
56080 - JP (WBGene00002179)
56080 - M3 (WBGene00249437)
56091 - no false positives
56095 - CAT (WBGene00000831)
56095 - M3 (WBGene00249437)
I still need to double-check if looking for synonyms is retrieving true positive genes that we missed before.
The text was updated successfully, but these errors were encountered: