Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matching gene synonyms now produces false positives #70

Open
vanaukenk opened this issue Mar 14, 2019 · 5 comments
Open

Matching gene synonyms now produces false positives #70

vanaukenk opened this issue Mar 14, 2019 · 5 comments
Assignees

Comments

@vanaukenk
Copy link
Collaborator

Looking for matches to gene synonyms from the gin_synonyms table, now produces false positives in the gene lists.

For the six papers in testing:

56039 - TOR (WBGene00002583)

56066 - Ny (WBGene00003066)
56066 - SL2 (WBGene00004836)
56066 - eT1 (WBGene00006772)

56075 - B1 (WBGene00002053)

56080 - JP (WBGene00002179)
56080 - M3 (WBGene00249437)

56091 - no false positives

56095 - CAT (WBGene00000831)
56095 - M3 (WBGene00249437)

I still need to double-check if looking for synonyms is retrieving true positive genes that we missed before.

@vanaukenk
Copy link
Collaborator Author

hdac-6, synonym for hda-6 and mentioned 37 times in 56039, is now returned.

@draciti
Copy link
Collaborator

draciti commented Mar 19, 2019

I still see false positives - ok seen now that is in the to do. @valearna let us know when we need to retest this

@valearna
Copy link
Collaborator

valearna commented Mar 19, 2019

We still need to discuss this issue before I can modify the code again @draciti. The new version of the pipeline extracts a gene if its sysonyms are mentioned in the text, even if the gene itself is never mentioned. I thought this was what we wanted, but maybe I got it wrong. If we want to recognize genes and their synonyms as separate entities what should we do if we have mentions to both the main gene and its synonyms?

@valearna
Copy link
Collaborator

We decided to remove synonyms from the list

@valearna valearna closed this as completed Apr 2, 2019
@valearna valearna transferred this issue from another repository Jun 28, 2019
@vanaukenk
Copy link
Collaborator Author

I'm reopening this issue as it looks like the ACKnowledge pipeline might still be extracting genes by recognition of gene name synonyms.

See, for example: WBPaper00065553 where we extracted M4 ( WBGene00249437 ).

@vanaukenk vanaukenk reopened this Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants