Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grounding Assist: Capture hints in paper #1148

Open
jvwong opened this issue Mar 20, 2023 · 1 comment
Open

Grounding Assist: Capture hints in paper #1148

jvwong opened this issue Mar 20, 2023 · 1 comment

Comments

@jvwong
Copy link
Member

jvwong commented Mar 20, 2023

Description

Q: What is the name of the feature?

A: Grounding Assist

Q: What does this feature enable the user to do?

A: Indirectly, disambiguate a name for a bioentity (e.g. gene) more accurately

Q: What information must the user provide to use the feature?

A: (1) Article information (2) names of bioentities

Q: What are the applicable constraints, e.g. compatibility or performance?

A: There main cases to consider:

  1. Default: No prior information is available
  2. Bioentity database identifiers are available
  3. Species information is available

Q: How does this feature affect each class of user (persona)?

A: Synonyms and orthologues account for a large proportion of observed errors (30%). It is conceivable that other types of errors could be mitigated (e.g. spelling issues) and that hints would enable features such as a true "type-ahead" autocomplete.

  • Uses

    • curation: in normalization
    • post-submission: in an automated error flagging system (even if not available at curation time)
    • triage: e.g. classifier to more accurately identify potential articles and authors
    • information extraction: e.g. context provided by authors
  • Users

    • Biologist: Eventually, better search across deposited data, better discovery
    • Editor: Increased quality and trust in the accuracy of Biofactoid data
    • Computational biologist: Increased fidelity of Biofactoid data, better data integration
    • Curator: Increased fidelity of Biofactoid curation

Specification

Sources of bioentity information

  • Considerations
    • Entity types
    • Consistent concepts (gene product, family)
    • Compatible Identifiers
    • Scope
    • Accuracy (curated vs NLP)
    • Format (file, web service)
    • Latency (seconds)
    • Hardware (GPU)
  • Providers
    • Curated
      • PubMed
    • Natural Language Processing
      • PubTator3
      • Reach

Scoring algorithm

This is to be determined. Should consider:

  1. Location: Prioritization based on mention in title vs abstract vs body
  2. Type: Local hint (e.g. entity database IDs) vs global (e.g. species)
  3. Reliability of source

Tasks

The factoid project should be responsible solely for obtaining bioentity hints for a given article:

    1. Define a Hint model
    1. PubTator3 Hint provider
    1. Organism Hint ranking
    1. General Hints API
    1. Retrieve and store Hints on create/update of Document
    1. Augment grounding-search query with Hints

At least for network curation, grounding-search should be responsible for scoring search hits in light of hints.

References

  • Entity normalization
    • Chen, L., Liu, H. & Friedman, C. Gene name ambiguity of eukaryotic nomenclatures. Bioinformatics 21, 248–256 (2005)
    • Gyori, B. M. et al. Gilda: biomedical entity text normalization with machine-learned disambiguation as a service. Bioinform Adv 2, (2022)
    • Wei, C.-H. et al. GNorm2: an improved gene name recognition and normalization system. Bioinformatics 39, btad599 (2023)
  • Entity identification
    • Luo, L. et al. AIONER: all-in-one scheme-based biomedical named entity recognition using deep learning. Bioinformatics 39, (2023)
  • Species
    • Pafilis, E. et al. The SPECIES and ORGANISMS Resources for Fast and Accurate Identification of Taxonomic Names in Text. PLoS ONE 8, e65390 (2013)
    • Wei, C.-H. et al. SR4GN: A Species Recognition Software Tool for Gene Normalization. PLoS ONE 7, e38460 (2012)
    • Luo, L. et al. Assigning species information to corresponding genes by a sequence labeling framework. Database 2022, baac090 (2022)
  • Applications
    • Wei, C.-H. et al. PubTator 3.0: an AI-powered Literature Resource for Unlocking Biomedical Knowledge. arXiv (2024)
@jvwong jvwong changed the title Grounding accuracy: Provide a species hint Grounding accuracy: Provide hints from paper Mar 20, 2024
@jvwong jvwong changed the title Grounding accuracy: Provide hints from paper Bioentity normalization accuracy: Store hints extracted from paper Mar 20, 2024
@jvwong jvwong changed the title Bioentity normalization accuracy: Store hints extracted from paper Assisting bioentity normalization: Capture hints in paper Mar 20, 2024
@jvwong jvwong changed the title Assisting bioentity normalization: Capture hints in paper Grounding Assist: Capture hints in paper Mar 20, 2024
fileoy pushed a commit that referenced this issue Jul 11, 2024
@jvwong
Copy link
Member Author

jvwong commented Aug 15, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant