Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotate a Dataset for NER Trainning #67

Open
grayJiaaoLi opened this issue May 28, 2024 · 0 comments
Open

Annotate a Dataset for NER Trainning #67

grayJiaaoLi opened this issue May 28, 2024 · 0 comments
Labels
User Story Label for User Stories

Comments

@grayJiaaoLi
Copy link
Contributor

grayJiaaoLi commented May 28, 2024

User story

  1. As a data engineer
  2. I want/need to prepare an annotated dataset for NER training
  3. So that the NER model can be trained on accurately tagged data

Acceptance criteria

  • Select a suitable amount of Q&A pairs from the HuggingFace

    • Start with 50-100 Q&A Pairs
  • Optional: Use tools like Doccano to tag entities according to the defined

  • Store the NER training dataset

    • Upload the NER-annotated data in a different directory on HuggingFace
    • Ensure annotated dataset can be used for the NER model
  • The list should contain objects like f.e.:

    • Entity types: Project_Name, Technology_Name, (Organization_Name), ...
    • Entities: Kubernetes, Docker, gRPC,...
    • Relationships: Depends_On, Complements, (Conflicts_with), ...
  • Here is an example:

    • "Example Text"
      • Project_Name: Kubernetes, ...
      • Technology_Name: Docker, gRPC, ...
      • (Organization Name: Google, Red Hat, ...)
      • Relationship: ...
  • Store the list in a format that can be used for the NER model training

  • As for this part of the work, it does not have to be automated but it can be automated

Definition of done (DoD)

  • Bill of Materials in the planning document has been updated
  • All feature branches have been merged and closed
  • New feature code has been documented
  • Potential new licenses have been checked
  • All GitHub Actions are passing
  • The requirement.txt is updated

DoD general criteria

  • Feature has been fully implemented
  • Feature has been merged into the mainline
  • All acceptance criteria were met
  • Product owner approved features
  • All tests are passing
  • Developers agreed to release
@grayJiaaoLi grayJiaaoLi added the User Story Label for User Stories label May 28, 2024
@grayJiaaoLi grayJiaaoLi moved this to Product Backlog in amos2024ss08-feature-board May 28, 2024
@grayJiaaoLi grayJiaaoLi changed the title Annote a Dataset for NER Trainning Annotate a Dataset for NER Trainning May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
User Story Label for User Stories
Projects
Status: Product Backlog
Development

No branches or pull requests

1 participant