Skip to content

A little text annotation tool I made for training named entity recognition models

License

Notifications You must be signed in to change notification settings

KW-M/NER_Annotate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Named Entity Annotate

A little tool I made for annotating text to train spaCy's (or any) Named Entity Recognition model.

⚠️ Named Entity Anotate is generally functional; however, the frontend has issues with complex text selection. I don't have the time to resolve these issues, so I've put this project on hold for unknown time. Please do fork and open a pull request if you want to take a stab at fixing these issues!

Docs

Install

  1. Clone or download this repo
  2. Copy the repo folder to your python project
  3. Import modules from Named_Entity_Annotate

Getting Started

Named_Entity_Annotate's main function is Server.run()

from Named_Entity_Annotate import Server, Generators

source_generator = Generators.parse_json_string(Generators.from_folder('examples'))

save_callback = Generators.save_line_to_file('output.json')

Server.run(
  avalable_entitiy_labels=["PRODUCT","org","GpE","LOC","MONEY","TIME",],
  next_example_generator=source_generator,
  save_example_callback=save_callback
)

Recipies for common functionality

Sources

  • Get examples from a folder of plain text files:

    source_generator = Generators.make_empty_ent_dict_with_text(Generators.from_folder('examples_folder'))
  • Get examples from a file where each line is a plain text example:

    source_generator = Generators.make_empty_ent_dict_with_text(Generators.from_file('examples_list_file.txt'))
  • Get examples from a folder where each file is already json formatted like the data format below

    source_generator = Generators.parse_json_string(Generators.from_folder('examples_folder'))

Methods

Source Generators

Note: Server.run() expects the source data to be a formatted dict already, so all of these will need some modifier generator.

  • See the Recipies for common functionality section for some examples.

  • Get examples from a folder of plain text files:

    source_generator = Generators.make_empty_ent_dict_with_text(Generators.from_folder('examples_folder'))

Save Example Callbacks

  • Save examples to a folder

    save_callback = Generators.save_as_file_in_folder('annotated_examples_folder',output_file_extension="json"):
  • Save examples as json to a each line of a file

    save_callback = Generators.save_line_to_file('annotated_examples_list.json')

Architecture

Named_Entity_Annotator follows a basic pipeline model. When the WebApp requests a new example, it calls the pipeline ending at the source generator which returns the next example text or json, each generator modifier can modifiy and return the example, until the last generator modifier returns the fully formed json to the Named_Entity_Annotate server and the webapp receives it:

[Annotator WebApp]--< Generator Modifier(Previous Modifier Return Value) << Generator Modifier(Source Generator Return Value) << Source Generator()

When you finish annotating one example in the browser, it sends the json back to the Python & calls your save callback with the json data.

[Annotator WebApp]--> Save Callback(Saved Data in the JSON format mentioned above)

Source Generators

Source Modifiers

About

A little text annotation tool I made for training named entity recognition models

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages