Skip to content

Name entity recognition for proper names based on brazilian letters.

License

Notifications You must be signed in to change notification settings

guilhermenoronha/NameAnonPT

Repository files navigation

Name Entity Recognition for Brazilian Portuguese.

The present repository is a study of Name Entity Recognition application to detect proper nouns in Portuguese.

The study was carried on CE-DOHS corpus (Corpus Eletrônico de Documentos Históricos do Sertão).

CE-DOHS was preprocessed and, later, annotated using label-studio. This study aimed to calculate the NER F-Score using BI-LSTM-CRF deep learning algorithm.

All the process can be found in Full Pipeline notebook.

The F-Score obtained was 0.97

How to use: Notebook version

  1. conda create -n NameAnonPT
  2. conda activate NameAnonPT
  3. pip install -r requirements.txt
  4. jupyter lab

How to use: Airflow Version

OBS: this version requires Docker and Docker-Compose.

  1. Go to Airflow folder
  2. docker-compose up
  3. Access localhost:/8080
  4. run cartasAnonPT DAG

Below some screenshots:

DAG tree

About

Name entity recognition for proper names based on brazilian letters.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published