The present repository is a study of Name Entity Recognition application to detect proper nouns in Portuguese.
The study was carried on CE-DOHS corpus (Corpus Eletrônico de Documentos Históricos do Sertão).
CE-DOHS was preprocessed and, later, annotated using label-studio. This study aimed to calculate the NER F-Score using BI-LSTM-CRF deep learning algorithm.
All the process can be found in Full Pipeline notebook.
The F-Score obtained was 0.97
- conda create -n NameAnonPT
- conda activate NameAnonPT
- pip install -r requirements.txt
- jupyter lab
OBS: this version requires Docker and Docker-Compose.
- Go to Airflow folder
- docker-compose up
- Access localhost:/8080
- run cartasAnonPT DAG
Below some screenshots: