TuPy Data Engineering

This repository contains the processes used to create the Portuguese hate speech dataset (TuPy), an annotated corpus designed to facilitate the development of advanced hate speech detection models using machine learning (ML) and natural language processing (NLP) techniques. TuPI is formed by combining datasets annotated by Fortuna et. al. (2019), Leite et. al. (2021), Vargas et. al. (2020) in addition to 10 thousand unpublished annotated documents collected in 2023.

This repository is organized as follows:

root.
    ├── datasets 
    ├── figures
    ├── notebooks
    ├── models
    ├── notebooks
    ├── src
    ├── LICENSE
    └── README.md

Quick start

Run the following command

bash INIT.sh

Or install Miniconda 3 than type the following command order:

conda create -n tupi-env python=3.10
conda activate tupi-env
pip install poetry
poetry install
poetry run python -m nltk.downloader stopwords

Acknowledge

The TuPi project is the result of the development of Felipe Oliveira's thesis and the work of several collaborators. This project is financed by the Federal University of Rio de Janeiro (UFRJ) and the Alberto Luiz Coimbra Institute for Postgraduate Studies and Research in Engineering (COPPE).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TuPy Data Engineering

Quick start

Acknowledge

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
data		data
figures		figures
notebooks		notebooks
src		src
.gitignore		.gitignore
INIT.sh		INIT.sh
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

License

Silly-Machine/TuPy-Data-Engineering

Folders and files

Latest commit

History

Repository files navigation

TuPy Data Engineering

Quick start

Acknowledge

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages