Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better tagger models (more heavyweight) #623

Open
dgarijo opened this issue Jan 25, 2024 · 2 comments
Open

Better tagger models (more heavyweight) #623

dgarijo opened this issue Jan 25, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@dgarijo
Copy link
Collaborator

dgarijo commented Jan 25, 2024

Once the corpus is improved a bit, we should move towards Language models. Probably training a BERT model for tagging the text will provide better results than current classifiers. The model would be stored in Huggingface and download locally.

This would be a replacement for the current binary classifiers (at least text taggers)

@dgarijo dgarijo added the enhancement New feature or request label Jan 25, 2024
@dgarijo
Copy link
Collaborator Author

dgarijo commented Feb 29, 2024

Building some taggers with Ollama may be a good idea. However, they are heavy to run for now. Maybe not worth it?

@dgarijo
Copy link
Collaborator Author

dgarijo commented Apr 23, 2024

Specially with things like Llama3 out, they seem very promising to address some of these problems. See https://github.com/ollama/ollama with 4.7 GB the Llama3 8b model. It will probably be way slower though!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant