topic-autolabel

Given text data, generates labels to classify the data into a set number of topics completely unsupervised.

Example usage:

First, install the package with pip: pip install topic_autolabel

# Labelling with supplied labels
from topic_autolabel import process_file
import pandas as pd

df = pd.read_csv('path/to/file')
candidate_labels = ["positive", "negative"]

# labelling column "review" with "positive" or "negative"
new_df = process_file(
    df=df,
    text_column="review",
    candidate_labels=candidate_labels,
    model_name="meta-llama/Llama-3.1-8B-Instruct" # default model to pull from huggingface hub
)

Alternatively, one can label text completely unsupervised by not providing the candidate_labels argument

from topic_autolabel import process_file
import pandas as pd

df = pd.read_csv('path/to/file')

# labelling column "review" with open-ended labels (best results when dataset talks about many topics)
new_df = process_file(
    df=df,
    text_column="review",
    model_name="meta-llama/Llama-3.1-8B-Instruct",
    num_labels=5 # generate up to 5 labels for each of the rows
)

Ollama integration:

Provided you have an ollama server running, you can pass in the tag of the model you want to use to generate labels.

from topic_autolabel import process_file
import pandas as pd

df = pd.read_csv('path/to/file')

# labelling column "review" with open-ended labels, using llama3.1 hosted with ollama (llama 3.1 must be running, run ollama ps to verify)
new_df = process_file(
    df=df,
    text_column="review",
    model_name="llama3.1",
    num_labels=5 # generate up to 5 labels for each of the rows
)

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
scripts		scripts
src/topic_autolabel		src/topic_autolabel
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

topic-autolabel

Example usage:

Ollama integration:

About

Releases 4

Packages

Languages

License

asusevski/topic-autolabel

Folders and files

Latest commit

History

Repository files navigation

topic-autolabel

Example usage:

Ollama integration:

About

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Languages

Packages