Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Modular Classifier #270

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Conversation

laerdon
Copy link

@laerdon laerdon commented Feb 1, 2025

ConvoKit currently does not allow clients to use their own models, especially critical if such models are fine-tuned for the datasets which they use with ConvoKit. Currently, the classifiers powering features like politeness analysis and hypergraph representation are based upon sk-learn models, which are generally outdated and less robust than those provided by the HuggingFace Transformers library. We aim to update ConvoKit to support a more modular design which will provide users with a broader selection of models. Users want to use their own models, and leverage the ease of use that ConvoKit provides with navigating conversational corpuses. As of now, the Classifier class contains all functionality, including methods like fit() and transform(). We aim to delegate that functionality to a ClassifierModel abstract class, which will be the type of the internal classification model classifier_model.

Tested on local machine—fit and transform run successfully. More testing may be needed on a GPU-enabled environment.
An example is provided in convokit/examples/classifier/modular-classifier-example.ipynb.
This change deprecates pred_feats, the attribute of Classifier. Now, users are expected to produce their own torch Dataset containing this information. This also deprecates the evaluate_with_cv and evaluate_with_train_test_split methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant