Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
ConvoKit currently does not allow clients to use their own models, especially critical if such models are fine-tuned for the datasets which they use with ConvoKit. Currently, the classifiers powering features like politeness analysis and hypergraph representation are based upon sk-learn models, which are generally outdated and less robust than those provided by the HuggingFace Transformers library. We aim to update ConvoKit to support a more modular design which will provide users with a broader selection of models. Users want to use their own models, and leverage the ease of use that ConvoKit provides with navigating conversational corpuses. As of now, the Classifier class contains all functionality, including methods like
fit()
andtransform()
. We aim to delegate that functionality to a ClassifierModel abstract class, which will be the type of the internal classification modelclassifier_model
.Tested on local machine—fit and transform run successfully. More testing may be needed on a GPU-enabled environment.
An example is provided in
convokit/examples/classifier/modular-classifier-example.ipynb
.This change deprecates
pred_feats
, the attribute of Classifier. Now, users are expected to produce their own torch Dataset containing this information. This also deprecates theevaluate_with_cv
andevaluate_with_train_test_split
methods.