Stop word removal with vectorizer_model #12
Closed
zmarkofsky
started this conversation in
Ideas
Replies: 2 comments 2 replies
-
@zmarkofsky Great question. So I'm happy to get into the design considerations I factored into TopicTuner but my short answer would be to just get the clustering you want, export to BERTopic and do the stop word removal there. Am I missing something? |
Beta Was this translation helpful? Give feedback.
2 replies
-
Closed due to inactivity. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hey so using the package I noticed there appears to be no ability to remove stop words from the topic model being trained. This leads to some issues as if the initial tuning is done with stop words and then stop words are later removed from the final BERTopic model via the .update_topics() method, the results end up being very different.
I'm unsure where the best place would be to fit this in, but if we could have some place to pass in a vectorizer_model similar to how it's done in the BERTopic Documentation, I believe it would allow for much more robust topic generation using only the most meaningful words within the docs.
Beta Was this translation helpful? Give feedback.
All reactions