Replies: 1 comment
-
Thank you for the kind words! The thing you are referencing is not what you are using. That issue talks about The correct documentation for your functionality is the guided topic modeling. I would highly advise to check that out as it explains how it affects the underlying embeddings. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Thanks for this great package! I am trying to use seeded BERTopic to find certain climate topics of interest in documents. The seed words have been compiled with knowledge from domain experts and have gone through several iterations. However, I am noticing that, keeping everything else the same, even with relatively small changes to the seed words the topic outputs look fairly different. For example, "renewable energy" might be be the 3rd topic in one iteration and the 7th topic on the next run (with slightly different seed words) even though the change in seed words had nothing to do with "renewable energy". I have ~60 seed topics and I am getting 500-600 topics in the output.
Do you have insight into why this may be or how to evaluate how changing the seed words directly changes the output? From my understanding, seeded topic modeling only affects the topic representation and not topic assignment. So I would imagine atleast the top ~15 topics to not change, with perhaps different words to describe the topics. But I am seeing the order of the top 15 topics changing and some topics appearing/disappearing when modifying the seed words.
To put it another way, how do I know "renewable energy" is the 3rd topic because that's the topic 3rd most discussed vs being 3rd because there are more seed words for "renewable energy" so the topic is oversampled?
Here is the code:
I realize answering this question exactly without seeing the data is difficult but some insight into understanding the behavior would be very helpful. Thanks for your help and please let me know if more information would be helpful!
Beta Was this translation helpful? Give feedback.
All reactions