Make the posterior() stats available #79

aourednik · 2024-06-01T08:04:13Z

The intially wrapped package topicmodels offered the possibility of more refined exploration of topics in every document with topicmodels::posterior(my_lda)$topics. Could this be made available for a result of seededlda::textmodel_lda() ?

Given the probabilistic nature of topic-document associations, it would be nice to sensibilize students and the public to the fact that a given topic is only the most present one in a given text, not the only one.

Example:

lda_model2 <- topicmodels::LDA(convert(my_dfm, to = "topicmodels"), k = 6)
doc_topics <- topicmodels::posterior(lda_model2)$topics
df <- data.frame(doc_id = row.names(doc_topics) %>% str_replace(fixed(".txt"),""), doc_topics)
df_long <- tidyr::pivot_longer(df, cols = starts_with("X"), names_to = "topic", values_to = "importance")
ggplot(df_long, aes(x = importance, y = doc_id, fill = factor(topic))) +
	geom_bar(stat = "identity") +
	labs(x = "Topic Importance", y = "Document ID", fill = "Topic") +
	theme_minimal() +
	theme(axis.text.y = element_text(angle = 0, hjust = 1))

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the posterior() stats available #79

Make the posterior() stats available #79

aourednik commented Jun 1, 2024

Make the posterior() stats available #79

Make the posterior() stats available #79

Comments

aourednik commented Jun 1, 2024