Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented BERTopic Model for Accurate Topic Segmentation in Agriculture Dataset issue#291 #310

Open
wants to merge 1 commit into
base: restructure
Choose a base branch
from

Conversation

pmukesh31
Copy link

@pmukesh31 pmukesh31 commented Apr 14, 2024

Fix for- #291

Aim:
Get an accurate list of topics (around 20 topics max) for an agri dataset of queries (has around 20k unique queries) using BERTTopic for the dataset

Description:

  • Implemented BERTopic model to accurately segment the agriculture dataset into 20 distinct topics.
  • Utilized the 'questioninEnglish' column containing approximately 20,000 unique queries for topic analysis.
  • Successfully generated 20 topics using BERTopic, leveraging contextual embeddings from BERT for clustering.
  • Used HDBSCAN model for BERTopic
  • Plotted a Intertopic Distance Model alongside various other output graphs and barcharts.
  • Open to suggestions for improving topic cluster evaluation and enhancing the clustering process.

Steps:

1)Read the csv file and take 'queryInEnglish' column into consideration
2)Preprocessing of data by removing stop words and commas.
3)Training BERTopic
4)Visualizing results
5)Saving Model

image

@pmukesh31 pmukesh31 closed this Apr 14, 2024
@pmukesh31 pmukesh31 reopened this Apr 14, 2024
@pmukesh31 pmukesh31 closed this Apr 14, 2024
@pmukesh31 pmukesh31 reopened this Apr 14, 2024
Copy link
Collaborator

@Gautam-Rajeev Gautam-Rajeev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can raise this separately somewhere else as a repo and link it. this is not in the required ai-tools format

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants