Below is the architecture diagram of the NLP-based drug review classification system:
The Streamlit web application for the NLP-based drug review classification system is available at the following link:
Patients Condition Classification and Drugs Recommendation
Below is a screenshot of the project’s website:
In this project, we developed a natural language processing (NLP) system to classify patient conditions based on drug reviews. The goal is to analyze patient reviews to predict their conditions and recommend suitable drugs. The primary conditions targeted are Birth Control, Depression, High Blood Pressure, and Diabetes.
- Dataset: Drug Review Dataset
- Contains patient reviews categorized into various conditions.
- Selected Conditions: Birth Control, Depression, High Blood Pressure, Diabetes.
-
Bag of Words Model:
- Achieved an initial accuracy of 97% using the Bag of Words model with a Multinomial Naive Bayes classifier.
-
TF-IDF Vectorization:
- Improved performance with TF-IDF vectorization, but accuracy decreased initially.
- Model: Passive Aggressive Classifier.
- Accuracy: Increased to 98.20% using TF-IDF vectorization.
-
TF-IDF with Bigrams:
- Incorporated bigram analysis (two-word pairs) to enhance feature extraction.
- Accuracy: Achieved 98.6%, demonstrating significant improvement in classification accuracy.
- Passive Aggressive Classifier with TF-IDF and bigrams demonstrated the highest accuracy of 98.6%.
- Confusion Matrix Analysis:
- Birth Control had the highest number of predictions, reflecting its prevalence in the dataset.
- Misclassifications were observed, such as some cases of Depression being predicted as Birth Control.
- Streamlit: Deployed the model using Streamlit for interactive user interface and accessibility.
- Model Serialization:
- Joblib: Serialized the model and vectorizer for efficient deployment and integration across platforms, including web and mobile applications.
The project successfully demonstrated the application of advanced text feature extraction techniques in classifying patient conditions from drug reviews. The integration of TF-IDF with bigram analysis and the deployment of the model using Streamlit offer a robust solution for recommending suitable drugs based on patient reviews.
- Explore additional feature extraction methods.
- Improve model generalization to handle more diverse datasets.
- Enhance deployment capabilities for broader accessibility and integration.