Skip to content

Natural language processing project for classifying patient conditions from drug reviews using TF-IDF and advanced machine learning models. Achieved 98.6% accuracy with bigrams and deployed the solution with Streamlit for easy access.

Notifications You must be signed in to change notification settings

saurabhchavan7/Patients-Condition-Classification-and-Drugs-Recommendation

Repository files navigation

Patient Condition Classification from Drug Reviews

Architecture Diagram

Below is the architecture diagram of the NLP-based drug review classification system:

Architecture Diagram

Streamlit Application

The Streamlit web application for the NLP-based drug review classification system is available at the following link:

Patients Condition Classification and Drugs Recommendation

Website Screenshot

Below is a screenshot of the project’s website:

Website Screenshot

Project Overview

In this project, we developed a natural language processing (NLP) system to classify patient conditions based on drug reviews. The goal is to analyze patient reviews to predict their conditions and recommend suitable drugs. The primary conditions targeted are Birth Control, Depression, High Blood Pressure, and Diabetes.

Dataset

Methodology

Data Collection

  • Selected Conditions: Birth Control, Depression, High Blood Pressure, Diabetes.

Text Vectorization

  1. Bag of Words Model:

    • Achieved an initial accuracy of 97% using the Bag of Words model with a Multinomial Naive Bayes classifier.
  2. TF-IDF Vectorization:

    • Improved performance with TF-IDF vectorization, but accuracy decreased initially.
    • Model: Passive Aggressive Classifier.
    • Accuracy: Increased to 98.20% using TF-IDF vectorization.
  3. TF-IDF with Bigrams:

    • Incorporated bigram analysis (two-word pairs) to enhance feature extraction.
    • Accuracy: Achieved 98.6%, demonstrating significant improvement in classification accuracy.

Model Performance

  • Passive Aggressive Classifier with TF-IDF and bigrams demonstrated the highest accuracy of 98.6%.
  • Confusion Matrix Analysis:
    • Birth Control had the highest number of predictions, reflecting its prevalence in the dataset.
    • Misclassifications were observed, such as some cases of Depression being predicted as Birth Control.

Deployment

  • Streamlit: Deployed the model using Streamlit for interactive user interface and accessibility.
  • Model Serialization:
    • Joblib: Serialized the model and vectorizer for efficient deployment and integration across platforms, including web and mobile applications.

Conclusion

The project successfully demonstrated the application of advanced text feature extraction techniques in classifying patient conditions from drug reviews. The integration of TF-IDF with bigram analysis and the deployment of the model using Streamlit offer a robust solution for recommending suitable drugs based on patient reviews.

Future Work

  • Explore additional feature extraction methods.
  • Improve model generalization to handle more diverse datasets.
  • Enhance deployment capabilities for broader accessibility and integration.

About

Natural language processing project for classifying patient conditions from drug reviews using TF-IDF and advanced machine learning models. Achieved 98.6% accuracy with bigrams and deployed the solution with Streamlit for easy access.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published