Docify classifies PDFs into ten categories: Legal Medical Finance Education Business News Technical Creative Scientific and Government. Utilizing sklearn PyTesseract and Naive Bayes it ensures precise efficient document organization and retrieval enhancing decision-making and workflow automation across various industries.
Technologies used in the project:
Here are some of the project's best features:Feature | Description |
---|---|
Text Classification 📚 | The model can classify text into predefined categories such as Legal, Medical, Finance, etc., based on its content. |
PDF to Text Conversion 📄➡️📝 | The application can convert PDF files uploaded by users into text format, allowing the model to analyze the content. |
Custom Category Order 🧩 | The model uses a custom category order defined by the user, allowing for flexibility in how different categories are prioritized and displayed. |
Category | Emoji |
---|---|
Legal | ⚖️ |
Medical | 🏥 |
Finance | 💰 |
Education | 📚 |
Business | 🏢 |
News | 📰 |
Technical | 💻 |
Creative | 🎨 |
Scientific | 🧪 |
Government | 🏛️ |
1. Clone the repo
git clone https://github.com/Shobhit141141/Docify.git
2. Install required libraries
pip install -r requirements.txt
3. Run the project
python app.py