The Star Boy Is Llama 3.1 8b.
The Facebook Recently just like one or two or three days from todays date on which Repo was created (26 Th July 2024)
Just Download the .ipynb mentioned in the repo and then run on kaggle or collab or collab pro or collab pro plus or any other powerfull gpu. It is not recommended to run code on local enviroment untill and unless you have powerfull gpu for training.
Video Link:-) https://drive.google.com/file/d/1mtp7KsLF514EHg04jtZ__P0COeZ7q1ZW/view?usp=share_link
This repository contains a script for extracting text from PDF documents, indexing it hierarchically, and performing information retrieval and generation using NLP techniques. The script leverages several libraries including Hugging Face's Transformers, SpaCy, and BM25 for text processing and query generation.
The script performs the following tasks:
- Text Extraction from PDF: Extracts text from a PDF document.
- Hierarchical Tree-based Indexing: Processes and indexes the extracted text into a hierarchical structure based on chapters, sections, and subsections.
- Retrieval Techniques: Utilizes BM25 and Dense Passage Retrieval (DPR) models to find relevant documents based on a query.
- Retrieval Augmented Generation (RAG): Generates a response to a user query using the relevant documents as context.
To run the script, you'll need to install several Python packages. Follow the instructions below to set up your environment.
- Python 3.x
pip
(Python package installer)
Run the following commands to install the required libraries:
pip install torch
pip install --upgrade transformers
pip install accelerate huggingface_hub
pip install pymupdf
pip install spacy
python3 -m spacy download en_core_web_trf
pip install rank_bm25
pip install nltk
python3 -m nltk.downloader wordnet
pip install faiss-cpu
pip install --upgrade pip setuptools wheel
pip install bertopic --no-cache-dir
pip uninstall hdbscan -y
pip install hdbscan --no-cache-dir --no-binary :all: --no-build-isolation
### Hugging Face Authentication ###
from huggingface_hub import login
# Replace 'your_token_here' with your actual Hugging Face token
token = 'your_token_here'
login(token)