AI Web Scraper using Langchain and GroqCloud (Llama3.1)

Demo

This project is an AI-powered Web Scraper that leverages Langchain, GroqCloud, and Llama3.1 to scrape, parse, and analyze content from web pages. The project is made using Streamlit and containerized using Docker and deployed on Vercel for scalability and ease of access.

Features

Uses Langchain for conversational and generative AI capabilities.
Integrates with GroqCloud and the Llama3.1 model for powerful language understanding.
Scrapes and parses web content based on user input on Streamlit.
Deployed using Docker for containerized execution on Vercel.

Project Structure

.
├── Dockerfile             # Docker setup for the app
├── main.py                # The main entry point of the Streamlit app
├── parse.py               # Contains the logic for handling user input and parsing web content
├── requirements.txt       # Python dependencies
├── vercel.json            # Configuration for deployment on Vercel
└── .env                   # Environment variables (API keys, etc.)

Requirements

To run this project, you need the following dependencies listed in the requirements.txt:

groq
streamlit
langchain
langchain_core
langchain_groq
selenium
beautifulsoup4
lxml
html5lib
python-dotenv

Environment Variables

The project uses environment variables stored in a .env file. Ensure you add your keys for GroqCloud and other services as needed.

Example .env file:

GROQ_API_KEY=your_groq_api_key
or
HUGGINGFACEHUB_API_TOKEN = your_HuggingFace_API_TOKEN

How to Run Locally

Clone the repository:

git clone https://github.com/your-repo/ai-web-scraper.git
cd ai-web-scraper

Set up the environment:
- Create a .env file with your API keys as shown above.
Install the dependencies:
```
pip install -r requirements.txt
```
Run the Streamlit app:
```
streamlit run main.py
```

Docker Setup

To containerize the app, a Dockerfile is provided. This Dockerfile installs all necessary dependencies, sets up ChromeDriver for scraping, and runs the Streamlit app.

Build the Docker Image

docker build -t ai-web-scraper .

Run the Docker Container

docker run -p 8501:8501 ai-web-scraper

The app will be accessible at http://localhost:8501 inside the container.

This project is deployed on Vercel using Streamlit and Docker.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
build.sh		build.sh
main.py		main.py
parse.py		parse.py
requirements.txt		requirements.txt
script.py		script.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Web Scraper using Langchain and GroqCloud (Llama3.1)

Demo

Features

Project Structure

Requirements

Environment Variables

How to Run Locally

Docker Setup

Build the Docker Image

Run the Docker Container

About

Releases

Packages

Languages

Danish811/AI-Web-Scraper-Docker

Folders and files

Latest commit

History

Repository files navigation

AI Web Scraper using Langchain and GroqCloud (Llama3.1)

Demo

Features

Project Structure

Requirements

Environment Variables

How to Run Locally

Docker Setup

Build the Docker Image

Run the Docker Container

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages