Skip to content

You may put the link the bot will scrape the data as well as answer any queries you pass

Notifications You must be signed in to change notification settings

Bhavanish19/Ai_Web_Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ai_Web_Scraper

You may put the link the bot will scrape the data as well as answer any queries you pass

Project Overview

The AI Web Scraper is a powerful tool designed to extract data from web pages. Given a URL, the scraper retrieves the page's content, parses it, and can respond to user queries regarding the extracted data. This project integrates advanced parsing techniques with AI capabilities to interactively answer queries based on the scraped content.

Features:

Data Extraction: Automates the process of extracting raw HTML from any webpage.

Data Parsing: Utilizes BeautifulSoup4 and lxml to parse the HTML content into a manageable format.

Query Handling: Leverages Ollama 3.1 and LangChain to answer queries based on the parsed data.

User Interface: Streamlit-based front end for easy interaction with the tool.

Technologies Used

Selenium: For automating web browser interaction to scrape data.

BeautifulSoup4 and lxml: For parsing HTML and XML documents.

Ollama 3.1: For processing and answering queries based on natural language understanding.

LangChain: To integrate AI and language processing capabilities.

Streamlit: For creating the front end, making it interactive and user-friendly.

Python-dotenv: To manage environment variables.

HTML5lib: A compliant library for parsing and serializing HTML documents.

Chromedriver: To interface with Google Chrome.

RESULT

Screenshot 2024-09-12 at 9 08 52 PM (2)

Screenshot 2024-09-12 at 9 09 16 PM

Screenshot 2024-09-12 at 9 09 45 PM

About

You may put the link the bot will scrape the data as well as answer any queries you pass

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages