Web Crawler with Selenium and BeautifulSoup

This Python script utilizes Selenium and BeautifulSoup to crawl a website, save the HTML content of each page, and collect all page links for further crawling.

Key Features

Dynamic Content Handling: Seamlessly interacts with web pages that require JavaScript to load, ensuring comprehensive crawling of modern web applications.
Automated browser interactions using Selenium.
Extraction and preservation of HTML content.
Collection of hyperlinks from web pages for recursive crawling.
Utilization of BeautifulSoup for advanced HTML parsing.
Smart crawl management to avoid redundancy.

Prerequisites

Make sure you have the following installed on your system:

Python 3.x
Pipenv (Install it using pip install pipenv if not already installed)

Installation

Clone the repository: git clone
Navigate to the cloned project's directory: cd
Use Pipenv to install the dependencies and create a virtual environment:

pipenv install

Activate the Pipenv shell:

pipenv shell

Usage

Run the crawler using the following command within the Pipenv shell:

python crawler.py

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.idea		.idea
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Crawler with Selenium and BeautifulSoup

Key Features

Prerequisites

Installation

Usage

About

Releases

Packages

braisdev/dynamic-full-web-crawler

Folders and files

Latest commit

History

Repository files navigation

Web Crawler with Selenium and BeautifulSoup

Key Features

Prerequisites

Installation

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages