A handy tool, to extract URLs from a Page and search within the crawled content
- URL Queue Management
- Depth Limit
- HTML Parsing
- URL Extraction
- Content Search Module
- Robots.txt Compliance
- Error Handling.
XAMPP or any (PHP+MySQL)- hosting service
- Clone the repository in
htdocs
directory of XAMPP - Goto PHPmyAdmin, open an SQL script, and copy paste SQL from
crawlingData.sql
- Start
Apache
andMySQL
on XAMPP - Goto
http://localhost/WebSpider/FrontEnd/index.html
- Provide SEED URL, Depth for Crawling
- Provide Search String for Searching, Ensure the webpage is already crawled first