This package provides a class to crawl links on a website.
Spatie is a webdesign agency in Antwerp, Belgium. You'll find an overview of all our open source projects on our website.
This package has been modified to return an array that includes the Url, Response and the Parent Url (the webpage that contains the hyperlink to the current url), once the current url has been crawled. It also has an option for allowing external urls to be crawled each time, if they are link to multiple times.
This package can be installed via Composer:
composer require spatie/crawler
The crawler can be instantiated like this
Crawler::create()
->setCrawlObserver(<implementation of \Spatie\Crawler\CrawlObserver>)
->startCrawling($url);
The argument passed to setObserver
must be an object that implements the \Spatie\Crawler\CrawlObserver
-interface:
/**
* Called when the crawler will crawl the given url.
*
* @param \Spatie\Crawler\Url $url
*/
public function willCrawl(Url $url);
/**
* Called when the crawler has crawled the given url.
*
* @param \Spatie\Crawler\Url $url
* @param \Psr\Http\Message\ResponseInterface|null $response
* @param \Spatie\Crawler\Url|string $parentUrl
*/
public function hasBeenCrawled(Url $url, $response, $parentUrl);
/**
* Called when the crawl has ended.
*/
public function finishedCrawling();
You can tell the crawler not to visit certain url's by passing using the setCrawlProfile
-function. That function expects
an objects that implements the Spatie\Crawler\CrawlProfile
-interface:
/**
* Determine if the given url should be crawled.
*
* @param \Spatie\Crawler\Url $url
*
* @return bool
*/
public function shouldCrawl(Url $url);
Please see CHANGELOG for more information what has changed recently.
Please see CONTRIBUTING for details.
If you discover any security related issues, please email [email protected] instead of using the issue tracker.
Spatie is a webdesign agency in Antwerp, Belgium. You'll find an overview of all our open source projects on our website.
The MIT License (MIT). Please see License File for more information.