Skip to content

Commit

Permalink
Loading step generic loader type
Browse files Browse the repository at this point in the history
Add phpdoc template tags for a generic loader type in the `LoadingStep`
trait. So wherever it is used, the user can narrow down the loader type
that will be used, to improve static analysis and IDE autocompletion.
  • Loading branch information
otsch committed Oct 13, 2024
1 parent 753086f commit 81aac99
Show file tree
Hide file tree
Showing 6 changed files with 40 additions and 5 deletions.
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
* __BREAKING__: Removed the `result` and `addLaterToResult` properties from `Io` objects (`Input` and `Output`). These properties were part of the `addToResult` feature and are now removed. Instead, use the `keep` property where kept data is added.
* __BREAKING__: The signature of the `Crawler::addStep()` method has changed. You can no longer provide a result key as the first parameter. Previously, this key was passed to the `Step::addToResult()` method internally. Now, please handle this call yourself.
* __BREAKING__: The return type of the `Crawler::loader()` method no longer allows `array`. This means it's no longer possible to provide multiple loaders from the crawler. Instead, use the new functionality to directly provide a custom loader to a step described below. As part of this change, the `UnknownLoaderKeyException` was also removed as it is now obsolete. If you have any references to this class, please make sure to remove them.
* __BREAKING__: Refactored the abstract `LoadingStep` class to a trait and removed the `LoadingStepInterface`. Loading steps should now extend the `Step` class and use the trait. As multiple loaders are no longer supported, the `addLoader` method was renamed to `setLoader`. Similarly, the methods `useLoader()` and `usesLoader()` for selecting loaders by key are removed. Now, you can directly provide a different loader to a single step using the trait's new `withLoader()` method (e.g., `Http::get()->withLoader($loader)`).
* __BREAKING__: Refactored the abstract `LoadingStep` class to a trait and removed the `LoadingStepInterface`. Loading steps should now extend the `Step` class and use the trait. As multiple loaders are no longer supported, the `addLoader` method was renamed to `setLoader`. Similarly, the methods `useLoader()` and `usesLoader()` for selecting loaders by key are removed. Now, you can directly provide a different loader to a single step using the trait's new `withLoader()` method (e.g., `Http::get()->withLoader($loader)`). The trait now also uses phpdoc template tags, for a generic loader type. You can define the loader type by putting `/** @use LoadingStep<MyLoader> */` above `use LoadingStep;` in your step class. Then your IDE and static analysis (if supported) will know what type of loader, the trait methods return and accept.
* __BREAKING__: Removed the `PaginatorInterface` to allow for better extensibility. The old `Crwlr\Crawler\Steps\Loading\Http\Paginators\AbstractPaginator` class has also been removed. Please use the newer, improved version `Crwlr\Crawler\Steps\Loading\Http\AbstractPaginator`. This newer version has also changed: the first argument `UriInterface $url` is removed from the `processLoaded()` method, as the URL also is part of the request (`Psr\Http\Message\RequestInterface`) which is now the first argument. Additionally, the default implementation of the `getNextRequest()` method is removed. Child implementations must define this method themselves. If your custom paginator still has a `getNextUrl()` method, note that it is no longer needed by the library and will not be called. The `getNextRequest()` method now fulfills its original purpose.
* __BREAKING__: Removed methods from `HttpLoader`:
* `$loader->setHeadlessBrowserOptions()` => use `$loader->browser()->setOptions()` instead
Expand Down
11 changes: 7 additions & 4 deletions src/Steps/Loading/GetSitemapsFromRobotsTxt.php
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@

class GetSitemapsFromRobotsTxt extends Step
{
/**
* @use LoadingStep<HttpLoader>
*/
use LoadingStep;

public function outputType(): StepOutputType
Expand All @@ -25,17 +28,17 @@ public function outputType(): StepOutputType
*/
protected function invoke(mixed $input): Generator
{
if (!method_exists($this->loader, 'robotsTxt')) {
$loader = $this->getLoader();

if (!method_exists($loader, 'robotsTxt')) {
throw new Exception('The Loader doesn\'t expose the RobotsTxtHandler.');
}

$loader = $this->getLoader();

if (!$loader instanceof HttpLoader) {
throw new Exception('The GetSitemapsFromRobotsTxt step needs an HttpLoader as loader instance.');
}

$robotsTxtHandler = $loader->robotsTxt();
$robotsTxtHandler = $this->getLoader()->robotsTxt();

foreach ($robotsTxtHandler->getSitemaps($input) as $sitemapUrl) {
yield $sitemapUrl;
Expand Down
4 changes: 4 additions & 0 deletions src/Steps/Loading/HttpBase.php
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
namespace Crwlr\Crawler\Steps\Loading;

use Crwlr\Crawler\Loader\Http\Exceptions\LoadingException;
use Crwlr\Crawler\Loader\Http\HttpLoader;
use Crwlr\Crawler\Loader\Http\Messages\RespondedRequest;
use Crwlr\Crawler\Steps\Step;
use Crwlr\Crawler\Utils\HttpHeaders;
Expand All @@ -14,6 +15,9 @@

abstract class HttpBase extends Step
{
/**
* @use LoadingStep<HttpLoader>
*/
use LoadingStep;

protected bool $stopOnErrorResponse = false;
Expand Down
19 changes: 19 additions & 0 deletions src/Steps/Loading/LoadingStep.php
Original file line number Diff line number Diff line change
Expand Up @@ -4,26 +4,45 @@

use Crwlr\Crawler\Loader\LoaderInterface;

/**
* @template T of LoaderInterface
*/

trait LoadingStep
{
/**
* @var T $loader
*/
private LoaderInterface $loader;

/**
* @var ?T $customLoader
*/
private ?LoaderInterface $customLoader = null;

/**
* @param T $loader
*/
public function setLoader(LoaderInterface $loader): static
{
$this->loader = $loader;

return $this;
}

/**
* @param T $loader
*/
public function withLoader(LoaderInterface $loader): static
{
$this->customLoader = $loader;

return $this;
}

/**
* @return T
*/
protected function getLoader(): LoaderInterface
{
return $this->customLoader ?? $this->loader;
Expand Down
3 changes: 3 additions & 0 deletions tests/Pest.php
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,9 @@ protected function invoke(mixed $input): Generator
function helper_getLoadingStep(): Step
{
return new class extends Step {
/**
* @use LoadingStep<LoaderInterface>
*/
use LoadingStep;

protected function invoke(mixed $input): Generator
Expand Down
6 changes: 6 additions & 0 deletions tests/Steps/Loading/LoadingStepTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@

test('you can add a loader', function () {
$step = new class extends Step {
/**
* @use LoadingStep<HttpLoader>
*/
use LoadingStep;

protected function invoke(mixed $input): Generator
Expand Down Expand Up @@ -47,6 +50,9 @@ function () {
$loaderTwo->shouldReceive('load')->once()->andReturn('Hi');

$step = new class extends Step {
/**
* @use LoadingStep<Loader>
*/
use LoadingStep;

protected function invoke(mixed $input): Generator
Expand Down

0 comments on commit 81aac99

Please sign in to comment.