Skip to content

Commit

Permalink
Undeprecate XPath Functionality
Browse files Browse the repository at this point in the history
I found that I was wrong about XPath not being available for usage with
the new DOM API in PHP 8.4. As it is still available, remove the
deprecations introduced in 3.0.0.
  • Loading branch information
otsch committed Dec 9, 2024
1 parent 4f271e1 commit c46278d
Show file tree
Hide file tree
Showing 15 changed files with 89 additions and 55 deletions.
6 changes: 4 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

## [3.0.1] - 2024-12-10
### Undeprecated
* Removed deprecations for all XPath functionality (`Dom::xPath()`, `XPathQuery` class and `Node::queryXPath()`), because it's still available with the net DOM API in PHP 8.4.

## [3.0.0] - 2024-12-08
The primary change in version 3.0.0 is that the library now leverages PHP 8.4’s new DOM API when used in an environment with PHP >= 8.4. To maintain compatibility with PHP < 8.4, an abstraction layer has been implemented. This layer dynamically uses either the Symfony DomCrawler component or the new DOM API, depending on the PHP version.

Since no direct interaction with an instance of the Symfony DomCrawler library was required at the step level provided by the library, it is highly likely that you won’t need to make any changes to your code to upgrade to v3. To ensure a smooth transition, please review the points under “Changed.”

If you're using XPath queries for data extraction, please try to switch to using CSS selectors instead, because XPath is no longer supported by the new DOM API. Therefor XPath related functionality was deprecated in this version of the library and will probably be removed in the next major version.

### Changed
* __BREAKING__: The `DomQuery::innerText()` method (a.k.a. `Dom::cssSelector('...')->innerText()`) has been removed. `innerText` exists only in the Symfony DomCrawler component, and its usefulness is questionable. If you still require this variant of the DOM element text, please let us know or create a pull request yourself. Thank you!
* __BREAKING__: The `DomQueryInterface` was removed. As the `DomQuery` class offers a lot more functionality than the interface defines, the purpose of the interface was questionable. Please use the abstract `DomQuery` class instead. This also means that some method signatures, type hinting the interface, have changed. Look for occurences of `DomQueryInterface` and replace them.
Expand Down
1 change: 1 addition & 0 deletions phpstan.neon
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,4 @@ parameters:
- "#^Call to an undefined (static )?method Dom\\\\.+::.+\\(\\)\\.#"
- "#^Access to an undefined property Dom\\\\.+::\\$.+\\.#"
- "#^Function .+ has invalid return type Dom\\\\.+\\.#"
- "#^(?:Used )?(?:C|c)onstant DOM\\\\.+ not found\\.#"
2 changes: 0 additions & 2 deletions src/Steps/Dom.php
Original file line number Diff line number Diff line change
Expand Up @@ -91,8 +91,6 @@ public static function cssSelector(string $selector): CssSelector

/**
* @throws InvalidDomQueryException
* @deprecated As the usage of XPath queries is no longer an option with the new DOM API introduced in
* PHP 8.4, please switch to using CSS selectors instead!
*/
public static function xPath(string $query): XPathQuery
{
Expand Down
3 changes: 2 additions & 1 deletion src/Steps/Dom/HtmlDocument.php
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
use Crwlr\Utils\PhpVersion;
use DOMNode;
use Symfony\Component\DomCrawler\Crawler;
use const DOM\HTML_NO_DEFAULT_NS;

/**
* @method HtmlElement|null querySelector(string $selector)
Expand Down Expand Up @@ -47,7 +48,7 @@ protected function makeChildNodeInstance(object $node): Node
protected function makeDocumentInstance(string $source): object
{
if (PhpVersion::isAtLeast(8, 4)) {
return \Dom\HTMLDocument::createFromString($source, LIBXML_NOERROR);
return \Dom\HTMLDocument::createFromString($source, HTML_NO_DEFAULT_NS | LIBXML_NOERROR);
}

return new Crawler($source);
Expand Down
24 changes: 16 additions & 8 deletions src/Steps/Dom/Node.php
Original file line number Diff line number Diff line change
Expand Up @@ -47,10 +47,6 @@ public function querySelectorAll(string $selector): NodeList
return $this->makeNodeListInstance($this->node->querySelectorAll($selector));
}

/**
* @deprecated As the usage of XPath queries is no longer an option with the new DOM API introduced in
* PHP 8.4, please switch to using CSS selectors instead!
*/
public function queryXPath(string $query): NodeList
{
$node = $this->node;
Expand Down Expand Up @@ -107,12 +103,24 @@ protected function outerSource(): string
return $this->node->outerHtml();
}

if ($this->node instanceof Document) {
$node = $this->node->documentElement;

if ($this->node instanceof \Dom\HTMLDocument) {
return $this->node->saveHTML($node);
} elseif ($this->node instanceof \Dom\XMLDocument) {
return $this->node->saveXML($node);
}
}

$parentDocument = $this->getParentDocumentOfNode($this->node);

if ($parentDocument instanceof \Dom\HTMLDocument) {
return $parentDocument->saveHTML($this->node);
} elseif ($parentDocument instanceof \Dom\XMLDocument) {
return $parentDocument->saveXML($this->node);
if ($parentDocument) {
if ($parentDocument instanceof \Dom\HTMLDocument) {
return $parentDocument->saveHTML($this->node);
} elseif ($parentDocument instanceof \Dom\XMLDocument) {
return $parentDocument->saveXML($this->node);
}
}

return $this->node->innerHTML;
Expand Down
2 changes: 1 addition & 1 deletion src/Steps/Dom/XmlDocument.php
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ protected function makeChildNodeInstance(object $node): Node
protected function makeDocumentInstance(string $source): object
{
if (PhpVersion::isAtLeast(8, 4)) {
return \Dom\XMLDocument::createFromString($source, LIBXML_NOERROR);
return \Dom\XMLDocument::createFromString($source, LIBXML_NOERROR | LIBXML_NONET);
}

return new Crawler($source);
Expand Down
2 changes: 1 addition & 1 deletion src/Steps/Html/CssSelector.php
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ public function __construct(string $query)
}
} else {
try {
(new HtmlDocument('<p></p>'))->querySelector($query);
(new HtmlDocument('<!doctype html><html></html>'))->querySelector($query);
} catch (DOMException $exception) {
throw InvalidDomQueryException::fromDomException($query, $exception);
}
Expand Down
5 changes: 0 additions & 5 deletions src/Steps/Html/XPathQuery.php
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,6 @@
use DOMDocument;
use DOMXPath;

/**
* @deprecated As the usage of XPath queries is no longer an option with the new DOM API introduced in
* PHP 8.4, please switch to using CSS selectors instead!
*/

class XPathQuery extends DomQuery
{
/**
Expand Down
25 changes: 13 additions & 12 deletions tests/Steps/BaseStepTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -830,7 +830,7 @@ function (bool $callKeep, bool $callKeepAs, bool $callKeepFromInput, bool $callK

$results = helper_invokeStepWithInput($step, [
'foo' => 'hey',
'bar' => '<html><head></head><body><h1>Hello World!</h1></body>',
'bar' => '<!doctype html><html><head></head><body><h1>Hello World!</h1></body>',
]);

expect($results)->toHaveCount(1)
Expand All @@ -847,14 +847,15 @@ function (bool $callKeep, bool $callKeepAs, bool $callKeepFromInput, bool $callK
});

$html = <<<HTML
<html>
<head></head>
<body>
<div class="item"><h3>one</h3></div>
<div class="item"><h3>two</h3></div>
<div class="item"><h3>three</h3></div>
</body>
HTML;
<!doctype html>
<html>
<head></head>
<body>
<div class="item"><h3>one</h3></div>
<div class="item"><h3>two</h3></div>
<div class="item"><h3>three</h3></div>
</body>
HTML;

$results = helper_invokeStepWithInput($step, ['foo' => 'hey', 'bar' => $html, 'baz' => 'yo']);

Expand Down Expand Up @@ -883,9 +884,9 @@ function (bool $callKeep, bool $callKeepAs, bool $callKeepFromInput, bool $callK
$results = helper_invokeStepWithInput($step, [
'foo' => 'hey',
'bar' => [
'<html><head></head><body><h1>No. 1</h1></body>',
'<html><head></head><body><h1>No. 2</h1></body>',
'<html><head></head><body><h1>No. 3</h1></body>',
'<!doctype html><html><head></head><body><h1>No. 1</h1></body>',
'<!doctype html><html><head></head><body><h1>No. 2</h1></body>',
'<!doctype html><html><head></head><body><h1>No. 3</h1></body>',
],
'baz' => 'yo',
]);
Expand Down
11 changes: 6 additions & 5 deletions tests/Steps/Dom/HtmlDocumentTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
use Crwlr\Crawler\Steps\Dom\NodeList;

it('gets the href of a base tag in the document', function () {
$html = '<html><head><title>foo</title><base href="/foo/bar" /></head><body>hello</body></html>';
$html = '<!doctype html><html><head><title>foo</title><base href="/foo/bar" /></head><body>hello</body></html>';

$document = new HtmlDocument($html);

Expand All @@ -16,6 +16,7 @@

it('gets the href of the first base tag in the document', function () {
$html = <<<HTML
<!doctype html>
<html>
<head>
<title>foo</title>
Expand All @@ -32,23 +33,23 @@
});

test('getBaseHref() returns null if the document does not contain a base tag', function () {
$html = '<html><head><title>foo</title></head><body>hey</body></html>';
$html = '<!doctype html><html><head><title>foo</title></head><body>hey</body></html>';

$document = new HtmlDocument($html);

expect($document->getBaseHref())->toBeNull();
});

test('the querySelector() method returns an HtmlElement object', function () {
$html = '<html><head><title>foo</title></head><body><div class="element">hello</div></body></html>';
$html = '<!doctype html><html><head><title>foo</title></head><body><div class="element">hello</div></body></html>';

$document = new HtmlDocument($html);

expect($document->querySelector('.element'))->toBeInstanceOf(HtmlElement::class);
});

test('the querySelectorAll() method returns a NodeList of HtmlElement objects', function () {
$html = '<html><head><title>foo</title></head><body><ul><li>foo</li><li>bar</li></ul></body></html>';
$html = '<!doctype html><html><head><title>foo</title></head><body><ul><li>foo</li><li>bar</li></ul></body></html>';

$document = new HtmlDocument($html);

Expand All @@ -68,7 +69,7 @@
});

test('the queryXPath() method returns a NodeList of HtmlElement objects', function () {
$html = '<html><head><title>foo</title></head><body><ul><li>foo</li><li>bar</li></ul></body></html>';
$html = '<!doctype html><html><head><title>foo</title></head><body><ul><li>foo</li><li>bar</li></ul></body></html>';

$document = new HtmlDocument($html);

Expand Down
8 changes: 8 additions & 0 deletions tests/Steps/Dom/HtmlElementTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

test('child nodes selected via querySelector() are HtmlElement instances', function () {
$html = <<<HTML
<!doctype html>
<html>
<head></head>
<body>
Expand All @@ -26,6 +27,7 @@

test('child nodes selected via querySelectorAll() are HtmlElement instances', function () {
$html = <<<HTML
<!doctype html>
<html>
<head></head>
<body>
Expand Down Expand Up @@ -53,6 +55,7 @@

test('child nodes selected via queryXPath() are HtmlElement instances', function () {
$html = <<<HTML
<!doctype html>
<html>
<head></head>
<body>
Expand Down Expand Up @@ -82,6 +85,7 @@

it('gets the node name', function () {
$html = <<<HTML
<!doctype html>
<html>
<head></head>
<body>
Expand All @@ -100,6 +104,7 @@

it('gets the text of a node', function () {
$html = <<<HTML
<!doctype html>
<html>
<head></head>
<body>
Expand All @@ -119,6 +124,7 @@

it('gets the outer HTML of a node', function () {
$html = <<<HTML
<!doctype html>
<html>
<head></head>
<body>
Expand All @@ -142,6 +148,7 @@

it('gets the inner HTML of a node', function () {
$html = <<<HTML
<!doctype html>
<html>
<head></head>
<body>
Expand All @@ -164,6 +171,7 @@

it('gets an attribute from a node', function () {
$html = <<<HTML
<!doctype html>
<html>
<head></head>
<body>
Expand Down
8 changes: 8 additions & 0 deletions tests/Steps/Dom/NodeListTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@

it('can be constructed from a symfony Crawler instance', function () {
$html = <<<HTML
<!doctype html>
<html>
<head></head>
<body>
Expand Down Expand Up @@ -40,6 +41,7 @@ function (object $node): HtmlElement {

it('can be constructed from a \Dom\NodeList instance', function () {
$html = <<<HTML
<!doctype html>
<html>
<head></head>
<body>
Expand Down Expand Up @@ -67,6 +69,7 @@ function (object $node): HtmlElement {

it('can be instantiated from an array of Nodes (object instances from this library)', function () {
$html = <<<HTML
<!doctype html>
<html>
<head></head>
<body>
Expand Down Expand Up @@ -95,6 +98,7 @@ function (object $node): HtmlElement {

it('gets the count of the node list', function () {
$html = <<<HTML
<!doctype html>
<html>
<head>
<title>Foo</title>
Expand All @@ -112,6 +116,7 @@ function (object $node): HtmlElement {

it('can be iterated and the elements are instances of Crwlr\Crawler\Steps\Dom\Node', function () {
$html = <<<HTML
<!doctype html>
<html>
<head>
<title>Foo</title>
Expand Down Expand Up @@ -139,6 +144,7 @@ function (object $node): HtmlElement {
'can be iterated with the each() method and return values are returned as an array from the each() call',
function () {
$html = <<<HTML
<!doctype html>
<html>
<head></head>
<body>
Expand Down Expand Up @@ -169,6 +175,7 @@ function () {

test('an empty NodeList can be iterated', function () {
$html = <<<HTML
<!doctype html>
<html>
<head>
<title>Foo</title>
Expand All @@ -192,6 +199,7 @@ function () {

it('returns the first, last and nth element of the NodeList', function () {
$html = <<<HTML
<!doctype html>
<html>
<head></head>
<body>
Expand Down
Loading

0 comments on commit c46278d

Please sign in to comment.