Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undeprecate XPath Functionality #177

Merged
merged 2 commits into from
Dec 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

## [3.0.1] - 2024-12-10
### Undeprecated
* Removed deprecations for all XPath functionality (`Dom::xPath()`, `XPathQuery` class and `Node::queryXPath()`), because it's still available with the net DOM API in PHP 8.4.

## [3.0.0] - 2024-12-08
The primary change in version 3.0.0 is that the library now leverages PHP 8.4’s new DOM API when used in an environment with PHP >= 8.4. To maintain compatibility with PHP < 8.4, an abstraction layer has been implemented. This layer dynamically uses either the Symfony DomCrawler component or the new DOM API, depending on the PHP version.

Since no direct interaction with an instance of the Symfony DomCrawler library was required at the step level provided by the library, it is highly likely that you won’t need to make any changes to your code to upgrade to v3. To ensure a smooth transition, please review the points under “Changed.”

If you're using XPath queries for data extraction, please try to switch to using CSS selectors instead, because XPath is no longer supported by the new DOM API. Therefor XPath related functionality was deprecated in this version of the library and will probably be removed in the next major version.

### Changed
* __BREAKING__: The `DomQuery::innerText()` method (a.k.a. `Dom::cssSelector('...')->innerText()`) has been removed. `innerText` exists only in the Symfony DomCrawler component, and its usefulness is questionable. If you still require this variant of the DOM element text, please let us know or create a pull request yourself. Thank you!
* __BREAKING__: The `DomQueryInterface` was removed. As the `DomQuery` class offers a lot more functionality than the interface defines, the purpose of the interface was questionable. Please use the abstract `DomQuery` class instead. This also means that some method signatures, type hinting the interface, have changed. Look for occurences of `DomQueryInterface` and replace them.
Expand Down
1 change: 1 addition & 0 deletions phpstan.neon
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,4 @@ parameters:
- "#^Call to an undefined (static )?method Dom\\\\.+::.+\\(\\)\\.#"
- "#^Access to an undefined property Dom\\\\.+::\\$.+\\.#"
- "#^Function .+ has invalid return type Dom\\\\.+\\.#"
- "#^(?:Used )?(?:C|c)onstant DOM\\\\.+ not found\\.#"
2 changes: 0 additions & 2 deletions src/Steps/Dom.php
Original file line number Diff line number Diff line change
Expand Up @@ -91,8 +91,6 @@ public static function cssSelector(string $selector): CssSelector

/**
* @throws InvalidDomQueryException
* @deprecated As the usage of XPath queries is no longer an option with the new DOM API introduced in
* PHP 8.4, please switch to using CSS selectors instead!
*/
public static function xPath(string $query): XPathQuery
{
Expand Down
4 changes: 3 additions & 1 deletion src/Steps/Dom/HtmlDocument.php
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@
use DOMNode;
use Symfony\Component\DomCrawler\Crawler;

use const DOM\HTML_NO_DEFAULT_NS;

/**
* @method HtmlElement|null querySelector(string $selector)
* @method NodeList<int, HtmlElement> querySelectorAll(string $selector)
Expand Down Expand Up @@ -47,7 +49,7 @@ protected function makeChildNodeInstance(object $node): Node
protected function makeDocumentInstance(string $source): object
{
if (PhpVersion::isAtLeast(8, 4)) {
return \Dom\HTMLDocument::createFromString($source, LIBXML_NOERROR);
return \Dom\HTMLDocument::createFromString($source, HTML_NO_DEFAULT_NS | LIBXML_NOERROR);
}

return new Crawler($source);
Expand Down
24 changes: 16 additions & 8 deletions src/Steps/Dom/Node.php
Original file line number Diff line number Diff line change
Expand Up @@ -47,10 +47,6 @@ public function querySelectorAll(string $selector): NodeList
return $this->makeNodeListInstance($this->node->querySelectorAll($selector));
}

/**
* @deprecated As the usage of XPath queries is no longer an option with the new DOM API introduced in
* PHP 8.4, please switch to using CSS selectors instead!
*/
public function queryXPath(string $query): NodeList
{
$node = $this->node;
Expand Down Expand Up @@ -107,12 +103,24 @@ protected function outerSource(): string
return $this->node->outerHtml();
}

if ($this->node instanceof Document) {
$node = $this->node->documentElement;

if ($this->node instanceof \Dom\HTMLDocument) {
return $this->node->saveHTML($node);
} elseif ($this->node instanceof \Dom\XMLDocument) {
return $this->node->saveXML($node);
}
}

$parentDocument = $this->getParentDocumentOfNode($this->node);

if ($parentDocument instanceof \Dom\HTMLDocument) {
return $parentDocument->saveHTML($this->node);
} elseif ($parentDocument instanceof \Dom\XMLDocument) {
return $parentDocument->saveXML($this->node);
if ($parentDocument) {
if ($parentDocument instanceof \Dom\HTMLDocument) {
return $parentDocument->saveHTML($this->node);
} elseif ($parentDocument instanceof \Dom\XMLDocument) {
return $parentDocument->saveXML($this->node);
}
}

return $this->node->innerHTML;
Expand Down
2 changes: 1 addition & 1 deletion src/Steps/Dom/XmlDocument.php
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ protected function makeChildNodeInstance(object $node): Node
protected function makeDocumentInstance(string $source): object
{
if (PhpVersion::isAtLeast(8, 4)) {
return \Dom\XMLDocument::createFromString($source, LIBXML_NOERROR);
return \Dom\XMLDocument::createFromString($source, LIBXML_NOERROR | LIBXML_NONET);
}

return new Crawler($source);
Expand Down
2 changes: 1 addition & 1 deletion src/Steps/Html/CssSelector.php
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ public function __construct(string $query)
}
} else {
try {
(new HtmlDocument('<p></p>'))->querySelector($query);
(new HtmlDocument('<!doctype html><html></html>'))->querySelector($query);
} catch (DOMException $exception) {
throw InvalidDomQueryException::fromDomException($query, $exception);
}
Expand Down
5 changes: 0 additions & 5 deletions src/Steps/Html/XPathQuery.php
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,6 @@
use DOMDocument;
use DOMXPath;

/**
* @deprecated As the usage of XPath queries is no longer an option with the new DOM API introduced in
* PHP 8.4, please switch to using CSS selectors instead!
*/

class XPathQuery extends DomQuery
{
/**
Expand Down
25 changes: 13 additions & 12 deletions tests/Steps/BaseStepTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -830,7 +830,7 @@ function (bool $callKeep, bool $callKeepAs, bool $callKeepFromInput, bool $callK

$results = helper_invokeStepWithInput($step, [
'foo' => 'hey',
'bar' => '<html><head></head><body><h1>Hello World!</h1></body>',
'bar' => '<!doctype html><html><head></head><body><h1>Hello World!</h1></body>',
]);

expect($results)->toHaveCount(1)
Expand All @@ -847,14 +847,15 @@ function (bool $callKeep, bool $callKeepAs, bool $callKeepFromInput, bool $callK
});

$html = <<<HTML
<html>
<head></head>
<body>
<div class="item"><h3>one</h3></div>
<div class="item"><h3>two</h3></div>
<div class="item"><h3>three</h3></div>
</body>
HTML;
<!doctype html>
<html>
<head></head>
<body>
<div class="item"><h3>one</h3></div>
<div class="item"><h3>two</h3></div>
<div class="item"><h3>three</h3></div>
</body>
HTML;

$results = helper_invokeStepWithInput($step, ['foo' => 'hey', 'bar' => $html, 'baz' => 'yo']);

Expand Down Expand Up @@ -883,9 +884,9 @@ function (bool $callKeep, bool $callKeepAs, bool $callKeepFromInput, bool $callK
$results = helper_invokeStepWithInput($step, [
'foo' => 'hey',
'bar' => [
'<html><head></head><body><h1>No. 1</h1></body>',
'<html><head></head><body><h1>No. 2</h1></body>',
'<html><head></head><body><h1>No. 3</h1></body>',
'<!doctype html><html><head></head><body><h1>No. 1</h1></body>',
'<!doctype html><html><head></head><body><h1>No. 2</h1></body>',
'<!doctype html><html><head></head><body><h1>No. 3</h1></body>',
],
'baz' => 'yo',
]);
Expand Down
11 changes: 6 additions & 5 deletions tests/Steps/Dom/HtmlDocumentTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
use Crwlr\Crawler\Steps\Dom\NodeList;

it('gets the href of a base tag in the document', function () {
$html = '<html><head><title>foo</title><base href="/foo/bar" /></head><body>hello</body></html>';
$html = '<!doctype html><html><head><title>foo</title><base href="/foo/bar" /></head><body>hello</body></html>';

$document = new HtmlDocument($html);

Expand All @@ -16,6 +16,7 @@

it('gets the href of the first base tag in the document', function () {
$html = <<<HTML
<!doctype html>
<html>
<head>
<title>foo</title>
Expand All @@ -32,23 +33,23 @@
});

test('getBaseHref() returns null if the document does not contain a base tag', function () {
$html = '<html><head><title>foo</title></head><body>hey</body></html>';
$html = '<!doctype html><html><head><title>foo</title></head><body>hey</body></html>';

$document = new HtmlDocument($html);

expect($document->getBaseHref())->toBeNull();
});

test('the querySelector() method returns an HtmlElement object', function () {
$html = '<html><head><title>foo</title></head><body><div class="element">hello</div></body></html>';
$html = '<!doctype html><html><head><title>foo</title></head><body><div class="element">hello</div></body></html>';

$document = new HtmlDocument($html);

expect($document->querySelector('.element'))->toBeInstanceOf(HtmlElement::class);
});

test('the querySelectorAll() method returns a NodeList of HtmlElement objects', function () {
$html = '<html><head><title>foo</title></head><body><ul><li>foo</li><li>bar</li></ul></body></html>';
$html = '<!doctype html><html><head><title>foo</title></head><body><ul><li>foo</li><li>bar</li></ul></body></html>';

$document = new HtmlDocument($html);

Expand All @@ -68,7 +69,7 @@
});

test('the queryXPath() method returns a NodeList of HtmlElement objects', function () {
$html = '<html><head><title>foo</title></head><body><ul><li>foo</li><li>bar</li></ul></body></html>';
$html = '<!doctype html><html><head><title>foo</title></head><body><ul><li>foo</li><li>bar</li></ul></body></html>';

$document = new HtmlDocument($html);

Expand Down
8 changes: 8 additions & 0 deletions tests/Steps/Dom/HtmlElementTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

test('child nodes selected via querySelector() are HtmlElement instances', function () {
$html = <<<HTML
<!doctype html>
<html>
<head></head>
<body>
Expand All @@ -26,6 +27,7 @@

test('child nodes selected via querySelectorAll() are HtmlElement instances', function () {
$html = <<<HTML
<!doctype html>
<html>
<head></head>
<body>
Expand Down Expand Up @@ -53,6 +55,7 @@

test('child nodes selected via queryXPath() are HtmlElement instances', function () {
$html = <<<HTML
<!doctype html>
<html>
<head></head>
<body>
Expand Down Expand Up @@ -82,6 +85,7 @@

it('gets the node name', function () {
$html = <<<HTML
<!doctype html>
<html>
<head></head>
<body>
Expand All @@ -100,6 +104,7 @@

it('gets the text of a node', function () {
$html = <<<HTML
<!doctype html>
<html>
<head></head>
<body>
Expand All @@ -119,6 +124,7 @@

it('gets the outer HTML of a node', function () {
$html = <<<HTML
<!doctype html>
<html>
<head></head>
<body>
Expand All @@ -142,6 +148,7 @@

it('gets the inner HTML of a node', function () {
$html = <<<HTML
<!doctype html>
<html>
<head></head>
<body>
Expand All @@ -164,6 +171,7 @@

it('gets an attribute from a node', function () {
$html = <<<HTML
<!doctype html>
<html>
<head></head>
<body>
Expand Down
8 changes: 8 additions & 0 deletions tests/Steps/Dom/NodeListTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@

it('can be constructed from a symfony Crawler instance', function () {
$html = <<<HTML
<!doctype html>
<html>
<head></head>
<body>
Expand Down Expand Up @@ -40,6 +41,7 @@ function (object $node): HtmlElement {

it('can be constructed from a \Dom\NodeList instance', function () {
$html = <<<HTML
<!doctype html>
<html>
<head></head>
<body>
Expand Down Expand Up @@ -67,6 +69,7 @@ function (object $node): HtmlElement {

it('can be instantiated from an array of Nodes (object instances from this library)', function () {
$html = <<<HTML
<!doctype html>
<html>
<head></head>
<body>
Expand Down Expand Up @@ -95,6 +98,7 @@ function (object $node): HtmlElement {

it('gets the count of the node list', function () {
$html = <<<HTML
<!doctype html>
<html>
<head>
<title>Foo</title>
Expand All @@ -112,6 +116,7 @@ function (object $node): HtmlElement {

it('can be iterated and the elements are instances of Crwlr\Crawler\Steps\Dom\Node', function () {
$html = <<<HTML
<!doctype html>
<html>
<head>
<title>Foo</title>
Expand Down Expand Up @@ -139,6 +144,7 @@ function (object $node): HtmlElement {
'can be iterated with the each() method and return values are returned as an array from the each() call',
function () {
$html = <<<HTML
<!doctype html>
<html>
<head></head>
<body>
Expand Down Expand Up @@ -169,6 +175,7 @@ function () {

test('an empty NodeList can be iterated', function () {
$html = <<<HTML
<!doctype html>
<html>
<head>
<title>Foo</title>
Expand All @@ -192,6 +199,7 @@ function () {

it('returns the first, last and nth element of the NodeList', function () {
$html = <<<HTML
<!doctype html>
<html>
<head></head>
<body>
Expand Down
Loading
Loading