Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.11] Add XPathSelector::quote #575

Closed
wants to merge 5 commits into from

Conversation

divinity76
Copy link
Contributor

@divinity76 divinity76 commented Dec 13, 2023

method to quote strings in XPath, similar to PDO::quote() and mysqli::real_escape_string()

Quoting strings in XPath is surprisingly difficult to do in the edge case where the string contains both double-quotes and single-quotes. Got the algorithm from https://stackoverflow.com/a/1352556/1067003

divinity76 and others added 3 commits December 13, 2023 11:37
method to quote strings in XPath, similar to PDO::quote()

Quoting strings in XPath is surprisingly difficult to do, in the edge case where the string contains both double-quotes and single-quotes. Got the algorithm from https://stackoverflow.com/a/1352556/1067003
@divinity76 divinity76 changed the title XPathSelector::quote(string $str): string XPathSelector::quote(string $string): string Dec 13, 2023
@GrahamCampbell GrahamCampbell changed the base branch from 1.10 to 1.11 December 13, 2023 12:43
of my own design. test code:

<?php
class C
{
    /**
     * Quotes a string for use in an XPath expression.
     *
     * Example: new XPathSelector("//span[contains(text()," . XPathSelector::quote($string) . ")]")
     *
     * @param string $string
     *
     * @return string
     */
    public static function quote2(string $string): string
    {
        if (false === \strpos($string, '"')) {
            return '"' . $string . '"';
        }
        if (false === \strpos($string, '\'')) {
            return '\'' . $string . '\'';
        }
        // if the string contains both single and double quotes, construct an
        // expression that concatenates all non-double-quote substrings with
        // the quotes, e.g.:
        //    concat("'foo'", '"', "bar")
        $sb = [];
        while (\strlen($string) > 0) {
            $bytesUntilSingleQuote = \strcspn($string, '\'');
            $bytesUntilDoubleQuote = \strcspn($string, '"');
            $quoteMethod = ($bytesUntilSingleQuote > $bytesUntilDoubleQuote) ? "'" : '"';
            $bytesUntilQuote = max($bytesUntilSingleQuote, $bytesUntilDoubleQuote);
            $sb[] = $quoteMethod . \substr($string, 0, $bytesUntilQuote) . $quoteMethod;
            $string = \substr($string, $bytesUntilQuote);
        }
        $sb = \implode(',', $sb);
        return 'concat(' . $sb . ')';
    }

    /**
     * Quotes a string for use in an XPath expression.
     *
     * Example: new XPathSelector("//span[contains(text()," . XPathSelector::quote($string) . ")]")
     *
     * @author  Robert Rossney ( https://stackoverflow.com/users/19403/robert-rossney )
     *
     * @param string $string
     *
     * @return string
     */
    public static function quote(string $string): string
    {
        if (false === \strpos($string, '"')) {
            return '"' . $string . '"';
        }
        if (false === \strpos($string, '\'')) {
            return '\'' . $string . '\'';
        }
        // if the string contains both single and double quotes, construct an
        // expression that concatenates all non-double-quote substrings with
        // the quotes, e.g.:
        //    concat("'foo'", '"', "bar")
        $sb = 'concat(';
        $substrings = \explode('"', $string);
        for ($i = 0; $i < \count($substrings); ++$i) {
            $needComma = ($i > 0);
            if ('' !== $substrings[$i]) {
                if ($i > 0) {
                    $sb .= ', ';
                }
                $sb .= '"' . $substrings[$i] . '"';
                $needComma = true;
            }
            if ($i < (\count($substrings) - 1)) {
                if ($needComma) {
                    $sb .= ', ';
                }
                $sb .= "'\"'";
            }
        }
        $sb .= ')';

        return $sb;
    }
}
$tests = array(
    "foo",              // no quotes
    "\"foo",            // double quotes only
    "'foo",             // single quotes only
    "'foo\"bar",        // both; double quotes in mid-string
    "'foo\"bar\"baz",   // multiple double quotes in mid-string
    "'foo\"",           // string ends with double quotes
    "'foo\"\"",         // string ends with run of double quotes
    "\"'foo",           // string begins with double quotes
    "\"\"'foo",         // string begins with run of double quotes
    "'foo\"\"bar"       // run of double quotes in mid-string
);
foreach ($tests as $test) {
    $quoted = C::quote($test);
    $quoted2 = C::quote2($test);
    var_dump([
        'test' => $test,
        'quoted' => $quoted,
        'quoted2' => $quoted2,
        // 'eval' => eval("return $quoted;"),
    ]);
}
?>


=> 

array(3) {
  ["test"]=>
  string(3) "foo"
  ["quoted"]=>
  string(5) ""foo""
  ["quoted2"]=>
  string(5) ""foo""
}
array(3) {
  ["test"]=>
  string(4) ""foo"
  ["quoted"]=>
  string(6) "'"foo'"
  ["quoted2"]=>
  string(6) "'"foo'"
}
array(3) {
  ["test"]=>
  string(4) "'foo"
  ["quoted"]=>
  string(6) ""'foo""
  ["quoted2"]=>
  string(6) ""'foo""
}
array(3) {
  ["test"]=>
  string(8) "'foo"bar"
  ["quoted"]=>
  string(26) "concat("'foo", '"', "bar")"
  ["quoted2"]=>
  string(21) "concat("'foo",'"bar')"
}
array(3) {
  ["test"]=>
  string(12) "'foo"bar"baz"
  ["quoted"]=>
  string(38) "concat("'foo", '"', "bar", '"', "baz")"
  ["quoted2"]=>
  string(25) "concat("'foo",'"bar"baz')"
}
array(3) {
  ["test"]=>
  string(5) "'foo""
  ["quoted"]=>
  string(19) "concat("'foo", '"')"
  ["quoted2"]=>
  string(18) "concat("'foo",'"')"
}
array(3) {
  ["test"]=>
  string(6) "'foo"""
  ["quoted"]=>
  string(24) "concat("'foo", '"', '"')"
  ["quoted2"]=>
  string(19) "concat("'foo",'""')"
}
array(3) {
  ["test"]=>
  string(5) ""'foo"
  ["quoted"]=>
  string(19) "concat('"', "'foo")"
  ["quoted2"]=>
  string(18) "concat('"',"'foo")"
}
array(3) {
  ["test"]=>
  string(6) """'foo"
  ["quoted"]=>
  string(24) "concat('"', '"', "'foo")"
  ["quoted2"]=>
  string(19) "concat('""',"'foo")"
}
array(3) {
  ["test"]=>
  string(9) "'foo""bar"
  ["quoted"]=>
  string(31) "concat("'foo", '"', '"', "bar")"
  ["quoted2"]=>
  string(22) "concat("'foo",'""bar')"
}
@divinity76
Copy link
Contributor Author

divinity76 commented Dec 15, 2023

replaced the algorithm with one of my own design, creating smaller, more readable XPath expressions:

for example, with the input
'foo"bar"baz
the original algorithm generated
concat("'foo", '"', "bar", '"', "baz")
and the new algorithm generates
concat("'foo",'"bar"baz')

here is my test code:

<?php
class C
{
    /**
     * Quotes a string for use in an XPath expression.
     *
     * Example: new XPathSelector("//span[contains(text()," . XPathSelector::quote($string) . ")]")
     *
     * @param string $string
     *
     * @return string
     */
    public static function quote2(string $string): string
    {
        if (false === \strpos($string, '"')) {
            return '"' . $string . '"';
        }
        if (false === \strpos($string, '\'')) {
            return '\'' . $string . '\'';
        }
        // if the string contains both single and double quotes, construct an
        // expression that concatenates all non-double-quote substrings with
        // the quotes, e.g.:
        //    concat("'foo'", '"', "bar")
        $sb = [];
        while (\strlen($string) > 0) {
            $bytesUntilSingleQuote = \strcspn($string, '\'');
            $bytesUntilDoubleQuote = \strcspn($string, '"');
            $quoteMethod = ($bytesUntilSingleQuote > $bytesUntilDoubleQuote) ? "'" : '"';
            $bytesUntilQuote = max($bytesUntilSingleQuote, $bytesUntilDoubleQuote);
            $sb[] = $quoteMethod . \substr($string, 0, $bytesUntilQuote) . $quoteMethod;
            $string = \substr($string, $bytesUntilQuote);
        }
        $sb = \implode(',', $sb);
        return 'concat(' . $sb . ')';
    }

    /**
     * Quotes a string for use in an XPath expression.
     *
     * Example: new XPathSelector("//span[contains(text()," . XPathSelector::quote($string) . ")]")
     *
     * @author  Robert Rossney ( https://stackoverflow.com/users/19403/robert-rossney )
     *
     * @param string $string
     *
     * @return string
     */
    public static function quote(string $string): string
    {
        if (false === \strpos($string, '"')) {
            return '"' . $string . '"';
        }
        if (false === \strpos($string, '\'')) {
            return '\'' . $string . '\'';
        }
        // if the string contains both single and double quotes, construct an
        // expression that concatenates all non-double-quote substrings with
        // the quotes, e.g.:
        //    concat("'foo'", '"', "bar")
        $sb = 'concat(';
        $substrings = \explode('"', $string);
        for ($i = 0; $i < \count($substrings); ++$i) {
            $needComma = ($i > 0);
            if ('' !== $substrings[$i]) {
                if ($i > 0) {
                    $sb .= ', ';
                }
                $sb .= '"' . $substrings[$i] . '"';
                $needComma = true;
            }
            if ($i < (\count($substrings) - 1)) {
                if ($needComma) {
                    $sb .= ', ';
                }
                $sb .= "'\"'";
            }
        }
        $sb .= ')';

        return $sb;
    }
}
$tests = array(
    "foo",              // no quotes
    "\"foo",            // double quotes only
    "'foo",             // single quotes only
    "'foo\"bar",        // both; double quotes in mid-string
    "'foo\"bar\"baz",   // multiple double quotes in mid-string
    "'foo\"",           // string ends with double quotes
    "'foo\"\"",         // string ends with run of double quotes
    "\"'foo",           // string begins with double quotes
    "\"\"'foo",         // string begins with run of double quotes
    "'foo\"\"bar"       // run of double quotes in mid-string
);
foreach ($tests as $test) {
    $quoted = C::quote($test);
    $quoted2 = C::quote2($test);
    var_dump([
        'test' => $test,
        'quoted' => $quoted,
        'quoted2' => $quoted2,
        // 'eval' => eval("return $quoted;"),
    ]);
}

outputs

$ php wut.php
array(3) {
  ["test"]=>
  string(3) "foo"
  ["quoted"]=>
  string(5) ""foo""
  ["quoted2"]=>
  string(5) ""foo""
}
array(3) {
  ["test"]=>
  string(4) ""foo"
  ["quoted"]=>
  string(6) "'"foo'"
  ["quoted2"]=>
  string(6) "'"foo'"
}
array(3) {
  ["test"]=>
  string(4) "'foo"
  ["quoted"]=>
  string(6) ""'foo""
  ["quoted2"]=>
  string(6) ""'foo""
}
array(3) {
  ["test"]=>
  string(8) "'foo"bar"
  ["quoted"]=>
  string(26) "concat("'foo", '"', "bar")"
  ["quoted2"]=>
  string(21) "concat("'foo",'"bar')"
}
array(3) {
  ["test"]=>
  string(12) "'foo"bar"baz"
  ["quoted"]=>
  string(38) "concat("'foo", '"', "bar", '"', "baz")"
  ["quoted2"]=>
  string(25) "concat("'foo",'"bar"baz')"
}
array(3) {
  ["test"]=>
  string(5) "'foo""
  ["quoted"]=>
  string(19) "concat("'foo", '"')"
  ["quoted2"]=>
  string(18) "concat("'foo",'"')"
}
array(3) {
  ["test"]=>
  string(6) "'foo"""
  ["quoted"]=>
  string(24) "concat("'foo", '"', '"')"
  ["quoted2"]=>
  string(19) "concat("'foo",'""')"
}
array(3) {
  ["test"]=>
  string(5) ""'foo"
  ["quoted"]=>
  string(19) "concat('"', "'foo")"
  ["quoted2"]=>
  string(18) "concat('"',"'foo")"
}
array(3) {
  ["test"]=>
  string(6) """'foo"
  ["quoted"]=>
  string(24) "concat('"', '"', "'foo")"
  ["quoted2"]=>
  string(19) "concat('""',"'foo")"
}
array(3) {
  ["test"]=>
  string(9) "'foo""bar"
  ["quoted"]=>
  string(31) "concat("'foo", '"', '"', "bar")"
  ["quoted2"]=>
  string(22) "concat("'foo",'""bar')"
}

@enricodias
Copy link
Member

Can you add those tests in a unit test?

@GrahamCampbell GrahamCampbell changed the title XPathSelector::quote(string $string): string [1.11] Add XPathSelector::quote Dec 29, 2023
@divinity76
Copy link
Contributor Author

divinity76 commented Dec 31, 2023

should that be a dedicated XPathTests file or should it join the existing XPath tests in https://github.com/chrome-php/chrome/blob/1.11/tests/PageTest.php ? if a dedicated file, should it just be in the tests folder, or a subfolder?

btw the main reason i re-wrote it was because the original algo was very difficult to read/follow, and I needed the testcases to really believe that the original algo worked - the new algo is much easier to read, to the point where I don't even need testcases to feel confident about it. (nevertheless, this method is affected by a encoding bug fixed in #576 )

@enricodias
Copy link
Member

I was going to suggest creating a new file in the /Dom/Selector path but I just saw that the Dom tests themselves are not in the /Dom path 😅. But anyways, a new file in either path would work. We can reorganize the tests in the future without breaking backwards compatibility.

The test in the PageTest.php is testing the usage of XPathSelector in the real browser, so it's ok to be there. The test cases you used here could be tested without the browser.

I don't even need testcases to feel confident about it

A test ensures that the method will behave the same way for those cases even if someone changes it in the future, intentionally or not. It's also way easier to develop a feature running tests than running a standalone php file and checking the output of a var_dump manually.

I should have asked for better tests when this class was introduced in the first place. It may have avoided bugs like the one in #576

divinity76 added a commit to divinity76/php-src that referenced this pull request Feb 21, 2024
method to quote strings in XPath,
similar to PDO::quote() / mysqli::real_escape_string

sample usage: $xp->query("//span[contains(text()," . $xp->quote($string) . ")]")

the algorithm is derived from Robert Rossney's research into XPath quoting published at https://stackoverflow.com/a/1352556/1067003
(but using an improved implementation I wrote myself, originally for chrome-php/chrome#575 )
divinity76 added a commit to divinity76/php-src that referenced this pull request Feb 21, 2024
method to quote strings in XPath,
similar to PDO::quote() / mysqli::real_escape_string

sample usage: $xp->query("//span[contains(text()," . $xp->quote($string) . ")]")

the algorithm is derived from Robert Rossney's research into XPath quoting published at https://stackoverflow.com/a/1352556/1067003
(but using an improved implementation I wrote myself, originally for chrome-php/chrome#575 )
divinity76 added a commit to divinity76/php-src that referenced this pull request Feb 21, 2024
method to quote strings in XPath,
similar to PDO::quote() / mysqli::real_escape_string

sample usage: $xp->query("//span[contains(text()," . $xp->quote($string) . ")]")

the algorithm is derived from Robert Rossney's research into XPath quoting published at https://stackoverflow.com/a/1352556/1067003
(but using an improved implementation I wrote myself, originally for chrome-php/chrome#575 )
divinity76 added a commit to divinity76/php-src that referenced this pull request Feb 21, 2024
method to quote strings in XPath,
similar to PDO::quote() / mysqli::real_escape_string

sample usage: $xp->query("//span[contains(text()," . $xp->quote($string) . ")]")

the algorithm is derived from Robert Rossney's research into XPath quoting published at https://stackoverflow.com/a/1352556/1067003
(but using an improved implementation I wrote myself, originally for chrome-php/chrome#575 )
@divinity76
Copy link
Contributor Author

made a php builtin version for DOMXPath::quote, if php/php-src#13456 is accepted then there is no need for one in XPathSelector (or the implementation can just be a wrapper for the core one)

nielsdos pushed a commit to php/php-src that referenced this pull request Feb 22, 2024
Method to quote strings in XPath, similar to PDO::quote() / mysqli::real_escape_string.

Sample usage: $xp->query("//span[contains(text()," . $xp->quote($string) . ")]")

The algorithm is derived from Robert Rossney's research into XPath quoting published at https://stackoverflow.com/a/1352556/1067003
But using an improved implementation I wrote myself, originally for chrome-php/chrome#575
@divinity76
Copy link
Contributor Author

Native support has been accepted upstream and will be part of PHP8.4.0 release, so there should be no need for chrome-php to have it, I guess. Sample usage:

new XPathSelector("//span[contains(text()," . DOMXPath::quote($string) . ")]")

closing.

@divinity76 divinity76 closed this Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants