Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tag Processor] Merge the test files into a single file #44593

Closed
wants to merge 3 commits into from

Conversation

adamziel
Copy link
Contributor

@adamziel adamziel commented Sep 30, 2022

Part of #44410

What?

Follows up on #44447 (comment):

ultimately the CI runs are what matters. in fact, if we're going to create a new test file and special instructions just to run with WordPress as in 8cf66b0 I'd rather just remove the special-casing to run the test locally.

the comment at the top of the file is nice, but would be redundant if we added it everywhere. if someone is like me and prefers to run this locally then I think they/I can temporarily modify the file. if someone wants to be able to test this project locally then we can hopefully catch all files in one project to make the testing configuration simpler and more flexible.

This sentiment makes sense, so in this PR I merge the two test files into a single file.

Test instructions

Confirm the CI is green.

cc @dmsnell

@adamziel adamziel changed the title Replace two WP_HTML_Tag_Processor test files with a single one [Tag Processor] Merge two test files into a single file Sep 30, 2022
@adamziel adamziel self-assigned this Sep 30, 2022
@adamziel adamziel added the Developer Experience Ideas about improving block and theme developer experience label Sep 30, 2022
array(
'" onclick="alert(\'1\');"><span onclick=""></span><script>alert("1")</script>',
'&quot; onclick=&quot;alert(&#039;1&#039;);&quot;&gt;&lt;span onclick=&quot;&quot;&gt;&lt;/span&gt;&lt;script&gt;alert(&quot;1&quot;)&lt;/script&gt;',
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still uncomfortable with re-encoding the tests of esc_attr() in our test here. this adds a deep coupling that can come back and frustrate people trying to make unrelated changes.

for instance, in our actual code and discussion of the tag processor we have essentially delegated responsibility for attribute encoding and sanitization to esc_attr(). we don't do additional work and we don't depend on any particular behavior other than its own guarantees.

why should we then extend additional constraints in the tests that we never wanted to make in the code or in conversation? we're not actually testing if this prevents XSS anyway - we're testing that it runs through esc_attr(), and I'd rather leave the XSS prevention testing to that function.

ultimately this isn't the biggest problem and our code is littered with deeply-coupled integration tests like it, but at the same token our test suite is already in a place where it's frustrating how often tests fail probabilistically and where it can be hard to understand or figure out what's wrong when seemingly-unrelated tests fail.

if we're going to take responsibility for XSS protection and incorporate that code ourselves (which I don't think we'll do) then I think it would be more appropriate to test here. if we're going to delegate to esc_attr() then we should either lean on its tests to cover the scenarios, or if it doesn't have them, lean on the history and reporting that has happened over the years to vet it.

" -> &quot;, for example, is neither an XSS attack nor evidence that we're protected. it's a good demonstration that we encode the HTML entities and that we won't break out of the syntax; &quot; -> &quot; does neither and really appears to assert an internal behavior of esc_attr() and nothing else.

it would be nice if it were easier and natively-available in PHP to create test spies, which is what I was emulating by comparing against esc_attr( $value ), but every time I've tried that it involved user-space libraries with particular restrictions.

if we want to retain some of these tests then our naming is confusing. we could name it test_set_attribute_properly_escapes_html_entities() and then give explanations for the constraints we are setting on it, which would free someone to swap out esc_attr() while maintaining our requirements. I don't think our requirements are as strict as shown here.

array(
    'escapes double-quote to avoid breaking out of HTML' => array( '"', '&quot;' ),
    'avoids double-encoding existing HTML entities' => array( '&quot;', '&quot;' ),
);

I'm not sure if any of the other constraints is really there. the example with onclick seems to be covered by the assertion on quotes alone.

lots of thoughts 😆

Copy link
Contributor Author

@adamziel adamziel Oct 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmsnell I agree – I only did that to have an automated check in place to confirm that esc_attr is indeed used in all the different cases. Ideally we'd have a spy there, but it won't work if we load WordPress. Or maybe it would? We could register the esc_attr call and augment the return value to something like ESC_ATTR( $original_input ) using a filter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well blimey, we can do this with the attribute_escape filter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in 80452d8
should be easy to revert if we need to.
I removed the comment at the top about running this in isolation since we can't anymore; not if we want to assert that we called esc_attr() and also verify behaviors of esc_attr(). this is fine.

Copy link
Contributor Author

@adamziel adamziel Oct 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Looks good to me, too. This PR needs a rebase but should be good otherwise – I'll look into that

@adamziel
Copy link
Contributor Author

Redoing this PR on thunk will be easier than rebasing. Here's the raw diff for reference:

diff --git a/phpunit/html/wp-html-tag-processor-standalone-test.php b/phpunit/html/wp-html-tag-processor-test.php
similarity index 97%
rename from phpunit/html/wp-html-tag-processor-standalone-test.php
rename to phpunit/html/wp-html-tag-processor-test.php
index cef9b45f7c5d5..dd6e193715133 100644
--- a/phpunit/html/wp-html-tag-processor-standalone-test.php
+++ b/phpunit/html/wp-html-tag-processor-test.php
@@ -1,26 +1,11 @@
 <?php
 /**
  * Unit tests covering WP_HTML_Tag_Processor functionality.
- * This file takes about 100ms to run because it does not load
- * any WordPress libraries:
- *
- * ```
- * ./vendor/bin/phpunit --no-configuration ./phpunit/html/wp-html-tag-processor-test.php
- * ```
- *
- * Put all new WP_HTML_Tag_Processor tests here, and only add new cases to
- * wp-html-tag-processor-test-wp.php when they cannot run without WordPress.
  *
  * @package WordPress
  * @subpackage HTML
  */
 
-if ( ! function_exists( 'esc_attr' ) ) {
-	function esc_attr( $string ) {
-		return htmlspecialchars( $string, ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML401, 'utf-8', false );
-	}
-}
-
 if ( ! class_exists( 'WP_UnitTestCase' ) ) {
 	abstract class WP_UnitTestCase extends \PHPUnit\Framework\TestCase {}
 }
@@ -256,20 +241,20 @@ public function test_set_attribute_on_a_non_existing_tag_does_not_change_the_mar
 	}
 
 	/**
-	 * Passing a double quote inside of an attribute values could lead to an XSS attack as follows:
+	 * Passing unescaped attribute values could lead to XSS attacks and malformed HTML.
+	 * This test ensures that we properly escape the values to avoid these problems.
 	 *
+	 * Example:
 	 * <code>
 	 *     $p = new WP_HTML_Tag_Processor( '<div class="header"></div>' );
 	 *     $p->next_tag();
-	 *     $p->set_attribute('class', '" onclick="alert');
-	 *     echo $p;
-	 *     // <div class="" onclick="alert"></div>
-	 * </code>
+	 *     $p->set_attribute( 'class', '" onclick="alert' );
 	 *
-	 * To prevent it, `set_attribute` calls `esc_attr()` on its given values.
+	 *     // Not this!
+	 *     '<div class="" onclick="alert"></div>'
 	 *
-	 * <code>
-	 *    <div class="&quot; onclick=&quot;alert"></div>
+	 *     // This instead…
+	 *     '<div class="&quot; onclick=&quot;alert"></div>'
 	 * </code>
 	 *
 	 * @ticket 56299
@@ -277,10 +262,20 @@ public function test_set_attribute_on_a_non_existing_tag_does_not_change_the_mar
 	 * @dataProvider data_set_attribute_escapable_values
 	 * @covers set_attribute
 	 */
-	public function test_set_attribute_prevents_xss( $attribute_value ) {
+	public function test_set_attribute_escapes_value($value ) {
+		$spy = function ( $safe_text, $text ) use ( $value ) {
+			$this->assertEquals( $value, $text );
+
+			return $safe_text;
+		};
+
+		add_filter( 'attribute_escape', $spy );
+
 		$p = new WP_HTML_Tag_Processor( '<div></div>' );
 		$p->next_tag();
-		$p->set_attribute( 'test', $attribute_value );
+		$p->set_attribute( 'test', $value );
+
+		remove_filter( 'attribute_escape', $spy );
 
 		/*
 		 * Testing the escaping is hard using tools that properly parse
@@ -297,7 +292,7 @@ public function test_set_attribute_prevents_xss( $attribute_value ) {
 		preg_match( '~^<div test=(.*)></div>$~', (string) $p, $match );
 		list( , $actual_value ) = $match;
 
-		$this->assertEquals( $actual_value, '"' . esc_attr( $attribute_value ) . '"' );
+		$this->assertEquals( $actual_value, '"' . esc_attr( $value ) . '"' );
 	}
 
 	/**
diff --git a/phpunit/html/wp-html-tag-processor-wp-test.php b/phpunit/html/wp-html-tag-processor-wp-test.php
deleted file mode 100644
index 6c3a30340b267..0000000000000
--- a/phpunit/html/wp-html-tag-processor-wp-test.php
+++ /dev/null
@@ -1,91 +0,0 @@
-<?php
-/**
- * Unit tests covering WP_HTML_Tag_Processor functionality.
- * This file is only for tests that require WordPress libraries.
- * Put all other tests in wp-html-tag-processor-test-standalone.php.
- *
- * Run these tests locally as follows:
- *
- * ```
- * npx wp-env run phpunit "phpunit -c /var/www/html/wp-content/plugins/gutenberg/phpunit.xml.dist --filter WP_HTML_Tag_Processor_Test_WP"
- * ```
- *
- * @package WordPress
- * @subpackage HTML
- */
-
-require_once __DIR__ . '/../../lib/experimental/html/index.php';
-
-/**
- * @group html
- *
- * @coversDefaultClass WP_HTML_Tag_Processor
- */
-class WP_HTML_Tag_Processor_Test_WP extends WP_UnitTestCase {
-
-	/**
-	 * Passing a double quote inside of an attribute values could lead to an XSS attack as follows:
-	 *
-	 * <code>
-	 *     $p = new WP_HTML_Tag_Processor( '<div class="header"></div>' );
-	 *     $p->next_tag();
-	 *     $p->set_attribute('class', '" onclick="alert');
-	 *     echo $p;
-	 *     // <div class="" onclick="alert"></div>
-	 * </code>
-	 *
-	 * To prevent it, `set_attribute` calls `esc_attr()` on its given values.
-	 *
-	 * <code>
-	 *    <div class="&quot; onclick=&quot;alert"></div>
-	 * </code>
-	 *
-	 * @ticket 56299
-	 *
-	 * @dataProvider data_set_attribute_escapable_values
-	 * @covers set_attribute
-	 */
-	public function test_set_attribute_prevents_xss( $value_to_set, $expected_result ) {
-		$p = new WP_HTML_Tag_Processor( '<div></div>' );
-		$p->next_tag();
-		$p->set_attribute( 'test', $value_to_set );
-
-		/*
-		 * Testing the escaping is hard using tools that properly parse
-		 * HTML because they might interpret the escaped values. It's hard
-		 * with tools that don't understand HTML because they might get
-		 * confused by improperly-escaped values.
-		 *
-		 * For this test, since we control the input HTML we're going to
-		 * do what looks like the opposite of what we want to be doing with
-		 * this library but are only doing so because we have full control
-		 * over the content and because we want to look at the raw values.
-		 */
-		$match = null;
-		preg_match( '~^<div test=(.*)></div>$~', (string) $p, $match );
-		list( , $actual_value ) = $match;
-
-		$this->assertEquals( $actual_value, '"' . $expected_result . '"' );
-	}
-
-	/**
-	 * Data provider with HTML attribute values that might need escaping.
-	 */
-	public function data_set_attribute_escapable_values() {
-		return array(
-			array( '"', '&quot;' ),
-			array( '&quot;', '&quot;' ),
-			array( '&', '&amp;' ),
-			array( '&amp;', '&amp;' ),
-			array( '&euro;', '&euro;' ),
-			array( "'", '&#039;' ),
-			array( '<>', '&lt;&gt;' ),
-			array( '&quot";', '&amp;quot&quot;;' ),
-			array(
-				'" onclick="alert(\'1\');"><span onclick=""></span><script>alert("1")</script>',
-				'&quot; onclick=&quot;alert(&#039;1&#039;);&quot;&gt;&lt;span onclick=&quot;&quot;&gt;&lt;/span&gt;&lt;script&gt;alert(&quot;1&quot;)&lt;/script&gt;',
-			),
-		);
-	}
-
-}

@adamziel adamziel changed the title [Tag Processor] Merge two test files into a single file [Tag Processor] Merge the test files into a single file Oct 19, 2022
@dmsnell
Copy link
Contributor

dmsnell commented Nov 17, 2022

The merge of #45762 covers the goal of this PR, so I'm closing it. Thanks @adamziel for the first iterations on this work!

@dmsnell dmsnell closed this Nov 17, 2022
@dmsnell dmsnell deleted the update/merge-unit-tests-files branch November 17, 2022 05:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Developer Experience Ideas about improving block and theme developer experience
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants