For testing: WP_HTML_Tag_Processor with usage #3971

felixarntz · 2023-02-02T22:08:30Z

Not intended for commit.

This PR is solely for QA and performance testing of #3920. It is based on the same code, but also includes initial usage of that new API, which were taken from #3914 and dmsnell#1.

8d7ebfb has all the changes that are additional to #3920, the rest is the same code.

This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

This commit pulls in the HTML Tag Processor from the Gutenbeg repository. The Tag Processor attempts to be an HTML5-spec-compliant parser that provides the ability in PHP to find specific HTML tags and then add, remove, or update attributes on that tag. It provides a safe and reliable way to modify the attribute on HTML tags. ```php // Add missing `rel` attribute to links. $p = new WP_HTML_Tag_Processor( $block_content ); if ( $p->next_tag( 'A' ) && empty( $p->get_attribute( 'rel' ) ) ) { $p->set_attribute( 'noopener nofollow' ); } return $p->get_updated_html(); ``` Introduced originally in WordPress/gutenberg#42485 and developed within the Gutenberg repository, this HTML parsing system was built in order to address a persistent need (properly modifying HTML tag attributes) and was motivated after a sequence of block editor defects which stemmed from mismatches between actual HTML code and expectectations for HTML input running through existing naive string-search-based solutions. The Tag Processor is intended to operate fast enough to avoid being an obstacle on page render while using as little memory overhead as possible. It is practically a zero-memory-overhead system, and only allocates memory as changes to the input HTML document are enqueued, releasing that memory when flushing those changes to the document, moving on to find the next tag, or flushing its entire output via `get_updated_html()`. Rigor has been taken to ensure that the Tag Processor will not be consfused by unexpected or non-normative HTML input, including issues arising from quoting, from different syntax rules within `<title>`, `<textarea>`, and `<script>` tags, from the appearance of rare but legitimate comment and XML-like regions, and from a variety of syntax abnormalities such as unbalanced tags, incomplete syntax, and overlapping tags. The Tag Processor is constrained to parsing an HTML document as a stream of tokens. It will not build an HTML tree or generate a DOM representation of a document. It is designed to start at the beginning of an HTML document and linearly scan through it, potentially modifying that document as it scans. It has no access to the markup inside or around tags and it has no ability to determine which tag openers and tag closers belong to each other, or determine the nesting depth of a given tag. It includes a primitive bookmarking system to remember tags it has previously visited. These bookmarks refer to specific tags, not to string offsets, and continue to point to the same place in the document as edits are applied. By asking the Tag Processor to seek to a given bookmark it's possible to back up and continue processsing again content that has already been traversed. Attribute values are sanitized with `esc_attr()` and rendered as double-quoted attributes. On read they are unescaped and unquoted. Authors wishing to rely on the Tag Processor therefore are free to pass around data as normal strings. Convenience methods for adding and removing CSS class names exist in order to remove the need to process the `class` attribute. ```php // Update heading block class names $p = new WP_HTML_Tag_Processor( $html ); while ( $p->next_tag() ) { switch ( $p->get_tag() ) { case 'H1': case 'H2': case 'H3': case 'H4': case 'H5': case 'H6': $p->remove_class( 'wp-heading' ); $p->add_class( 'wp-block-heading' ); break; } return $p->get_updated_html(); ``` The Tag Processor is intended to be a reliable low-level library for traversing HTML documents and higher-level APIs are to be built upon it. Immediately, and in Core Gutenberg blocks it is meant to replace HTML modification that currently relies on RegExp patterns and simpler string replacements. See the following for examples of such replacement: WordPress/gutenberg@1315784 https://github.com/WordPress/gutenberg/pull/45469/files#diff-dcd9e1f9b87ca63efe9f1e834b4d3048778d3eca41aa39c636f8b16a5bb452d2L46 WordPress/gutenberg#46625 Co-Authored-By: Adam Zielinski <[email protected]> Co-Authored-By: Bernie Reiter <[email protected]> Co-Authored-By: Grzegorz Ziolkowski <[email protected]>

* Rename data providers to match test per coding standard. * Restructure data provider datasets into a single array form for consistency. * Add `WP_HTML_Tag_Processor::` to @Covers methods per coding standard. * Add empty line between set up and assertion groupings. * Moved well-formed HTML into separate test of updating attributes. * Replaced assertEquals() with assertSame().

Tests_{APIorGroup}_className.

@hellofromtonya

Props @hellofromtonya

Also: - Change visibility of some properties to `protected` to aid with in-progress expansion of the HTML API. - Refactor short-circuit checks in `get_updated_html()` for clarity.

…re none was thrown before.

dmsnell and others added 30 commits January 26, 2023 15:48

Move class_exists calls to wp-html

40e1cb3

Mark helper classes final

8b507e5

Updates from review feedback, mostly docs

561acff

Load API files directly from wp-settings.php

b708c6b

Tests: remove loading API files

521a500

Renames test classes to coding standard

8bdfae4

Tests_{APIorGroup}_className.

Renames test filenames to coding standard

bc17086

Cleans HEADS from merge conflict from test file

334e415

Reword explanation of lexical updates

def4ed4

docblock and consistency updates, addressing some PR feedback

b924e03

Move HTML processing modules into new html directory

91dc772

Documentation wording updates.

1fd0d7d

Props @hellofromtonya

Rename library to "HTML-API" instead of "HTML"

361710d

Un-finalize helper classes

2d1411a

Replace throwing with trigger_error( E_USER_WARNING )

d8fdf41

Add test to check for bug when encounting unexpected </SCRIPT> closer

1465218

Update tests: fix data provider and remove Exception expectation

5c1a5d5

Lint issue

c50ffee

Fix broken tests

9a5ccf0

Remove some TODOs, most were done already

13dd7d7

Expand design and limitations discussion

b31cca4

Loosen assertion on warning

1a9bec0

Rename some properties to clarify their purpose and expand comments.

28e9bf3

Also: - Change visibility of some properties to `protected` to aid with in-progress expansion of the HTML API. - Refactor short-circuit checks in `get_updated_html()` for clarity.

Linter: yoda condition

1e2ef09

Typos in comments

243dc7c

Rework @Covers attributes

8152988

Was doing it wrong w.r.t. doing_it_wrong

a5f2d96

Add additional type check to avoid throwing _doing_it_wrong error whe…

5b1d47e

…re none was thrown before.

dmsnell and others added 8 commits February 1, 2023 17:12

Lada la di

3f9b274

Remove checks that _doing_it_wrong throws a notice

1b8c75c

Set expected incorrect usage in tests.

aad5310

Docblock updates

4a43850

Shorten function summary

fbbf382

Ensure test assertions have a message parameter

5b4ad8f

Incorporate WP_HTML_Tag_Processor usage from WordPress#3914.

8d7ebfb

Merge branch 'trunk' into test/html-tag-processor-with-usage

a0a24f0

This was referenced Feb 2, 2023

Try/html api with packages update and layout.php back ports #3955

Closed

Editor: Introduce HTML Tag Processor #3920

Closed

felixarntz closed this Feb 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

For testing: WP_HTML_Tag_Processor with usage #3971

For testing: WP_HTML_Tag_Processor with usage #3971

felixarntz commented Feb 2, 2023 •

edited

Loading

For testing: WP_HTML_Tag_Processor with usage #3971

For testing: WP_HTML_Tag_Processor with usage #3971

Conversation

felixarntz commented Feb 2, 2023 • edited Loading

felixarntz commented Feb 2, 2023 •

edited

Loading