Throw syntax errors for invalid EndTags #73

camerondubas · 2019-11-02T21:27:44Z

This PR is in response to this comment PR glimmerjs/glimmer-vm#982 (comment).

Currently Simple HTML Tokenizer allows leading whitespace as well as attributes to be defined in End Tags. The comment linked above suggests that we throw syntax errors in these cases as they are ultimately invalid End Tags

The HTML Spec seems to be a bit inconsistent when it comes to attributes in End Tags, as the 12.1.2.2 End tags section says;

...
4. After the tag name, there may be one or more ASCII whitespace.
5. Finally, end tags must be closed by a U+003E GREATER-THAN SIGN character (>).

However 12.2.5.7 End tag open state says to enter the 12.2.5.8 Tag name state when an the first ASCII Alpha character is encountered. The "tag name state" is not specific to End Tags, and allows for entering the "before attribute name state" and the "self-closing start tag state", both of which aren't really valid in End Tags.

This PR assumes that we want to prevent entering these invalid states and adds syntax errors in the following scenarios:

Leading whitespace before the tagname in an EndTag. i.e </ div>
Attributes after EndTag tagname. i.e </div foo="bar">
Self closing EndTags. i.e </div/>

rwjblue

Thanks for picking this up!

rwjblue · 2019-11-03T01:14:08Z

src/evented-tokenizer.ts

      } else if (char === '>') {
        this.delegate.finishTag();
        this.transitionTo(TokenizerState.beforeData);
        this.tagNameBuffer = '';
      } else {
-        this.appendToTagName(char);
+        if (!this.delegate.current().syntaxError && !isSpace(char)) {


I don’t fully understand this conditional (reviewing on mobile so forgive me if I’ve missed something obvious).

Why do we check if .current().syntaxError?

You're right that this is confusing, and I feel like there may be a better way to do this.

This check is required since we no longer enter the beforeAttributeName or selfClosingStartTag states, which would reset the tagNameBuffer. Without this check I was getting invalid tag names that include the whitespace and/or attributes. i.e {tagname: 'div foo="bar"'}.

The !isSpace(char) is there to not include whitespace in the tagname, since I had made the decision to allow trailing whitespace in the closing tag. More on that decision in my reply to your other comment.

rwjblue · 2019-11-03T01:18:15Z

src/evented-tokenizer.ts

-      if (isSpace(char)) {
-        this.transitionTo(TokenizerState.beforeAttributeName);
-        this.tagNameBuffer = '';
+      if (isSpace(char) && isAlpha(this.peek())) {


Why do we check if the next char is alpha here? I think the goal is to issue an error if there is white space as the first thing after the </, right?

If so, maybe something like:

if (isSpace(char)) { if (this.tagNameBuffer === '') { this.delegate.reportSyntaxError('closing tag must only contain tagname'); } }

What do you think?

I added the check for whitespace after </ on lines 490-493 of this file:

... } else { this.transitionTo(TokenizerState.endTagName); this.delegate.beginEndTag(); this.delegate.reportSyntaxError('closing tag cannot contain whitespace before tagname'); }

The check on line 269 is specifically looking for attributes after the EndTag's tagname. I made the decision to allow whitespace after the tagname because the HTML spec allows for this

After the tag name, there may be one or more ASCII whitespace.

So I'm specifically looking for whitespace followed by and ASCII alpha character (the start of an attribute), which is invalid syntax.

I've you'd prefer to completely disallow any whitespace in closing tags, I'd be happy to update this PR to check for that!

I've you'd prefer to completely disallow any whitespace in closing tags, I'd be happy to update this PR to check for that!

Ya, let's do that.

lifeart · 2019-12-21T10:14:56Z

up

This reverts commit ca74152.

This reverts commit 4a756e9.

This reverts commit aecb392.

camerondubas · 2020-01-14T03:20:18Z

Ok, after some fighting with git/eol characters on Windows I've got a updated working version.

This will now throw the error 'closing tag must only contain tagname' whenever a closing tag contains trailing or leading whitespace, which also covers attributes in closing tags since they would need to be prepended with a whitespace character.

camerondubas · 2020-06-04T22:14:50Z

Bumping this. Hoping to get a review.

I'm happy to close this PR if it isn't relevant anymore. Thanks!

rwjblue · 2020-06-05T00:24:07Z

Eeck, sorry @camerondubas, will try to review tomorrow 😩

jfdnc · 2021-10-11T17:27:38Z

Just noting here that this appears to address emberjs/ember.js#19703 and glimmerjs/glimmer-vm#1309.

Edit: don't know what the state is here since its been sitting around for a bit, but if there are things to address I'm happy to help out.

camerondubas · 2021-10-20T23:12:09Z

I believe this is still pending a review. @rwjblue any chance this could get looked at again?

camerondubas added 2 commits November 2, 2019 13:52

Throw syntax errors for invalid EndTags

7f24f68

Allow trailing whitespace in EngTags

0d213ee

camerondubas mentioned this pull request Nov 2, 2019

Throw error if closing tag contains Muststache Statement glimmerjs/glimmer-vm#982

Open

rwjblue reviewed Nov 3, 2019

View reviewed changes

camerondubas added 8 commits January 13, 2020 17:28

Merge master

4a756e9

Only allow tag name in closing tags

ca74152

Revert "Only allow tag name in closing tags"

7b27974

This reverts commit ca74152.

Revert "Merge master"

c212800

This reverts commit 4a756e9.

Re-apply "Only allow tag name in closing tags"

aecb392

Revert "Re-apply "Only allow tag name in closing tags""

2b44185

This reverts commit aecb392.

Fix whitespace, and Re-apply "Only allow tag name in closing tags"

2d27f35

Fix typo for "contain"

f3c022c

camerondubas requested a review from rwjblue January 14, 2020 03:20

jfdnc mentioned this pull request Oct 11, 2021

[Bug] Ember silently allows malformed HTML 🐻 emberjs/ember.js#19703

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Throw syntax errors for invalid EndTags #73

Throw syntax errors for invalid EndTags #73

Uh oh!

camerondubas commented Nov 2, 2019

Uh oh!

rwjblue left a comment

Uh oh!

rwjblue Nov 3, 2019

Uh oh!

camerondubas Nov 4, 2019

Uh oh!

rwjblue Nov 3, 2019

Uh oh!

camerondubas Nov 4, 2019

Uh oh!

rwjblue Dec 23, 2019

Uh oh!

lifeart commented Dec 21, 2019

Uh oh!

camerondubas commented Jan 14, 2020

Uh oh!

camerondubas commented Jun 4, 2020

Uh oh!

rwjblue commented Jun 5, 2020

Uh oh!

jfdnc commented Oct 11, 2021 •

edited

Loading

Uh oh!

camerondubas commented Oct 20, 2021

Uh oh!

Uh oh!

Throw syntax errors for invalid EndTags #73

Are you sure you want to change the base?

Throw syntax errors for invalid EndTags #73

Uh oh!

Conversation

camerondubas commented Nov 2, 2019

Uh oh!

rwjblue left a comment

Choose a reason for hiding this comment

Uh oh!

rwjblue Nov 3, 2019

Choose a reason for hiding this comment

Uh oh!

camerondubas Nov 4, 2019

Choose a reason for hiding this comment

Uh oh!

rwjblue Nov 3, 2019

Choose a reason for hiding this comment

Uh oh!

camerondubas Nov 4, 2019

Choose a reason for hiding this comment

Uh oh!

rwjblue Dec 23, 2019

Choose a reason for hiding this comment

Uh oh!

lifeart commented Dec 21, 2019

Uh oh!

camerondubas commented Jan 14, 2020

Uh oh!

camerondubas commented Jun 4, 2020

Uh oh!

rwjblue commented Jun 5, 2020

Uh oh!

jfdnc commented Oct 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

camerondubas commented Oct 20, 2021

Uh oh!

Uh oh!

jfdnc commented Oct 11, 2021 •

edited

Loading