Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throw syntax errors for invalid EndTags #73

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

camerondubas
Copy link

This PR is in response to this comment PR glimmerjs/glimmer-vm#982 (comment).

Currently Simple HTML Tokenizer allows leading whitespace as well as attributes to be defined in End Tags. The comment linked above suggests that we throw syntax errors in these cases as they are ultimately invalid End Tags

The HTML Spec seems to be a bit inconsistent when it comes to attributes in End Tags, as the 12.1.2.2 End tags section says;

...
4. After the tag name, there may be one or more ASCII whitespace.
5. Finally, end tags must be closed by a U+003E GREATER-THAN SIGN character (>).

However 12.2.5.7 End tag open state says to enter the 12.2.5.8 Tag name state when an the first ASCII Alpha character is encountered. The "tag name state" is not specific to End Tags, and allows for entering the "before attribute name state" and the "self-closing start tag state", both of which aren't really valid in End Tags.


This PR assumes that we want to prevent entering these invalid states and adds syntax errors in the following scenarios:

  • Leading whitespace before the tagname in an EndTag. i.e </ div>
  • Attributes after EndTag tagname. i.e </div foo="bar">
  • Self closing EndTags. i.e </div/>

Copy link
Collaborator

@rwjblue rwjblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking this up!

} else if (char === '>') {
this.delegate.finishTag();
this.transitionTo(TokenizerState.beforeData);
this.tagNameBuffer = '';
} else {
this.appendToTagName(char);
if (!this.delegate.current().syntaxError && !isSpace(char)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t fully understand this conditional (reviewing on mobile so forgive me if I’ve missed something obvious).

Why do we check if .current().syntaxError?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that this is confusing, and I feel like there may be a better way to do this.

This check is required since we no longer enter the beforeAttributeName or selfClosingStartTag states, which would reset the tagNameBuffer. Without this check I was getting invalid tag names that include the whitespace and/or attributes. i.e {tagname: 'div foo="bar"'}.

The !isSpace(char) is there to not include whitespace in the tagname, since I had made the decision to allow trailing whitespace in the closing tag. More on that decision in my reply to your other comment.

if (isSpace(char)) {
this.transitionTo(TokenizerState.beforeAttributeName);
this.tagNameBuffer = '';
if (isSpace(char) && isAlpha(this.peek())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we check if the next char is alpha here? I think the goal is to issue an error if there is white space as the first thing after the </, right?

If so, maybe something like:

if (isSpace(char)) {
  if (this.tagNameBuffer === '') {
    this.delegate.reportSyntaxError('closing tag must only contain tagname');
  }
}

What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the check for whitespace after </ on lines 490-493 of this file:

...
} else {
    this.transitionTo(TokenizerState.endTagName);
    this.delegate.beginEndTag();
    this.delegate.reportSyntaxError('closing tag cannot contain whitespace before tagname');
}

The check on line 269 is specifically looking for attributes after the EndTag's tagname. I made the decision to allow whitespace after the tagname because the HTML spec allows for this

  1. After the tag name, there may be one or more ASCII whitespace.

So I'm specifically looking for whitespace followed by and ASCII alpha character (the start of an attribute), which is invalid syntax.

I've you'd prefer to completely disallow any whitespace in closing tags, I'd be happy to update this PR to check for that!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've you'd prefer to completely disallow any whitespace in closing tags, I'd be happy to update this PR to check for that!

Ya, let's do that.

@lifeart
Copy link

lifeart commented Dec 21, 2019

up

@camerondubas
Copy link
Author

Ok, after some fighting with git/eol characters on Windows I've got a updated working version.

This will now throw the error 'closing tag must only contain tagname' whenever a closing tag contains trailing or leading whitespace, which also covers attributes in closing tags since they would need to be prepended with a whitespace character.

@camerondubas
Copy link
Author

Bumping this. Hoping to get a review.

I'm happy to close this PR if it isn't relevant anymore. Thanks!

@rwjblue
Copy link
Collaborator

rwjblue commented Jun 5, 2020

Eeck, sorry @camerondubas, will try to review tomorrow 😩

@jfdnc
Copy link

jfdnc commented Oct 11, 2021

Just noting here that this appears to address emberjs/ember.js#19703 and glimmerjs/glimmer-vm#1309.

Edit: don't know what the state is here since its been sitting around for a bit, but if there are things to address I'm happy to help out.

@camerondubas
Copy link
Author

I believe this is still pending a review. @rwjblue any chance this could get looked at again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants