-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Throw syntax errors for invalid EndTags #73
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for picking this up!
} else if (char === '>') { | ||
this.delegate.finishTag(); | ||
this.transitionTo(TokenizerState.beforeData); | ||
this.tagNameBuffer = ''; | ||
} else { | ||
this.appendToTagName(char); | ||
if (!this.delegate.current().syntaxError && !isSpace(char)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t fully understand this conditional (reviewing on mobile so forgive me if I’ve missed something obvious).
Why do we check if .current().syntaxError
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right that this is confusing, and I feel like there may be a better way to do this.
This check is required since we no longer enter the beforeAttributeName
or selfClosingStartTag
states, which would reset the tagNameBuffer
. Without this check I was getting invalid tag names that include the whitespace and/or attributes. i.e {tagname: 'div foo="bar"'}
.
The !isSpace(char)
is there to not include whitespace in the tagname, since I had made the decision to allow trailing whitespace in the closing tag. More on that decision in my reply to your other comment.
src/evented-tokenizer.ts
Outdated
if (isSpace(char)) { | ||
this.transitionTo(TokenizerState.beforeAttributeName); | ||
this.tagNameBuffer = ''; | ||
if (isSpace(char) && isAlpha(this.peek())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we check if the next char is alpha here? I think the goal is to issue an error if there is white space as the first thing after the </
, right?
If so, maybe something like:
if (isSpace(char)) {
if (this.tagNameBuffer === '') {
this.delegate.reportSyntaxError('closing tag must only contain tagname');
}
}
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the check for whitespace after </
on lines 490-493 of this file:
...
} else {
this.transitionTo(TokenizerState.endTagName);
this.delegate.beginEndTag();
this.delegate.reportSyntaxError('closing tag cannot contain whitespace before tagname');
}
The check on line 269 is specifically looking for attributes after the EndTag's tagname. I made the decision to allow whitespace after the tagname because the HTML spec allows for this
- After the tag name, there may be one or more ASCII whitespace.
So I'm specifically looking for whitespace followed by and ASCII alpha character (the start of an attribute), which is invalid syntax.
I've you'd prefer to completely disallow any whitespace in closing tags, I'd be happy to update this PR to check for that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've you'd prefer to completely disallow any whitespace in closing tags, I'd be happy to update this PR to check for that!
Ya, let's do that.
up |
Ok, after some fighting with git/eol characters on Windows I've got a updated working version. This will now throw the error |
Bumping this. Hoping to get a review. I'm happy to close this PR if it isn't relevant anymore. Thanks! |
Eeck, sorry @camerondubas, will try to review tomorrow 😩 |
Just noting here that this appears to address emberjs/ember.js#19703 and glimmerjs/glimmer-vm#1309. Edit: don't know what the state is here since its been sitting around for a bit, but if there are things to address I'm happy to help out. |
I believe this is still pending a review. @rwjblue any chance this could get looked at again? |
This PR is in response to this comment PR glimmerjs/glimmer-vm#982 (comment).
Currently Simple HTML Tokenizer allows leading whitespace as well as attributes to be defined in End Tags. The comment linked above suggests that we throw syntax errors in these cases as they are ultimately invalid End Tags
The HTML Spec seems to be a bit inconsistent when it comes to attributes in End Tags, as the 12.1.2.2 End tags section says;
However 12.2.5.7 End tag open state says to enter the 12.2.5.8 Tag name state when an the first ASCII Alpha character is encountered. The "tag name state" is not specific to End Tags, and allows for entering the "before attribute name state" and the "self-closing start tag state", both of which aren't really valid in End Tags.
This PR assumes that we want to prevent entering these invalid states and adds syntax errors in the following scenarios:
</ div>
</div foo="bar">
</div/>