-
-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: track whitespace #1135
feat: track whitespace #1135
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for tackling this issue. However, I am concerned about false positives (i.e. diagnostics on correct code). See my comments marked 'must fix'.
Char8 c = input[0]; | ||
unsigned char c0 = static_cast<unsigned char>(input[0]); | ||
unsigned char c1 = static_cast<unsigned char>(input[1]); | ||
unsigned char c2 = static_cast<unsigned char>(input[2]); | ||
if (c == ' ' || c == '\t' || c == '\f' || c == '\v') { | ||
this->indent_level_ += this->increasing_indent_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like tracking indent level like this for a few reasons:
- I don't like introducing another addition to this hot loop. This is possibly a performance problem.
- It is bug prone. For example, your patch does not treat non-ASCII spaces (such as Ogham Space Mark on line 1882) as indentation.
- This does not account for differences between tabs and spaces.
I think a better approach would be to store a pointer to the beginning of the line. reset_indent_level
would then be the only thing doing new work in the lexer. However, to determine the indentation level, we would need a more sophisticated algorithm; it would re-scan the indentation and track tabs and spaces separately. The parser would invoke this algorithm rarely, so the complexities of the indentation algorithm shouldn't hurt performance much.
Perhaps my idea is bad. I haven't tried it. But storing the beginning of the line feels right to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not tried to implement different ways to track whitespace. I figured the addition would be fairly cheap even in a hot loop, but I didn't measure. I also increment this->indent_level
in the Ogham Space Mark branch of skip_whitespace
now -- thanks for catching that.
As for the difference between tabs and spaces, I assume we should let the programmer configure this? I guess we can try to autodetect this too. I don't mix tabs and spaces myself though, so I don't understand the intent of doing so.
this->diag_reporter_->report(Diag_Mismatched_Curly_Indentation{ | ||
.left_curly = left_curly.span(), | ||
.right_curly = this->peek().span(), | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must fix: I am very skeptical of this diagnostic. Wonky indentation is not necessarily a problem. For example, quick-lint-js should not complain about the following code:
if (x) {
y();
} else {
z();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this diagnostic completely in this new set of patches. I replaced it with Diag_Unclosed_Code_Block_V2
, which I think matches your intention in #282 much more closely.
It will not report any warnings for the snippet of code above. But it will report an error for the snippet of code in #282 as well as others obviously. See the tests for more examples.
Note, this new diagnostic does not catch something like this, where more code trails an expected but missing closing curly:
function f() {
z;
if (false) {
z;
if (1) {
z;
}
if (true) {
}
}
It will report that the curly brace after if (false)
is missing its matching curly. However, it will suggest the matching rightly curly go at the end of the file instead of before if (true)
.
if (!allow_several_statements && | ||
this->peek().indent_level > keyword_token.indent_level) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised that the logic for this diagnostic is so simple! 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit more complicated now to allow a snippet like this, where else
is more indented than if
.
if (x)
y();
else
z();
@@ -4468,6 +4482,14 @@ void Parser::parse_and_visit_if(Parse_Visitor_Base &v) { | |||
|
|||
parse_maybe_else: | |||
if (this->peek().type == Token_Type::kw_else) { | |||
Token else_token = this->peek(); | |||
if (this->peek().indent_level != if_token.indent_level) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must fix: I think this logic is too strict. The following code is fine, for example, but your code would report annoying diagnostics:
if (x) {
y();
} else {
z();
}
if (x)
y();
else
z();
I think we should only trigger a warning about else
if there's an nested if
prior (or something). We shouldn't warn about all else
s.
[[qljs::diag("E0443", Diagnostic_Severity::warning)]] // | ||
[[qljs::message("misleading indentation after 'if' body", | ||
ARG(if_or_else_span))]] // | ||
Source_Code_Span if_or_else_span; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the wrong spot. We should point at the following statement or at the indentation before the following statement. The location of the if
or else
is supplementary information.
if (cond)
f();
g();
//^^^^ warning: misleading indentation after 'if' body
u8"if(x)\n\tif (y)\n\t\tz();\nelse\n\tw();"_sv, // | ||
u8" ^^ Diag_Misleading_Braceless_If_Else_Indentation.if_span\n"_diag // | ||
u8" ^^^^ .else_span"_diag); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, though I find \t
a bit hard to read. Perhaps two spaces is a good substitute:
u8"if(x)\n\tif (y)\n\t\tz();\nelse\n\tw();"_sv, // | |
u8" ^^ Diag_Misleading_Braceless_If_Else_Indentation.if_span\n"_diag // | |
u8" ^^^^ .else_span"_diag); | |
u8"if(x)\n if (y)\n z();\nelse\n w();"_sv, // | |
u8" ^^ Diag_Misleading_Braceless_If_Else_Indentation.if_span\n"_diag // | |
u8" ^^^^ .else_span"_diag); |
Eh, maybe not. We don't have great testing for diagnostics with multi-line code samples...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept \t
for now. I agree it's difficult to read, but I don't find spaces easier. For now, I added an unraveled version of the string above each test.
0f41268
to
055fcbf
Compare
I addressed your "must fix" comments. The diagnostics are more flexible now. They do not necessarily report wonky indentation, but they still use indentation to try to track where to place a closing right curly brace. They also still report misleading indentation for I have not tried your suggestion about changing the way we track indentation in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should focus on only one diagnostic for now (probably either Diag_Unclosed_Code_Block
or Diag_Misleading_If_Or_Else_Body_Indentation
). You can add the other diagnostics later in a separate pull request.
Must fix: The following code reports a false positive ("indentation of 'if' does not match 'else' [E0442]"):
if (true) {
if (true)
console.log();
// console.log();
else
console.log();
}
I think the comment is significant. Deleting the comment fixes the false positive. (Indentation level of the comment doesn't matter.)
Similarly, there's a false negative in the following code (expected "misleading indentation after 'if' body [E0443]", but got nothing):
if (true) {
if (true)
console.log();
// console.log();
//else
console.log();
}
[[qljs::message("indentation of '{0}' does not match '{1}'", ARG(if_span), | ||
ARG(else_span))]] // | ||
[[qljs::message("indentation of '{0}' does not match '{1}'", ARG(else_span), | ||
ARG(if_span))]] // |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To the user, this will look like two different diagnostics.
Nit: I would inline 'else' and 'if' to make the messages more legible to maintainers.
I also think that it's more likely that else
is misindented than if
, so we should point to the else
first.
I think this reads better:
[[qljs::message("indentation of '{0}' does not match '{1}'", ARG(if_span), | |
ARG(else_span))]] // | |
[[qljs::message("indentation of '{0}' does not match '{1}'", ARG(else_span), | |
ARG(if_span))]] // | |
[[qljs::message("indentation of 'else' does not match indentation of 'if'", ARG(else_span))]] // | |
[[qljs::message("matching 'if' here", ARG(if_span))]] // |
} | ||
this->diag_reporter_->report(Diag_Unclosed_Code_Block_V2{ | ||
.block_open = block_open.span(), | ||
.expected_block_close = Source_Code_Span::unit(block_close.begin), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the inaccuracy of expected_block_close
in common situations such as the following:
function f() {
if (true) { // <-- matching '{' here [E0441]
body();
foo();
} // <-- missing '}' [E0441]
I want quick-lint-js to not mislead.
I think we should omit it and only mention the opening {
. In other words, lets use Diag_Unclosed_Code_Block
until we have a smart enough algorithm to point to the correct place for }
.
.block_open = left_curly_span, | ||
case Token_Type::end_of_file: { | ||
Token block_open, block_close; | ||
if (this->mismatched_curly_braces_.has_value()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must fix: Because we don't reset mismatched_curly_braces_
, we might end up with duplicate diagnostics. For example:
function a() {
function f() {
if (true) { // <-- matching '{' here [E0441]; matching '{' here [E0441]
body();
foo();
}
/*
sandbox/hello3.mjs:6:1: error: missing '}' [E0441]
sandbox/hello3.mjs:3:15: note: matching '{' here [E0441]
sandbox/hello3.mjs:6:1: error: missing '}' [E0441]
sandbox/hello3.mjs:3:15: note: matching '{' here [E0441]
*/
Adding this->mismatched_curly_braces_.reset();
after reporting the diagnostic fixes this issue somewhat:
function a() { // <-- matching '{' here [E0441]
function f() {
if (true) { // <-- matching '{' here [E0441]
body();
foo();
}
/*
sandbox/hello3.mjs:6:1: error: missing '}' [E0441]
sandbox/hello3.mjs:3:15: note: matching '{' here [E0441]
sandbox/hello3.mjs:7:1: error: missing '}' [E0441]
sandbox/hello3.mjs:1:14: note: matching '{' here [E0441]
*/
I think a better solution is to maintain a stack of curlies instead of either zero or one pairs of curlies.
bool previous_in_if_statement = this->in_if_statement_; | ||
this->in_if_statement_ = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must fix: Use Bool_Guard
. (It's a bit of boilerplate, but the safety is worth it.) See Switch_Guard
for example usage.
this->peek().indent_level > keyword_token.indent_level) { | ||
this->diag_reporter_->report(Diag_Misleading_If_Or_Else_Body_Indentation{ | ||
.if_or_else_span = keyword_token.span()}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code reports diagnostics for the following code:
// misleading indentation after 'if' body [E0443]
if (true) console.log();
console.log();
function f() {
if (true) console.log();
// misleading indentation after 'if' body [E0443]
else console.log()
console.log();
}
Was this intentional? If so, I think it deserves a test to codify the behavior.
@@ -4468,6 +4496,15 @@ void Parser::parse_and_visit_if(Parse_Visitor_Base &v) { | |||
|
|||
parse_maybe_else: | |||
if (this->peek().type == Token_Type::kw_else) { | |||
Token else_token = this->peek(); | |||
if (previous_in_if_statement && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must fix: This detection of whether we're in a braceless if
or not is faulty. For example, the following code triggers the diagnostic, but if you remove the outer if
(first and last lines) there is no diagnostic:
if (true) {
// indentation of 'if' does not match 'else' [E0442]
if (true) {}
else console.log()
}
Another example of faulty if
detection:
if (true) {
a.map((o) => {
if (true) return 1;
else if (true) return 2;
else return 3;
});
}
// if (x) | ||
// if (y) | ||
// z(); | ||
// else | ||
// w(); | ||
test_parse_and_visit_statement( | ||
u8"if(x)\n\tif (y)\n\t\tz();\n\telse\n\t\tw();"_sv, no_diags); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the no_diag tests, we can split the string onto multiple source lines:
// if (x) | |
// if (y) | |
// z(); | |
// else | |
// w(); | |
test_parse_and_visit_statement( | |
u8"if(x)\n\tif (y)\n\t\tz();\n\telse\n\t\tw();"_sv, no_diags); | |
test_parse_and_visit_statement( | |
u8"if(x)\n"_sv | |
u8"\tif (y)\n"_sv | |
u8"\t\tz();\n"_sv | |
u8"\telse\n"_sv | |
u8"\t\tw();"_sv, | |
no_diags); |
I'm not able to figure out a heuristic and an implementation for it that doesn't feel complex and hacky. I'm going to close this PR for now. Thank you for your suggestions and help nonetheless. |
This set of commits addresses issues #282 and #1113 by tracking whitespace and adding a few new diagnostics. quick-lint will print warnings about mismatched curly brace indentation,
if/else
indentation, and misleading indentation after a bracelessif
.Commit 2379dea modifies a couple existing tests by adding whitespace to them to avoid triggering
Diag_Mismatched_Curly_Indentation
.