feat: track whitespace #1135

arieldon · 2023-12-26T16:37:28Z

This set of commits addresses issues #282 and #1113 by tracking whitespace and adding a few new diagnostics. quick-lint will print warnings about mismatched curly brace indentation, if/else indentation, and misleading indentation after a braceless if.

Commit 2379dea modifies a couple existing tests by adding whitespace to them to avoid triggering Diag_Mismatched_Curly_Indentation.

strager

Thanks for tackling this issue. However, I am concerned about false positives (i.e. diagnostics on correct code). See my comments marked 'must fix'.

strager · 2023-12-31T22:39:24Z

src/quick-lint-js/fe/lex.cpp

  Char8 c = input[0];
  unsigned char c0 = static_cast<unsigned char>(input[0]);
  unsigned char c1 = static_cast<unsigned char>(input[1]);
  unsigned char c2 = static_cast<unsigned char>(input[2]);
  if (c == ' ' || c == '\t' || c == '\f' || c == '\v') {
+    this->indent_level_ += this->increasing_indent_;


I don't like tracking indent level like this for a few reasons:

I don't like introducing another addition to this hot loop. This is possibly a performance problem.

It is bug prone. For example, your patch does not treat non-ASCII spaces (such as Ogham Space Mark on line 1882) as indentation.

This does not account for differences between tabs and spaces.

I think a better approach would be to store a pointer to the beginning of the line. reset_indent_level would then be the only thing doing new work in the lexer. However, to determine the indentation level, we would need a more sophisticated algorithm; it would re-scan the indentation and track tabs and spaces separately. The parser would invoke this algorithm rarely, so the complexities of the indentation algorithm shouldn't hurt performance much.

Perhaps my idea is bad. I haven't tried it. But storing the beginning of the line feels right to me.

I have not tried to implement different ways to track whitespace. I figured the addition would be fairly cheap even in a hot loop, but I didn't measure. I also increment this->indent_level in the Ogham Space Mark branch of skip_whitespace now -- thanks for catching that.

As for the difference between tabs and spaces, I assume we should let the programmer configure this? I guess we can try to autodetect this too. I don't mix tabs and spaces myself though, so I don't understand the intent of doing so.

src/quick-lint-js/fe/lex.cpp

strager · 2023-12-31T22:46:25Z

src/quick-lint-js/fe/parse-statement.cpp

+          this->diag_reporter_->report(Diag_Mismatched_Curly_Indentation{
+              .left_curly = left_curly.span(),
+              .right_curly = this->peek().span(),
+          });


Must fix: I am very skeptical of this diagnostic. Wonky indentation is not necessarily a problem. For example, quick-lint-js should not complain about the following code:

if (x) { y(); } else { z(); }

I removed this diagnostic completely in this new set of patches. I replaced it with Diag_Unclosed_Code_Block_V2, which I think matches your intention in #282 much more closely.

It will not report any warnings for the snippet of code above. But it will report an error for the snippet of code in #282 as well as others obviously. See the tests for more examples.

Note, this new diagnostic does not catch something like this, where more code trails an expected but missing closing curly:

function f() { z; if (false) { z; if (1) { z; } if (true) { } }

It will report that the curly brace after if (false) is missing its matching curly. However, it will suggest the matching rightly curly go at the end of the file instead of before if (true).

strager · 2023-12-31T22:52:32Z

src/quick-lint-js/fe/parse-statement.cpp

+    if (!allow_several_statements &&
+        this->peek().indent_level > keyword_token.indent_level) {


I'm surprised that the logic for this diagnostic is so simple! 👍

It's a bit more complicated now to allow a snippet like this, where else is more indented than if.

if (x) y(); else z();

strager · 2023-12-31T22:59:10Z

src/quick-lint-js/fe/parse-statement.cpp

@@ -4468,6 +4482,14 @@ void Parser::parse_and_visit_if(Parse_Visitor_Base &v) {

 parse_maybe_else:
  if (this->peek().type == Token_Type::kw_else) {
+    Token else_token = this->peek();
+    if (this->peek().indent_level != if_token.indent_level) {


Must fix: I think this logic is too strict. The following code is fine, for example, but your code would report annoying diagnostics:

if (x) { y(); } else { z(); } if (x) y(); else z();

I think we should only trigger a warning about else if there's an nested if prior (or something). We shouldn't warn about all elses.

test/test-parse-function.cpp

strager · 2023-12-31T23:06:47Z

src/quick-lint-js/diag/diagnostic-types-2.h

+  [[qljs::diag("E0443", Diagnostic_Severity::warning)]]  //
+  [[qljs::message("misleading indentation after 'if' body",
+                  ARG(if_or_else_span))]]  //
+  Source_Code_Span if_or_else_span;


I think this is the wrong spot. We should point at the following statement or at the indentation before the following statement. The location of the if or else is supplementary information.

if (cond) f(); g(); //^^^^ warning: misleading indentation after 'if' body

src/quick-lint-js/diag/diagnostic-types-2.h

test/test-parse-statement.cpp

strager · 2023-12-31T23:10:15Z

test/test-parse-statement.cpp

+        u8"if(x)\n\tif (y)\n\t\tz();\nelse\n\tw();"_sv,  //
+        u8"         ^^ Diag_Misleading_Braceless_If_Else_Indentation.if_span\n"_diag  //
+        u8"                           ^^^^ .else_span"_diag);


This looks good, though I find \t a bit hard to read. Perhaps two spaces is a good substitute:

Suggested change

u8"if(x)\n\tif (y)\n\t\tz();\nelse\n\tw();"_sv, //

u8" ^^ Diag_Misleading_Braceless_If_Else_Indentation.if_span\n"_diag //

u8" ^^^^ .else_span"_diag);

u8"if(x)\n if (y)\n z();\nelse\n w();"_sv, //

u8" ^^ Diag_Misleading_Braceless_If_Else_Indentation.if_span\n"_diag //

u8" ^^^^ .else_span"_diag);

Eh, maybe not. We don't have great testing for diagnostics with multi-line code samples...

I kept \t for now. I agree it's difficult to read, but I don't find spaces easier. For now, I added an unraveled version of the string above each test.

arieldon · 2024-01-03T16:09:06Z

I addressed your "must fix" comments. The diagnostics are more flexible now. They do not necessarily report wonky indentation, but they still use indentation to try to track where to place a closing right curly brace. They also still report misleading indentation for if and else statements.

I have not tried your suggestion about changing the way we track indentation in Lexer yet. Diag_Misleading_If_Or_Else_Body_Indentation also still reports the if and not the actual misleading indentation.

strager

I think you should focus on only one diagnostic for now (probably either Diag_Unclosed_Code_Block or Diag_Misleading_If_Or_Else_Body_Indentation). You can add the other diagnostics later in a separate pull request.

Must fix: The following code reports a false positive ("indentation of 'if' does not match 'else' [E0442]"):

if (true) {
  if (true)
    console.log();
//      console.log();
  else
    console.log();
}

I think the comment is significant. Deleting the comment fixes the false positive. (Indentation level of the comment doesn't matter.)

Similarly, there's a false negative in the following code (expected "misleading indentation after 'if' body [E0443]", but got nothing):

if (true) {
  if (true)
    console.log();
  //      console.log();
  //else
    console.log();
}

strager · 2024-01-04T03:32:19Z

src/quick-lint-js/diag/diagnostic-types-2.h

+  [[qljs::message("indentation of '{0}' does not match '{1}'", ARG(if_span),
+                  ARG(else_span))]]  //
+  [[qljs::message("indentation of '{0}' does not match '{1}'", ARG(else_span),
+                  ARG(if_span))]]  //


To the user, this will look like two different diagnostics.

Nit: I would inline 'else' and 'if' to make the messages more legible to maintainers.

I also think that it's more likely that else is misindented than if, so we should point to the else first.

I think this reads better:

Suggested change

[[qljs::message("indentation of '{0}' does not match '{1}'", ARG(if_span),

ARG(else_span))]] //

[[qljs::message("indentation of '{0}' does not match '{1}'", ARG(else_span),

ARG(if_span))]] //

[[qljs::message("indentation of 'else' does not match indentation of 'if'", ARG(else_span))]] //

[[qljs::message("matching 'if' here", ARG(if_span))]] //

strager · 2024-01-04T21:47:29Z

src/quick-lint-js/fe/parse-statement.cpp

+        }
+        this->diag_reporter_->report(Diag_Unclosed_Code_Block_V2{
+            .block_open = block_open.span(),
+            .expected_block_close = Source_Code_Span::unit(block_close.begin),


I don't like the inaccuracy of expected_block_close in common situations such as the following:

function f() { if (true) { // <-- matching '{' here [E0441] body(); foo(); } // <-- missing '}' [E0441]

I want quick-lint-js to not mislead.

I think we should omit it and only mention the opening {. In other words, lets use Diag_Unclosed_Code_Block until we have a smart enough algorithm to point to the correct place for }.

strager · 2024-01-04T21:56:28Z

src/quick-lint-js/fe/parse-statement.cpp

-            .block_open = left_curly_span,
+      case Token_Type::end_of_file: {
+        Token block_open, block_close;
+        if (this->mismatched_curly_braces_.has_value()) {


Must fix: Because we don't reset mismatched_curly_braces_, we might end up with duplicate diagnostics. For example:

function a() { function f() { if (true) { // <-- matching '{' here [E0441]; matching '{' here [E0441] body(); foo(); } /* sandbox/hello3.mjs:6:1: error: missing '}' [E0441] sandbox/hello3.mjs:3:15: note: matching '{' here [E0441] sandbox/hello3.mjs:6:1: error: missing '}' [E0441] sandbox/hello3.mjs:3:15: note: matching '{' here [E0441] */

Adding this->mismatched_curly_braces_.reset(); after reporting the diagnostic fixes this issue somewhat:

function a() { // <-- matching '{' here [E0441] function f() { if (true) { // <-- matching '{' here [E0441] body(); foo(); } /* sandbox/hello3.mjs:6:1: error: missing '}' [E0441] sandbox/hello3.mjs:3:15: note: matching '{' here [E0441] sandbox/hello3.mjs:7:1: error: missing '}' [E0441] sandbox/hello3.mjs:1:14: note: matching '{' here [E0441] */

I think a better solution is to maintain a stack of curlies instead of either zero or one pairs of curlies.

strager · 2024-01-04T22:20:00Z

src/quick-lint-js/fe/parse-statement.cpp

+  bool previous_in_if_statement = this->in_if_statement_;
+  this->in_if_statement_ = true;


Must fix: Use Bool_Guard. (It's a bit of boilerplate, but the safety is worth it.) See Switch_Guard for example usage.

strager · 2024-01-04T22:35:22Z

src/quick-lint-js/fe/parse-statement.cpp

+        this->peek().indent_level > keyword_token.indent_level) {
+      this->diag_reporter_->report(Diag_Misleading_If_Or_Else_Body_Indentation{
+          .if_or_else_span = keyword_token.span()});
+    }


This code reports diagnostics for the following code:

// misleading indentation after 'if' body [E0443] if (true) console.log(); console.log(); function f() { if (true) console.log(); // misleading indentation after 'if' body [E0443] else console.log() console.log(); }

Was this intentional? If so, I think it deserves a test to codify the behavior.

strager · 2024-01-04T22:46:17Z

src/quick-lint-js/fe/parse-statement.cpp

@@ -4468,6 +4496,15 @@ void Parser::parse_and_visit_if(Parse_Visitor_Base &v) {

 parse_maybe_else:
  if (this->peek().type == Token_Type::kw_else) {
+    Token else_token = this->peek();
+    if (previous_in_if_statement &&


Must fix: This detection of whether we're in a braceless if or not is faulty. For example, the following code triggers the diagnostic, but if you remove the outer if (first and last lines) there is no diagnostic:

if (true) { // indentation of 'if' does not match 'else' [E0442] if (true) {} else console.log() }

Another example of faulty if detection:

if (true) { a.map((o) => { if (true) return 1; else if (true) return 2; else return 3; }); }

strager · 2024-01-04T22:47:51Z

test/test-parse-statement.cpp

+    // if (x)
+    //   if (y)
+    //     z();
+    //   else
+    //     w();
+    test_parse_and_visit_statement(
+        u8"if(x)\n\tif (y)\n\t\tz();\n\telse\n\t\tw();"_sv, no_diags);


For the no_diag tests, we can split the string onto multiple source lines:

Suggested change

// if (x)

// if (y)

// z();

// else

// w();

test_parse_and_visit_statement(

u8"if(x)\n\tif (y)\n\t\tz();\n\telse\n\t\tw();"_sv, no_diags);

test_parse_and_visit_statement(

u8"if(x)\n"_sv

u8"\tif (y)\n"_sv

u8"\t\tz();\n"_sv

u8"\telse\n"_sv

u8"\t\tw();"_sv,

no_diags);

arieldon · 2024-01-08T19:45:26Z

I'm not able to figure out a heuristic and an implementation for it that doesn't feel complex and hacky. I'm going to close this PR for now. Thank you for your suggestions and help nonetheless.

arieldon changed the title ~~Track Whitespace~~ feat: track whitespace Dec 26, 2023

strager requested changes Dec 31, 2023

View reviewed changes

arieldon added 4 commits January 3, 2024 09:33

feat(fe): track indentation within type Token

accfe20

feat(fe): report unclosed code block based on indentation

25717c1

feat(fe): warn about mismatched indentation of if/else

07779d6

feat(fe): warn about misleading indentation after braceless if/else

055fcbf

arieldon force-pushed the track_whitespace branch from 0f41268 to 055fcbf Compare January 3, 2024 15:34

arieldon requested a review from strager January 3, 2024 16:10

strager requested changes Jan 4, 2024

View reviewed changes

arieldon closed this Jan 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: track whitespace #1135

feat: track whitespace #1135

arieldon commented Dec 26, 2023

strager left a comment

strager Dec 31, 2023 •

edited

Loading

arieldon Jan 3, 2024

strager Dec 31, 2023

arieldon Jan 3, 2024

strager Dec 31, 2023

arieldon Jan 3, 2024

strager Dec 31, 2023

strager Dec 31, 2023

strager Dec 31, 2023

arieldon Jan 3, 2024

arieldon commented Jan 3, 2024

strager left a comment

strager Jan 4, 2024

strager Jan 4, 2024

strager Jan 4, 2024

strager Jan 4, 2024

strager Jan 4, 2024

strager Jan 4, 2024

strager Jan 4, 2024

arieldon commented Jan 8, 2024

		if (!allow_several_statements &&
		this->peek().indent_level > keyword_token.indent_level) {

		bool previous_in_if_statement = this->in_if_statement_;
		this->in_if_statement_ = true;

feat: track whitespace #1135

feat: track whitespace #1135

Conversation

arieldon commented Dec 26, 2023

strager left a comment

Choose a reason for hiding this comment

strager Dec 31, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arieldon commented Jan 3, 2024

strager left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arieldon commented Jan 8, 2024

strager Dec 31, 2023 •

edited

Loading