Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retaining significant whitespace within MacroExpressions within MacroVerbatim #15307

Open
Blacksmoke16 opened this issue Dec 21, 2024 · 2 comments · May be fixed by #15305
Open

Retaining significant whitespace within MacroExpressions within MacroVerbatim #15307

Blacksmoke16 opened this issue Dec 21, 2024 · 2 comments · May be fixed by #15305

Comments

@Blacksmoke16
Copy link
Member

Blacksmoke16 commented Dec 21, 2024

Discussion

Background

Currently if you have code like:

{%
  10

  # Foo

  20
  30

  # Bar

  40
%}
{%
  50
  60
%}

The parser is able to properly attribute location information to each number, taking into account the newlines and comments:

{value: "10", line_number: 2}
{value: "20", line_number: 6}
{value: "30", line_number: 7}
{value: "40", line_number: 11}
{value: "50", line_number: 14}
{value: "60", line_number: 15}

However, when used as part of a MacroVerbatim node like:

macro finished
  {% verbatim do %}
    {%
      10

      # Foo

      20
      30

      # Bar

      40
    %}
    {%
      50
      60
    %}
  {% end %}
end

NOTE: The macro finished isn't really needed to reproduce this problem, but is included for demonstration purposes to make generating the output easier.

The results are now not totally accurate:

{value: "10", line_number: 2}
{value: "20", line_number: 3}
{value: "30", line_number: 4}
{value: "40", line_number: 5}
{value: "50", line_number: 7}
{value: "60", line_number: 8}

This is because the line numbers for each node are now based on the to_s representation of the MacroExpressions within the MacroVerbatim versus the actual source code. E.g.

# There's a newline here
    {% 10
20
30
40
 %}
    {% 50
60
 %}

The Problem

#15305 makes things a little bit more accurate by retaining the newline after the {% is present. E.g.

{value: "10", line_number: 3}
{value: "20", line_number: 4}
{value: "30", line_number: 5}
{value: "40", line_number: 6}
{value: "50", line_number: 9}
{value: "60", line_number: 10}
# There's a newline here
    {%
  10
  20
  30
  40
%}
    {%
  50
  60
%}

However things are still not quite 100% correct. We can make things a bit better by summing the node's location with the location of the MacroVerbatim node itself (removing the 0 || on like 592 in interpreter.cr in the appendix diff):

{value: "10", line_number: 4}
{value: "20", line_number: 5}
{value: "30", line_number: 6}
{value: "40", line_number: 7}
{value: "50", line_number: 10}
{value: "60", line_number: 11}

The number 10 now at least has the proper location, but all the others are still not correct. The gist of why the other numbers are off is because notice in the stringified output, all the whitespace and comments in the macro expression have been stripped. Thus they are not taken into consideration when the code is re-parsed as part of the macro expansion logic.

Not having the nodes have the proper location information in this context makes #14880 much harder as it relies on having proper line numbers to map to the source code when generating the coverage report.

Proposed Solution

So far the most robust solution I can think of is to have the parser include MacroLiteral nodes when parsing macro expressions that represent comments and extra newlines. This would ensure that the stringified macro expression includes these newlines so the re-parsed code is able to map to the line numbers of the source.

WDYT?

EDIT: Is this maybe something we could leverage location pragmas for?

Appendix

Diff used to generate the outputs
diff --git a/src/compiler/crystal/macros/interpreter.cr b/src/compiler/crystal/macros/interpreter.cr
index 978c57470..d7541a8af 100644
--- a/src/compiler/crystal/macros/interpreter.cr
+++ b/src/compiler/crystal/macros/interpreter.cr
@@ -82,6 +82,10 @@ module Crystal
       @last = Nop.new
     end
 
+    def is_test_file?
+      @location.try &.original_filename.as?(String).try &.ends_with? "test.cr"
+    end
+
     def define_var(name : String, value : ASTNode) : Nil
       @vars[name] = value
     end
@@ -584,6 +588,10 @@ module Crystal
     end
 
     def visit(node : Nop | NilLiteral | BoolLiteral | NumberLiteral | CharLiteral | StringLiteral | SymbolLiteral | RangeLiteral | RegexLiteral | MacroId | TypeNode | Def)
+      if self.is_test_file?
+        pp({value: node.to_s, line_number: (node.location.try(&.line_number) || 0) + (0 || node.location.try(&.macro_location.try(&.line_number)) || 0)})
+      end
+
       @last = node.clone_without_location
       false
     end
diff --git a/src/compiler/crystal/macros/macros.cr b/src/compiler/crystal/macros/macros.cr
index a5d5714f1..1b68e3b53 100644
--- a/src/compiler/crystal/macros/macros.cr
+++ b/src/compiler/crystal/macros/macros.cr
@@ -27,7 +27,18 @@ class Crystal::Program
     interpreter = MacroInterpreter.new self, scope, path_lookup || scope, node.location, def: a_def, in_macro: false
     interpreter.free_vars = free_vars
     node.accept interpreter
-    {interpreter.to_s, interpreter.macro_expansion_pragmas}
+    output_str = interpreter.to_s
+
+    if interpreter.is_test_file?
+      puts
+      puts
+
+      puts output_str
+
+      puts
+      puts
+    end
+    {output_str, interpreter.macro_expansion_pragmas}
   end
 
   def parse_macro_source(generated_source, macro_expansion_pragmas, the_macro, node, vars, current_def = nil, inside_type = false, inside_exp = false, mode : Parser::ParseMode = :normal, visibility : Visibility = :public)
@straight-shoota
Copy link
Member

I'm wondering if there could be a rather trivial solution by making verbatim retain the original source code as a string instead of parsing the content into an AST and then stringifing that again.

@Blacksmoke16
Copy link
Member Author

Blacksmoke16 commented Dec 28, 2024

@straight-shoota Hmm that would work, but not be 100% robust mainly these specs would still fail due to them not being in a verbatim block:

expect_to_s "{%\n  1\n  2\n  3\n%}"
expect_to_s "{%\n  1\n%}"
expect_to_s "{%\n  2 + 2\n%}"
expect_to_s "{%\n  a = 1 %}", "{%\n  a = 1\n%}"

So there would still need to be some tweaks to ToSVisitor. At which point, I'm not sure if it makes sense to have special parsing logic just to handle verbatim, while still needing to have everything else in ToSVisitor anyway for when these nodes are outside of verbatim.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants