Skip to content
This repository has been archived by the owner on Mar 8, 2020. It is now read-only.

semantic: string literal escape sequence handing #111

Closed
bzz opened this issue Mar 21, 2019 · 3 comments · Fixed by #115
Closed

semantic: string literal escape sequence handing #111

bzz opened this issue Mar 21, 2019 · 3 comments · Fixed by #115
Assignees
Labels
Milestone

Comments

@bzz
Copy link
Contributor

bzz commented Mar 21, 2019

Discovered as part of the bblfsh/bblfshd#268 (comment)

Driver fails to parse e.g this file with check: key "escapedValue": invalid syntax ("\0")"

Normalization mapping for semantic mode needs to be updated.

@bzz bzz added the bug label Mar 21, 2019
@bzz bzz added this to the v2.6.2 milestone Mar 21, 2019
@bzz
Copy link
Contributor Author

bzz commented Mar 21, 2019

There are total 57 invalid syntax cases that look similar in bblfsh/bblfshd#268 logs.

 grep "language=java$" bblfshd.log | grep "error" | grep -c "invalid syntax"
57

All need to be double-checked and taken care of under this issue as most probably have the same root cause (string decoding, similar to bblfsh/javascript-driver#62)

@bzz
Copy link
Contributor Author

bzz commented Apr 14, 2019

Java, in comparison to other languages, seems to have a well-defined and limited number of special escapes for character and string literals:

So only the next cases seem to be different with Go:

  • single octal escape \0 that Java has for C compatibility (but Go does not)
  • two-digit octal \01 (has to be always 3 in Go)
  • there are no hexadecimal escape sequences in Java

(Does not matter, but interesting fact: 3-digit octal in Java can only start with 0 to 3 🤷‍♂️)

All this means that a simple solution may actually be plausible here.
But using a different AST node, as in bblfsh/javascript-driver#81 (comment), will be investigated as well.

@creachadair
Copy link
Contributor

(Does not matter, but interesting fact: 3-digit octal in Java can only start with 0 to 3 🤷‍♂️)

That's because Java defines "byte" as an octet. In C, where a char may nor may not be an octet, the leading digit is allowed to have the full range (but—of course—the behaviour is implementation-defined if you write a sequence that is out of range for the concrete type).

@bzz bzz changed the title check: key "escapedValue": invalid syntax ("\0")" semantic: string literal escape sequence handing Apr 15, 2019
bzz added a commit to bzz/java-driver that referenced this issue Apr 16, 2019
Now native Java AST contains both
 - escapedValue
 - unescapedValue
obtained from JDT parser.

That is similar to what javasript driver does and
allows to avoid having language-spacific escape
sequence handling on the Go side that can be
different from Go one.

See bblfsh#111 (comment)
for details.

Signed-off-by: Alexander Bezzubov <[email protected]>
@bzz bzz closed this as completed in #115 Apr 17, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants