Add support for boxes and templates #22

shigma · 2018-11-04T07:33:44Z

Support built-in characters and character encoding in strings.

  "\[alpha]"
(* ^^^^^^^^ constant.character.built-in.wolfram *)

  "\:123456"
(* ^^^^^^ constant.character.encoding.wolfram *)

Not treat `\` as escape anymore, but special characters like `\n` will be recognized as escapes as usual.

  "This is\n\a string. (* not a comment *)"
(*^ punctualation.defination.string.begin *)
(*        ^^ constant.character.escape *)
(*          ^^ string.quoted *)

Support string templates.

  StringTemplate["Value `a`: <* Range[#n] *>."][<|"a" -> 1234, "n" -> 3|>]
(*                      ^^^ variable.parameter *)
(*                           ^^ keyword.operator.template-expression *)

Support string representation of boxes.

  "box 1: \!\(x\^2\); box 2: \(y\^3\) "
(*        ^ keyword.operator.string-box *)
(*             ^^ keyword.operator.x-scriptBox *)
(*                           ^^^^^^^^ string.quoted *)

  \( \)
(*^^ punctuation.section.box.begin.wolfram *)
(*^^^^^ meta.box.wolfram *)
(*   ^^ punctuation.section.box.end.wolfram *)

WolframLanguage.sublime-syntax

batracos · 2018-11-04T19:32:39Z

WolframLanguage.sublime-syntax

@@ -105,14 +104,38 @@ contexts:
        - match: \"


I would move this line at the and to avoid popping when an escaped " occurs.

If escape characters include ", there is no need to change here.

There are still some issues though. Compare

"\!$\*FractionBox[1, 2]$" "\!$\*FractionBox[\"1\", 2]$"

Sounds a difficult problem for me. I haven't worked out a solution.

Perhaps with_prototype will work but every time I used this key my sublime crashed.

It's very corner case. I suggest to make an issue for tracking purpose and not lose too much time now.

WolframLanguage.sublime-syntax

syntax_test_wolfram_language.wl

Co-Authored-By: Shigma <[email protected]>

syntax_test_wolfram_language.wl

chere005 · 2018-11-08T19:19:34Z

At first glance it looked like the escape \ fix was correct, but now I'm not so sure:

chere005 · 2018-11-08T19:24:28Z

I'm not sure I see a difference in:

  "box 1: \!\(x\^2\); box 2: \(y\^3\) "
(*        ^ keyword.operator.string-box *)
(*             ^^ keyword.operator.x-scriptBox *)
(*                           ^^^^^^^^ string.quoted *)

  \( \)
(*^^ punctuation.section.box.begin.wolfram *)
(*^^^^^ meta.box.wolfram *)
(*   ^^ punctuation.section.box.end.wolfram *)

Should I?

chere005 · 2018-11-09T04:28:53Z

This breaks strings like "FOO`BAR`BAZ`" where this represents a context, which is a big regression that must be fixed before this is merged

shigma · 2018-11-09T06:02:32Z

This breaks strings like "FOO`BAR`BAZ`" where this represents a context, which is a big regression that must be fixed before this is merged

Emmm, it does. How do you plan to solve it?

shigma · 2018-11-09T06:04:05Z

I'm not sure I see a difference.

What do you mean by "difference"?

shigma · 2018-11-09T06:07:08Z

At first glance it looked like the escape \ fix was correct, but now I'm not so sure:

It looks good to me.

chere005 · 2018-11-09T06:18:50Z

I'm not sure I see a difference.

What do you mean by "difference"?

When I pasted these code snippets and applied your changes, the highlighting didn't change at all.

chere005 · 2018-11-09T06:24:44Z

This breaks strings like "FOO`BAR`BAZ`" where this represents a context, which is a big regression that must be fixed before this is merged

Emmm, it does. How do you plan to solve it?

I'm not sure, this is a much bigger issue than not supporting StringTemplate. I discussed this with @batracos at one point back in May actually and we couldn't find a good solution.. The closest thing I came up with was something along these lines for L140:

(?<=[^a-zA-Z0-9$]|[\"\`])\`\w*\`|\`\w*\`(?=[^a-zA-Z0-9$"])

It seems to work but @batracos didn't like it. @shigma and @batracos can we improve this, use it, or come up with something better? If not, we should hold off on StringExpression backtick support

chere005 · 2018-11-09T06:25:55Z

At first glance it looked like the escape \ fix was correct, but now I'm not so sure:

It looks good to me.

Clearly back slash on its own isn't supported from the second output, and when mathematica doesn't complain it's a bug. We should always highlight the character after \ in a string to indicate there is an escape happening, this is crucial to help visually indicate bugs.

shigma · 2018-11-09T07:39:40Z

I'm not sure I see a difference.

What do you mean by "difference"?

When I pasted these code snippets and applied your changes, the highlighting didn't change at all.

It's wield. Syntax rules before this PR should not have been able to handle these syntaxes. Can you tell me which commit?

shigma · 2018-11-09T08:04:03Z

I'm not sure, this is a much bigger issue than not supporting StringTemplate. I discussed this with @batracos at one point back in May actually and we couldn't find a good solution.. The closest thing I came up with was something along these lines for L140:
(?<=[^a-zA-Z0-9$]|[\"\`])\`\w*\`|\`\w*\`(?=[^a-zA-Z0-9$"])
It seems to work but @batracos didn't like it. @shigma and @batracos can we improve this, use it, or come up with something better? If not, we should hold off on StringExpression backtick support

I'm not sure I get your point by (?<=[^a-zA-Z0-9$]|[\"\`]) but the thing is slots in a StringTemplate can be placed just between letters:

In[11]:= StringTemplate["FOO`BAR`BAZ`"][<|"BAR" -> "-"|>]
Out[11]= "FOO-BAZ`"

The syntax for python didn't work out a solution for string templates, either:

str = 'I just want a {plain} brace'
                   # ^^^^^^^ constant.other.placeholder.python

We don't show an attempt to format it and it appears as if we do.

If you really want to distinguish those behaviour, maybe we can just match the function outside the string, for example, treat strings inside a Begin function as "plain" to some extent. I think provide different functions with different inner context is a good solution to all the problems like this, and functions like Block has applied this solution before.

I'm interested in it but suggest not to do such things in this PR but open a new one instead because this PR has already does too many things.

shigma · 2018-11-09T08:18:08Z

Clearly back slash on its own isn't supported from the second output, and when mathematica doesn't complain it's a bug. We should always highlight the character after \ in a string to indicate there is an escape happening, this is crucial to help visually indicate bugs.

Mathematica is not that strict:

In[21]:= "\a" // InputForm
(* Syntax::stresc: Unknown string escape \a. *)
Out[21]//InputForm= "\\a"

In[31]:= "\a"
Out[31]= "\\a"

Should out program be that strict? I just don't know.

If yes, maybe we can preserve the current rule and append the following rule:

match: \\[\s\S]
scope: invalid.character.escape.wolfram

chere005 · 2018-11-09T20:33:54Z

Clearly back slash on its own isn't supported from the second output, and when mathematica doesn't complain it's a bug. We should always highlight the character after \ in a string to indicate there is an escape happening, this is crucial to help visually indicate bugs.

Mathematica is not that strict:
In[21]:= "\a" // InputForm
(* Syntax::stresc: Unknown string escape \a. *)
Out[21]//InputForm= "\\a"

In[31]:= "\a"
Out[31]= "\\a"
Should out program be that strict? I just don't know.

If yes, maybe we can preserve the current rule and append the following rule:
match: \\[\s\S]
scope: invalid.character.escape.wolfram

I think you missed my point. Mathematica IS that strict. If the message isn't appearing, it is a bug that it isn't appearing. That syntax is never actually valid, and we must indicate that \ followed by any char means it's an attempted escape. I'm fine, and even encourage giving these a different different colors (valid vs invalid escapes) but \ on its own should never be unformatted

chere005 · 2018-11-09T20:36:22Z

I'm not sure, this is a much bigger issue than not supporting StringTemplate. I discussed this with @batracos at one point back in May actually and we couldn't find a good solution.. The closest thing I came up with was something along these lines for L140:
(?<=[^a-zA-Z0-9$]|[\"\`])\`\w*\`|\`\w*\`(?=[^a-zA-Z0-9$"])
It seems to work but @batracos didn't like it. @shigma and @batracos can we improve this, use it, or come up with something better? If not, we should hold off on StringExpression backtick support
I'm not sure I get your point by (?<=[^a-zA-Z0-9$]|[\"\`]) but the thing is slots in a StringTemplate can be placed just between letters:
In[11]:= StringTemplate["FOO`BAR`BAZ`"][<|"BAR" -> "-"|>]
Out[11]= "FOO-BAZ`"
The syntax for python didn't work out a solution for string templates, either:
str = 'I just want a {plain} brace'
                   # ^^^^^^^ constant.other.placeholder.python
We don't show an attempt to format it and it appears as if we do.

If you really want to distinguish those behaviour, maybe we can just match the function outside the string, for example, treat strings inside a Begin function as "plain" to some extent. I think provide different functions with different inner context is a good solution to all the problems like this, and functions like Block has applied this solution before.

I'm interested in it but suggest not to do such things in this PR but open a new one instead because this PR has already does too many things.

Contexts show up in many other strings. We absolutely cannot break this for StringTemplated strings. All strings can contain normal backticks. Only in the case for StringTemplate are the backticks special. What this means is assume and default to the backtick being a normal backtick in a string, unless we can properly distinguish. I was suggesting a hacky solution so that sometimes StringTemplate would be right but that contexts would always be right, but I'm less convinced by this. I 100% think we should not support StringTemplate backtick syntax, please remove it from this PR unless there is a much, much better solution.

1. Now string templates will only be colored in assignment of usages and inside StringTemplate and TemplateApply. 2. Support 3-octal encoding. 3. Better detection for invalid strings.

shigma · 2018-11-10T04:21:56Z

Contexts show up in many other strings. We absolutely cannot break this for StringTemplated strings. All strings can contain normal backticks. Only in the case for StringTemplate are the backticks special. What this means is assume and default to the backtick being a normal backtick in a string, unless we can properly distinguish. I was suggesting a hacky solution so that sometimes StringTemplate would be right but that contexts would always be right, but I'm less convinced by this. I 100% think we should not support StringTemplate backtick syntax, please remove it from this PR unless there is a much, much better solution.

I don't entirely agree with you but given that the colorization for string template is a break change I decide to remove all the related rules from general string recognition but preserve them for some functions (specifically StringTemplate and TemplateApply) and assignment for usages (which will automatically use string template when displayed in a message).

Is this OK?

shigma · 2018-11-10T04:35:17Z

I think you missed my point. Mathematica IS that strict. If the message isn't appearing, it is a bug that it isn't appearing. That syntax is never actually valid, and we must indicate that \ followed by any char means it's an attempted escape. I'm fine, and even encourage giving these a different different colors (valid vs invalid escapes) but \ on its own should never be unformatted

I see. Hope the following rules are satisfactory:

Sublime-WolframLanguage/WolframLanguage.sublime-syntax

Lines 137 to 161 in 4fa3e82

    
               # escape characters 
        
               - match: \\[-"nrtbf()!^%&+_*@`/\\] 
        
                 scope: constant.character.escape.wolfram 
        
               - match: |- 
        
                   (?x)( 
        
                     \\[0-7]{3}| 
        
                     \\\.[0-9A-Fa-f]{2}| 
        
                     \\:[0-9A-Fa-f]{4} 
        
                   ) 
        
                 scope: constant.character.encoding.wolfram 
        
               - match: \\\[({{named_characters}})\] 
        
                 scope: constant.character.built-in.wolfram 
        
               # invalid characters 
        
               - match: |- 
        
                   (?x)( 
        
                     \\[0-7]{1,2}(?=[^0-7])| 
        
                     \\\.[0-9A-Fa-f]?(?=[^0-9A-Fa-f])| 
        
                     \\:[0-9A-Fa-f]{0,3}(?=[^0-9A-Fa-f]) 
        
                   ) 
        
                 scope: invalid.character.encoding.wolfram 
        
               - match: \\\[\w+\] 
        
                 scope: invalid.character.built-in.wolfram 
        
               - match: \\[a-zA-Z] 
        
                 scope: invalid.character.escape.wolfram

I found only [a-zA-Z] except [nrtbf] can result in a error and [()"!^%&+_*@`/\\] after a back-slant will be recognized as a special character. Under other circumstances, any character after a back-slant will be recognized as usual:

In[69]:= "\8" // Characters
Out[69]= {"\\", "8"}

chere005 · 2018-11-10T07:27:19Z

Contexts show up in many other strings. We absolutely cannot break this for StringTemplated strings. All strings can contain normal backticks. Only in the case for StringTemplate are the backticks special. What this means is assume and default to the backtick being a normal backtick in a string, unless we can properly distinguish. I was suggesting a hacky solution so that sometimes StringTemplate would be right but that contexts would always be right, but I'm less convinced by this. I 100% think we should not support StringTemplate backtick syntax, please remove it from this PR unless there is a much, much better solution.

I don't entirely agree with you but given that the colorization for string template is a break change I decide to remove all the related rules from general string recognition but preserve them for some functions (specifically StringTemplate and TemplateApply) and assignment for usages (which will automatically use string template when displayed in a message).

Is this OK?

This is a great idea, I'll test it soon.

chere005 · 2018-11-10T07:51:10Z

Contexts show up in many other strings. We absolutely cannot break this for StringTemplated strings. All strings can contain normal backticks. Only in the case for StringTemplate are the backticks special. What this means is assume and default to the backtick being a normal backtick in a string, unless we can properly distinguish. I was suggesting a hacky solution so that sometimes StringTemplate would be right but that contexts would always be right, but I'm less convinced by this. I 100% think we should not support StringTemplate backtick syntax, please remove it from this PR unless there is a much, much better solution.

I don't entirely agree with you but given that the colorization for string template is a break change I decide to remove all the related rules from general string recognition but preserve them for some functions (specifically StringTemplate and TemplateApply) and assignment for usages (which will automatically use string template when displayed in a message).

Is this OK?

I want to bring the idea of if there are spaces around the backticks back on the table. While it won't fix every other StringTemplate, it would fix most of the ones that I use. There would just be some instances (when the backticks where next to other characters but it was actually a StringTemplate and not inside the functions we whitelisted.. so very rare). I also suggest we add Success and Failure to the list of functions which commonly use StringTemplate and always turn on the syntax highlighting for this.

chere005 · 2018-11-10T07:52:08Z

I think you missed my point. Mathematica IS that strict. If the message isn't appearing, it is a bug that it isn't appearing. That syntax is never actually valid, and we must indicate that \ followed by any char means it's an attempted escape. I'm fine, and even encourage giving these a different different colors (valid vs invalid escapes) but \ on its own should never be unformatted

I see. Hope the following rules are satisfactory:

Sublime-WolframLanguage/WolframLanguage.sublime-syntax

Lines 137 to 161 in 4fa3e82
# escape characters
- match: \[-"nrtbf()!^%&+_*@`/\]
scope: constant.character.escape.wolfram
- match: |-
(?x)(
\[0-7]{3}|
\.[0-9A-Fa-f]{2}|
\:[0-9A-Fa-f]{4}
)
scope: constant.character.encoding.wolfram
- match: \[({{named_characters}})]
scope: constant.character.built-in.wolfram
 # invalid characters 
 - match: |- 
     (?x)( 
       \\[0-7]{1,2}(?=[^0-7])| 
       \\\.[0-9A-Fa-f]?(?=[^0-9A-Fa-f])| 
       \\:[0-9A-Fa-f]{0,3}(?=[^0-9A-Fa-f]) 
     ) 
   scope: invalid.character.encoding.wolfram 
 - match: \\\[\w+\] 
   scope: invalid.character.built-in.wolfram 
 - match: \\[a-zA-Z] 
   scope: invalid.character.escape.wolfram 
I found only [a-zA-Z] except [nrtbf] can result in a error and [()"!^%&+_*@`/\\] after a back-slant will be recognized as a special character. Under other circumstances, any character after a back-slant will be recognized as usual:
In[69]:= "\8" // Characters
Out[69]= {"\\", "8"}

At first glance this seems to work, but it does look like it can be cleaned up. Perhaps another thing to put an issue for, but I'll take another look next week

chere005 · 2018-11-10T07:54:50Z

To be honest I never work with boxes so I'm just not sure if this is expected:

Current Release:

This PR's latest commit:

chere005 · 2018-11-10T07:59:21Z

It looks a bit to me like some of the escape character rules are leaking here. I wasn't really sure why this was the list of escape characters, in particular ()%+_@/ could you give some insight on everything past f in the list perhaps?

[nrtbf()"!^%&+_*@`/\\]

This is the best resource I've found: https://reference.wolfram.com/language/tutorial/InputSyntax.html apparently \000 is a thing too..

chere005 · 2018-11-10T08:00:19Z

I tried to respond to most questions tonight, I won't be back on over the weekend. I plan to leave your most recent commits in my local sublime for a few days at work next week before approving so I have time to see if I notice anything with my normal workflows.

shigma · 2018-11-10T08:18:07Z

To be honest I never work with boxes so I'm just not sure if this is expected:

Current Release:

This PR's latest commit:

Maybe I worked with boxes before and just forgot all about it later on ...

shigma · 2018-11-10T08:19:42Z

It looks a bit to me like some of the escape character rules are leaking here. I wasn't really sure why this was the list of escape characters, in particular ()%+_@/ could you give some insight on everything past f in the list perhaps?

[nrtbf()"!^%&+_*@`/\\]

This is the best resource I've found: https://reference.wolfram.com/language/tutorial/InputSyntax.html apparently \000 is a thing too..

See this:

In[153]:= Characters /@ {"\+", "\-", "\>"}
Out[153]= {{"\+"}, {"\\", "-"}, {}}

shigma · 2018-11-10T08:38:48Z

@chere005 The following code may better depict the escaping behavior:

Quiet @ Last @ Reap[
    Scan[
        Sow[#, Check[Length @ Characters @ ToExpression["\"\\" <> # <> "\""], -1]] &,
        CharacterRange[33, 126]
    ],
    _,
    #1 -> StringJoin[#2] &
]

With the following result:

errored: .01234567:ABCDEFGHIJKLMNOPQRSTUVWXYZ[acdeghijklmopqsuvwxyz
escaped: !"%&()*+/@\^_`bfnrt
non-escaped: #$',-89;=?]{|}~
disappeared: <>

shigma · 2018-11-15T17:28:29Z

Any questions @batracos ?

batracos · 2019-03-23T16:59:57Z

syntax_test_wolfram_language.wl

-(*      ^^^^^^^^^^^^^^ variable.function*)
-(*                        ^^ keyword.operator*)
+(*^ entity.name.function *)
+(* ^ variable.parameter *)


variable.parameter is not aligned with x_.
Same issue at line 173 and 177

batracos · 2019-03-23T17:04:38Z

Sorry for the long time off the project. I simply did not have the bandwidth to spare.
Except for those three tests not passing this seems good to go.

chere005 · 2019-03-23T17:28:16Z

Hold off on this until I can take another look..

…

Sent from my iPhone

On Mar 23, 2019, at 12:04 PM, batracos ***@***.***> wrote: Sorry for the long time off the project. I simply did not have the bandwidth to spare. Except for those three tests not passing this seems good to go. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

chere005 · 2019-03-29T17:05:35Z

@chere005 The following code may better depict the escaping behavior:

Quiet @ Last @ Reap[
    Scan[
        Sow[#, Check[Length @ Characters @ ToExpression["\"\\" <> # <> "\""], -1]] &,
        CharacterRange[33, 126]
    ],
    _,
    #1 -> StringJoin[#2] &
]

With the following result:

* **errored:** `.01234567:ABCDEFGHIJKLMNOPQRSTUVWXYZ[acdeghijklmopqsuvwxyz`

* **escaped:** `` !"%&()*+/@\^_`bfnrt ``

* **non-escaped:** `#$',-89;=?]{|}~`

* **disappeared:** `<>`

@batracos and I discussed this, in a string the only characters that should show up as valid escape characters is nrtbf Things like "\*" which don't complain end up with a string "\\*" which means the * was not actually escaped. Please make this change and fix the tests that @batracos mentioned are failing before I make another pass at reviewing this.

chere005 · 2019-06-27T04:12:57Z

@shigma any plans to update this with the changes we mentioned?

shigma added 3 commits November 4, 2018 15:13

add support for boxes and templates

3efb03a

add some adjustments

3f64690

built-in and encoding characters

2c157e9

shigma requested a review from batracos November 4, 2018 07:34

shigma added the feature label Nov 4, 2018

batracos reviewed Nov 4, 2018

View reviewed changes

batracos and others added 3 commits November 5, 2018 11:03

optimize encoding pattern

9b19fec

Co-Authored-By: Shigma <[email protected]>

update string syntax

0269833

fix a typo

e185450

Co-Authored-By: Shigma <[email protected]>

batracos reviewed Nov 5, 2018

View reviewed changes

syntax_test_wolfram_language.wl Outdated Show resolved Hide resolved

syntax_test_wolfram_language.wl Outdated Show resolved Hide resolved

syntax_test_wolfram_language.wl Show resolved Hide resolved

add some adjustments

cdb5632

chere005 self-requested a review November 8, 2018 19:13

add newline prototype

236141f

remove template recognize and optimize escaping behaviour

4fa3e82

1. Now string templates will only be colored in assignment of usages and inside StringTemplate and TemplateApply. 2. Support 3-octal encoding. 3. Better detection for invalid strings.

tiny fix

255e6ee

shigma added 2 commits November 10, 2018 16:42

optimize escape pattern

5c14b08

support for regular expression

212906c

batracos reviewed Mar 23, 2019

View reviewed changes

batracos closed this Mar 23, 2019

batracos reopened this Mar 23, 2019

batracos self-requested a review March 23, 2019 17:06

Add support for boxes and templates #22

Are you sure you want to change the base?

Add support for boxes and templates #22

Conversation

shigma commented Nov 4, 2018

Support built-in characters and character encoding in strings.

Not treat \ as escape anymore, but special characters like \n will be recognized as escapes as usual.

Support string templates.

Support string representation of boxes.

batracos Nov 4, 2018

Choose a reason for hiding this comment

shigma Nov 5, 2018

Choose a reason for hiding this comment

batracos Nov 5, 2018

Choose a reason for hiding this comment

shigma Nov 5, 2018

Choose a reason for hiding this comment

batracos Nov 5, 2018

Choose a reason for hiding this comment

chere005 commented Nov 8, 2018

chere005 commented Nov 8, 2018

chere005 commented Nov 9, 2018 • edited Loading

shigma commented Nov 9, 2018

shigma commented Nov 9, 2018

shigma commented Nov 9, 2018

chere005 commented Nov 9, 2018

chere005 commented Nov 9, 2018

chere005 commented Nov 9, 2018

shigma commented Nov 9, 2018

shigma commented Nov 9, 2018 • edited Loading

shigma commented Nov 9, 2018

chere005 commented Nov 9, 2018

chere005 commented Nov 9, 2018 • edited Loading

shigma commented Nov 10, 2018

shigma commented Nov 10, 2018

chere005 commented Nov 10, 2018

chere005 commented Nov 10, 2018

chere005 commented Nov 10, 2018

chere005 commented Nov 10, 2018

chere005 commented Nov 10, 2018

chere005 commented Nov 10, 2018

shigma commented Nov 10, 2018

shigma commented Nov 10, 2018

shigma commented Nov 10, 2018 • edited Loading

shigma commented Nov 15, 2018

batracos Mar 23, 2019

Choose a reason for hiding this comment

batracos commented Mar 23, 2019

chere005 commented Mar 23, 2019 via email

chere005 commented Mar 29, 2019

chere005 commented Jun 27, 2019

Not treat `\` as escape anymore, but special characters like `\n` will be recognized as escapes as usual.

chere005 commented Nov 9, 2018 •

edited

Loading

shigma commented Nov 9, 2018 •

edited

Loading

chere005 commented Nov 9, 2018 •

edited

Loading

shigma commented Nov 10, 2018 •

edited

Loading