Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for boxes and templates #22

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
Open

Add support for boxes and templates #22

wants to merge 12 commits into from

Conversation

shigma
Copy link
Collaborator

@shigma shigma commented Nov 4, 2018

Support built-in characters and character encoding in strings.

  "\[alpha]"
(* ^^^^^^^^ constant.character.built-in.wolfram *)

  "\:123456"
(* ^^^^^^ constant.character.encoding.wolfram *)

Not treat \ as escape anymore, but special characters like \n will be recognized as escapes as usual.

  "This is\n\a string. (* not a comment *)"
(*^ punctualation.defination.string.begin *)
(*        ^^ constant.character.escape *)
(*          ^^ string.quoted *)

Support string templates.

  StringTemplate["Value `a`: <* Range[#n] *>."][<|"a" -> 1234, "n" -> 3|>]
(*                      ^^^ variable.parameter *)
(*                           ^^ keyword.operator.template-expression *)

Support string representation of boxes.

  "box 1: \!\(x\^2\); box 2: \(y\^3\) "
(*        ^ keyword.operator.string-box *)
(*             ^^ keyword.operator.x-scriptBox *)
(*                           ^^^^^^^^ string.quoted *)

  \( \)
(*^^ punctuation.section.box.begin.wolfram *)
(*^^^^^ meta.box.wolfram *)
(*   ^^ punctuation.section.box.end.wolfram *)

@shigma shigma requested a review from batracos November 4, 2018 07:34
@shigma shigma added the feature label Nov 4, 2018
WolframLanguage.sublime-syntax Outdated Show resolved Hide resolved
WolframLanguage.sublime-syntax Outdated Show resolved Hide resolved
@@ -105,14 +104,38 @@ contexts:
- match: \"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this line at the and to avoid popping when an escaped " occurs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If escape characters include ", there is no need to change here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still some issues though. Compare

"\!\(\*FractionBox[1, 2]\)"
"\!\(\*FractionBox[\"1\", 2]\)"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds a difficult problem for me. I haven't worked out a solution.

Perhaps with_prototype will work but every time I used this key my sublime crashed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very corner case. I suggest to make an issue for tracking purpose and not lose too much time now.

WolframLanguage.sublime-syntax Outdated Show resolved Hide resolved
WolframLanguage.sublime-syntax Outdated Show resolved Hide resolved
WolframLanguage.sublime-syntax Outdated Show resolved Hide resolved
syntax_test_wolfram_language.wl Outdated Show resolved Hide resolved
syntax_test_wolfram_language.wl Outdated Show resolved Hide resolved
syntax_test_wolfram_language.wl Outdated Show resolved Hide resolved
syntax_test_wolfram_language.wl Outdated Show resolved Hide resolved
syntax_test_wolfram_language.wl Outdated Show resolved Hide resolved
syntax_test_wolfram_language.wl Show resolved Hide resolved
@chere005 chere005 self-requested a review November 8, 2018 19:13
@chere005
Copy link
Collaborator

chere005 commented Nov 8, 2018

At first glance it looked like the escape \ fix was correct, but now I'm not so sure:

screen shot 2018-11-08 at 1 17 59 pm

@chere005
Copy link
Collaborator

chere005 commented Nov 8, 2018

I'm not sure I see a difference in:

  "box 1: \!\(x\^2\); box 2: \(y\^3\) "
(*        ^ keyword.operator.string-box *)
(*             ^^ keyword.operator.x-scriptBox *)
(*                           ^^^^^^^^ string.quoted *)

  \( \)
(*^^ punctuation.section.box.begin.wolfram *)
(*^^^^^ meta.box.wolfram *)
(*   ^^ punctuation.section.box.end.wolfram *)

Should I?

@chere005
Copy link
Collaborator

chere005 commented Nov 9, 2018

This breaks strings like "FOO`BAR`BAZ`" where this represents a context, which is a big regression that must be fixed before this is merged

@shigma
Copy link
Collaborator Author

shigma commented Nov 9, 2018

This breaks strings like "FOO`BAR`BAZ`" where this represents a context, which is a big regression that must be fixed before this is merged

Emmm, it does. How do you plan to solve it?

@shigma
Copy link
Collaborator Author

shigma commented Nov 9, 2018

I'm not sure I see a difference.

What do you mean by "difference"?

@shigma
Copy link
Collaborator Author

shigma commented Nov 9, 2018

At first glance it looked like the escape \ fix was correct, but now I'm not so sure:

screen shot 2018-11-08 at 1 17 59 pm

It looks good to me.

@chere005
Copy link
Collaborator

chere005 commented Nov 9, 2018

I'm not sure I see a difference.

What do you mean by "difference"?

When I pasted these code snippets and applied your changes, the highlighting didn't change at all.

@chere005
Copy link
Collaborator

chere005 commented Nov 9, 2018

This breaks strings like "FOO`BAR`BAZ`" where this represents a context, which is a big regression that must be fixed before this is merged

Emmm, it does. How do you plan to solve it?

I'm not sure, this is a much bigger issue than not supporting StringTemplate. I discussed this with @batracos at one point back in May actually and we couldn't find a good solution.. The closest thing I came up with was something along these lines for L140:

(?<=[^a-zA-Z0-9$]|[\"\`])\`\w*\`|\`\w*\`(?=[^a-zA-Z0-9$"])

It seems to work but @batracos didn't like it. @shigma and @batracos can we improve this, use it, or come up with something better? If not, we should hold off on StringExpression backtick support

@chere005
Copy link
Collaborator

chere005 commented Nov 9, 2018

At first glance it looked like the escape \ fix was correct, but now I'm not so sure:
screen shot 2018-11-08 at 1 17 59 pm

It looks good to me.

Clearly back slash on its own isn't supported from the second output, and when mathematica doesn't complain it's a bug. We should always highlight the character after \ in a string to indicate there is an escape happening, this is crucial to help visually indicate bugs.

@shigma
Copy link
Collaborator Author

shigma commented Nov 9, 2018

I'm not sure I see a difference.

What do you mean by "difference"?

When I pasted these code snippets and applied your changes, the highlighting didn't change at all.

It's wield. Syntax rules before this PR should not have been able to handle these syntaxes. Can you tell me which commit?

@shigma
Copy link
Collaborator Author

shigma commented Nov 9, 2018

I'm not sure, this is a much bigger issue than not supporting StringTemplate. I discussed this with @batracos at one point back in May actually and we couldn't find a good solution.. The closest thing I came up with was something along these lines for L140:

(?<=[^a-zA-Z0-9$]|[\"\`])\`\w*\`|\`\w*\`(?=[^a-zA-Z0-9$"])

It seems to work but @batracos didn't like it. @shigma and @batracos can we improve this, use it, or come up with something better? If not, we should hold off on StringExpression backtick support

I'm not sure I get your point by (?<=[^a-zA-Z0-9$]|[\"\`]) but the thing is slots in a StringTemplate can be placed just between letters:

In[11]:= StringTemplate["FOO`BAR`BAZ`"][<|"BAR" -> "-"|>]
Out[11]= "FOO-BAZ`"

The syntax for python didn't work out a solution for string templates, either:

str = 'I just want a {plain} brace'
                   # ^^^^^^^ constant.other.placeholder.python

We don't show an attempt to format it and it appears as if we do.


If you really want to distinguish those behaviour, maybe we can just match the function outside the string, for example, treat strings inside a Begin function as "plain" to some extent. I think provide different functions with different inner context is a good solution to all the problems like this, and functions like Block has applied this solution before.

I'm interested in it but suggest not to do such things in this PR but open a new one instead because this PR has already does too many things.

@shigma
Copy link
Collaborator Author

shigma commented Nov 9, 2018

Clearly back slash on its own isn't supported from the second output, and when mathematica doesn't complain it's a bug. We should always highlight the character after \ in a string to indicate there is an escape happening, this is crucial to help visually indicate bugs.

Mathematica is not that strict:

In[21]:= "\a" // InputForm
(* Syntax::stresc: Unknown string escape \a. *)
Out[21]//InputForm= "\\a"

In[31]:= "\a"
Out[31]= "\\a"

Should out program be that strict? I just don't know.

If yes, maybe we can preserve the current rule and append the following rule:

match: \\[\s\S]
scope: invalid.character.escape.wolfram

@chere005
Copy link
Collaborator

chere005 commented Nov 9, 2018

Clearly back slash on its own isn't supported from the second output, and when mathematica doesn't complain it's a bug. We should always highlight the character after \ in a string to indicate there is an escape happening, this is crucial to help visually indicate bugs.

Mathematica is not that strict:

In[21]:= "\a" // InputForm
(* Syntax::stresc: Unknown string escape \a. *)
Out[21]//InputForm= "\\a"

In[31]:= "\a"
Out[31]= "\\a"

Should out program be that strict? I just don't know.

If yes, maybe we can preserve the current rule and append the following rule:

match: \\[\s\S]
scope: invalid.character.escape.wolfram

I think you missed my point. Mathematica IS that strict. If the message isn't appearing, it is a bug that it isn't appearing. That syntax is never actually valid, and we must indicate that \ followed by any char means it's an attempted escape. I'm fine, and even encourage giving these a different different colors (valid vs invalid escapes) but \ on its own should never be unformatted

@chere005
Copy link
Collaborator

chere005 commented Nov 9, 2018

I'm not sure, this is a much bigger issue than not supporting StringTemplate. I discussed this with @batracos at one point back in May actually and we couldn't find a good solution.. The closest thing I came up with was something along these lines for L140:

(?<=[^a-zA-Z0-9$]|[\"\`])\`\w*\`|\`\w*\`(?=[^a-zA-Z0-9$"])

It seems to work but @batracos didn't like it. @shigma and @batracos can we improve this, use it, or come up with something better? If not, we should hold off on StringExpression backtick support

I'm not sure I get your point by (?<=[^a-zA-Z0-9$]|[\"\`]) but the thing is slots in a StringTemplate can be placed just between letters:

In[11]:= StringTemplate["FOO`BAR`BAZ`"][<|"BAR" -> "-"|>]
Out[11]= "FOO-BAZ`"

The syntax for python didn't work out a solution for string templates, either:

str = 'I just want a {plain} brace'
                   # ^^^^^^^ constant.other.placeholder.python

We don't show an attempt to format it and it appears as if we do.

If you really want to distinguish those behaviour, maybe we can just match the function outside the string, for example, treat strings inside a Begin function as "plain" to some extent. I think provide different functions with different inner context is a good solution to all the problems like this, and functions like Block has applied this solution before.

I'm interested in it but suggest not to do such things in this PR but open a new one instead because this PR has already does too many things.

Contexts show up in many other strings. We absolutely cannot break this for StringTemplated strings. All strings can contain normal backticks. Only in the case for StringTemplate are the backticks special. What this means is assume and default to the backtick being a normal backtick in a string, unless we can properly distinguish. I was suggesting a hacky solution so that sometimes StringTemplate would be right but that contexts would always be right, but I'm less convinced by this. I 100% think we should not support StringTemplate backtick syntax, please remove it from this PR unless there is a much, much better solution.

1. Now string templates will only be colored in assignment of usages and inside StringTemplate and TemplateApply.
2. Support 3-octal encoding.
3. Better detection for invalid strings.
@shigma
Copy link
Collaborator Author

shigma commented Nov 10, 2018

Contexts show up in many other strings. We absolutely cannot break this for StringTemplated strings. All strings can contain normal backticks. Only in the case for StringTemplate are the backticks special. What this means is assume and default to the backtick being a normal backtick in a string, unless we can properly distinguish. I was suggesting a hacky solution so that sometimes StringTemplate would be right but that contexts would always be right, but I'm less convinced by this. I 100% think we should not support StringTemplate backtick syntax, please remove it from this PR unless there is a much, much better solution.

I don't entirely agree with you but given that the colorization for string template is a break change I decide to remove all the related rules from general string recognition but preserve them for some functions (specifically StringTemplate and TemplateApply) and assignment for usages (which will automatically use string template when displayed in a message).

Is this OK?

@shigma
Copy link
Collaborator Author

shigma commented Nov 10, 2018

I think you missed my point. Mathematica IS that strict. If the message isn't appearing, it is a bug that it isn't appearing. That syntax is never actually valid, and we must indicate that \ followed by any char means it's an attempted escape. I'm fine, and even encourage giving these a different different colors (valid vs invalid escapes) but \ on its own should never be unformatted

I see. Hope the following rules are satisfactory:

# escape characters
- match: \\[-"nrtbf()!^%&+_*@`/\\]
scope: constant.character.escape.wolfram
- match: |-
(?x)(
\\[0-7]{3}|
\\\.[0-9A-Fa-f]{2}|
\\:[0-9A-Fa-f]{4}
)
scope: constant.character.encoding.wolfram
- match: \\\[({{named_characters}})\]
scope: constant.character.built-in.wolfram
# invalid characters
- match: |-
(?x)(
\\[0-7]{1,2}(?=[^0-7])|
\\\.[0-9A-Fa-f]?(?=[^0-9A-Fa-f])|
\\:[0-9A-Fa-f]{0,3}(?=[^0-9A-Fa-f])
)
scope: invalid.character.encoding.wolfram
- match: \\\[\w+\]
scope: invalid.character.built-in.wolfram
- match: \\[a-zA-Z]
scope: invalid.character.escape.wolfram

I found only [a-zA-Z] except [nrtbf] can result in a error and [()"!^%&+_*@`/\\] after a back-slant will be recognized as a special character. Under other circumstances, any character after a back-slant will be recognized as usual:

In[69]:= "\8" // Characters
Out[69]= {"\\", "8"}

@chere005
Copy link
Collaborator

Contexts show up in many other strings. We absolutely cannot break this for StringTemplated strings. All strings can contain normal backticks. Only in the case for StringTemplate are the backticks special. What this means is assume and default to the backtick being a normal backtick in a string, unless we can properly distinguish. I was suggesting a hacky solution so that sometimes StringTemplate would be right but that contexts would always be right, but I'm less convinced by this. I 100% think we should not support StringTemplate backtick syntax, please remove it from this PR unless there is a much, much better solution.

I don't entirely agree with you but given that the colorization for string template is a break change I decide to remove all the related rules from general string recognition but preserve them for some functions (specifically StringTemplate and TemplateApply) and assignment for usages (which will automatically use string template when displayed in a message).

Is this OK?

This is a great idea, I'll test it soon.

@chere005
Copy link
Collaborator

Contexts show up in many other strings. We absolutely cannot break this for StringTemplated strings. All strings can contain normal backticks. Only in the case for StringTemplate are the backticks special. What this means is assume and default to the backtick being a normal backtick in a string, unless we can properly distinguish. I was suggesting a hacky solution so that sometimes StringTemplate would be right but that contexts would always be right, but I'm less convinced by this. I 100% think we should not support StringTemplate backtick syntax, please remove it from this PR unless there is a much, much better solution.

I don't entirely agree with you but given that the colorization for string template is a break change I decide to remove all the related rules from general string recognition but preserve them for some functions (specifically StringTemplate and TemplateApply) and assignment for usages (which will automatically use string template when displayed in a message).

Is this OK?

I want to bring the idea of if there are spaces around the backticks back on the table. While it won't fix every other StringTemplate, it would fix most of the ones that I use. There would just be some instances (when the backticks where next to other characters but it was actually a StringTemplate and not inside the functions we whitelisted.. so very rare). I also suggest we add Success and Failure to the list of functions which commonly use StringTemplate and always turn on the syntax highlighting for this.

@chere005
Copy link
Collaborator

I think you missed my point. Mathematica IS that strict. If the message isn't appearing, it is a bug that it isn't appearing. That syntax is never actually valid, and we must indicate that \ followed by any char means it's an attempted escape. I'm fine, and even encourage giving these a different different colors (valid vs invalid escapes) but \ on its own should never be unformatted

I see. Hope the following rules are satisfactory:

Sublime-WolframLanguage/WolframLanguage.sublime-syntax

Lines 137 to 161 in 4fa3e82
# escape characters
- match: \[-"nrtbf()!^%&+_*@`/\]
scope: constant.character.escape.wolfram
- match: |-
(?x)(
\[0-7]{3}|
\.[0-9A-Fa-f]{2}|
\:[0-9A-Fa-f]{4}
)
scope: constant.character.encoding.wolfram
- match: \[({{named_characters}})]
scope: constant.character.built-in.wolfram

 # invalid characters 
 - match: |- 
     (?x)( 
       \\[0-7]{1,2}(?=[^0-7])| 
       \\\.[0-9A-Fa-f]?(?=[^0-9A-Fa-f])| 
       \\:[0-9A-Fa-f]{0,3}(?=[^0-9A-Fa-f]) 
     ) 
   scope: invalid.character.encoding.wolfram 
 - match: \\\[\w+\] 
   scope: invalid.character.built-in.wolfram 
 - match: \\[a-zA-Z] 
   scope: invalid.character.escape.wolfram 

I found only [a-zA-Z] except [nrtbf] can result in a error and [()"!^%&+_*@`/\\] after a back-slant will be recognized as a special character. Under other circumstances, any character after a back-slant will be recognized as usual:

In[69]:= "\8" // Characters
Out[69]= {"\\", "8"}

At first glance this seems to work, but it does look like it can be cleaned up. Perhaps another thing to put an issue for, but I'll take another look next week

@chere005
Copy link
Collaborator

To be honest I never work with boxes so I'm just not sure if this is expected:

Current Release:
currentrelease

This PR's latest commit:
withchanges

@chere005
Copy link
Collaborator

It looks a bit to me like some of the escape character rules are leaking here. I wasn't really sure why this was the list of escape characters, in particular ()%+_@/ could you give some insight on everything past f in the list perhaps?

[nrtbf()"!^%&+_*@`/\\]

This is the best resource I've found: https://reference.wolfram.com/language/tutorial/InputSyntax.html apparently \000 is a thing too..

@chere005
Copy link
Collaborator

I tried to respond to most questions tonight, I won't be back on over the weekend. I plan to leave your most recent commits in my local sublime for a few days at work next week before approving so I have time to see if I notice anything with my normal workflows.

@shigma
Copy link
Collaborator Author

shigma commented Nov 10, 2018

To be honest I never work with boxes so I'm just not sure if this is expected:

Current Release:
currentrelease

This PR's latest commit:
withchanges

Maybe I worked with boxes before and just forgot all about it later on ...

@shigma
Copy link
Collaborator Author

shigma commented Nov 10, 2018

It looks a bit to me like some of the escape character rules are leaking here. I wasn't really sure why this was the list of escape characters, in particular ()%+_@/ could you give some insight on everything past f in the list perhaps?

[nrtbf()"!^%&+_*@`/\\]

This is the best resource I've found: https://reference.wolfram.com/language/tutorial/InputSyntax.html apparently \000 is a thing too..

See this:

In[153]:= Characters /@ {"\+", "\-", "\>"}
Out[153]= {{"\+"}, {"\\", "-"}, {}}

@shigma
Copy link
Collaborator Author

shigma commented Nov 10, 2018

@chere005 The following code may better depict the escaping behavior:

Quiet @ Last @ Reap[
    Scan[
        Sow[#, Check[Length @ Characters @ ToExpression["\"\\" <> # <> "\""], -1]] &,
        CharacterRange[33, 126]
    ],
    _,
    #1 -> StringJoin[#2] &
]

With the following result:

  • errored: .01234567:ABCDEFGHIJKLMNOPQRSTUVWXYZ[acdeghijklmopqsuvwxyz
  • escaped: !"%&()*+/@\^_`bfnrt
  • non-escaped: #$',-89;=?]{|}~
  • disappeared: <>

@shigma
Copy link
Collaborator Author

shigma commented Nov 15, 2018

Any questions @batracos ?

(* ^^^^^^^^^^^^^^ variable.function*)
(* ^^ keyword.operator*)
(*^ entity.name.function *)
(* ^ variable.parameter *)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variable.parameter is not aligned with x_.
Same issue at line 173 and 177

@batracos
Copy link
Collaborator

Sorry for the long time off the project. I simply did not have the bandwidth to spare.
Except for those three tests not passing this seems good to go.

@batracos batracos closed this Mar 23, 2019
@batracos batracos reopened this Mar 23, 2019
@batracos batracos self-requested a review March 23, 2019 17:06
@chere005
Copy link
Collaborator

chere005 commented Mar 23, 2019 via email

@chere005
Copy link
Collaborator

@chere005 The following code may better depict the escaping behavior:

Quiet @ Last @ Reap[
    Scan[
        Sow[#, Check[Length @ Characters @ ToExpression["\"\\" <> # <> "\""], -1]] &,
        CharacterRange[33, 126]
    ],
    _,
    #1 -> StringJoin[#2] &
]

With the following result:

* **errored:** `.01234567:ABCDEFGHIJKLMNOPQRSTUVWXYZ[acdeghijklmopqsuvwxyz`

* **escaped:** `` !"%&()*+/@\^_`bfnrt ``

* **non-escaped:** `#$',-89;=?]{|}~`

* **disappeared:** `<>`

@batracos and I discussed this, in a string the only characters that should show up as valid escape characters is nrtbf Things like "\*" which don't complain end up with a string "\\*" which means the * was not actually escaped. Please make this change and fix the tests that @batracos mentioned are failing before I make another pass at reviewing this.

@chere005
Copy link
Collaborator

@shigma any plans to update this with the changes we mentioned?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants