Parsing and operators #802

rljacobson · 2023-02-26T06:00:30Z

rljacobson
Feb 26, 2023

Parsing and operators

You might already know about the WLTools / Wolfram Language Spec information I have: https://wltools.github.io/LanguageSpec/. It is a brain dump of most of what I know about the syntax and semantics of Wolfram Language. The most important thing there is the table of operator data I have collected. (I wrote an article about finding the data.) It was with this data in mind that I wrote about my opinions on the design of Pratt parsers. Life events kept me from writing Part II of that article, but I have implemented the algorithm in Rust, and it would be easy to implement it in Python. The Rust source code is heavily commented and should be accessible to anyone familiar with a major programming language.

The quantity and accuracy of my operator data far exceeds that of Mathics' present implementation. If there is interest, I can work on a rough draft, a sort of proof-of-concept parser for Mathics that incorporates everything I've learned. On the other hand, if the team is already happy with Mathics' scanner and parser implementations and isn't interested, my feelings won't be hurt.

mmatera · 2023-02-26T12:46:26Z

mmatera
Feb 26, 2023
Maintainer

@rljacobson, this is my opinion: if you have the will to work on this (or something else) you are welcome. In the particular case of the parser, we reach the point at which, if you write a complete replacement for mathics-scanner, and if you respect the interface, with small changes in the mathics-core side, you could test it, and see if there is an advantage.

On the other hand, until now, I didn't hit any issue in mathics-scanner that suggests the need for a big refactor / reimplementation. Also, as the inputs to be parsed are not very long in typical cases, the relative impact of the performance of mathics-scanner over mathics-core is typically negligible, compared with the cost of pattern-matching/conversions/evaluations over the parsed expressions.

So, if you are up for it, what I think would be more useful right now would be to check, improve and complete the operator tables in mathics_scanner.

0 replies

rljacobson · 2023-02-26T18:29:14Z

rljacobson
Feb 26, 2023
Author

Having skimmed through the relevant Mathics source code, I agree that, barring the absence of some necessary capability, rewriting the parsing algorithm itself is unlikely to give a significant advantage. Our parsing strategies are similar, and Mathics already handles the obnoxious edge cases (the Span operator, ;;, which is cursed, and implicit multiplication, which is considered harmful [Fateman, 7.4]).

I suspect an impedance mismatch between my data set and Mathics' data ingestion mechanism, which would require some changes, but these changes lie outside of the parsing algorithm. My point is that I do think code changes are required, as opposed to changes only to the operator tables, even if they are pedestrian ones.

2 replies

mmatera Feb 26, 2023
Maintainer

Actually, I found a few issues that can be solved using the existing API/logic up to small changes in the code.

mmatera Feb 26, 2023
Maintainer

See for instance Mathics3/mathics-scanner#15 (comment)

rocky · 2023-02-26T21:10:51Z

rocky
Feb 26, 2023
Maintainer

The quantity and accuracy of my operator data far exceeds that of Mathics' present implementation. If there is interest, I can work on a rough draft, a sort of proof-of-concept parser for Mathics that incorporates everything I've learned.

As everyone has observed, the parser for Mathics3 is one of the few areas where we were left in good shape. The parser was redone by the second person to pick up the project. This is mentioned in the "history" section of the manual.

However there is the operator information data that is useful, and I have thoughts how we could get this into Mathics3 and more widely available for other Wolfram Language related projects.

First let me give some Mathics3 background on operator information is it is now.

We use YAML tables for input symbols, and for 6.0.0 more attributes were added to include some operator information, for those input symbols which are also operators. From the YAML, a Python program produces custom JSON dictionaries for the particular needs of a particular application, such as the parser for mathics-core. So starting in 6.0.0, mathics-core then gets operator precedence basically from this YAML table via this custom Python program that does the JSON extraction.

However in adding "operators", I had this feeling that this wasn't quite right. And in looking at what you've written and looking at the HTML operator tables, I realize this should be separate and based on or better - directly using - the operator information you've taken care and pain to extract.

So if you are interested moving forward getting the information that you put together in a form that Mathics3 (and presumably other applications) can use in a more automated fashion, the suggestion here would be to provide this in YAML form.

And in the YAML comments can be more expansive about what the various fields mean. The YAML though would be what gets edited as corrections get made.

Some things I note: in the "tokens" fields, the input symbol name should be listed instead one character representation (of potentially several).

The only field that is included in the YAML input symbols but not your tables, is an AMSLaTeX name for an operator when that is relevant.

On the other hand, until now, I didn't hit any issue in mathics-scanner that suggests the need for a big refactor / reimplementation.

Note that although operator information is in mathics scanner (which is kind a mishmosh of a couple of things), parsing is in mathics-core.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mathics3

Parsing and operators #802

{{title}}

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Mathics3

Parsing and operators #802

rljacobson Feb 26, 2023

Parsing and operators

Replies: 3 comments · 2 replies

mmatera Feb 26, 2023 Maintainer

rljacobson Feb 26, 2023 Author

mmatera Feb 26, 2023 Maintainer

mmatera Feb 26, 2023 Maintainer

rocky Feb 26, 2023 Maintainer

rljacobson
Feb 26, 2023

Replies: 3 comments 2 replies

mmatera
Feb 26, 2023
Maintainer

rljacobson
Feb 26, 2023
Author

mmatera Feb 26, 2023
Maintainer

mmatera Feb 26, 2023
Maintainer

rocky
Feb 26, 2023
Maintainer