Parsing and operators #802
Replies: 3 comments 2 replies
-
@rljacobson, this is my opinion: if you have the will to work on this (or something else) you are welcome. In the particular case of the parser, we reach the point at which, if you write a complete replacement for mathics-scanner, and if you respect the interface, with small changes in the mathics-core side, you could test it, and see if there is an advantage. On the other hand, until now, I didn't hit any issue in So, if you are up for it, what I think would be more useful right now would be to check, improve and complete the operator tables in mathics_scanner. |
Beta Was this translation helpful? Give feedback.
-
Having skimmed through the relevant Mathics source code, I agree that, barring the absence of some necessary capability, rewriting the parsing algorithm itself is unlikely to give a significant advantage. Our parsing strategies are similar, and Mathics already handles the obnoxious edge cases (the I suspect an impedance mismatch between my data set and Mathics' data ingestion mechanism, which would require some changes, but these changes lie outside of the parsing algorithm. My point is that I do think code changes are required, as opposed to changes only to the operator tables, even if they are pedestrian ones. |
Beta Was this translation helpful? Give feedback.
-
As everyone has observed, the parser for Mathics3 is one of the few areas where we were left in good shape. The parser was redone by the second person to pick up the project. This is mentioned in the "history" section of the manual. However there is the operator information data that is useful, and I have thoughts how we could get this into Mathics3 and more widely available for other Wolfram Language related projects. First let me give some Mathics3 background on operator information is it is now. We use YAML tables for input symbols, and for 6.0.0 more attributes were added to include some operator information, for those input symbols which are also operators. From the YAML, a Python program produces custom JSON dictionaries for the particular needs of a particular application, such as the parser for mathics-core. So starting in 6.0.0, mathics-core then gets operator precedence basically from this YAML table via this custom Python program that does the JSON extraction. However in adding "operators", I had this feeling that this wasn't quite right. And in looking at what you've written and looking at the HTML operator tables, I realize this should be separate and based on or better - directly using - the operator information you've taken care and pain to extract. So if you are interested moving forward getting the information that you put together in a form that Mathics3 (and presumably other applications) can use in a more automated fashion, the suggestion here would be to provide this in YAML form. And in the YAML comments can be more expansive about what the various fields mean. The YAML though would be what gets edited as corrections get made. Some things I note: in the "tokens" fields, the input symbol name should be listed instead one character representation (of potentially several). The only field that is included in the YAML input symbols but not your tables, is an AMSLaTeX name for an operator when that is relevant.
Note that although operator information is in mathics scanner (which is kind a mishmosh of a couple of things), parsing is in mathics-core. |
Beta Was this translation helpful? Give feedback.
-
Parsing and operators
You might already know about the WLTools / Wolfram Language Spec information I have: https://wltools.github.io/LanguageSpec/. It is a brain dump of most of what I know about the syntax and semantics of Wolfram Language. The most important thing there is the table of operator data I have collected. (I wrote an article about finding the data.) It was with this data in mind that I wrote about my opinions on the design of Pratt parsers. Life events kept me from writing Part II of that article, but I have implemented the algorithm in Rust, and it would be easy to implement it in Python. The Rust source code is heavily commented and should be accessible to anyone familiar with a major programming language.
The quantity and accuracy of my operator data far exceeds that of Mathics' present implementation. If there is interest, I can work on a rough draft, a sort of proof-of-concept parser for Mathics that incorporates everything I've learned. On the other hand, if the team is already happy with Mathics' scanner and parser implementations and isn't interested, my feelings won't be hurt.
Beta Was this translation helpful? Give feedback.
All reactions