Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use textmate grammar instead of pygments #244

Closed
watermarkhu opened this issue Mar 11, 2024 · 18 comments
Closed

Use textmate grammar instead of pygments #244

watermarkhu opened this issue Mar 11, 2024 · 18 comments

Comments

@watermarkhu
Copy link

watermarkhu commented Mar 11, 2024

Hi @joeced, great work on maintaining this repo.

A year ago, I wanted to contribute to support argument blocks. However, I've found that the logic in mat_types.py based on the Pygments tokens to be very hard to work with, and a bit unstable.

Following MathWorks' support for VSCode, I had started on working a parser based on TextMate grammars using Python, which is used for syntax highlighting in VSCode. MathWorks is now also maintaining the MATLAB grammar.

The package is available at https://github.com/watermarkhu/textmate-grammar-python. If you are interested, I think this can be a good replacement for the currently in-house parsing of matlabdomain. The benefit of using TextMate grammar is that 1) due to its nested nature, the output is already a syntax tree and 2) parsing is now officially supported by MathWorks and the contributors of the VSCode extension.

On a different topic, due to some requirements, I will need to have an auto-documenter that is compatible with markdown docstrings. To this end, I've already started work on a new extension that is dependent on the myst-parser and based on autodoc2. I would love to get in touch with you to understand the matlabdomain better to see what I can re-use.

@joeced
Copy link
Collaborator

joeced commented Mar 12, 2024

Hi @watermarkhu

This looks really interesting. At the moment I started tackling #44 and #222, and the pygments token output is just a mess to start parsing. I'll give a shot a see if it can replace pygments and then improve the functionality.

Regarding starting up a new auto-documenter, I can only tell how this domain was started. The original author basically built the documenter directly upon autodoc for Python. This gave them a good start and basis.
However, the code is not the easiest to work with in my opinion. We still run into features that needs to be reimplemented, for instance #180. Even after maintaining the package for many years now, I still struggle with the Sphinx internals of Documenter and Directives 😵.

A different approach for autodoc is done in https://github.com/mozilla/sphinx-js. I hope this helps.

@watermarkhu
Copy link
Author

Good to hear!

I'm currently mostly struggling with setting up roles in a new domain in order to make cross-referencing possible eventually. Can we possibly setup a call?

@joeced
Copy link
Collaborator

joeced commented Mar 12, 2024

I tried textmate-grammar-python and looks way nicer with the tokenization (see #222 (comment)). Definitely makes it easier to deduce if it's method definition in an abstract class. Further, I really like the nested dictionaries, where I can just skip the body of a function, once I collected what I need.
It will require a lot of re-writing, but I'm quite tempted by it.

We can setup a call, but be warned I am by no means an expert in the cross-referencing. You can contact me at jorgen at cederberg dot be.

@joeced
Copy link
Collaborator

joeced commented Mar 12, 2024

@watermarkhu Two comments to https://github.com/watermarkhu/textmate-grammar-python:

  • Requirement for Python 3.11 is hard for sphinxcontrib-matlabdomain. I still support version 3.9.
  • Requirement for PyYAML >=7 is in conflict with conan version requirement of PyYAML>=6.0, <7.0.

Do you want me to add them as issues?

@watermarkhu
Copy link
Author

Good to see! Adding the issues would be great.

Let's discuss about 3.9 support on the PR that you submitted.

@apozharski
Copy link
Collaborator

Hello, not to step on any toes here but I would like to know if this effort has stalled (understandably, time is always a valuable commodity)? The matlab library I am a maintainer of is currently going through a major documentation pass and to that end I have allocated some time to working on tooling. As such, I think this would be a good place to start as it will help in closing #52, #54, #212, and #222.

Those four issues are currently my target to get done (perhaps in one fell swoop along with this one) as they would be very useful for our documentation. I have started an attempt to implement classdef class parsing working here and it seems like it should be doable to replace the current parsing code with something (at least marginally) better. Let me know if I have misread the situation.

@joeced
Copy link
Collaborator

joeced commented Jul 16, 2024

Hi. It was definitely stalled. I have had zero time to work on this project unfortunately. This week, I'll give it a shot. The most difficult issue to solve is still #222.

@apozharski
Copy link
Collaborator

If you want a starting point re: classes, I have now gotten most of a classdef parser written here https://github.com/apozharski/matlabdomain/blob/only-enums/sphinxcontrib/textmate_parser.py including argument blocks.

I am happy to continue work on it and submit a pr or you can pull out whatever is useful.

@apozharski
Copy link
Collaborator

As an aside there are definitely some bugs in the textmate parser (watermarkhu/textmate-grammar-python#66 (comment) for example), though after digging I suspect they are in the underlying grammar maintained by mathworks. I am currently looking at a possible fix for it though @watermarkhu may have the inner track on understanding the grammar format.

@joeced
Copy link
Collaborator

joeced commented Jul 16, 2024

If you want a starting point re: classes, I have now gotten most of a classdef parser written here https://github.com/apozharski/matlabdomain/blob/only-enums/sphinxcontrib/textmate_parser.py including argument blocks.

I am happy to continue work on it and submit a pr or you can pull out whatever is useful.

Thanks. I will take this as a starting point. It looks very useful already. If you have any PR's, I'll work on the development branch dev-textmate-grammar-for-parsing.

@joeced
Copy link
Collaborator

joeced commented Jul 16, 2024

@apozharski regarding priority of docstrings, they are as follows:

  • properties, enums, (events): Comments before the property have higher precedence, than a trailing comment. However, there cannot be empty lines before the property.
  • functions and classes: always after the function or classdef line.

@joeced
Copy link
Collaborator

joeced commented Jul 17, 2024

@apozharski I ran into an issue with class attributes and created an issue watermarkhu/textmate-grammar-python#67.

In the current parsing of classdef / method / property attributes I reuse the same method:

def attributes(self, idx, attr_types):

@apozharski
Copy link
Collaborator

@apozharski regarding priority of docstrings, they are as follows:

* properties, enums, (events): Comments before the property have higher precedence, than a trailing comment. However, there cannot be empty lines before the property.

* functions and classes: always after the function or classdef line.

Yep that is what I thought was the case. Thanks for clarifying. I will do some cleanup and get the routines to check for non-consecutive comments and submit a PR to your dev branch.

@apozharski
Copy link
Collaborator

@joeced After spending a few too many hours trying to fix the mathworks provided textmate grammar I am convinced that it is not worth continuing to force a square peg, a parsing system primarily designed for syntax highlighting, into the round hole that is using it for extracting structure. After doing some research there is a better alternative that is https://github.com/acristoffers/tree-sitter-matlab which is a matlab grammar for tree-sitter which uses a "proper" LR parser and produces a much more usable AST. It also does not have the seeming performance downsides: watermarkhu/textmate-grammar-python#68.

Over the last couple days I quickly threw together a working prototype with support for I believe the full suite of matlab syntax (argument blocks, enumeration blocks, events blocks etc.): https://github.com/apozharski/matlabdomain/blob/tree-sitter-dev/sphinxcontrib/mat_tree_sitter_parser.py

I think this is the direction this project should go in as it does not require us to fix yet more bugs in MATLAB-textmate-grammar, and it already supports the full feature-set we need. The one concern is that while tree-sitter is available on pyPI tree-sitter-matlab is not yet. I have reached out to the developer via issue acristoffers/tree-sitter-matlab#12 and they seem receptive to packaging it for pyPI.

@joeced
Copy link
Collaborator

joeced commented Jul 27, 2024

Hi @apozharski, thank you very much for looking into this. Taking the time and effort. Much appreciated! I'm away from a computer at the moment, but will get back to you in 2 weeks.

@joeced
Copy link
Collaborator

joeced commented Aug 9, 2024

hi @apozharski - I'm back now!

Did you have any time to work on using tree-sitter-matlab and would you try to make a pull request?
The most important thing for me is not what parsing library is used, but that it is better than the existing, where better equals:

Again - thanks for looking into this!

@apozharski
Copy link
Collaborator

Hello @joeced,
yes I had been sidelined last week on this due to some urgent work.
I have a branch with a 90-95% working parser and I will open a separate PR with it. In general I think it has simplified the code significantly however I would love to hear your feedback.

The latest work I have done on this is slowly beginning to fix things to get the tests back in working order (and I have found a bug in tree-sitter-matlab which I have a PR for there now.

Absolutely no problem!

@apozharski
Copy link
Collaborator

This is resolved by the move to the tree-sitter backend #261.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants