Document members with a larger range of unicode characters (e.g. σ) #13084

satr-cowi · 2024-10-30T06:19:28Z

Is your feature request related to a problem? Please describe.

I have functions/classes with a unicode character σ in the title. I am using autosummary to produce my docs, but these .rst files are not getting produced.

I can see there are regex expressions (I think in a few places, the autosummary parsing and in toctree generation) e.g.

sphinx/sphinx/ext/autosummary/generate.py

Line 687 in 116a430

autosummary_item_re = re.compile(r'^\s+(~?[_a-zA-Z][a-zA-Z0-9_.]*)\s*.*?')

Describe the solution you'd like
A larger range of allowable characters in methods/functions/attributes etc.

Perhaps we could disallow problematic characters rather than only allow a limited range.

Describe alternatives you've considered

If there is a good reason to hardcode this in (e.g. different OS having issues), perhaps there could be a configuration option to try and allow extra characters for your own build.

If it is decided to not change this and stick with the current set, documentation could be improved stating the allowable characters and the fact that anything else will not be documented.

jayaddison · 2024-11-01T12:24:27Z

A larger range of allowable characters in methods/functions/attributes etc.

Thank you for the suggestion @satr-cowi. Could you confirm what programming languages you're referring to? I'm guessing Python, based on the reference to autosummary - but perhaps also some of the other supported programming languages of Sphinx? (note that the acceptable symbols for names in each programming language may differ, which is why I ask)

satr-cowi · 2024-11-03T06:32:58Z

Ah yes, I was only referring to Python (and hadn't really thought about the others as I've never used Sphinx with them!).

Maybe options in config could be a good way to help deal with this? Unsure exactly what format would be best, but currently I couldn't find an easy way round it without a deep dive into all the regex expressions in the code.

jayaddison · 2024-11-05T02:00:35Z

Thanks @satr-cowi - yep, sometimes configuration options can help here; however, too much configurability can also create problems, so I'd note that attribute inheritance (in this case, from subclasses of Domain) might be another way to achieve the same.

What would be really nice would be to perform some kind of comparative analysis and produce a table per-programming-language of support for Unicode characters in various API objects. I'm not yet familiar enough to know what the common-denominator objects are across all domains (functions? classes? ...), but some kind of support status per-language might (in my opinion) shed some light of how to design/implement this.

jayaddison · 2024-11-05T02:01:06Z

(I would like to volunteer for that, but at the moment I think I need to step back from a few threads/tasks - so please don't hold your breath)

satr-cowi added the type:proposal a feature suggestion label Oct 30, 2024

jayaddison added the internals:other label Nov 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document members with a larger range of unicode characters (e.g. σ) #13084

Document members with a larger range of unicode characters (e.g. σ) #13084

satr-cowi commented Oct 30, 2024

jayaddison commented Nov 1, 2024

satr-cowi commented Nov 3, 2024

jayaddison commented Nov 5, 2024

jayaddison commented Nov 5, 2024

Document members with a larger range of unicode characters (e.g. σ) #13084

Document members with a larger range of unicode characters (e.g. σ) #13084

Comments

satr-cowi commented Oct 30, 2024

jayaddison commented Nov 1, 2024

satr-cowi commented Nov 3, 2024

jayaddison commented Nov 5, 2024

jayaddison commented Nov 5, 2024