Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Italic text is no longer recognized in some cases #28

Open
triska opened this issue Jun 26, 2017 · 11 comments
Open

Italic text is no longer recognized in some cases #28

triska opened this issue Jun 26, 2017 · 11 comments

Comments

@triska
Copy link
Member

triska commented Jun 26, 2017

The predicate crypto_n_random_bytes/2 contains in its PlDoc description the following fragment:

%   With this definition, you can generate a random 256-bit integer
%   _from_ a list of 32 random _bytes_:

Expectation: We expect the following rendering, as was also the case in earlier versions:

With this definition, you can generate a random 256-bit integer from a list of 32 random bytes:

Result: However, as of the most recent git version, this is unexpectedly rendered as:

With this definition, you can generate a random 256-bit integer _from_ a list of 32 random _bytes_:

This means that the italics are no longer recognized in this paragraph.

The more I proceed, by trial and error, to somehow get the intended effects out of PlDoc, the more I admire Richard O'Keefe, who accurately foretold this struggle already years ago as the result of not using a formally specified markup language...

@wouterbeek
Copy link
Contributor

@triska What's a good markup language that is both little effort to type but still has a formal specification? (Many of these modern markup languages, like Markdown, seem to have this property of not being able to interpret even common strings in the language unambiguously.)

@triska
Copy link
Member Author

triska commented Jun 26, 2017

HTML satisfies exactly two of your 3 criteria. That's already one more than Markdown. If it were an option, I would simply use LaTeX or one of its newer dialects for such descriptions throughout.

@wouterbeek
Copy link
Contributor

wouterbeek commented Jun 26, 2017

HTML and LaTeX both seem fine IMO. You would type a bit more in terms of markup, but that's about it. With LaTeX we would even allow future versions of the web site to show in-line formulas.

Out of curiosity, would you envision typing the paragraph tags explicitly (assuming HTML), as below, or would they be inferred?

% <p>With this definition, you can generate a random 256-bit integer
% <i>from</i> a list of 32 random <i>bytes</i>:</p>

The plDoc/Markdown discussion was a bit before my time / before I came to Prolog. Is it possible to summarize the argument?

@triska
Copy link
Member Author

triska commented Jun 26, 2017

As I said, I would prefer to use LaTeX or some other language that is at least minimally programmable, so I could write this for example as:

% \par With this definition, you can generate a random 256-bit~integer
% \i{from} a list of 32 random~\i{bytes}:\par

Please not the ~ to avoid bad line
breaks!

The PlDoc discussion is:

http://swi-prolog.iai.uni-bonn.narkive.com/Ko4IHNJm/pldoc-version-2-online-and-in-git

The argument that concluded the discussion at that time was:

I agree that you point out desirable properties of markup languages. I believe that, although desirable, they are not necessary properties and I think there are more useful things I can do with my time than `fixing' these issues that doesn't seem to bother that many people and do far less harm than many other issues with SWI-Prolog.

Of course, I'm happy to clarifies some of PlDoc's logic and support improvements of the documentation.

So, I am filing these issues, with at least some success thanks to Jan's improvements. However, I, likewise, prefer to do more useful things than continuously wrestle with these issues, where every other day the semantics of PlDoc change and what previously rendered as intended now is rendered differently (#26#27#28).

I remember at the time of the discussion, I did not see the full extent of the argument Richard was making, and although I considered PlDoc far from ideal also back then (due to too many missing features), it appeared to me to be "good enough" for a system that does not strive for excellence in all respects. Now, after being repeatedly among the few people who actually encounter all these issues, I see more completely and agree with what Richard was saying back then already.

In my view, the ideal state of documentation for Prolog predicates would also involve a declarative description of what the predicate means, or at least state some properties such as "Which exceptions can arise?" in a way that can be parsed and analyzed automatically. From that, we are currently far, far away still, not only with PlDoc but also in the ISO standard and other documents. Still, seeing that you are interested in semantics, why not aim high?

@wouterbeek
Copy link
Contributor

I think it's a little sad that the Markdown grammar is still causing so many issues several years later. Since Jan is an extremely good programmer this seems to indicate that it is indeed impossible to implement Markdown. I would not be opposed to a LaTeX-inspired parser for plDoc. I would probably use it myself.

In your mind, would LaTeX commands be interspersed with Markdown commands? That may lead to even more trouble... Would it be necessary to delimit LaTeX comments? If so, how would that be done?

It seems reasonable to assume that each comment is either in Markdown or in LaTeX. If there is at least one \{...} in a comment it would be parsed as LaTeX. Adding something like this to plDoc would not break any existing functionality AFAICS.

@JanWielemaker
Copy link
Member

Most of the issue is not so much that markdown cannot be parsed, but that the syntax started with TWiki, got aspects from Wikimedia, from JavaDoc, from Doxygen and finally from Github. Then Prolog has its own handy shortcuts like name/arity and its own term syntax that should not interfere too much with the wiki/markdown syntax. Next, where originally very few of these constructs could be nested, people started to ask for font changes inside links, nested font changes, etc. This quickly causes ambiguities.

All this stuff was sort of ad hoc added to the existing Twiki parser and lacking a proper test suite and well described interactions it became a mess.

As the dust around Markdown and all dialects has mostly settled we should define our position in that field and see what we support. Define what can be combined with what, establish a test suite and rewrite notably the last part that recognises fonts, links and other objects in running text.

And no, LaTeX makes it even worse. For one thing you need to convert it into HTML and that is very hard to do correctly, in particular if you want more or less readable HTML can can be searched and indexed properly. LaTeX is simply far too open ended. HTML is too much typing and escaping. Partial HTML support link GitHub doesn't make it much better either.

@wouterbeek
Copy link
Contributor

So IIUC we should (1) properly define our own Markdown + Prolog terms grammar, (2) implement the parser in such a way that it allows the vast majority of strings typed in dialect (1) to be parsed correctly, and (3) update the placed where legacy syntax (Wikimedia, Doxygen) is used in the comments.

Sounds like a summer vacation project?

@JanWielemaker
Copy link
Member

So IIUC we should (1) properly define our own Markdown + Prolog terms grammar, (2) implement the parser in such a way that it allows the vast majority of strings typed in dialect (1) to be parsed correctly, and (3) update the placed where legacy syntax (Wikimedia, Doxygen) is used in the comments.

I think yes. I think we can keep old markup as long as it is sufficiently unique that it is very unlikely to cause issues. Otherwise you need everybody to update their source and that is not very well appreciated. The main issue is with the basic font switches like italic, bold (which it used to be, but no longer on github) and their ambiguity because these symbols are quite common in Prolog descriptions. So you need to define what happens with _x*y_ _a*b_. These things are currently pretty much undefined. Originally, only alphanumerical text was allowed between __, **, etc and otherwise the much stronger _|a*b|_ was required.

Sounds like a summer vacation project?

But who wants to do it ...

@wouterbeek
Copy link
Contributor

@JanWielemaker if you can point to the relevant code parts then I'm volunteering to assemble a unified plDoc grammar that allows Prolog expressions to be typed and that also implements a Markdown-inspired form of markup. TWISI the outcome should be an EBNF grammar which is only a couple of lines long and that should cover 95-98% of current plDoc usage. When we publish the EBNF it is clear which kinds of nesting are and which are not supported.

@triska are you joining forces with me for this?

@triska
Copy link
Member Author

triska commented Jun 27, 2017

Yes, that would be great! Please also take a look at already filed issues like #16, #17 and #18 to see what the format should able to handle among other features.

@JanWielemaker
Copy link
Member

The wiki parser consists of two layers. The first recognises the structure (headers, lists, code, etc) and the second adds font changes and links to running text. I think that is the one we must deal with first as it is the most ambiguous one. The input of the second layer is a list of words (alpahnumerical sequences) represented as w(Word), white space, always represented as a single space and all remaining characters, represented by themselves. This parser is implemented using wiki_faces/3 in doc_wiki.pl.

An important feature of the parser must be to never fully reject any text. So, a grammar is nice, but we also need some way to handle/describe resolution of dubious input. Given presence, I think the github markdown should be the first input but we need to take care of the fact that there is a lot of Prolog notation that should not conflict and preferably be recognised and rendered appropriately without user intervention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants