Translate Erlang/OTP XML into EEP 48 docs chunk #3

josevalim · 2019-11-13T08:59:02Z

This is a thread to track the progress and any pending tasks that may arise during this effort.

The current idea is that Erlang/OTP XML will be converted to an HTML tree data-structure. This is better than storing HTML as text as we don't have to parse it again.

The tree will have the following format:

Ast :: [Node]
Node :: binary() | {Tag, Attributes, Ast} | {Tag, Ast}
Tag :: binary()
Attributes :: [{binary(), binary()}]

Note: We may remove {Tag, Ast} as it can be represented as {Tag, [], Ast}. The question is if the size will increase considerably or not by doing so.

Note: Currently, EEP 48 requires documentation to be stored as binaries. This means that the HTML tree would have to be converted as term_to_binary and then the whole chunk is converted to term_to_binary again. This is most likely OK, converting a binary with term_to_binary is a fast operation with minimum space increase, but worth noting. The mime type should be something "application/erlang-abstract-format+html".

You can track the current progress here.

TODO

Don't do

Forbid multiple <funcs> tags in the source xml
Wrap all <li> contents with a <p> (don't think I will do this as it will change the meaning of things)

The text was updated successfully, but these errors were encountered:

wojtekmach · 2019-11-15T12:28:17Z

I'd like to propose changing:

Attributes :: [{binary(), binary()}]

to:

Attributes :: #{binary() => binary()}

I think this would be more friendly to tooling as we we could easily pattern match on subset of attributes. I believe the order of the attributes shouldn't matter and duplicates shouldn't be allowed (e.g. xmerl does not allow duplicates)

KennethL · 2019-11-15T14:32:35Z

@wojtekmach yes I agree that a map would be more efficient for tooling here and I don't think it will increase the size of the doc chunk.

marianoguerra · 2019-11-15T17:44:42Z

I already have a poc for this, I'm on vacations this week but the next one I can put together a status report and next steps.

…

On Fri, 15 Nov 2019, 16:32 Kenneth Lundin, ***@***.***> wrote: @wojtekmach <https://github.com/wojtekmach> yes I agree that a map would be more efficient for tooling here and I don't think it will increase the size of the doc chunk. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#3>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAQW3YEFI2PK5XH3J6SYF3QT2XIHANCNFSM4JMWEBSQ> .

marianoguerra · 2019-11-15T17:46:53Z

To be more specific: I have a project that reads OTP xml docs and converts it to a tree data structure that then is translated to html, markdown and restructured text On Fri, 15 Nov 2019, 19:44 Mariano Guerra, <[email protected]> wrote:

…

I already have a poc for this, I'm on vacations this week but the next one I can put together a status report and next steps. On Fri, 15 Nov 2019, 16:32 Kenneth Lundin, ***@***.***> wrote: > @wojtekmach <https://github.com/wojtekmach> yes I agree that a map would > be more efficient for tooling here and I don't think it will increase the > size of the doc chunk. > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > <#3>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AAAQW3YEFI2PK5XH3J6SYF3QT2XIHANCNFSM4JMWEBSQ> > . >

KennethL · 2020-01-17T12:35:20Z

I have been thinking about the format of the doc chunk for Erlang for some time and during that I have also written a program which translate the OTP XML into the doc chunk.
As stated before I propose that the doc chunk have a structured term format and not a binary as stated in the EEP 48. Of course it is possible to convert it to a binary with term_to_binary but I don't see the point in not allowing the term as is. The format is anyway tagged with something like "application/erlang-doc-format+html".
The tricky part is to decide the exact format and I think it must be an iterative process during the development of the translator and the consumers of the format.
I see the consumers as:

a rendering function for producing a text printout in the shell (help M F A)
Exdoc generating html and epub

@josevalim wrote
The tree will have the following format:

Ast :: [Node]
Node :: binary() | {Tag, Attributes, Ast} | {Tag, Ast}
Tag :: binary()
Attributes :: [{binary(), binary()}]

But during my work and the comment from @wojtekmach I think I would like to do a minor change to this:

Ast :: [Node]
Node :: binary() | {Tag, Attributes, Ast} | {Tag, Ast}
Tag :: atom()
Attributes :: #{atom() => term()}]

I don't really see a problem with having the Tag and the key for attributes as an atom() vs. a binary() but this is a detail. The more important part is what tags there are. I have worked with the idea of having the well known html names when possible.
A complete example will come soon.

My idea is that we extend the OTP build with producing the doc chunks in addition to the current html, man and pdf so that anyone can build OTP with doc chunks (optional). That version should also contain API functions for fetching the doc chunks and presenting them in the shell. I hope we can have this already in OTP 23 (May) at least as an experimental feature as all the formats and API's might not be 100% settled

erszcz · 2020-01-17T13:15:00Z

@KennethL That's great news!

Are pull requests to EDoc based on the work done so far welcome?

josevalim · 2020-01-17T14:27:48Z

As stated before I propose that the doc chunk have a structured term format and not a binary as stated in the EEP 48.

Agreed. I will send a PR to update EEP 48. I propose, however, that the format must then end with +erlang, so it is relatively easy for tools to detect they have an Erlang term in there. Is that ok?

I don't really see a problem with having the Tag and the key for attributes as an atom() vs. a binary() but this is a detail.

Thanks for the updates! Our current plan is to change ExDoc to work on HTML ASTs. So for Elixir, ExDoc will convert the Markdown to a HTML tree and then traverse it. For Erlang, we will work directly with the tree stored in the chunk. It may be that the tools (markdown processor and Erlang) do not agree on the format of the tree, so we will need some pre-processing. Furthermore, we want the simplest possible tree while processing, binary() | {Tag, Attributes, Ast}, but in the interest of storage, having a {Tag, Ast} is nice, especially if Attributes is a map instead of a list.

Using a map instead of list for attributes also means we lose ordering. Once again, this is probably not a problem for Erlang, but a Markdown processor may want to keep the user ordering, and therefore they are forced to use a list.

In other words, I think it is fine to go with your proposal. Realistically speaking, it is most likely that ExDoc will have to normalize those trees. Even if they have the same AST type, I wouldn't be surprised if certain representation are sometimes different.

My idea is that we extend the OTP build with producing the doc chunks in addition to the current html, man and pdf so that anyone can build OTP with doc chunks (optional).

Should we make the building of docs chunk enabled by default? I think this would be the simplest to have the docs accessible to everyone.

KennethL · 2020-01-20T08:25:40Z

@KennethL That's great news!

Are pull requests to EDoc based on the work done so far welcome?

Yes puul request for Edoc are welcome, but try to make them compatible so that current usage is not broken, I assume even the not so nice layout it produces today will have to be kept for a while. But for the output there are support for plugins which can produce whatever format. These plugins don't even need to be part of Edoc, but in this case I think we want it to be part of Edoc.

KennethL · 2020-01-20T09:43:47Z

As stated before I propose that the doc chunk have a structured term format and not a binary as stated in the EEP 48.

Agreed. I will send a PR to update EEP 48. I propose, however, that the format must then end with +erlang, so it is relatively easy for tools to detect they have an Erlang term in there. Is that ok?

Thinking again about if format (in EEP-48) should be a binary or a structured term, a binary with the term in external format might have its advantages when using online help where the doc for only one function at a time is requested. It would then be less computation to fetch the whole chunk and then convert only the doc for one function with binary_to_term. What do you think?

I don't really see a problem with having the Tag and the key for attributes as an atom() vs. a binary() but this is a detail.

Thanks for the updates! Our current plan is to change ExDoc to work on HTML ASTs. So for Elixir, ExDoc will convert the Markdown to a HTML tree and then traverse it. For Erlang, we will work directly with the tree stored in the chunk. It may be that the tools (markdown processor and Erlang) do not agree on the format of the tree, so we will need some pre-processing. Furthermore, we want the simplest possible tree while processing, binary() | {Tag, Attributes, Ast}, but in the interest of storage, having a {Tag, Ast} is nice, especially if Attributes is a map instead of a list.

Using a map instead of list for attributes also means we lose ordering. Once again, this is probably not a problem for Erlang, but a Markdown processor may want to keep the user ordering, and therefore they are forced to use a list.

Having a map for the attributes instead of a list is not super important for me. If the order sometimes is important we can keep it as a list.

In other words, I think it is fine to go with your proposal. Realistically speaking, it is most likely that ExDoc will have to normalize those trees. Even if they have the same AST type, I wouldn't be surprised if certain representation are sometimes different.

My idea is that we extend the OTP build with producing the doc chunks in addition to the current html, man and pdf so that anyone can build OTP with doc chunks (optional).

Should we make the building of docs chunk enabled by default? I think this would be the simplest to have the docs accessible to everyone.

Yes I think building docs shunks could be default, but we have to decide if the chunk goes into the .beam file or if it goes into a separate file per module. If it goes into the .beam file I think we have to integrate is more with the ´erlc´ command and the compiler or of course it could be an extra separate pass where docs chunks are added to the .beam files. I am not sure what I think yet, we have to discuss it more in the OTP team (will probably happen this week). Do you have any input on that?

wojtekmach · 2020-01-20T10:58:00Z

Having a map for the attributes instead of a list is not super important for me. If the order sometimes is important we can keep it as a list.

I was the one proposing the attributes map but I agree that sticking with a list might be better after all.

Yes I think building docs chunks could be default, but we have to decide if the chunk goes into the .beam file or if it goes into a separate file per module.

My understanding is for modules that are part of erts there's no beam file so in order to eventually do: erl> h(erlang)., erl> h(atomics). etc we'd need the chunk somewhere else, e.g. in doc/chunks/erlang.chunk per EEP 48.

Perhaps OTP stores erts docs in .chunk but other apps docs in .beam?

If it goes into the .beam file I think we have to integrate is more with the ´erlc´

I think this is the most appealing option: when working in rebar3/mix/etc projects when using dependencies they would be compiled with docs with no extra work (maybe just a compiler flag to enable chunks generation) and thus the documentation would be immediately available. Otherwise, if it's part of edoc, then we'd need to run edoc for all deps (which could of course be automated somehow).

josevalim · 2020-01-20T11:06:39Z

Thinking again about if format (in EEP-48) should be a binary or a structured term, a binary with the term in external format might have its advantages when using online help where the doc for only one function at a time is requested. It would then be less computation to fetch the whole chunk and then convert only the doc for one function with binary_to_term. What do you think?

I think both options are ok. Having it as a binary is definitely more consistent and it doesn't require changing EEP-48. If it has actual performance benefits in some situations, then even better!

If it goes into the .beam file I think we have to integrate is more with the ´erlc´

My initial question is which docs would be integrated into erlc. We could integrate edoc, but it would require extra-passes, and we are not sure we want to stick with edoc anyway. erl_docgen is mostly specific to Erlang/OTP so maybe it is not worth integrating it either. So maybe it is best to stick with separate .chunk files for now, due to the reasons given by Wojtek, and we can worry about tightly integrating with erlc in the future?

erszcz · 2020-01-20T12:11:00Z

maybe it is best to stick with separate .chunk files for now, due to the reasons given by Wojtek, and we can worry about tightly integrating with erlc in the future?

I was also going to suggest that a separate pass might make sense when the feature is still experimental, given the pass is different for OTP and non-OTP modules. For building project releases that include OTP libs some care has to be taken not to unintentionally skip the separate .chunk files.

garazdawi · 2020-02-07T14:24:37Z

Hello,

For the last week or so I've been working on making this a reality. You can track my progress here: https://github.com/garazdawi/otp/tree/lukas/kernel/code-chunk-lookup

So far I've implemented help for all functions and types in all of Erlang/OTP.

Some notes:

The documentation is stored in .chunk files for all modules.
It is generated when the documentation is built, not when the modules are compiled. So to build you need to do make && make docs
There is a function called code:get_doc/1 that fetches the documentation from either a beam chunk or the .chunk file.

Here is an example of what is in the chunk:

22> rp(shell_docs:get_doc(erlang,timestamp,0)).
[{{function,timestamp,0},
  0,
  [<<"timestamp/0">>],
  #{<<"en">> =>
        [{p,[],[<<"Current Erlang System time.">>]},
         {type,[],[{v,[{name,"timestamp"}],[]}]},
         {p,[],
            [<<"Returns current ">>,<<"Erlang system time">>,
             <<" on the format ">>,
             {c,[],[<<"{MegaSecs, Secs, MicroSecs}">>]},
             <<". This format is the same as ">>,<<>>,
             {c,[],[<<"os:timestamp/0">>]},
             <<" and the deprecated ">>,<<>>,
             {c,[],[<<"erlang:now/0">>]},
             <<" use. The reason for the existence of ">>,
             {c,[],[<<"erlang:timestamp()">>]},
             <<" is purely to simplify use for existing code that assumes this time stamp format. Current Erlang system time can more efficiently be retrieved in the time unit of your choice using ">>,
             <<>>,
             {c,[],[<<"erlang:system_time/1">>]},
             <<".">>]},
         {p,[],
            [<<"The ">>,
             {c,[],[<<"erlang:timestamp()">>]},
             <<" BIF is equivalent to:">>]},
         {code,
             [{type,"none"}],
             [<<"timestamp() ->\n    ErlangSystemTime = erlang:system_time(microsecond),\n    MegaSecs = ErlangSystemTime div 1000000000000,\n    Secs = ErlangSystemTime div 1000000 - MegaSecs*1000000,\n    MicroSecs = ErlangSystemTime rem 1000000,\n    {MegaSecs, Secs, MicroSecs}.">>]},
         {p,[],
            [<<"It, however, uses a native implementation that does not build garbage on the heap and with slightly better performance.">>]},
         {note,[],
             [<<>>,
              {p,[],
                 [<<"This time is ">>,
                  {em,[],[<<"not">>]},
                  <<" a monotonically increasing time in the general case. For more information, see the documentation of ">>,
                  <<"time warp modes">>,<<" in the User's Guide.">>]},
              <<>>]}]},
  #{spec =>
        [{attribute,1549,spec,
             {{erlang,timestamp,0},
              [{type,1549,bounded_fun,
                   [{type,1549,'fun',
                        [{type,1549,product,[]},{var,1549,'Timestamp'}]},
                    [{type,1550,constraint,
                         [{atom,1550,is_subtype},
                          [{var,1550,'Timestamp'},
                           {user_type,1550,timestamp,[]}]]}]]}]}}]}}]

And here it is when rendered:

20> h(erlang,timestamp,0).

-spec erlang:timestamp() -> Timestamp when Timestamp :: timestamp().


Current Erlang System time.

Types:
  -type timestamp() ::
          {MegaSecs :: non_neg_integer(),
           Secs :: non_neg_integer(),
           MicroSecs :: non_neg_integer()}.


Returns current Erlang system time on the format {MegaSecs, Secs, 
MicroSecs}. This format is the same as os:timestamp/0 and the 
deprecated erlang:now/0 use. The reason for the existence of 
erlang:timestamp() is purely to simplify use for existing code that 
assumes this time stamp format. Current Erlang system time can more 
efficiently be retrieved in the time unit of your choice using 
erlang:system_time/1.

The erlang:timestamp() BIF is equivalent to:

  timestamp() ->
      ErlangSystemTime = erlang:system_time(microsecond),
      MegaSecs = ErlangSystemTime div 1000000000000,
      Secs = ErlangSystemTime div 1000000 - MegaSecs*1000000,
      MicroSecs = ErlangSystemTime rem 1000000,
      {MegaSecs, Secs, MicroSecs}.

It, however, uses a native implementation that does not build garbage 
on the heap and with slightly better performance.

Note:
  This time is not a monotonically increasing time in the general 
  case. For more information, see the documentation of time warp modes 
  in the User's Guide.

Remaining issues:

The rendering is not perfect yet, most significantly there are whitespaces that should not be there that are emitted.
At the moment there are duplicate entries in the docs list for some functions. For instance erlang:system_info/1 has 8 entries, one for each <func> tag in the source xml. I don't have any good idea about how to store this type of documentation in the EEP-48 chunks. The best I can some up with is to modify {{Kind,Name,Arity},Anno,Sig,Doc,Meta} to {{Kind,Name,Arity},[{Anno,Sig,Doc,Meta}].
Since is not implemented because I don't know what to do about functions like this where separate clauses have different sinces. Same for deprecations...
Links and cross-references are not really done as the shell rendered does not need them.
Some of the tags need to be converted to better/other names. You can view the current names here: https://github.com/garazdawi/otp/blob/ae7c40923550d4021e31c70e6fa0e32f9b3bd64b/lib/stdlib/src/shell_docs.erl#L35. Specifically v and d, but maybe also note, warning and others.
Add a version to the mimetype so that we know what we are working with when rendering.

josevalim · 2020-02-07T14:36:38Z

@garazdawi fantastic!

The best I can some up with is to modify {{Kind,Name,Arity},Anno,Sig,Doc,Meta} to {{Kind,Name,Arity},[{Anno,Sig,Doc,Meta}].

I agree. We may need to bump the version of the format.

Some of the tags need to be converted to better/other names

You probably need to decide if you are going to emit HTML or a HTML-like tree. Have you had any thoughts about this?

garazdawi · 2020-02-07T14:50:45Z

You probably need to decide if you are going to emit HTML or a HTML-like tree. Have you had any thoughts about this?

I think it will have to be HTML-like. Some of the things I do now could be translated. Not sure how easy it would be to do with the <type> tag though... maybe it is worth an attempt to do and the just add some extra attributes to the HTML tags for the renderer to use.

What I'm trying to avoid is for different renderers to have to do very complex parsing loops in order to figure out what to print.

josevalim · 2020-02-07T14:55:17Z

@garazdawi is that the tag that processes the spec metadata? So it seems it can be added anywhere in the text? Would you a lot if it always appears at the top, before any of the paragraphs? I assume there was a reason why it was originally done as so.

garazdawi · 2020-02-07T15:04:16Z

@garazdawi is that the tag that processes the spec metadata?

No. It is the tag that either described what a type the function uses looks like (e.g. https://github.com/erlang/otp/blob/master/lib/asn1/doc/src/asn1ct.xml#L78-L89) or the name of the type if it is part of the code (e.g. https://github.com/erlang/otp/blob/master/lib/stdlib/doc/src/calendar.xml#L135-L138).

The spec metadata is rendered instead of the signature when present. I decided to not put the rendered version of the spec in the signature as the spec format is more flexible and easier to create links etc from.

garazdawi · 2020-02-07T15:08:45Z

Would you a lot if it always appears at the top, before any of the paragraphs? I assume there was a reason why it was originally done as so.

It always appears at the top at the moment. Or rather just after the <fsummary>, which is translated to the first <p> in each function. Thinking some more about it maybe I could translate the types into an <ul> at the top.

josevalim · 2020-02-07T15:13:45Z

The spec metadata is rendered instead of the signature when present. I decided to not put the rendered version of the spec in the signature as the spec format is more flexible and easier to create links etc from.

And to clarify, the spec follows the same Erlang AST format, right? We should probably document this in EEP 48 too.

Thinking some more about it maybe I could translate the types into an
at the top.

Sounds like a good call to me. <ul> with each entry using the proper code quotation.

garazdawi · 2020-02-07T15:26:19Z

And to clarify, the spec follows the same Erlang AST format, right?

Yes.

erszcz · 2020-02-10T09:31:48Z

The best I can some up with is to modify {{Kind,Name,Arity},Anno,Sig,Doc,Meta} to {{Kind,Name,Arity},[{Anno,Sig,Doc,Meta}].

I agree. We may need to bump the version of the format.

Just an idea looking at the problem from a different angle - since functions in Erlang are actually uniquely identified by their M:F/A this should not be a frequent problem. Maybe the few places where this occurs could be rewritten to fit into the current {{Kind,Name,Arity},Anno,Sig,Doc,Meta} format? I'm not sure if that's feasible, so please treat this just as food for thought.

garazdawi · 2020-02-10T09:42:28Z

There are 58 places (unless my grep skills fail me) in the docs where this is feature is used. So it could be done. However, in my opinion, the (HTML) docs become easier to navigate by using the method and I don't really see an alternative notation that would work as well.

I'll leave that part to rest for a while and focus on other parts of the doc chunks first and then come back to it. Maybe we'll have an epiphany somewhere along the way.

garazdawi · 2020-02-10T14:05:27Z

Pushed an update where types are like this:

{ul,[{class,"types"}],[{li,[{name,"timestamp"}],[]}]},

for types that are derived from the metadata. And like this:

{ul,[{class,"types"}],
     [{li,[{class,"type"}],[<<"MatchDesc = [MatchNum]">>]},
      {li,[{class,"type"}],
          [<<"MatchNum = {matched, node(), integer()} | {matched, node(), 0, RPCError}">>]},
      {li,[{class,"type"}],[<<"RPCError = term()">>]}]},

for inline type desciptions.

Also I changed note,warning,doi,dont to be a p with a class. e.g.

{p,[{class,"note"}],
    [<<>>,
     {p,[],
         [<<"This time is ">>,
          {em,[],[<<"not">>]},
          <<" a monotonically increasing time in the general case. For more information, see the documentation of ">>,
          <<"time warp modes">>,<<" in the User's Guide.">>]},
     <<>>]}]}

That leaves type_desc as the only non-html tag.

garazdawi · 2020-02-18T17:09:08Z

I plan to open a PR for my changes tomorrow. There are still some issues that are unresolved, but they may have to be fixed later as OTP 23-rc1 is set to be released soon and I want this to be part of that.

josevalim · 2020-02-18T17:10:43Z

Fantastic! If there is anything I can do or if you want to jump on a call to discuss the unresolved issues, please let me know!

garazdawi · 2020-02-18T17:19:02Z

The main unresolved issue is still what to do with the multiple function definitions. I'm starting to lean towards actually just re-writing the docs to not use that feature any more...

One thing that is left is to use a second renderer from the chunk sources. Right now I've only tested with my own shell_docs renderer, but as far as I understand there is a vision to also use LSP and ExDoc to generate docs from these chunks. It would be great to know if there is anything that needs to be adapted in the format in order to make that easier.

josevalim · 2020-02-18T18:38:54Z

The main unresolved issue is still what to do with the multiple function definitions. I'm starting to lean towards actually just re-writing the docs to not use that feature any more...

My current thinking is that EEP 48 will become more robust if we change it to support multiple entries. For Erlang and Elixir it boils down to an implementation detail but I can imagine a language in the future that may want to document each of them individually, especially statically typed languages. In the worst case scenario, if we don't want to use the feature, it just ends up being a one item list. :)

But if you think it is best to save this fight for later when the need arises, then that's ok by me too!

Once the PR is available, we will start looking into giving it a try on ExDoc and give you some feedback.

KennethL · 2020-02-18T18:49:52Z

As discussed earlier we should also have a plugin for edoc to produce the same format as we now are producing from the OTP xml. Is there anyone working on that part? I would like to have it as a PR to include in OTP and as a pure extension, leaving the old functionality of edoc compatible (as much as possible). Kenneth

…

On Tue, Feb 18, 2020, 19:38 José Valim ***@***.***> wrote: The main unresolved issue is still what to do with the multiple function definitions. I'm starting to lean towards actually just re-writing the docs to not use that feature any more... My current thinking is that EEP 48 will become more robust if we change it to support multiple entries. For Erlang and Elixir it boils down to an implementation detail but I can imagine a language in the future that may want to document each of them individually, especially statically typed language. In the worst case scenario, if we don't want to use the feature, it just ends up being a one item list. :) But if you think it is best to save this fight for later when the need arises, then that's ok by me too! Once the PR is available, we will start looking into giving it a try on ExDoc and give you some feedback. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABFWSFZ2T6RRYDWJXS4A3LRDQTL7ANCNFSM4JMWEBSQ> .

garazdawi · 2020-03-18T10:10:32Z

No. Even if it is named "type-" it is not actually a documented type, it is just a link to a "<a name>" marker.

garazdawi · 2020-03-18T10:13:26Z

hmm, on second thought... maybe it should be... need to dig more.

garazdawi · 2020-03-18T16:48:58Z

So, type links are apparently a mess... I will try to make something better...

essen · 2020-03-20T08:35:07Z

I seem to remember having read somewhere a suggestion of how links should be represented in erlang+html, but can't find it now.

My current implementation has it like this:
<seemfa marker="stdlib:lists#foldl/3">lists:foldl/3</seemfa>
  becomes
{a,[{href,"stdlib:lists#foldl/3"},
    {rel,"https://erlang.org/doc/link/seemfa"}],
   [<<"lists:foldl/3">>]}]}
If anyone has any better suggestions I'm all ears.

Probably erlang/otp#2545 (comment)

Your links look good now!

garazdawi · 2020-03-20T08:55:25Z

@wojtekmach in my branch I've now made it so that has to point to a documented type that is accessible through a doc chunk. I've also written a validator that makes sure that points to documented functions and points to documented types. Next is to extend it to also check that <see*> points to actual markers in the docs and not to generated header markers. Maybe I'll have to introduce a <seetitle...>.

Now I have to focus on the OTP-23 rc2 though as it is about to be released.

wojtekmach · 2020-03-23T09:36:10Z

@garazdawi Great! If you'd like me to test your changes, which branch is it? But happy to wait for RCs to test them out, so no pressure.

garazdawi · 2020-03-23T10:04:41Z

There is a link to the latest changes in the top post of this issue.

wojtekmach · 2020-03-23T10:06:47Z

Got it, thanks.

…

On 23 Mar 2020, at 11:04, Lukas Larsson ***@***.***> wrote: There is a link to the latest changes in the top post of this issue. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAASSJ6MO5TN7YESHPY2PW3RI4X4XANCNFSM4JMWEBSQ>.

garazdawi · 2020-03-23T13:54:32Z

erlang/otp#2573 opened a pr with the link changes.

garazdawi · 2020-03-24T16:10:37Z

Yes, I'm aware of this. My idea at the moment was to mark different function signatures with equal docs to be duplicate entries somehow in the metadata.

So we have two options:

Repeat the docs and use metadata to exclude. This makes the chunk larger. When doing h(gen_server, call) and tools like ExDoc now need to filter out duplicate content based on the metadata. Looking up by h(gen_server, call, 2) is simple though.

Annotate that one function represents other arities, skipping the doc for other arities. The chunk is smaller. h(gen_server, call) just works, h(gen_server, call, 2) though is slightly more complicated.

We have 2 in Elixir. What I have implemented for 2 is that when looking up for gen_server:call/2, we first lookup for gen_server:call, and then get the first entry with equal arity or with the represented extra_arities/default in the metadata.

Once we pick an approach, let's document it in the EEP 48 too. :)

So, what I did for this is that I introduced a metadata tag för each doc entry that is called equiv. If it is set, then this function should use the documentation from the entry it is pointing to, but it should still use the local signature. Example:

98> shell_docs:get_doc(ssl,transport_accept,2).
[{{function,transport_accept,2},
  [{file,"ssl.erl"},{location,597}],
  [<<"transport_accept/2">>],
  #{},
  #{edit_url =>
        <<"https://github.com/erlang/otp/edit/maint/lib/ssl/doc/src/ssl.xml#L1808">>,
    equiv => {function,transport_accept,1},
    signature =>
        [{attribute,597,spec,
                    {{transport_accept,2},
                     [{type,597,bounded_fun,...}]}}]}].

josevalim · 2020-03-28T09:33:46Z

Hi @garazdawi! I have just given RC2 a try and all of my previous comments have been addressed. Thank you for that! I have just two tiny remarks left:

I don't see the equiv metadata on RC2, I assume I didn't make the cut? :)
Do we need to include the signature in metadata when they are just typespecs? Almost all tools already read the typespecs if necessary, so we can skip them here if you want to.

Have a good weekend!

garazdawi · 2020-03-28T11:35:13Z

no, it did not make it. It will be part of rc3, and it should make it into master next week.
I thought about that when doing it the first time... but decided against it for reason I can't remember right now.

Edit: just remembered. I did it because I didn't want to deal with the beam file and the doc chunk becoming out of sync.

garazdawi · 2020-04-03T08:33:42Z

Opened a new PR with more changes: erlang/otp#2583

garazdawi · 2020-04-17T08:33:14Z

Changes merged to master.

wojtekmach · 2020-04-26T18:25:12Z

I noticed a problem with some types in the stdlib app, they the have Doc part of the chunk as #{}. It should either be #{<<"en">> => ...} or an atom none or hidden.

erl> {ok, {docs_v1, _, _, _, _, _, Docs}} = code:get_doc(array),
  [{KindNameArity, Doc} || {KindNameArity, _, _, Doc, _} <- Docs, Doc == #{}].
[{{type,array,0},#{}},
 {{type,array_indx,0},#{}},
 {{type,array_opts,0},#{}},
 {{type,array_opt,0},#{}},
 {{type,indx_pairs,1},#{}},
 {{type,indx_pair,1},#{}}]

the remaining chunks look good. I'm running OTP 23.0-rc3.

garazdawi · 2020-04-27T06:08:40Z

So that is when a type is documented, but there is no text describing it. i.e. http://erlang.org/doc/man/array.html#type-array_indx

I think the correct thing to do is to put #{ <<"en">> => [] }. What do you think?

garazdawi · 2020-04-27T06:18:08Z

Actually, why can't it be an empty map which would mean the same as #{ <<"en">> => [] } but for all languages, not just en?

wojtekmach · 2020-04-27T07:20:51Z

When a thing should show up in docs but doesnt have docs text, thats exactly what the atom none was supposed to mean. Otherwise, there is no use case for it and shouldnt be in eep48. I don’t remember if there was a reason to pick none over an empty map.

garazdawi · 2020-04-27T09:12:47Z

In the OTP implementation of EEP-48 I make a distinction between "the user has not yet written any docs for this" and "the user has decided that there should be no docs for this, but it is still a public type". For the first I emit none and for the latter I emit and empty map.

When none is given, I print "there is no documentation for this type". When #{} is given I don't print anything, which is the same as if #{ <<"en">> -> <<>> } has been given.

I'm not against changing this so that I always emit none and print that string, I just wanted to explain why I differentiate between the two.

josevalim · 2020-04-27T09:44:21Z

In Elixir, we are doing the following:

:nonemeans "no docs written (yet) but the function is public"
:hidden means "docs do not exist because function should not be invoked externally"

So our usage of :none is consistent. In Elixir, we don't have a public API for saying "i want these docs to be empty", you just literally write an empty string, which means the empty string will be there in the chunk but I can see the empty map being potentially a better API for this, if your documentation system allows absence to be expressed in such ways.

Therefore I propose to not change Erlang/OTP. @garazdawi @wojtekmach do you think this makes sense?

wojtekmach · 2020-04-27T09:51:04Z

I'm fine with keeping things as is, just a note that we will have to update our tools:

iex(2)> t :array.array
@type array() :: array(term())

** (FunctionClauseError) no function clause matching in IEx.Introspection.translate_doc/1

(same for ExDoc)

erszcz · 2020-05-06T09:33:19Z

When running ExDoc on EDoc generated chunks I'm getting quite a few warnings like this (also about forms/0:

warning: documentation references type :erl_syntax.syntaxTree/0 but it is undefined or private
  src/edoc.erl:0: t:edoc.syntaxTree/0

Initially I was thinking it's an ExDoc/EDoc integration issue, but it doesn't seem to be the case. On OTP 23 rc3:

> ht(erl_syntax, syntaxTree, 0).
{error,type_missing}

Yet, at the same time I can see the relevant -export_type attribute in erl_syntax.erl.

Moreover, the types are listed in the types metadata entry:

> {ok, Doc} = code:get_doc(erl_syntax), maps:get(types, element(6, Doc)).
...
  {syntaxTree,0} =>
      {attribute,452,type,
                 {syntaxTree,{type,452,union,
                                   [{user_type,452,tree,[]},
                                    {user_type,452,wrapper,[]},
                                    {user_type,452,erl_parse,[]}]},
                             []}},
...

But not in #docs_v1.docs - actually, there are no types at all there:

13> {ok, D} = code:get_doc(erl_syntax), lists:usort( [element(1, element(1, Doc)) || Doc <- D#docs_v1.docs] ).
[function]

garazdawi · 2020-05-06T09:55:42Z

The documentation for erl_syntax is written using Edoc which means that erl_docgen first has to convert it to the internal xml format. In that translation some information is lost, one such thing is which types are do be documented for a module.

So any module documented by Edoc in the Erlang/OTP source tree will (for now) not have any type documentation.

wojtekmach · 2020-05-06T09:58:05Z

but I can see the empty map being potentially a better API for this (...) Therefore I propose to not change Erlang/OTP

Just a quick note about that, I believe we need to update EEP 48 too since it currently has:

Doc :: #{DocLanguage := DocString} | none | hidden,

erszcz · 2020-05-07T10:18:13Z

I've spotted two small things. The first one is formatting with ht / hcb:

4> ht(proplists).
   proplists

These types are documented in this module:

  -type property() :: atom() | tuple().

  -type proplist() :: [property()].
ok
5> hcb(proplists).
   proplists

There are no callbacks in this module
ok
6>

There's no leading empty line before the module name.

The second one is about the signature fields. For functions it's Name/Arity. For types, though, it seems to be a synthesized signature like -type Name(Arg1, ..., ArgN) :: term(). - i.e. the definition is always term(). For example:

{{type,indx_pairs,1},
 [{file,"array.erl"},{location,181}],
 [<<"-type indx_pairs(Arg1) :: term().">>],
 #{},
 #{signature =>
       [{attribute,181,type,
                   {indx_pairs,{type,181,list,
                                     [{user_type,181,indx_pair,[{var,181,'Type'}]}]},
                               [{var,181,'Type'}]}}]}},
{{type,indx_pair,1},
 [{file,"array.erl"},{location,180}],
 [<<"-type indx_pair(Arg1) :: term().">>],
 #{},
 #{signature =>
       [{attribute,180,type,
                   {indx_pair,{type,180,tuple,
                                    [{ann_type,180,
                                               [{var,180,'Index'},{user_type,180,array_indx,[]}]},
                                     {var,180,'Type'}]},
                              [{var,180,'Type'}]}}]}}]}}

For indx_pairs/1 the pretty-printed type would be:

4> IndxPairs = {attribute,181,type,
4>                                  {indx_pairs,{type,181,list,
4>                                                    [{user_type,181,indx_pair,[{var,181,'Type'}]}]},
4>                                              [{var,181,'Type'}]}}.
{attribute,181,type,
           {indx_pairs,{type,181,list,
                             [{user_type,181,indx_pair,[{var,181,'Type'}]}]},
                       [{var,181,'Type'}]}}
6> iolist_to_binary(erl_pp:form(IndxPairs)).
<<"-type indx_pairs(Type) :: [indx_pair(Type)].\n">>

I'm proposing to standardise the signature for types to the following:

a pretty-printed signature corresponding 1-to-1 to the spec metadata for tools that don't understand the spec
a fallback Name/Arity if pretty printing when creating the chunk is not possible
but not a synthetic pretty-printed spec which does not correspond to the spec metadata

Tools, on the other hand, should prefer the use of the spec metadata if they can and it's present (as shell_docs does), while use signature in other cases. What do you think?

garazdawi · 2020-05-07T14:12:49Z

I'm for option #2 as it would look the most as what the functions look like. It's not pretty, but the user should never have to see it anyways.

josevalim added the Erlang/OTP label Nov 13, 2019

garazdawi mentioned this issue Feb 11, 2020

EEP 48: Provide a distinction between code source and documentation source #1

Closed

garazdawi mentioned this issue Mar 20, 2020

Updates of EEP-48 support for OTP-23 rc2 erlang/otp#2568

Merged

wojtekmach mentioned this issue Aug 11, 2020

Details about the HTML AST elixir-lang/ex_doc#1242

Closed

erszcz mentioned this issue Nov 13, 2020

Generate docs for Erlang projects elixir-lang/ex_doc#1291

Closed

erszcz mentioned this issue Sep 7, 2023

Update EEP statuses erlang/eep#52

Merged

Translate Erlang/OTP XML into EEP 48 docs chunk #3

Translate Erlang/OTP XML into EEP 48 docs chunk #3

Comments

josevalim commented Nov 13, 2019 • edited by garazdawi Loading

TODO

Don't do

wojtekmach commented Nov 15, 2019

KennethL commented Nov 15, 2019

marianoguerra commented Nov 15, 2019 via email

marianoguerra commented Nov 15, 2019 via email

KennethL commented Jan 17, 2020

erszcz commented Jan 17, 2020

josevalim commented Jan 17, 2020

KennethL commented Jan 20, 2020

KennethL commented Jan 20, 2020

wojtekmach commented Jan 20, 2020 • edited Loading

josevalim commented Jan 20, 2020

erszcz commented Jan 20, 2020

garazdawi commented Feb 7, 2020

josevalim commented Feb 7, 2020

garazdawi commented Feb 7, 2020

josevalim commented Feb 7, 2020

garazdawi commented Feb 7, 2020

garazdawi commented Feb 7, 2020

josevalim commented Feb 7, 2020 • edited Loading

garazdawi commented Feb 7, 2020

erszcz commented Feb 10, 2020

garazdawi commented Feb 10, 2020

garazdawi commented Feb 10, 2020

garazdawi commented Feb 18, 2020

josevalim commented Feb 18, 2020

garazdawi commented Feb 18, 2020

josevalim commented Feb 18, 2020 • edited Loading

KennethL commented Feb 18, 2020 via email

garazdawi commented Mar 18, 2020 • edited Loading

garazdawi commented Mar 18, 2020

garazdawi commented Mar 18, 2020

essen commented Mar 20, 2020

garazdawi commented Mar 20, 2020

wojtekmach commented Mar 23, 2020 • edited Loading

garazdawi commented Mar 23, 2020

wojtekmach commented Mar 23, 2020 via email

garazdawi commented Mar 23, 2020

garazdawi commented Mar 24, 2020

josevalim commented Mar 28, 2020

garazdawi commented Mar 28, 2020 • edited Loading

garazdawi commented Apr 3, 2020

garazdawi commented Apr 17, 2020

wojtekmach commented Apr 26, 2020 • edited Loading

garazdawi commented Apr 27, 2020

garazdawi commented Apr 27, 2020

wojtekmach commented Apr 27, 2020

garazdawi commented Apr 27, 2020

josevalim commented Apr 27, 2020

wojtekmach commented Apr 27, 2020

erszcz commented May 6, 2020 • edited Loading

garazdawi commented May 6, 2020

wojtekmach commented May 6, 2020

erszcz commented May 7, 2020

garazdawi commented May 7, 2020 • edited Loading

josevalim commented Nov 13, 2019 •

edited by garazdawi

Loading

wojtekmach commented Jan 20, 2020 •

edited

Loading

josevalim commented Feb 7, 2020 •

edited

Loading

josevalim commented Feb 18, 2020 •

edited

Loading

garazdawi commented Mar 18, 2020 •

edited

Loading

wojtekmach commented Mar 23, 2020 •

edited

Loading

garazdawi commented Mar 28, 2020 •

edited

Loading

wojtekmach commented Apr 26, 2020 •

edited

Loading

erszcz commented May 6, 2020 •

edited

Loading

garazdawi commented May 7, 2020 •

edited

Loading