Skip to content
This repository has been archived by the owner on Feb 19, 2021. It is now read-only.

Translate Erlang/OTP XML into EEP 48 docs chunk #3

Open
14 of 19 tasks
josevalim opened this issue Nov 13, 2019 · 72 comments
Open
14 of 19 tasks

Translate Erlang/OTP XML into EEP 48 docs chunk #3

josevalim opened this issue Nov 13, 2019 · 72 comments

Comments

@josevalim
Copy link
Collaborator

josevalim commented Nov 13, 2019

This is a thread to track the progress and any pending tasks that may arise during this effort.

The current idea is that Erlang/OTP XML will be converted to an HTML tree data-structure. This is better than storing HTML as text as we don't have to parse it again.

The tree will have the following format:

Ast :: [Node]
Node :: binary() | {Tag, Attributes, Ast} | {Tag, Ast}
Tag :: binary()
Attributes :: [{binary(), binary()}]

Note: We may remove {Tag, Ast} as it can be represented as {Tag, [], Ast}. The question is if the size will increase considerably or not by doing so.

Note: Currently, EEP 48 requires documentation to be stored as binaries. This means that the HTML tree would have to be converted as term_to_binary and then the whole chunk is converted to term_to_binary again. This is most likely OK, converting a binary with term_to_binary is a fast operation with minimum space increase, but worth noting. The mime type should be something "application/erlang-abstract-format+html".

You can track the current progress here.

TODO

  • Merge multipe for the same arity.
  • Have a look at specs and types to see if we can do something better with them
  • Check what has/can be done to ease lsp integration.
  • Skip wild HTML elements when rendering
  • Fix all links to work properly. This will include changes to the source xml as right now we cannot see any difference between a link to a User's Guide and to a Reference Manual.
  • Remove nested <p>
  • Make all attributes binaries
  • Add pagianation for h/1 and friends, see c:i/0 for details
  • Add link type checker
  • Move <section> in the refman xml to only be allowed inside <description>
  • Add callbacks to the chunks
  • Fix duplicate function documentation when doing h(ttb,tpl)
  • Add deprecation attributes to the meta-data for functions
  • Fix , and . alone on new-line when after a tagged string.
  • Document the application/erlang+html format and the EEP-48 format.
  • Expand the shell_docs:validate/1 function to follow a stricter dtd.
  • Add proper tests for shell_docs:render/* by using the dtd.

Don't do

  • Forbid multiple <funcs> tags in the source xml
  • Wrap all <li> contents with a <p> (don't think I will do this as it will change the meaning of things)
@wojtekmach
Copy link

I'd like to propose changing:

Attributes :: [{binary(), binary()}]

to:

Attributes :: #{binary() => binary()}

I think this would be more friendly to tooling as we we could easily pattern match on subset of attributes. I believe the order of the attributes shouldn't matter and duplicates shouldn't be allowed (e.g. xmerl does not allow duplicates)

@KennethL
Copy link

@wojtekmach yes I agree that a map would be more efficient for tooling here and I don't think it will increase the size of the doc chunk.

@marianoguerra
Copy link

marianoguerra commented Nov 15, 2019 via email

@marianoguerra
Copy link

marianoguerra commented Nov 15, 2019 via email

@KennethL
Copy link

I have been thinking about the format of the doc chunk for Erlang for some time and during that I have also written a program which translate the OTP XML into the doc chunk.
As stated before I propose that the doc chunk have a structured term format and not a binary as stated in the EEP 48. Of course it is possible to convert it to a binary with term_to_binary but I don't see the point in not allowing the term as is. The format is anyway tagged with something like "application/erlang-doc-format+html".
The tricky part is to decide the exact format and I think it must be an iterative process during the development of the translator and the consumers of the format.
I see the consumers as:

  • a rendering function for producing a text printout in the shell (help M F A)
  • Exdoc generating html and epub

@josevalim wrote
The tree will have the following format:

Ast :: [Node]
Node :: binary() | {Tag, Attributes, Ast} | {Tag, Ast}
Tag :: binary()
Attributes :: [{binary(), binary()}]

But during my work and the comment from @wojtekmach I think I would like to do a minor change to this:

Ast :: [Node]
Node :: binary() | {Tag, Attributes, Ast} | {Tag, Ast}
Tag :: atom()
Attributes :: #{atom() => term()}]

I don't really see a problem with having the Tag and the key for attributes as an atom() vs. a binary() but this is a detail. The more important part is what tags there are. I have worked with the idea of having the well known html names when possible.
A complete example will come soon.

My idea is that we extend the OTP build with producing the doc chunks in addition to the current html, man and pdf so that anyone can build OTP with doc chunks (optional). That version should also contain API functions for fetching the doc chunks and presenting them in the shell. I hope we can have this already in OTP 23 (May) at least as an experimental feature as all the formats and API's might not be 100% settled

@erszcz
Copy link
Collaborator

erszcz commented Jan 17, 2020

@KennethL That's great news!

Are pull requests to EDoc based on the work done so far welcome?

@josevalim
Copy link
Collaborator Author

As stated before I propose that the doc chunk have a structured term format and not a binary as stated in the EEP 48.

Agreed. I will send a PR to update EEP 48. I propose, however, that the format must then end with +erlang, so it is relatively easy for tools to detect they have an Erlang term in there. Is that ok?

I don't really see a problem with having the Tag and the key for attributes as an atom() vs. a binary() but this is a detail.

Thanks for the updates! Our current plan is to change ExDoc to work on HTML ASTs. So for Elixir, ExDoc will convert the Markdown to a HTML tree and then traverse it. For Erlang, we will work directly with the tree stored in the chunk. It may be that the tools (markdown processor and Erlang) do not agree on the format of the tree, so we will need some pre-processing. Furthermore, we want the simplest possible tree while processing, binary() | {Tag, Attributes, Ast}, but in the interest of storage, having a {Tag, Ast} is nice, especially if Attributes is a map instead of a list.

Using a map instead of list for attributes also means we lose ordering. Once again, this is probably not a problem for Erlang, but a Markdown processor may want to keep the user ordering, and therefore they are forced to use a list.

In other words, I think it is fine to go with your proposal. Realistically speaking, it is most likely that ExDoc will have to normalize those trees. Even if they have the same AST type, I wouldn't be surprised if certain representation are sometimes different.

My idea is that we extend the OTP build with producing the doc chunks in addition to the current html, man and pdf so that anyone can build OTP with doc chunks (optional).

Should we make the building of docs chunk enabled by default? I think this would be the simplest to have the docs accessible to everyone.

@KennethL
Copy link

@KennethL That's great news!

Are pull requests to EDoc based on the work done so far welcome?

Yes puul request for Edoc are welcome, but try to make them compatible so that current usage is not broken, I assume even the not so nice layout it produces today will have to be kept for a while. But for the output there are support for plugins which can produce whatever format. These plugins don't even need to be part of Edoc, but in this case I think we want it to be part of Edoc.

@KennethL
Copy link

As stated before I propose that the doc chunk have a structured term format and not a binary as stated in the EEP 48.

Agreed. I will send a PR to update EEP 48. I propose, however, that the format must then end with +erlang, so it is relatively easy for tools to detect they have an Erlang term in there. Is that ok?

Thinking again about if format (in EEP-48) should be a binary or a structured term, a binary with the term in external format might have its advantages when using online help where the doc for only one function at a time is requested. It would then be less computation to fetch the whole chunk and then convert only the doc for one function with binary_to_term. What do you think?

I don't really see a problem with having the Tag and the key for attributes as an atom() vs. a binary() but this is a detail.

Thanks for the updates! Our current plan is to change ExDoc to work on HTML ASTs. So for Elixir, ExDoc will convert the Markdown to a HTML tree and then traverse it. For Erlang, we will work directly with the tree stored in the chunk. It may be that the tools (markdown processor and Erlang) do not agree on the format of the tree, so we will need some pre-processing. Furthermore, we want the simplest possible tree while processing, binary() | {Tag, Attributes, Ast}, but in the interest of storage, having a {Tag, Ast} is nice, especially if Attributes is a map instead of a list.

Using a map instead of list for attributes also means we lose ordering. Once again, this is probably not a problem for Erlang, but a Markdown processor may want to keep the user ordering, and therefore they are forced to use a list.

Having a map for the attributes instead of a list is not super important for me. If the order sometimes is important we can keep it as a list.

In other words, I think it is fine to go with your proposal. Realistically speaking, it is most likely that ExDoc will have to normalize those trees. Even if they have the same AST type, I wouldn't be surprised if certain representation are sometimes different.

My idea is that we extend the OTP build with producing the doc chunks in addition to the current html, man and pdf so that anyone can build OTP with doc chunks (optional).

Should we make the building of docs chunk enabled by default? I think this would be the simplest to have the docs accessible to everyone.

Yes I think building docs shunks could be default, but we have to decide if the chunk goes into the .beam file or if it goes into a separate file per module. If it goes into the .beam file I think we have to integrate is more with the ´erlc´ command and the compiler or of course it could be an extra separate pass where docs chunks are added to the .beam files. I am not sure what I think yet, we have to discuss it more in the OTP team (will probably happen this week). Do you have any input on that?

@wojtekmach
Copy link

wojtekmach commented Jan 20, 2020

Having a map for the attributes instead of a list is not super important for me. If the order sometimes is important we can keep it as a list.

I was the one proposing the attributes map but I agree that sticking with a list might be better after all.

Yes I think building docs chunks could be default, but we have to decide if the chunk goes into the .beam file or if it goes into a separate file per module.

My understanding is for modules that are part of erts there's no beam file so in order to eventually do: erl> h(erlang)., erl> h(atomics). etc we'd need the chunk somewhere else, e.g. in doc/chunks/erlang.chunk per EEP 48.

Perhaps OTP stores erts docs in .chunk but other apps docs in .beam?

If it goes into the .beam file I think we have to integrate is more with the ´erlc´

I think this is the most appealing option: when working in rebar3/mix/etc projects when using dependencies they would be compiled with docs with no extra work (maybe just a compiler flag to enable chunks generation) and thus the documentation would be immediately available. Otherwise, if it's part of edoc, then we'd need to run edoc for all deps (which could of course be automated somehow).

@josevalim
Copy link
Collaborator Author

Thinking again about if format (in EEP-48) should be a binary or a structured term, a binary with the term in external format might have its advantages when using online help where the doc for only one function at a time is requested. It would then be less computation to fetch the whole chunk and then convert only the doc for one function with binary_to_term. What do you think?

I think both options are ok. Having it as a binary is definitely more consistent and it doesn't require changing EEP-48. If it has actual performance benefits in some situations, then even better!

If it goes into the .beam file I think we have to integrate is more with the ´erlc´

My initial question is which docs would be integrated into erlc. We could integrate edoc, but it would require extra-passes, and we are not sure we want to stick with edoc anyway. erl_docgen is mostly specific to Erlang/OTP so maybe it is not worth integrating it either. So maybe it is best to stick with separate .chunk files for now, due to the reasons given by Wojtek, and we can worry about tightly integrating with erlc in the future?

@erszcz
Copy link
Collaborator

erszcz commented Jan 20, 2020

maybe it is best to stick with separate .chunk files for now, due to the reasons given by Wojtek, and we can worry about tightly integrating with erlc in the future?

I was also going to suggest that a separate pass might make sense when the feature is still experimental, given the pass is different for OTP and non-OTP modules. For building project releases that include OTP libs some care has to be taken not to unintentionally skip the separate .chunk files.

@garazdawi
Copy link

Hello,

For the last week or so I've been working on making this a reality. You can track my progress here: https://github.com/garazdawi/otp/tree/lukas/kernel/code-chunk-lookup

So far I've implemented help for all functions and types in all of Erlang/OTP.

Some notes:

  • The documentation is stored in .chunk files for all modules.
  • It is generated when the documentation is built, not when the modules are compiled. So to build you need to do make && make docs
  • There is a function called code:get_doc/1 that fetches the documentation from either a beam chunk or the .chunk file.

Here is an example of what is in the chunk:

22> rp(shell_docs:get_doc(erlang,timestamp,0)).
[{{function,timestamp,0},
  0,
  [<<"timestamp/0">>],
  #{<<"en">> =>
        [{p,[],[<<"Current Erlang System time.">>]},
         {type,[],[{v,[{name,"timestamp"}],[]}]},
         {p,[],
            [<<"Returns current ">>,<<"Erlang system time">>,
             <<" on the format ">>,
             {c,[],[<<"{MegaSecs, Secs, MicroSecs}">>]},
             <<". This format is the same as ">>,<<>>,
             {c,[],[<<"os:timestamp/0">>]},
             <<" and the deprecated ">>,<<>>,
             {c,[],[<<"erlang:now/0">>]},
             <<" use. The reason for the existence of ">>,
             {c,[],[<<"erlang:timestamp()">>]},
             <<" is purely to simplify use for existing code that assumes this time stamp format. Current Erlang system time can more efficiently be retrieved in the time unit of your choice using ">>,
             <<>>,
             {c,[],[<<"erlang:system_time/1">>]},
             <<".">>]},
         {p,[],
            [<<"The ">>,
             {c,[],[<<"erlang:timestamp()">>]},
             <<" BIF is equivalent to:">>]},
         {code,
             [{type,"none"}],
             [<<"timestamp() ->\n    ErlangSystemTime = erlang:system_time(microsecond),\n    MegaSecs = ErlangSystemTime div 1000000000000,\n    Secs = ErlangSystemTime div 1000000 - MegaSecs*1000000,\n    MicroSecs = ErlangSystemTime rem 1000000,\n    {MegaSecs, Secs, MicroSecs}.">>]},
         {p,[],
            [<<"It, however, uses a native implementation that does not build garbage on the heap and with slightly better performance.">>]},
         {note,[],
             [<<>>,
              {p,[],
                 [<<"This time is ">>,
                  {em,[],[<<"not">>]},
                  <<" a monotonically increasing time in the general case. For more information, see the documentation of ">>,
                  <<"time warp modes">>,<<" in the User's Guide.">>]},
              <<>>]}]},
  #{spec =>
        [{attribute,1549,spec,
             {{erlang,timestamp,0},
              [{type,1549,bounded_fun,
                   [{type,1549,'fun',
                        [{type,1549,product,[]},{var,1549,'Timestamp'}]},
                    [{type,1550,constraint,
                         [{atom,1550,is_subtype},
                          [{var,1550,'Timestamp'},
                           {user_type,1550,timestamp,[]}]]}]]}]}}]}}]

And here it is when rendered:

20> h(erlang,timestamp,0).

-spec erlang:timestamp() -> Timestamp when Timestamp :: timestamp().


Current Erlang System time.

Types:
  -type timestamp() ::
          {MegaSecs :: non_neg_integer(),
           Secs :: non_neg_integer(),
           MicroSecs :: non_neg_integer()}.


Returns current Erlang system time on the format {MegaSecs, Secs, 
MicroSecs}. This format is the same as os:timestamp/0 and the 
deprecated erlang:now/0 use. The reason for the existence of 
erlang:timestamp() is purely to simplify use for existing code that 
assumes this time stamp format. Current Erlang system time can more 
efficiently be retrieved in the time unit of your choice using 
erlang:system_time/1.

The erlang:timestamp() BIF is equivalent to:

  timestamp() ->
      ErlangSystemTime = erlang:system_time(microsecond),
      MegaSecs = ErlangSystemTime div 1000000000000,
      Secs = ErlangSystemTime div 1000000 - MegaSecs*1000000,
      MicroSecs = ErlangSystemTime rem 1000000,
      {MegaSecs, Secs, MicroSecs}.

It, however, uses a native implementation that does not build garbage 
on the heap and with slightly better performance.

Note:
  This time is not a monotonically increasing time in the general 
  case. For more information, see the documentation of time warp modes 
  in the User's Guide.

Remaining issues:

  • The rendering is not perfect yet, most significantly there are whitespaces that should not be there that are emitted.
  • At the moment there are duplicate entries in the docs list for some functions. For instance erlang:system_info/1 has 8 entries, one for each <func> tag in the source xml. I don't have any good idea about how to store this type of documentation in the EEP-48 chunks. The best I can some up with is to modify {{Kind,Name,Arity},Anno,Sig,Doc,Meta} to {{Kind,Name,Arity},[{Anno,Sig,Doc,Meta}].
  • Since is not implemented because I don't know what to do about functions like this where separate clauses have different sinces. Same for deprecations...
  • Links and cross-references are not really done as the shell rendered does not need them.
  • Some of the tags need to be converted to better/other names. You can view the current names here: https://github.com/garazdawi/otp/blob/ae7c40923550d4021e31c70e6fa0e32f9b3bd64b/lib/stdlib/src/shell_docs.erl#L35. Specifically v and d, but maybe also note, warning and others.
  • Add a version to the mimetype so that we know what we are working with when rendering.

@josevalim
Copy link
Collaborator Author

@garazdawi fantastic!

The best I can some up with is to modify {{Kind,Name,Arity},Anno,Sig,Doc,Meta} to {{Kind,Name,Arity},[{Anno,Sig,Doc,Meta}].

I agree. We may need to bump the version of the format.

Some of the tags need to be converted to better/other names

You probably need to decide if you are going to emit HTML or a HTML-like tree. Have you had any thoughts about this?

@garazdawi
Copy link

You probably need to decide if you are going to emit HTML or a HTML-like tree. Have you had any thoughts about this?

I think it will have to be HTML-like. Some of the things I do now could be translated. Not sure how easy it would be to do with the <type> tag though... maybe it is worth an attempt to do and the just add some extra attributes to the HTML tags for the renderer to use.

What I'm trying to avoid is for different renderers to have to do very complex parsing loops in order to figure out what to print.

@josevalim
Copy link
Collaborator Author

@garazdawi is that the tag that processes the spec metadata? So it seems it can be added anywhere in the text? Would you a lot if it always appears at the top, before any of the paragraphs? I assume there was a reason why it was originally done as so.

@garazdawi
Copy link

@garazdawi is that the tag that processes the spec metadata?

No. It is the tag that either described what a type the function uses looks like (e.g. https://github.com/erlang/otp/blob/master/lib/asn1/doc/src/asn1ct.xml#L78-L89) or the name of the type if it is part of the code (e.g. https://github.com/erlang/otp/blob/master/lib/stdlib/doc/src/calendar.xml#L135-L138).

The spec metadata is rendered instead of the signature when present. I decided to not put the rendered version of the spec in the signature as the spec format is more flexible and easier to create links etc from.

@garazdawi
Copy link

Would you a lot if it always appears at the top, before any of the paragraphs? I assume there was a reason why it was originally done as so.

It always appears at the top at the moment. Or rather just after the <fsummary>, which is translated to the first <p> in each function. Thinking some more about it maybe I could translate the types into an <ul> at the top.

@josevalim
Copy link
Collaborator Author

josevalim commented Feb 7, 2020

The spec metadata is rendered instead of the signature when present. I decided to not put the rendered version of the spec in the signature as the spec format is more flexible and easier to create links etc from.

And to clarify, the spec follows the same Erlang AST format, right? We should probably document this in EEP 48 too.

Thinking some more about it maybe I could translate the types into an

    at the top.

Sounds like a good call to me. <ul> with each entry using the proper code quotation.

@garazdawi
Copy link

And to clarify, the spec follows the same Erlang AST format, right?

Yes.

@erszcz
Copy link
Collaborator

erszcz commented Feb 10, 2020

The best I can some up with is to modify {{Kind,Name,Arity},Anno,Sig,Doc,Meta} to {{Kind,Name,Arity},[{Anno,Sig,Doc,Meta}].

I agree. We may need to bump the version of the format.

Just an idea looking at the problem from a different angle - since functions in Erlang are actually uniquely identified by their M:F/A this should not be a frequent problem. Maybe the few places where this occurs could be rewritten to fit into the current {{Kind,Name,Arity},Anno,Sig,Doc,Meta} format? I'm not sure if that's feasible, so please treat this just as food for thought.

@garazdawi
Copy link

There are 58 places (unless my grep skills fail me) in the docs where this is feature is used. So it could be done. However, in my opinion, the (HTML) docs become easier to navigate by using the method and I don't really see an alternative notation that would work as well.

I'll leave that part to rest for a while and focus on other parts of the doc chunks first and then come back to it. Maybe we'll have an epiphany somewhere along the way.

@garazdawi
Copy link

Pushed an update where types are like this:

{ul,[{class,"types"}],[{li,[{name,"timestamp"}],[]}]},

for types that are derived from the metadata. And like this:

{ul,[{class,"types"}],
     [{li,[{class,"type"}],[<<"MatchDesc = [MatchNum]">>]},
      {li,[{class,"type"}],
          [<<"MatchNum = {matched, node(), integer()} | {matched, node(), 0, RPCError}">>]},
      {li,[{class,"type"}],[<<"RPCError = term()">>]}]},

for inline type desciptions.

Also I changed note,warning,doi,dont to be a p with a class. e.g.

{p,[{class,"note"}],
    [<<>>,
     {p,[],
         [<<"This time is ">>,
          {em,[],[<<"not">>]},
          <<" a monotonically increasing time in the general case. For more information, see the documentation of ">>,
          <<"time warp modes">>,<<" in the User's Guide.">>]},
     <<>>]}]}

That leaves type_desc as the only non-html tag.

@garazdawi
Copy link

I plan to open a PR for my changes tomorrow. There are still some issues that are unresolved, but they may have to be fixed later as OTP 23-rc1 is set to be released soon and I want this to be part of that.

@josevalim
Copy link
Collaborator Author

Fantastic! If there is anything I can do or if you want to jump on a call to discuss the unresolved issues, please let me know!

@garazdawi
Copy link

The main unresolved issue is still what to do with the multiple function definitions. I'm starting to lean towards actually just re-writing the docs to not use that feature any more...

One thing that is left is to use a second renderer from the chunk sources. Right now I've only tested with my own shell_docs renderer, but as far as I understand there is a vision to also use LSP and ExDoc to generate docs from these chunks. It would be great to know if there is anything that needs to be adapted in the format in order to make that easier.

@josevalim
Copy link
Collaborator Author

josevalim commented Feb 18, 2020

The main unresolved issue is still what to do with the multiple function definitions. I'm starting to lean towards actually just re-writing the docs to not use that feature any more...

My current thinking is that EEP 48 will become more robust if we change it to support multiple entries. For Erlang and Elixir it boils down to an implementation detail but I can imagine a language in the future that may want to document each of them individually, especially statically typed languages. In the worst case scenario, if we don't want to use the feature, it just ends up being a one item list. :)

But if you think it is best to save this fight for later when the need arises, then that's ok by me too!

Once the PR is available, we will start looking into giving it a try on ExDoc and give you some feedback.

@KennethL
Copy link

KennethL commented Feb 18, 2020 via email

@garazdawi
Copy link

garazdawi commented Mar 18, 2020

No. Even if it is named "type-" it is not actually a documented type, it is just a link to a "<a name>" marker.

@garazdawi
Copy link

hmm, on second thought... maybe it should be... need to dig more.

@garazdawi
Copy link

So, type links are apparently a mess... I will try to make something better...

@essen
Copy link

essen commented Mar 20, 2020

I seem to remember having read somewhere a suggestion of how links should be represented in erlang+html, but can't find it now.

My current implementation has it like this:

<seemfa marker="stdlib:lists#foldl/3">lists:foldl/3</seemfa>
  becomes
{a,[{href,"stdlib:lists#foldl/3"},
    {rel,"https://erlang.org/doc/link/seemfa"}],
   [<<"lists:foldl/3">>]}]}

If anyone has any better suggestions I'm all ears.

Probably erlang/otp#2545 (comment)

Your links look good now!

@garazdawi
Copy link

@wojtekmach in my branch I've now made it so that has to point to a documented type that is accessible through a doc chunk. I've also written a validator that makes sure that points to documented functions and points to documented types. Next is to extend it to also check that <see*> points to actual markers in the docs and not to generated header markers. Maybe I'll have to introduce a <seetitle...>.

Now I have to focus on the OTP-23 rc2 though as it is about to be released.

@wojtekmach
Copy link

wojtekmach commented Mar 23, 2020

@garazdawi Great! If you'd like me to test your changes, which branch is it? But happy to wait for RCs to test them out, so no pressure.

@garazdawi
Copy link

There is a link to the latest changes in the top post of this issue.

@wojtekmach
Copy link

wojtekmach commented Mar 23, 2020 via email

@garazdawi
Copy link

erlang/otp#2573 opened a pr with the link changes.

@garazdawi
Copy link

Yes, I'm aware of this. My idea at the moment was to mark different function signatures with equal docs to be duplicate entries somehow in the metadata.

So we have two options:

  1. Repeat the docs and use metadata to exclude. This makes the chunk larger. When doing h(gen_server, call) and tools like ExDoc now need to filter out duplicate content based on the metadata. Looking up by h(gen_server, call, 2) is simple though.
  2. Annotate that one function represents other arities, skipping the doc for other arities. The chunk is smaller. h(gen_server, call) just works, h(gen_server, call, 2) though is slightly more complicated.

We have 2 in Elixir. What I have implemented for 2 is that when looking up for gen_server:call/2, we first lookup for gen_server:call, and then get the first entry with equal arity or with the represented extra_arities/default in the metadata.

Once we pick an approach, let's document it in the EEP 48 too. :)

So, what I did for this is that I introduced a metadata tag för each doc entry that is called equiv. If it is set, then this function should use the documentation from the entry it is pointing to, but it should still use the local signature. Example:

98> shell_docs:get_doc(ssl,transport_accept,2).
[{{function,transport_accept,2},
  [{file,"ssl.erl"},{location,597}],
  [<<"transport_accept/2">>],
  #{},
  #{edit_url =>
        <<"https://github.com/erlang/otp/edit/maint/lib/ssl/doc/src/ssl.xml#L1808">>,
    equiv => {function,transport_accept,1},
    signature =>
        [{attribute,597,spec,
                    {{transport_accept,2},
                     [{type,597,bounded_fun,...}]}}]}].

@josevalim
Copy link
Collaborator Author

Hi @garazdawi! I have just given RC2 a try and all of my previous comments have been addressed. Thank you for that! I have just two tiny remarks left:

  1. I don't see the equiv metadata on RC2, I assume I didn't make the cut? :)

  2. Do we need to include the signature in metadata when they are just typespecs? Almost all tools already read the typespecs if necessary, so we can skip them here if you want to.

Have a good weekend!

@garazdawi
Copy link

garazdawi commented Mar 28, 2020

  1. no, it did not make it. It will be part of rc3, and it should make it into master next week.
  2. I thought about that when doing it the first time... but decided against it for reason I can't remember right now.

Edit: just remembered. I did it because I didn't want to deal with the beam file and the doc chunk becoming out of sync.

@garazdawi
Copy link

Opened a new PR with more changes: erlang/otp#2583

@garazdawi
Copy link

Changes merged to master.

@wojtekmach
Copy link

wojtekmach commented Apr 26, 2020

I noticed a problem with some types in the stdlib app, they the have Doc part of the chunk as #{}. It should either be #{<<"en">> => ...} or an atom none or hidden.

erl> {ok, {docs_v1, _, _, _, _, _, Docs}} = code:get_doc(array),
  [{KindNameArity, Doc} || {KindNameArity, _, _, Doc, _} <- Docs, Doc == #{}].
[{{type,array,0},#{}},
 {{type,array_indx,0},#{}},
 {{type,array_opts,0},#{}},
 {{type,array_opt,0},#{}},
 {{type,indx_pairs,1},#{}},
 {{type,indx_pair,1},#{}}]

the remaining chunks look good. I'm running OTP 23.0-rc3.

@garazdawi
Copy link

So that is when a type is documented, but there is no text describing it. i.e. http://erlang.org/doc/man/array.html#type-array_indx

I think the correct thing to do is to put #{ <<"en">> => [] }. What do you think?

@garazdawi
Copy link

Actually, why can't it be an empty map which would mean the same as #{ <<"en">> => [] } but for all languages, not just en?

@wojtekmach
Copy link

When a thing should show up in docs but doesnt have docs text, thats exactly what the atom none was supposed to mean. Otherwise, there is no use case for it and shouldnt be in eep48. I don’t remember if there was a reason to pick none over an empty map.

@garazdawi
Copy link

In the OTP implementation of EEP-48 I make a distinction between "the user has not yet written any docs for this" and "the user has decided that there should be no docs for this, but it is still a public type". For the first I emit none and for the latter I emit and empty map.

When none is given, I print "there is no documentation for this type". When #{} is given I don't print anything, which is the same as if #{ <<"en">> -> <<>> } has been given.

I'm not against changing this so that I always emit none and print that string, I just wanted to explain why I differentiate between the two.

@josevalim
Copy link
Collaborator Author

In Elixir, we are doing the following:

  • :nonemeans "no docs written (yet) but the function is public"
  • :hidden means "docs do not exist because function should not be invoked externally"

So our usage of :none is consistent. In Elixir, we don't have a public API for saying "i want these docs to be empty", you just literally write an empty string, which means the empty string will be there in the chunk but I can see the empty map being potentially a better API for this, if your documentation system allows absence to be expressed in such ways.

Therefore I propose to not change Erlang/OTP. @garazdawi @wojtekmach do you think this makes sense?

@wojtekmach
Copy link

I'm fine with keeping things as is, just a note that we will have to update our tools:

iex(2)> t :array.array
@type array() :: array(term())

** (FunctionClauseError) no function clause matching in IEx.Introspection.translate_doc/1

(same for ExDoc)

@erszcz
Copy link
Collaborator

erszcz commented May 6, 2020

When running ExDoc on EDoc generated chunks I'm getting quite a few warnings like this (also about forms/0:

warning: documentation references type :erl_syntax.syntaxTree/0 but it is undefined or private
  src/edoc.erl:0: t:edoc.syntaxTree/0

Initially I was thinking it's an ExDoc/EDoc integration issue, but it doesn't seem to be the case. On OTP 23 rc3:

> ht(erl_syntax, syntaxTree, 0).
{error,type_missing}

Yet, at the same time I can see the relevant -export_type attribute in erl_syntax.erl.

Moreover, the types are listed in the types metadata entry:

> {ok, Doc} = code:get_doc(erl_syntax), maps:get(types, element(6, Doc)).
...
  {syntaxTree,0} =>
      {attribute,452,type,
                 {syntaxTree,{type,452,union,
                                   [{user_type,452,tree,[]},
                                    {user_type,452,wrapper,[]},
                                    {user_type,452,erl_parse,[]}]},
                             []}},
...

But not in #docs_v1.docs - actually, there are no types at all there:

13> {ok, D} = code:get_doc(erl_syntax), lists:usort( [element(1, element(1, Doc)) || Doc <- D#docs_v1.docs] ).
[function]

@garazdawi
Copy link

The documentation for erl_syntax is written using Edoc which means that erl_docgen first has to convert it to the internal xml format. In that translation some information is lost, one such thing is which types are do be documented for a module.

So any module documented by Edoc in the Erlang/OTP source tree will (for now) not have any type documentation.

@wojtekmach
Copy link

but I can see the empty map being potentially a better API for this (...) Therefore I propose to not change Erlang/OTP

Just a quick note about that, I believe we need to update EEP 48 too since it currently has:

Doc :: #{DocLanguage := DocString} | none | hidden,

@erszcz
Copy link
Collaborator

erszcz commented May 7, 2020

I've spotted two small things. The first one is formatting with ht / hcb:

4> ht(proplists).
   proplists

These types are documented in this module:

  -type property() :: atom() | tuple().

  -type proplist() :: [property()].
ok
5> hcb(proplists).
   proplists

There are no callbacks in this module
ok
6>

There's no leading empty line before the module name.

The second one is about the signature fields. For functions it's Name/Arity. For types, though, it seems to be a synthesized signature like -type Name(Arg1, ..., ArgN) :: term(). - i.e. the definition is always term(). For example:

{{type,indx_pairs,1},
 [{file,"array.erl"},{location,181}],
 [<<"-type indx_pairs(Arg1) :: term().">>],
 #{},
 #{signature =>
       [{attribute,181,type,
                   {indx_pairs,{type,181,list,
                                     [{user_type,181,indx_pair,[{var,181,'Type'}]}]},
                               [{var,181,'Type'}]}}]}},
{{type,indx_pair,1},
 [{file,"array.erl"},{location,180}],
 [<<"-type indx_pair(Arg1) :: term().">>],
 #{},
 #{signature =>
       [{attribute,180,type,
                   {indx_pair,{type,180,tuple,
                                    [{ann_type,180,
                                               [{var,180,'Index'},{user_type,180,array_indx,[]}]},
                                     {var,180,'Type'}]},
                              [{var,180,'Type'}]}}]}}]}}

For indx_pairs/1 the pretty-printed type would be:

4> IndxPairs = {attribute,181,type,
4>                                  {indx_pairs,{type,181,list,
4>                                                    [{user_type,181,indx_pair,[{var,181,'Type'}]}]},
4>                                              [{var,181,'Type'}]}}.
{attribute,181,type,
           {indx_pairs,{type,181,list,
                             [{user_type,181,indx_pair,[{var,181,'Type'}]}]},
                       [{var,181,'Type'}]}}
6> iolist_to_binary(erl_pp:form(IndxPairs)).
<<"-type indx_pairs(Type) :: [indx_pair(Type)].\n">>

I'm proposing to standardise the signature for types to the following:

  • a pretty-printed signature corresponding 1-to-1 to the spec metadata for tools that don't understand the spec
  • a fallback Name/Arity if pretty printing when creating the chunk is not possible
  • but not a synthetic pretty-printed spec which does not correspond to the spec metadata

Tools, on the other hand, should prefer the use of the spec metadata if they can and it's present (as shell_docs does), while use signature in other cases. What do you think?

@garazdawi
Copy link

garazdawi commented May 7, 2020

I'm for option #​2 as it would look the most as what the functions look like. It's not pretty, but the user should never have to see it anyways.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants