-
Notifications
You must be signed in to change notification settings - Fork 94
Outline of the model rewrite project
The model needs some fairly invasive changes to get it to a state where it can reliably handle complex libraries such as `Core`. This document outlines some of the changes that are required.
The current implementation of the path types uses GADTs with polymorphic variants for
indices. This allows us to give nice types to some functions (e.g.
'a Reference.Resolved.t -> 'a Identifier
), and avoids boxing when moving between
different kinds of path. However, OCaml is still not great at working with these types
making them very hard to use sometimes. It also obfuscates the underlying constraints
on paths, which somewhat defeats the point of having such precise types.
This should be replaced by just using multiple types for each sort of path. For
example, instead of representing resolved paths by a single type with indices like
[`Module]
and [`Module_type]
, we should have one type for resolved module paths
and another type for resolved module type paths.
Currently we just use strings for all sub-parts of paths and identifiers. This
occasionally allows stupid bugs to get through. Instead we should use separate abstract
types (e.g. ModuleName.t
, TypeName.t
).
Currently, identifiers that point into the body of paths just apply ordinary module projections to the functor itself. Meanwhile identifiers in the parameters of functors use a special argument projection which takes the name and index of the parameter. For example, given:
module F (X : ...) (Y : sig type param end) : sig type res end
We refer to res
using an identifier like Type(Module(root, "F"), "res")
and refer
to param
using an identifier like Type(Argument(Module(root, "F"), "Y", 1), "param")
.
These identifiers turn out to be quite awkward to work with. Instead we should have an
explicit Result
projection that goes down the right of the functor arrow, and a
simple Parameter
projection that goes down the left of the functor arrow.
Currently, the SubstAlias
constructor is used for both module alias substitutions and
functor argument substitutions. However, these two kinds of substitutions should really
be treated differently.
Firstly, a module alias substitution should be performed on the substituted module
itself, whilst a functor argument substitution should only be performed on
projection. For example, if we have SubstAlias(M, N)
(i.e. replace M
with N
) it
should refer to the identifier of N
if its for a module alias, but still refer to M
if it is functor argument. Whereas a path like Type(SubstAlias(M, N)), t)
should
refer to the identifier of N.t
for both module aliases and functor arguments.
Secondly, we don’t want to perform a module alias substitution if the right-hand side of the substitution is hidden, whereas for a functor argument substitution we must perform this substitution regardless. Currently, this is handled by trying to only insert the substitution for module aliases if the right-hand side is not hidden, however hiddenness cannot be reliably determined until the “linking” phase, so this is fragile and unreliable.
Currently we do not try to eliminate hidden paths. Adding an optional replacement string will make it easy to eliminate them in many cases.
Currently the right-hand side of a `Canonical` constructor – which is the canonical path that should be used as long as it can be resolved – is a `Path.t`. Really it should be a `Reference.t` since it comes from a documentation comment.
Currently we don’t provide support for cross-package references (other than to compilation dependencies). Such references should require explicitly naming the package that they refer to. This will require adding a reference syntax for explicitly naming the package.
Note that we’ve already extended the syntax of references to allow disambiguating each part of the path we’re referring too. The new syntax has been discussed in these two issues:
- https://github.com/ocaml-doc/odoc/issues/61#issuecomment-319974223
- https://github.com/ocaml-doc/odoc/issues/90
Hopefully we can keep a consistent syntax to disambiguate the “package” part.
The current implementations of path resolution, reference resolution and expansion are fundamentally flawed. In particular, the components data structure which drives resolution is not fit for purpose and cannot even be used for expansion. Fixing the underlying issues with the components data structure should greatly simplify resolution and allow us to use it for expansion – deleting the current broken, unfinished and extremely inefficient implementation of expansion.
The best approach here is probably to start from scratch – the existing implementation is hard to follow and most of its complexity shouldn’t appear in the new version.
The current implementation of the components data structure is designed so that given:
module M = struct
module type S = sig ... end
module N : S
end
module O : M.S
the same value can be used unchanged for the components of M.S
, M.N
and O
. In
other words, we do not distinguish identifier projection from path projection and the
renaming operation is a no-op.
This turns out to be a terrible design decision. It causes, at least, the following serious issues:
- It dramatically complicates the implementation of functor parameter substitution.
- It means that the components value cannot calculate the resolved path or reference to a definition. Instead we must build the resulting path or reference in parallel with calculating the components value.
- It means that the components value cannot calculate the expansion of the represented module or module type. Instead we are forced to calculate expansions without using components at all.
- It doesn’t allow us to correctly track properties like “hidden” or “canonical”. For
example, if
S
is marked as hidden above thenN
andO
will also be treated as hidden.
Other issues with the current implementation include:
- It keeps separate tables in signatures for the different kinds of reference. This adds a lot of additional code for no real gain.
- It implements module extension and destructive substitution too naively. It does
not handle cases like:
module type A = sig module M : sig module type S end module N : M.S end module B : sig module type S = sig type t end end module C : A with module M = B type t = C.N.t (* This path is not currently resolved *)
Currently, the different stages of path resolution, reference resolution and expansion are run multiple times across both the compiling and linking phases. This should be tidied up so that the different stages happen at the appropriate time.
The current implementation doesn’t have a clear order in which different operations should happen. It mixes together operations which should happen at different phases, and can perform some operations too early leading to incorrect results.
- There is only a single “path and reference resolution” operation, even though forward paths and references should not be resolved until the link phase.
- Module aliases to hidden paths are replaced by their expansion by the expansion operation, even though this replacement must happen during the link phase.
- Paths involving module aliases are represented either with or without
SubstAlias
, depending on whether the aliased path is hidden. This decision is made during path resolution, however the hiddenness of a module may not be correctly available until the link phase.
Currently consumers of the model have to use the representations of paths directly. There are helper functions (e.g. to convert paths to identifiers) but it still requires an understanding of the various book-keeping constructors in the path representations. Instead we would like a more abstract interface to hide these details.