Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Void type #32

Open
eduardoleon opened this issue Mar 17, 2017 · 16 comments
Open

Void type #32

eduardoleon opened this issue Mar 17, 2017 · 16 comments

Comments

@eduardoleon
Copy link

eduardoleon commented Mar 17, 2017

If a type t is void (has no values), any control flow path whose context contains a variable of type t is automatically unreachable. This is useful information for a compiler writer! Sadly, in Standard ML, the only way to construct void types is using recursion:

datatype void = V of void

So a compiler writer has to perform extra work just to establish that a type is void. Why not provide a built-in void type that everyone can tell has no values without doing any work?

signature VOID =
sig
  type void
  val absurd : void -> 'a (* pure, total function *)
end
@RobertHarper
Copy link

RobertHarper commented Mar 17, 2017 via email

@JohnReppy
Copy link
Contributor

How do void types get introduced? Does raise exn have type void?

@eduardoleon
Copy link
Author

raise exn should have type 'a, for 'a a fresh type variable, as usual.

@JohnReppy
Copy link
Contributor

So how do void types get introduced?

@MatthewFluet
Copy link
Contributor

They don't. That's the point -- they are an uninhabited type.

@JohnReppy
Copy link
Contributor

I understand that the type is uninhabited, but there is no point in introducing it if there are no places where it is going to be used. If it is not use for expressions that do not return to their continuation, where does it get used?

@MatthewFluet
Copy link
Contributor

In polymorphic datatypes, to indicate the absence of values.

For example:

datatype ('a, 'b, 'c) t = A of 'a | B of 'b | C of 'c

fun f (x : (void, int, void) t) =
  case x of
     B i => i
   | A v | C v => absurd v

fun g (x : (int, int, int) t) : (void, int, void) t =
  case x of
     A i => B (2 * i)
   | B i => B i
   | C i => B (3 * i)

Now you have the static guarantee that g can't ever return an A or C variant and f can make use of the assumption that it won't be given an A or C variant.

@JohnReppy
Copy link
Contributor

Thanks for the example. Sort of a poor-man's refinement type.

Would it be a common enough pattern that compilers would actually do the analysis to exploit the information?

@RobertHarper
Copy link

RobertHarper commented May 4, 2017 via email

@MatthewFluet
Copy link
Contributor

I think the point of the proposal is that the void type would be provided by the compiler (and exported through the Basis Library); therefore, the compiler wouldn't need to do any analysis --- it "knows" that void is the uninhabited type and absurd is unreachable code. For example, in the f function, the compiler could omit exception handling overhead (e.g., passing a handler continuation).

I agree that a nullary case analysis makes sense, but it would change the grammar.

@RobertHarper
Copy link

RobertHarper commented May 4, 2017 via email

@eduardoleon
Copy link
Author

While, in principle, I agree that void could be treated as a nullary datatype, I think there are concrete benefits to treating it as a hardcoded built-in instead:

  • There would be no need to change the existing syntax and semantics of datatypes.
  • It would be easier for compilers to detect that a specific type is void - it has to be literally equal to void. OTOH, if we allow nullary datatypes, compilers will have to perform a little bit of analysis to determine that a datatype is empty. It's only a little complication, but these complications add up in the long run.

@RobertHarper
Copy link

RobertHarper commented May 4, 2017 via email

@nilern
Copy link

nilern commented Jul 10, 2019

One can also do void without datatypes:

structure Void :> VOID = struct
    type void = unit
    fun absurd () = raise Fail "unreachable"
end

@RobertHarper
Copy link

RobertHarper commented Jul 10, 2019 via email

@YawarRaza7349
Copy link

It's good that @JohnReppy (and @dmacqueen elsewhere) brought up the overlap between void and fresh type variables because it's important to not overuse void, even if there are places where void is needed. For example, @MatthewFluet gave these two examples earlier to motivate void:

datatype ('a, 'b, 'c) t = A of 'a | B of 'b | C of 'c

fun f (x : (void, int, void) t) =
  case x of
     B i => i
   | A v | C v => absurd v

fun g (x : (int, int, int) t) : (void, int, void) t =
  case x of
     A i => B (2 * i)
   | B i => B i
   | C i => B (3 * i)

However, only f is a valid motivation. g should instead have the type (int, int, int) t -> ('a, int, 'b) t (which is indeed its inferred type without an annotation). This allows you to self-compose it directly (e.g. g (g (A 1))), while the original type of g requires "glue" between the self-composes (g (B (f (g (A 1))))).

Similarly, tau -> void is the correct type for when a continuation is a parameter to a function, for example. But when a continuation is returned from a function (and yeah, I know you wouldn't really do that in CPS; it's just an example), it should have type tau -> 'a (which can, of course, be used as a tau -> void simply through automatic type instantiation, while the reverse direction needs absurd as "glue"). Also, the void here isn't because of "the limitations of type inference", but because SML doesn't have rank-2 polymorphism; Haskell has the same type inference limitations, but supports rank-2 polymorphism via an explicit type annotation, so can use the tau -> 'a form as a parameter as well.

I actually don't think the optimization rationale really works as a reason for a built-in void (though a built-in void is still good for other reasons). Thinking about MLton, for a compiler that optimizes as much as it already does, the algorithm to see whether a type is inhabited seems relatively straightforward: start by assuming the types being introduced are uninhabited, then dig through the structure of the type to see if it's inhabited (are any components of a sum inhabited, are all components of a product inhabited, does a function have an inhabited return type or uninhabited parameter type, etc), then propagate (you'd have a visited boolean on each node of the graph to prevent cycles). So MLton should have such a check regardless (and I think it does). MLton aside, the datatype void = Void of void idiom already exists in codebases, so less completionist compilers might still want to check for at least this specific form, for practical reasons.

I don't think it would change the grammar so as to invalidate existing code, would it?

I think the concern is what the concrete syntax looks like. A naïve generalization of the existing syntax to zero cases gets you, for example, case x of. That's it, no branches; just case x of. Doesn't exactly roll off the tongue. The other places look even worse. datatype x =? Even if its ugliness doesn't deter you, consider:

datatype x =
datatype y = Y

Which the compiler would first try to parse as:

datatype x = datatype y
= Y

I don't know much about parsing, but I doubt SML currently requires this level of backtracking...

Despite this, void should still be implemented as a nilary data type for one important reason: pattern matching makes it so you don't even need to write absurd, those branches just don't need to be written at all. This is especially useful in types where void is nested within another type. Imagine a type like (int * void * void, void * int * void, void * void * int) t; using absurd is a pain and a mess, whereas nilary pattern matching needs no branches at all. (Though I suppose you never actually need to use absurd, since raise does the same thing, as mentioned above.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants