Skip to content
Adrian Wong edited this page Jul 31, 2020 · 3 revisions

Till now, we have assumed that variables may be instantiated or not, but there is more to it.

Instantiatedness trees

An "instantiatedness tree" has two types of nodes, and approximates the instantiation state of a term if, for every node:

  • If a node in the instantiatedness tree is "free" then the corresponding node in the term tree is a free variable that does not share with any other variable.

  • If a node in the instantiatedness tree is "bound" then the corresponding node in the term tree is a function symbol, possibly one of several alternatives. The instantiatedness tree does not tell us which function symbol the term is bound to. Instead, for each possible function symbol it tells us exactly which arguments of the function symbol will be free and which will be bound. The same principle applies recursively to these bound arguments.

The mode system provides free and ground as names for instantiatedness trees all of whose nodes are free and bound respectively (with exceptions for solver type values, which we will not get into).

Examples of instantiatedness trees, or "insts", in concrete Mercury syntax:

  • ground approximates 1, "foo", [a, b, c] and any other non-unique term without free variables

  • bound(1) approximates the term 1

  • bound(1 ; 2 ; 3) approximates the terms 1, 2 and 3

  • bound(no ; yes(free)) approximates no and yes(V) where V is a free variable

The last example shows that insts were designed to be capable of expressing partially instantiated data structures (data structures with holes). However, partial instantiation is not supported by the current Mercury compiler.

Naming insts

You can declare names for instantiatedness trees:

:- inst dog_or_cat == bound(dog ; cat).

:- inst list == bound([] ; [ground | list]).

A more readable syntax exists for bound instantiatedness trees. The list example could also be written as:

:- inst list
    --->    []
    ;       [ground | list].

This mirrors the type definition so you can see the corresponding nodes in both trees easily:

:- type list(T)
    --->    []
    ;       [T | list(T)].

The inst list is not useful because is equivalent to ground. We can omit function symbols to make an inst more precise, restricting the set of terms that it approximates. A non-empty list is not bound to [] at the top-level so an inst for non-empty lists is:

:- inst non_empty_list
    --->    [ground | ground].

Inst definitions may be parametric. If we have:

:- type pet
    --->    dog
    ;       cat
    ;       rock.

:- inst dog_or_cat
    --->    dog
    ;       cat.

:- inst list(Inst)
    --->    []
    ;       [Inst | list(Inst)].

then a list with inst list(dog_or_cat) can contain dogs and cats, but not rocks.

Mode mappings

A mode mapping is a mapping from an initial instantiatedness tree to a final instantiatedness tree. We have used a couple of mode mappings already without explaining them as such. The in and out modes are defined in the builtin module:

:- mode in == ground >> ground.
:- mode out == free >> ground.

On the left and right hand sides of the >> symbol are the initial and final insts. The mode in says that the term will be approximated by ground before the goal and after the goal. The mode out says that the term must be approximated by free before the goal, but ground after the goal.

Mode definitions may also be parametric. The builtin module defines these modes:

:- mode in(Inst)  == Inst >> Inst.
:- mode out(Inst) == free >> Inst.

You can define your own mode mappings, though you usually won't need to.

Tracking instantiation states

The compiler tracks the instantiation state of variables in a procedure using insts. Consider this silly example:

:- type pet
    --->    dog
    ;       cat
    ;       rock.

:- pred choose(int, pet, string).
:- mode choose(in, out, out) is semidet.

choose(Num, Pet, Name) :-
    % program point 1
    (
        Num = 1,
        Pet = dog
        % program point 2
    ;
        Num = 2,
        Pet = cat
        % program point 3
    ),
    % program point 4
    (
        Pet = dog,
        Name = "Rex"
    ;
        Pet = cat,
        Name = "Tiger"
    ),
    % program point 5
    true.

At the start of the procedure, program point 1, the instantiation states of Num, Pet and Name are approximated by ground, free and free respectively. This comes directly from the definitions of modes in and out.

At program point 2, Num = 1 must have succeeded, and Pet will be bound. The insts approximating their instantiation states will become bound(1) and bound(dog) respectively.

On the other hand, the insts for Num and Pet at program point 3 will be bound(2) and bound(cat).

At program point 4, ground would approximate the instantiation states of Num and Pet, but that is unnecessarily imprecise. Instead, the insts from program point 2 and program point 3 are merged: the inst for Num becomes bound(1 ; 2), and the inst for Pet becomes bound(dog ; cat). This retains the information that Num can be bound to either 1 or 2, and Pet can be bound to either dog or cat, but does not capture the fact that if Num is bound to 1 then Pet must be bound to dog, or if Num is bound to 2 then Pet must be bound to cat.

After program point 4, we have a switch on Pet. The inst for Pet tells us that Pet can be bound to one of two function symbols and, since both alternatives are covered, the switch is complete. If the inst for Pet had been ground then the switch would be inferred to be incomplete as it does not cover the function symbol rock.

At program point 5, the inst for Pet remains bound(dog ; cat), and the inst for Name becomes bound("Rex" ; "Tiger").

At the end of the procedure, the compiler checks that the final instantiation states of the terms in the arguments are indeed approximated by the corresponding final insts declared for each argument. bound(1 ; 2) is approximated by ground (the final inst of in) and bound(dog ; cat) and bound("Rex" ; "Tiger") approximated by ground (the final inst of out), so all is well.

Insts in mode declarations

The head/2 predicate returns the first element of a list:

:- pred head(list(T), T).
:- mode head(in, out) is semidet.

head([X | _], X).

The declared mode is semidet as head/2 must fail on an empty list. We can declare another mode that restricts the first argument to non-empty lists, using the non_empty_list inst and the in(Inst) mode presented previously.

:- mode head(in(non_empty_list), out) is det.

which is the same as:

:- mode head(
            non_empty_list >> non_empty_list,
            free >> ground
            ) is det.

which further expands to:

:- mode head(
            bound([ground | ground]) >> bound([ground | ground]),
            free >> ground
            ) is det.

This mode can only be called when it is known that the first argument is a non-empty list. As it will never fail, its determinism can be tightened to det, a nice improvement.

We can improve things a little more (but this is going beyond what most Mercury code does). The output argument of head/2 has final inst ground so even if the caller had a more precise inst for the first argument, when head/2 returns, the caller will get back an element with inst ground.

To retain the inst information from the first to the second argument, we can make use of inst parameters. First, we need to provide a parameterised definition of non_empty_list.

:- inst non_empty_list(Inst)
    --->    [Inst | list(Inst)].

Then we can use that inst in the mode declaration of head/2.

:- mode head(in(non_empty_list(I)), out(I)) is det.

The inst variable I approximates the instantiation state of every element of the input list. head/2 binds the second argument to an element of the input list so, when it returns, the instantiation state of the second argument can also be approximated by I. There is also an implicit constraint that I must be compatible with ground, e.g. not free or unique. See "Constrained polymorphic modes" in the reference manual for more information.

Insts in practice

In practice, it is hard to maintain precise insts for too long. Most Mercury predicates and functions, including the standard library, only use the simple modes in and out so you very often end up with ground insts. At this time, it is probably impractical to use polymorphic modes everywhere.

A somewhat common pattern that employs named insts is when breaking up a large predicate into smaller predicates. The top predicate switches on a variable, then each switch arm calls a helper predicate to handle the subset of function symbols that the variable could be bound to in that branch.

:- type foobar
    --->    foo(a, b, c, d, e)
    ;       bar(f, g).

:- inst foo
    --->    foo(ground, ground, ground, ground, ground).

:- pred do_foobar(foobar, result).
:- mode do_foobar(in, out) is det.

do_foobar(FooBar, Result) :-
    (
        FooBar = foo(_, _, _, _, _),
        do_foo(FooBar, Result)
    ;
        FooBar = bar(_, _),
        do_bar(FooBar, Result)
    ).

:- pred do_foo(foobar, result).
:- mode do_foo(in(foo), out) is det.

do_foo(Foo, Result) :-
    Foo = foo(A, B, C, D, E),
    ...

Including the more precise inst in the mode declaration allows a helper predicate like do_foo/2 to consider only a subset of the function symbols of the argument's type. An alternative would be to pass each of the foo/5 arguments separately to do_foo/2.

Insts will also show up in the discussion of our next topic, higher-order programming in Mercury.

Sample code

:- module insts.
:- interface.

:- import_module io.

:- pred main(io::di, io::uo) is det.

:- implementation.

:- import_module list.

/*
:- type list(T)
    --->    []
    ;       [T | list(T)].
*/

    % Already defined in the list module.
:- inst list(Inst)
    --->    []
    ;       [Inst | insts.list(Inst)].

    % Already defined in the list module.
:- inst non_empty_list
    --->    [ground | ground].

:- inst non_empty_list(Inst)
    --->    [Inst | insts.list(Inst)].

:- pred head(list(T), T).
:- mode head(in, out) is semidet.
% :- mode head(in(non_empty_list(I)), out) is det.
:- mode head(in(non_empty_list(I)), out(I)) is det.

head([X | _], X).

:- type pet
    --->    dog
    ;       cat
    ;       rock.

:- inst dog_or_cat
    --->    dog
    ;       cat.

main(!IO) :-
    % Try this:
    %   Pets = [],

    % Try this:
    %   Pets = [dog, cat, rock],

    Pets = [dog, cat],

    head(Pets, Pet),
    (
        Pet = dog,
        write_string("My first pet is a dog.\n", !IO)
    ;
        Pet = cat,
        write_string("My first pet is a cat.\n", !IO)
    ).