chap3.tex

\chapter{Design space}

% ``technology transfer'' from PLT to TC

This chapter will attempt to locate technical computing within the large
design space of programming languages.
It will discuss how the priorities introduced in section~\ref{sec:tcproblem}
relate to type systems and dispatch systems.
% concludes with an intro to julia etc.
Our lofty goal is to preserve the flexibility and ease of popular high level
languages, while adding the expressiveness needed for performant generic
programming.

% it is important to bear in mind the many meanings of the word ``type''
% often people use it just when they want a different ``kind of thing'',
% and have not yet thought about how that relates to the means of
% expression available.

\section{A binding time dilemma}
\label{sec:bindingtimedilemma}

Consider the following fragment of a function called \texttt{factorize},
which numerically analyzes a matrix to discover structure that
can be exploited to solve problems efficiently:

\begin{singlespace}
\begin{lstlisting}[language=julia]
if istril(A)
    istriu(A) && return Diagonal(A)
    utri1 = true
    for j = 1:n-1, i = j+1:m
        if utri1 && A[i,j] != 0
            utri1 = i == j + 1
        end
    end
    utri1 && return Bidiagonal(diag(A), diag(A, -1), false)
    return LowerTriangular(A)
end
\end{lstlisting}
\end{singlespace}

\noindent
The code returns different structured matrix types based on the input
data.
% TODO a bit more about how it's used
This sort of pattern can be implemented in object-oriented languages
by defining an interface or base class (perhaps called \texttt{StructuredMatrix}),
and declaring \texttt{Diagonal}, \texttt{Bidiagonal}, and so on as
subclasses.

Unfortunately, there are several problems with this solution.
Class-based OO ties together variation and dynamic dispatch: to allow
a value's representation to vary, each use of it needs to perform an
indirect call.
A major reason to use structured matrix types like \texttt{Bidiagonal}
is performance, so we'd rather avoid the overhead of indirect calls.
%for operations like element access.
A second problem is that most code operating on \texttt{Bidiagonal} would
be implementing algorithms that exploit its representation.
These algorithms would have to be implemented as methods of
\texttt{Bidiagonal}.
However this is not natural, since one cannot expect to have every
function that might be applied to bidiagonal matrices defined in
one place.
This kind of computing is function-oriented.
% other examples: CSC vs. CSR

Let's try a different abstraction mechanism: templates and overloading.
That would allow us to write code like this:

\begin{singlespace}
\begin{lstlisting}[language=julia]
if diagonal
    solve(Diagonal(M), x)
elseif bidiagonal
    solve(Bidiagonal(M), x)
end
\end{lstlisting}
\end{singlespace}

\noindent
Here, structured matrix types are used to immediately select an
implementation of \texttt{solve}.
But the part that selects a matrix type can't be abstracted away.
We would like to be able to write \texttt{solve(factorize(M), x)}.

% TODO: or tagged unions, but then you have to handle every case

%... no wonder people use prepackaged version of this functionality
Given these constraints, the traditional design of technical
computing systems makes sense: pre-generate a large number of
optimized routines for various cases, and be able to dispatch
to them at run time.

It's clear what's going on here: mathematics is dependently typed.
If a compiler could prove that a certain matrix were bidiagonal, or
symmetric, there would be no problem.
But knowing what a given object \emph{is} in a relevant sense can
require arbitrary proofs, and in research computing these proofs
might refer to objects that are not yet well understood.
% In numerical computing, if \emph{anybody} were able to prove the
% relevant property, there would be no point in running the program.

Most languages offer a strict divide between early and late binding,
separated by what the type system can statically prove.
The less the type system models the domain of interest, the more it
gets in the way.
% talk about the premature optimization of assuming
% (generate code for) = (known at compile time)
The compiler's ability to generate specialized code is also coupled
to the static type system, preventing it from being reused for domains
the type system does not handle well.
In general, of course, lifting this restriction implies a need to
generate code at run time, which is not always acceptable.
However run time code generation might not be necessary in every case.
A compiler could, for example, automatically generate a range of
specialized routines and select among them at run time, falling back
to slower generic code if necessary.

Consider the trade-off this entails.
The programmer is liberated from worrying about the binding time of
different language constructs.
With a JIT compiler, performance will probably be good.
But without JIT, performance will be unpredictable, with no clear
syntactic rules for what is expected to be fast.
So it is not surprising that software engineering languages shun this
trade-off.
However we believe it is the trade-off technical computing
users want by default, and it has not generally been available
in any language.

Designing for this trade-off requires a change of perspective
within the compiler.
Most compilers focus on, and make decisions according to, the
language's source-level type system.
Secondarily, they will perform flow analysis to discover
properties not captured by the type system.
Given our goals, we need to invert this relationship.
The focus needs to be on understanding program behavior
\emph{as thoroughly as possible}, since no one set of rules
will cover all cases.
Usually changing a language's type system is a momentous event,
but in the analysis-focused perspective the compiler can evolve
rapidly, with ``as thoroughly as possible'' handling more cases
all the time.
This approach is still compatible with static typing, although we will not
discuss it much.
We could pick a well-defined subset of the system,
or a separate type system, to give errors about.

The notion of ``as thorough as possible'' analysis is formalized
by domain theory.

%Instead of a split between static and dynamic resolution, we can
%instead focus on program \emph{analysis}.
%The goal is to understand a program as well as possible.

%2 problems?
%   1 - flexibility implies late binding, slowness. e.g. CSR/CSC as classes
%   2 - given a simple type system, programs will not match the lattice
%       and analysis diverges

% issues:
% - how programs make choices
% - how compiler-understandable are those choices
% - when do you get specialized code

\section{Domain theory}

In the late 1960s Dana Scott and Christopher Strachey asked how to assign
meanings to programs, which otherwise just appear to be lists of symbols
\cite{scott1971toward}.
For example, given a program computing the factorial function, we
want a process by which we can assign the meaning ``factorial'' to it.
This led to the invention of domain theory, which can be interpreted
as modeling the behavior of a program without running it.
A ``domain'' is essentially a partial order of sets of values that a
program might manipulate.
The model works as follows: a program starts with no
information, the lowest point in the partial order (``bottom'').
Computation steps accumulate information, gradually moving higher through
the order.
This model provides a way to think about the meaning of a program without
running the program.
Even the ``result'' of a non-terminating program has a representation
(the bottom element).
%Other elements of the partial order might refer to intermediate results.

The idea of ``running a program without running it'' is of great interest
in compiler implementation.
A compiler would like to discover as much information as possible about a
program without running it, since running it might take a long time, or
produce side effects that the programmer does not want yet, or, of course,
not terminate.
The general recipe for doing this is to design a partial order (lattice)
that captures program properties of interest, and then describe all
primitive operations in the language based on how they transform
elements of this lattice.
Then an input program can be executed, in an approximate sense, until
it converges to a fixed point.
%For example, given a program that outputs an integer, we might decide
%that we only care whether this integer is even or odd.
%Then our posets are the even and odd integers, and we will classify operations
%in the program according to whether they are evenness-preserving,
%evenness-inverting, always even, always odd, of uncertain evenness,
%etc.
Most modern compilers use a technique like this
(sometimes called abstract interpretation \cite{abstractinterp})
to semi-decide
interesting properties like whether a variable is a constant, whether
a variable might be used before it is initialized, whether a variable's
value is never used, and so on.
%Domain theory gave rise to the study of denotational semantics and the
%design of type systems. However, the original theory is quite general
%and invites us to invent any domains we wish for any sort of language.
%Abstract interpretation \cite{abstractinterp} is an especially
%elegant and general implementation of this idea.
%It should be clear that this sort of analysis, while clearly related
%to type systems, is fairly different from what most programmers
%think of as type \emph{checking}.

Given the general undecidability of questions about programs, analyses
like these are \emph{conservative}.
If our goal is to warn programmers about use of uninitialized
variables, we want to be ``better safe than sorry'' and print a warning
if such a use \emph{might} occur.
Such a use corresponds to any lattice element greater than or equal to
the element representing ``uninitialized''.
Such conservative analyses are the essence of compiler transformations
for performance (optimizations): we only want to perform an optimization
if the program property it relies on holds for sure.
% and not if there is any uncertainty.

The generality of this approach
allows us to discover a large variety of program properties as long as we are
willing to accept some uncertainty.
%Of course, many programmers and language designers prefer to maximize
%safety, leading to different approaches that trade away some precision
%(e.g. syntactic type systems such as those in the ML language
%family \cite{hindley1969principal, MLtypeinf}).
Even if static guarantees are not the priority, or if a language considers
itself unconcerned with ``types'', the domain-theoretic model is
still our type system in some sense, whether we like it or not
(as implied by \cite{scott1976data}).

%- make it easier to ``follow the lattice''
% analyses don't work as well with programs not written to be
% ``type conscious''.
% the solution is just to make it easier to write type conscious programs.
% this doesn't require any restrictions, just a stylistic change.
% you would think if types are also just data values there would be no
% gain, but there is.
% there are run time things, and just flow-sensitive things

\subsection{Programs and domains}
\label{sec:programsanddomains}

% - checking isbidiag() vs. having a type
% - checking every array elt for integer - maybe for ^ function

Program analyses of this kind have been applied to high-level
languages many times.
A common pitfall is that the analysis can easily \emph{diverge}
for programs that make distinctions not modeled by the lattice.

Consider the following program that repeatedly applies elementwise
exponentiation (\texttt{.\^}) to an array:

% TODO: what should `f` be?
% maybe A = [sum(A.^A)]
\begin{singlespace}
\begin{lstlisting}[language=julia]
A = [n]
for i = 1:niter
    A = f(A .^ A)
end
return A
\end{lstlisting}
\end{singlespace}

\noindent
We will try to analyze this program using the lattice in figure~\ref{fig:arraylattice}.

% TODO maybe add Union(Array{Int},Array{BigInt}) to this
\begin{figure}[!t]
\begin{center}
\begin{tikzpicture}[node distance=2cm]
\node(top)   {$\top$};
\node(Array)         [below=0.5cm of top] {\texttt{Array}};
\node(ArrayInteger)  [below=0.5cm of Array] {$\exists\ T<:\texttt{Integer}\ \texttt{Array\{T\}}$};
\node(ArrayInt)      [below left=1cm of ArrayInteger] {\texttt{Array\{Int\}}};
\node(ArrayBigInt)   [below right=1cm of ArrayInteger] {\texttt{Array\{BigInt\}}};
\node(ArrayFloat)    [right=1.25cm of ArrayBigInt] {\texttt{Array\{Float\}}};
\node(bot)   [below=4.5cm of top] {$\bot$};
\draw(top) -- (Array);
\draw(Array) -- (ArrayInteger);
\draw(ArrayInteger) -- (ArrayInt);
\draw(ArrayInteger) -- (ArrayBigInt);
\draw(Array) -- (ArrayFloat);
\draw(ArrayInt) -- (bot);
\draw(ArrayBigInt) -- (bot);
\draw(ArrayFloat) -- (bot);
\end{tikzpicture}
\end{center}
\caption{
  A lattice of array types
}
\label{fig:arraylattice}
\end{figure}

The result depends strongly on how the \texttt{.\^} and \texttt{\^} functions
are defined (assume that \texttt{.\^} calls \texttt{\^} on every element of an array).
The code might look like this:

\vspace{-4ex}
\begin{singlespace}
\begin{multicols}{2}
\begin{lstlisting}[language=julia]
function ^(x, y)
    if trunc(y) == y
        if overflows(x, y)
            x = widen(x)
        end
        # use repeated squaring
    else
        # floating point algorithm
    end
end


function f(a)
    if all(trunc(y) .== y)
        # integer case
    else
        # general case
    end
end
\end{lstlisting}
\end{multicols}
\end{singlespace}

\noindent
This code implements the user-friendly behavior of automatically switching to the
\texttt{BigInt} type on overflow, by calling \texttt{widen}.

Assume we initially know that \texttt{n} is an \texttt{Int}, and therefore
that \texttt{A} is an \texttt{Array\{Int\}}.
Although \texttt{A} really does have this type, the code does not mention
types anywhere.
Next, the type of an element taken from \texttt{A} (\texttt{Int}) will
flow to the \texttt{y} argument of \texttt{\^}.
The function's behavior crucially depends on the test \texttt{trunc(y) == y}.
It is always true for integer arguments, but it is unlikely that the
analysis can determine this.
Therefore \texttt{\^} might return an \texttt{Int}, \texttt{BigInt}, or
\texttt{Float}, and we can only conclude that \texttt{A.\^{}A} is an
\texttt{Array}.
When function \texttt{f} is called, it performs a fairly expensive test
to see if every element is an integer.
Only through human reasoning about the whole program do we see that
this test is unnecessary.
However when the analysis considers \texttt{f} applied to type
\texttt{Array}, our type information is likely to diverge further to
\texttt{$\top$}, since we don't know anything about the array's elements.

The problem is that the program is not written in terms of the
underlying value domain, even though it could be.
We might have written \texttt{isa(y,Integer)} instead of \texttt{trunc(y) == y}.
However, there are reasons that code like this might exist.
The programmer might not be aware of the type system, or the conditional
might refer to a property that was not originally reflected in the type system
but is later, as the language and libraries evolve.
Adding type declarations is not an ideal solution since it can restrict
polymorphism, and the programmer might not know the type of \texttt{A.\^{}A}
or even the type of \texttt{n}.


\iffalse
How can we fix this?
One solution is to add type annotations.
But the author of the original code does not know the type of
\texttt{A.\^{}A}, and might not even know the type of \texttt{n}.
%%%%%  is this accurate?
Another solution is to improve the analysis.
But we will never finish adding cases to the compiler.
Perhaps we can handle \texttt{trunc(y) == y}, but will we also be
able to understand \texttt{trunc(y) == 1*y}?
\fi

Our solution is to give some simple tools to the library developer
(who, of course, is not necessarily a different person).
In Julia, one can implement \texttt{\^} and \texttt{f} as follows:

\begin{singlespace}
\begin{lstlisting}[language=julia]
function ^(x, y)
    # original code
end

function ^(x, y::Integer)
    if overflows(x, y)
        x = widen(x)
    end
    # ...
end

function f(a)
    # original code
end

function f{T<:Integer}(a::Array{T})
    # integer case
end
\end{lstlisting}
\end{singlespace}

\noindent
The syntax \texttt{y::Integer} for a function argument is a dispatch specification.
Now, from the same source program, we can conclude that only
\texttt{\^{}(x, y::Integer)} is applicable, so we know that the
result of \texttt{A.\^{}A} is some integer array.
The library writer can intercept this case with the definition
\texttt{f\{T<:Integer\}(a::Array\{T\})}\footnote{
Although the signature of this method happens to match one of the
lattice elements in figure~\ref{fig:arraylattice}, this is a coincidence.
Method applicability is determined dynamically.
}, avoiding the check \texttt{all(trunc(y) .== y)}.
Since we started with an integer array, the whole program behaves
exactly as before.

We have neither added restrictions, nor asked for redundant type annotations.
All we did is add a dispatch system, and encourage people to use it.
Using dispatch in this way is optional, but comes with benefits for
structuring programs.
For example, the function \texttt{f} above might perform much better
on integer arrays.
We can implement this case separately from the checks and extra logic
that might be needed for other data types, leading to cleaner code
and also making the core definition of \texttt{f} smaller and therefore
more likely to be inlined.

% TODO something about why we use an extensibility mechanism for collecting
% type info.
% 1 - the extra cases can be added by somebody else
% 2 - a library can pick a type to return to user code, and later
%     ``intercept'' operations on it

The original version of this code can also be written in our system,
though it probably will not perform as well.
However, it might still perform better than one would expect, since the
functions that it calls are in turn defined using the same dispatch
mechanism.
This mechanism is expressive enough to be used down to the lowest levels of
the system, providing leverage one would not get from an object system
or performance add-on.

% reverse flow
Having type information attached to user or library definitions
increases the value of \emph{reverse} data flow analysis.
One way to describe why analyses of highly dynamic code diverge
is that everything in the program monotonically increases the
number of cases we need to consider; there is no way to
narrow possibilities.
But if we know that a function is only defined on type \texttt{T},
every value that passes through its argument list is narrowed to
\texttt{T}.

% there seems to be some psychology to this: ``write method definitions''
% is somehow an easier performance model than ``use conditions only involving
% type checks''
% instead of requiring that the compiler be able to resolve things, we
% just want to make it more likely.

% isa(x,Int) is simple enough, but for more complicated checks it becomes
% much harder to predict. e.g. imagine testing for everything being the
% same type.
% there has to be a split between what is specialized on and what is not.
% in c++ this requires switching to templates.


% next we will consider how this dispatch system fits in

\section{Dispatch systems}

It would be unpleasant if every piece of every program we wrote were forced
to do only one specific task.
Every time we wanted to do something slightly different, we'd have to write
a different program.
But if a language allows the same program element to do different things at
different times, we can write whole classes of programs at once.
This kind of capability is one of the main reasons object-oriented programming
is popular: it provides a way to automatically select different behaviors
according to some structured criteria
(we use the non-standard term ``criteria'' deliberately, in order
to clarify our point of view, which is independent of any particular
object system).

However in class-based OO there is essentially \emph{no way} to create an
operation that dispatches on existing types.
This has been called ``the expression problem''~\cite{wadler1998expression}).
While many kinds of object-oriented programs can ignore or work around
this problem, technical programs cannot.
In this domain most programs deal with the same
few types (e.g.\ numbers and arrays), and might sensibly want to write new
operations that dispatch on them.

% The loss of encapsulation due to multimethods weighed in \cite{binarymethods}
% is less of a problem for technical computing, and in some cases even
% advantageous.

%Somewhat unfortunately, the term \emph{object-oriented} has many
%connotations, and the \emph{object-oriented} methodology tries to address
%multiple software engineering problems --- for example modularity,
%encapsulation, implementation hiding, and code reuse. These issues are
%important, but it may be because of them that over time too little
%emphasis has been placed on expressive power.

\subsection{A hierarchy of dispatch}

%The sophistication of the available ``selection criteria'' account for a
%large part of the perceived ``power'' or leverage provided by a language.
It is possible to illustrate a hierarchy of such mechanisms.
As an example, consider a simple simulation, and how it can be written
under a series of increasingly powerful paradigms. First, written-out
imperative code:

\vspace{-3ex}
\begin{singlespace}
\begin{verbatim}
while running
    for a in animals
        b = nearby_animal(a)
        if a isa Rabbit
            if b isa Wolf then run(a)
            if b isa Rabbit then mate(a,b)
        else if a isa Wolf
            if b isa Rabbit then eat(a,b)
            if b isa Wolf then follow(a,b)
        end
    end
end
\end{verbatim}
\end{singlespace}

We can see how this would get tedious as we add more kinds of animals
and more behaviors.
Another problem is that the animal behavior is
implemented directly inside the control loop, so it is hard to see
what parts are simulation control logic and what parts are animal
behavior.
Adding a simple object system leads to a nicer implementation
\footnote{A perennial problem with simple examples is that better
implementations often make the code longer.}:

\vspace{-3ex}
\begin{singlespace}
\begin{verbatim}
class Rabbit
    method encounter(b)
        if b isa Wolf then run()
        if b isa Rabbit then mate(b)
    end
end

class Wolf
    method encounter(b)
        if b isa Rabbit then eat(b)
        if b isa Wolf then follow(b)
    end
end

while running
    for a in animals
        b = nearby_animal(a)
        a.encounter(b)
    end
end
\end{verbatim}
\end{singlespace}

Here all of the simulation's animal behavior has been
compressed into a single program point: \texttt{a.encounter(b)}
leads to all of the behavior by selecting an implementation based
on the first argument, \texttt{a}.
This kind of criterion is essentially indexed lookup; we can imagine
that \texttt{a} could be an integer index into a table of operations.

The next enhancement to ``selection criteria'' adds a hierarchy
of behaviors, to provide further opportunities to avoid repetition.
Here \texttt{A<:B} is used to declare a subclass relationship; it
says that an \texttt{A} is a kind of \texttt{B}:

\vspace{-3ex}
\begin{singlespace}
\begin{multicols}{2}
\begin{verbatim}
abstract class Animal
    method nearby()
        # search within some radius
    end
end

class Rabbit <: Animal
    method encounter(b::Animal)
        if b isa Wolf then run()
        if b isa Rabbit then mate(b)
    end
end

class Wolf <: Animal
    method encounter(b::Animal)
        if b isa Rabbit then eat(b)
        if b isa Wolf then follow(b)
    end
end

while running
    for a in animals
        b = a.nearby()
        a.encounter(b)
    end
end
\end{verbatim}
\end{multicols}
\end{singlespace}

We are still essentially doing table lookup, but the tables have
more structure: every \texttt{Animal} has the \texttt{nearby}
method, and can inherit a general purpose implementation.

This brings us roughly to the level of most popular object-oriented
languages.
But still more can be done.
Notice that in the first transformation we replaced one level of \texttt{if}
statements with method lookup.
However, inside of these methods a structured set of \texttt{if} statements
remains.
We can replace these by adding another level of dispatch.

\vspace{-3ex}
\begin{singlespace}
\begin{verbatim}
class Rabbit <: Animal
    method encounter(b::Wolf) = run()
    method encounter(b::Rabbit) = mate(b)
end

class Wolf <: Animal
    method encounter(b::Rabbit) = eat(b)
    method encounter(b::Wolf) = follow(b)
end
\end{verbatim}
\end{singlespace}

We now have a \emph{double dispatch} system, where a method call
uses two lookups, first on the first argument and then on the
second argument.
This syntax might be considered a bit nicer, but the design
begs a question: why is $n=2$ special?
It isn't, and we could consider even more method arguments as part of
dispatch.
But at that point, why is the first argument special?
Why separate methods in a special way based on the first argument?
It seems arbitrary, and indeed we can remove the special treatment:

\vspace{-3ex}
\begin{singlespace}
\begin{verbatim}
abstract class Animal
end

class Rabbit <: Animal
end

class Wolf <: Animal
end

nearby(a::Animal)               = # search
encounter(a::Rabbit, b::Wolf)   = run(a)
encounter(a::Rabbit, b::Rabbit) = mate(a,b)
encounter(a::Wolf, b::Rabbit)   = eat(a, b)
encounter(a::Wolf, b::Wolf)     = follow(a, b)

while running
    for a in animals
        b = nearby(a)
        encounter(a, b)
    end
end
\end{verbatim}
\end{singlespace}

Here we made two major changes: the methods have been moved ``outside''
of any classes, and all arguments are listed explicitly.
This is sometimes called \emph{external dispatch}.
This change has significant implications.
Since methods no longer need to be ``inside'' classes, there is no syntactic
limit to where definitions may appear.
Now it is easier to add new methods after a class has been defined.
Methods also now naturally operate on combinations of objects, not single objects.
%There may be software engineering reasons to want ``ownership'' of methods
%by objects, but strictly speaking this coupling does not seem correct.
%It ought to be possible to define function behaviors independently of
%data hiding concerns.

The shift to thinking about combinations of objects is fairly revolutionary.
Many interesting properties only apply to combinations of objects, and not
individuals.
We are also now free to think of more exotic kinds of combinations.
We can define a method for \emph{any number} of objects:

\begin{verbatim}
encounter(ws::Wolf...) = pack(ws)
\end{verbatim}

\noindent
We can also abstract over more subtle properties, like whether the
arguments are two animals of the same type:

\begin{verbatim}
encounter{T<:Animal}(a::T, b::T) = mate(a, b)
\end{verbatim}

\noindent
Some systems push dispatch expressiveness even further.
% TODO more


\subsection{Predicate dispatch}

%Patterns are very powerful, but the tradeoff is that there is not
%necessarily a useful relationship between what your program does and
%what a static analysis (based on a finite-height partial order over
%patterns) can discover. Maybe julia could be considered a sweet spot
%somewhere in between.

Predicate dispatch is a powerful object-oriented mechanism that allows
methods to be selected based on arbitrary predicates \cite{ErnstKC98}.
It is, in some sense, the most powerful \emph{possible} dispatch system,
since any computation may be done as part of method selection.
Since a predicate denotes a set (the set of values for which it is true),
it also denotes a set-theoretic type.
Some type systems of this kind, notably that of Common
Lisp~\cite{steele1990common:types}, have actually included predicates as types.
However, such a type system is obviously undecidable, since it
requires computing the predicates themselves or, even worse, computing
predicate implication.\footnote{
Many type systems involving bounded quantification, such as system $F_{<:}$,
are already undecidable \cite{Pierce1994131}.
However, they seem to terminate for most practical programs, and also admit
minor variations that yield decidable systems \cite{Castagna:1994:DBQ:174675.177844}.
It is fair to say they are ``just barely'' undecidable, while predicates
are ``very'' undecidable.
}

For a language that is willing to do run time type checks anyway, the
undecidability of predicate dispatch is not a problem.
Interestingly, it can also pose no problem for \emph{static} type systems
that wish to prove that every call site has an applicable method.
Even without evaluating predicates, one can prove that the available methods
are exhaustive (e.g.\ methods for both $p$ and $\neg p$ exist).
In contrast, and most relevant to this thesis, predicate types \emph{are} a
problem for code \emph{specialization}.
Static method lookup would require evaluating the predicates, and optimal code
generation would require understanding something about what the predicates mean.
One approach would be to include a list of satisfied predicates in a type.
However, to the extent such a system is decidable, it is essentially equivalent
to multiple inheritance.
Another approach would be to separate predicates into a second ``level'' of the
type system.
The compiler would combine methods with the same ``first level'' type, and then
generate branches to evaluate the predicates.
Such a system would be useful, and could be
combined with a language like Julia or, indeed, most object-oriented
languages (this has been done for Java~\cite{Millstein:2009:EMP:1462166.1462168}).
However this comes at the expense of making predicates second-class
citizens of the type system.

In considering the problems of predicate dispatch for code specialization,
we seem to be up against a fundamental obstacle: some sets of values are
simply more robust under evaluation than others.
Programs that map integers to integers abound, but programs that map, say,
even integers to even integers are rare to the point of irrelevance.
With predicate dispatch, the first version of the code in
section~\ref{sec:programsanddomains} could have been rearranged to use
dispatch instead of \texttt{if} statements.
This might have advantages for readability and extensibility, but not for
performance.


\subsection{Symbolic programming}

Systems based on symbolic rewrite rules arguably occupy a further tier of
dispatch sophistication.
In these systems, you can dispatch on essentially anything, including arbitrary
values and structures.
Depending on details, their power is roughly equal to that of predicate
dispatch.
%These systems are typically powerful enough to concisely define the kinds of
%behaviors we are interested in.

However, symbolic programming lacks data abstraction: the concrete
representations of values are exposed to the dispatch system.
In such a system, there is no difference between being a list and being
something \emph{represented} as a list.
If the representation of a value changes, the value can be inadvertently
``captured'' by a dispatch rule that was not intended to apply to it,
violating abstraction.
% we use 2-part values instead; symbolic part and data part effectively

There has always been a divide between ``numeric'' and ``symbolic''
languages in the world of technical computing.
To many people the distinction is fundamental, and we should happily live
with both kinds of languages.
But if one insists on an answer as to which approach is the right one,
then the answer is: symbolic.
Programs are ultimately symbolic artifacts.
Computation is limited only by our ability to describe it, and
symbolic tools are needed to generate, manipulate, and query these
descriptions.
For example in numerical computing, a successful approach has been to
create high-performance kernels for important problems.
From there, the limiting factor is knowing \emph{when} to use each
kernel, which can depend on many factors from problem structure to
data size.
Symbolic approaches can be used to automate this.
We will see some examples of this in chapter~\ref{chap:casestudies}.

\subsection{Choices, choices}
\label{sec:choices}

\newcommand{\chk}{{\Large \checkmark}}

\begin{table}[!t]
\begin{center}
\begin{tabular}{|c||c|c|c|c|c|c|c|}
\hline
               & Domain &  Dynamic  & Spec. & Pattern & S.T.S. & S.C. \\
\hline
\hline
Methods        & $O(1)$   &         &       &         & \chk   & \chk \\
\hline
Virtual methods  & $O(1)$   & \chk    &       &         & \chk   & \chk \\
\hline
Overloading    & $O(n)$   &         &       &         & \chk   & \chk \\
\hline
Templates      & $O(n)$   &         & \chk  &         & \chk   &      \\
\hline
Closures       & $O(1)$   & \chk    &       &         & \chk   & \chk \\
\hline
Duck typing    & $O(1)$   & \chk    &       &         &        & \chk \\
\hline
Multimethods   & $O(n)$   & \chk    &       &         & ?      & ?    \\
\hline
Predicate dispatch & $O(n m)$ & \chk    &       &  \small{1}  & ? & ? \\
\hline
Typeclasses    & $O(m)$   & \small{2} & \small{3} &   & \chk   & \chk \\
\hline
Term rewriting & $O(n m)$ & \chk    &       & \chk    &        &      \\
\hline
Julia          & $O(n m)$ & \chk    & \chk  &         & ?      &      \\
\hline
\end{tabular}
\end{center}
\begin{singlespace}
\caption[Attributes of code selection features]{
\small{
Attributes of several code selection features.
Spec.\ stands for specialization.
S.T.S.\ stands for statically type safe.
S.C.\ stands for separate compilation.
1.\ Depending on design details, 2.\ When combined with existential types,
3.\ Optionally, with declarations,
% no, because it can only pattern match on types
%4.\ With the \texttt{FlexibleInstances} option.
}
}
\label{table:dispatch}
\end{singlespace}
\end{table}

% message of the table: you have to know a really large amount to
% pick which one of these to use. there is no best one.
% ``domain'' languages are all about avoiding knowledge of this table.

Table~\ref{table:dispatch} compares 11 language features.
Each provides some sort of control flow indirection, packaged into a
structure designed to facilitate reasoning (ideally human reasoning,
but often the compiler's reasoning is prioritized).
The ``domain'' column describes the amount of information considered
by the dispatch system, where $n$ is the number of arguments and
$m$ is the amount of relevant information per argument.
$O(1)$ means each use of the feature considers basically the same
amount of information.
$O(n)$ extends the process to every argument.
$O(m)$ means that one value is considered, but its structure is
taken into account.
Squares with question marks are either not fully understood, or too
sensitive to design details to answer definitively.

%% it may be that the ``power'' of a language is measured by the complexity
%% of the criteria used by the language's run time dispatch mechanisms.

This table illustrates how many combinations of dispatch semantics have
been tried.
Keeping track of these distinctions is a distraction when one is focused
on a non-programming problem.
Including more than one row of this table makes a language especially
complex and difficult to learn.

% compare to julia tradeoffs


\section{Subtyping}
\label{sec:chap3subtyping}

So far we have seen some reasons why dispatch contributes significantly
to flexibility and performance.
However we have only actually dispatched on fairly simple properties like
whether a value is an integer.
How powerful should dispatch be, exactly?
Each method signature describes some set of values to which it applies.
Some signatures might be more specific than others.
The theory governing such properties is subtyping.

% TODO
% informal convexity property. ``any # of integers'' is ok, but
% ``any # of integers except 3'' is not.
%% reflect on level of power: this dispatch system is both more and less
%% powerful than previous ones in various ways.

% normally this is used for type safety
% in our case it is used to form an ``analyzable subset'' of the language

It turns out that a lot of relevant work on this has been done in the
context of type systems for XML~\cite{hosoya2000xduce, BCF03}.
%Something about semantic subtyping and type systems for processing XML.
XML at first seems unrelated to numerical computing, and indeed it
was quite a while before we discovered these papers and noticed the
overlap.
However if one substitutes ``symbolic expression'' for ``XML document'',
the similarity becomes clearer.
In XML processing, programs match documents against patterns in order
to extract information of interest or validate document structure.
These patterns resemble regular expressions, and so also denote sets.
%and admit a subset (subtype) relation.

In our context, some argument lists and the properties of some data
structures are sufficiently complex to warrant such a treatment.
For example, consider a \texttt{SubArray} type that describes a
selection or ``view'' of part of another array.
Here is part of its definition in Julia's standard library:

\begin{singlespace}
\begin{lstlisting}[language=julia]
const ViewIndex = Union(Int, Colon, Range{Int}, UnitRange{Int},
                        Array{Int,1})
immutable SubArray{T, N, P<:AbstractArray,
                   I<:Tuple{ViewIndex...}} <: AbstractArray{T,N}
\end{lstlisting}
\end{singlespace}

\noindent
The properties of the \texttt{SubArray} type are declared within curly
braces.
A \texttt{SubArray} has an element type, number of dimensions,
underlying storage type, and a tuple of indexes (one index per dimension).
Limiting indexes to the types specified by \texttt{ViewIndex}
documents what kind of indexes can be supported efficiently.
Different index tuples can yield drastically different performance
characteristics.

Without the ability to describe these properties at the type level,
it would be difficult to implement an efficient \texttt{SubArray}.
In section~\ref{sec:programsanddomains} we only needed to test fairly
simple conditions, but the checks here would involve looping over
indexes to determine which dimensions to drop, or to determine whether
stride-1 algorithms can be used, and so on.
% TODO more

% our language is in many ways dual to ML. that family shuns subtyping,
% but in the lattice theoretic model it's inescapable.
%\cite{hindley1969principal, MLtypeinf}


\section{Specialization}

\subsection{Parametric vs.\ ad hoc polymorphism}

The term \emph{polymorphism} refers generally to reuse of code or data
structures in different contexts.
A further distinction is often made between \emph{parametric} polymorphism
and \emph{ad hoc} polymorphism.
Parametric polymorphism refers to code that is reusable for many types
because its behavior does not depend on argument types (for example,
the identity function).
%reusing the \emph{same} code for different purposes, while
Ad hoc polymorphism refers to selecting
\emph{different} code for different circumstances.

%% Both forms occur frequently in technical computing.
%% For example, a programmer intuitively expects that \texttt{A[i]} selects
%% an element of an array regardless of what kind of array \texttt{A} is.
%% \texttt{A} might contain integers or strings, it might be a local
%% array or a distributed array, and so on.
%% This is parametric polymorphism.
%% However the ``parametric'' property only applies to the code \texttt{A[i]}
%% itself.
%% When we dig into how array indexing actually works, we will probably
%% need to resort to ad hoc polymorphism.
%% For example, when \texttt{A} is a local array the code accesses local
%% memory, and when \texttt{A} is distributed it might be necessary to
%% send a network message instead.
%% At a lower level, the machine code for \texttt{A[i]} needs to be
%% different depending on the array's element type.

In theory, a parametrically polymorphic function works on all data
types.
In practice, this can be achieved by forcing a uniform representation of
data such as pointers, which can be handled by the same code regardless of
what kind of data is pointed to.
However this kind of representation is not the most efficient, and
for good performance specialized code for different types must be
generated.

The idea of specialization unites parametric and ad hoc polymorphism.
Beginning with a parametrically polymorphic function, one can imagine
a compiler specializing it for various cases, i.e.\ certain concrete argument
values or types.
These specializations could be stored in a lookup table, for use
either at compile time or at run time.

Next we can imagine that the compiler might not optimally specialize
certain definitions, and that the programmer would like to provide
hints or hand-written implementations to speed up special cases.
For example, imagine a function that traverses a generic array.
A compiler-generated specialization might inline a specific array type's
indexing operations, but a human might further realize that the loop order
should be switched for certain arrays types based on their storage order.

However, if we are going to let a human specialize a function for performance,
we might as well allow them to specialize it for some other reason, including
entirely different behavior.
But at this point separate ad hoc polymorphism is no longer necessary; we are
using a single overloading feature for everything.

\iffalse
Parametric polymorphism describes code that works for any object precisely
because it does not do anything meaningful to the object, for example the
identity function. In contrast, programming with tagged data (e.g.
symbolic expression systems, XML) permits code to work for any object
because every object has the same structure, allowing meaningful
operations.
\fi

\subsection{Separate compilation}

Writing the signature of a generic method that needs to be separately compiled,
as in Java, can be a difficult exercise.
The programmer must use the type system to write down sufficient conditions on all
arguments.
The following example from a generic Java graph
library~\cite{Garcia:2003:CSL:949305.949317} demonstrates the level of verbosity
that can be required:

\begin{singlespace}
\begin{lstlisting}[language=java,style=ttcode]
public class breadth_first_search {
  public static<Vertex, Edge extends GraphEdge<Vertex>,
                VertexIterator extends Iterator<Vertex>,
                OutEdgeIterator extends Iterator<Edge>,
                Visitor extends BFSVisitor,
                ColorMap extends ReadWriteMap<Vertex, Integer>>
  void go(VertexListAndIncidenceGraph<Vertex,Edge,
            VertexIterator,VerticesSizeType,OutEdgeIterator,
            DegreeSizeType> g,
          Vertex s, ColorMap c, Visitor vis);
}
\end{lstlisting}
\end{singlespace}

%(other problems: primitive types int and double cannot be used,
%static parameters can only be inferred directly from arguments,
%not from constraints of other parameters. our subtyping system
%does not have this problem)

If, however, we are going to specialize the method, the compiler can analyze it
using actual types from call sites, and see for itself whether the method works
in each case.
This is how C++ templates are type checked; they are analyzed again for each
instantiation.

It is quite interesting that performance and ease-of-use pull this design
decision in the same direction.


\section{Easy polymorphism}

% conclude by summarizing the design decisions we end up with:

% easy polymorphism recipe:

The total effect of these design considerations is something more than the
sum of the parts.
We obtain what might be called ``easy polymorphism'', where many of the performance
and expressiveness advantages of generic programming are available without
their usual complexity.
This design arises from the following five part recipe:

\vspace{-3ex}
\begin{singlespace}
\begin{enumerate}
%\item A fully connected type tree
%\item Self-describing data model % aware of memory layout
%\item Type tags with nested structure
%\item Dynamic dispatch over all types % including parameters and varargs
%\item Data flow type inference
\item One mechanism for overloading, dispatch, and specialization
\item Semantic subtyping
\item Make type parameters optional
\item Code specialization by default
\item Types as ordinary values
\end{enumerate}
\end{singlespace}

\noindent
The first three items provide three forms of ``gradualism'':
adding methods, adding more detailed type specifications to arguments,
and adding parameters to types.
The last two items alleviate some of the difficulty of type-level
programming.
The last item is especially controversial, but need not be.
Consistent and practical formulations allowing types to be values of
type \texttt{Type} have been demonstrated~\cite{cardelli1986polymorphic}.
The next chapter will show how this design works in detail.

%- library writers can decide what to put at the type level without affecting users,
%  dropping type parameters
%- a single mechanism to cover overloading, dispatch, and specialization
%%%- using types, but mostly for dispatch, where they can be used gradually
%- specialize by default (makes static polymorphism less of a black art)
%- types as ordinary values

%% This list of features may appear somewhat ad hoc. However, they turn out to be
%% remarkably strongly coupled, and deeply constrained by our ultimate goal.
%% Each of these features has appeared in some form before, but never in a way
%% that fully solves the problems described here.

%% Challenges of this approach (why has this not been done before?)


\iffalse

\section{Introduction to Julia}

A \emph{lattice} is an algebraic structure where some pairs of elements
satisfy a reflexive, antisymmetric, and transitive relation $\leq$.
For purposes of this example, we will consider lattices that have
a greatest, or \emph{top}, element ($\top$), and a least, or \emph{bottom}
element ($\bot$).
When working with lattices one often wants to compute
a least upper bound, or \emph{join} ($\sqcup$), or a greatest lower bound,
or \emph{meet} ($\sqcap$).

Several concerns arise when modeling lattices in a programming language.
First, the structure is very general, and so admits implementations
for many different kinds of elements.
We want to write code using the operators $\leq$, $\sqcup$, and $\sqcap$, and
have it apply to any kind of lattice.
Therefore some kind of overloading or object-oriented programming
is desirable.
Second, some properties apply to all lattices, and we would
like to avoid implementing them repeatedly.

Using ``duck typing'', the problem of modeling an abstraction like lattices
disappears almost entirely.
One may simply define methods for $\leq$, $\sqcup$, and $\sqcap$ at any time,
for any type, and that type will function as a lattice.
That is certainly convenient, but it also fails to provide any reusable
functionality for those defining lattices.

\begin{figure}[!t]
% TODO what does $===$ mean here?
  \begin{center}
\begin{singlespace}
\begin{lstlisting}[language=julia]
abstract LatticeElement

<=(x::LatticeElement, y::LatticeElement) = x===y
==(x::LatticeElement, y::LatticeElement) = x<=y && y<=x
< (x::LatticeElement, y::LatticeElement) = x<=y && !(y<=x)

immutable TopElement <: LatticeElement; end
immutable BotElement <: LatticeElement; end

const ~$\top$~ = TopElement()
const ~$\bot$~ = BotElement()

<=(::BotElement, ::TopElement) = true
<=(::BotElement, ::LatticeElement) = true
<=(::LatticeElement, ::TopElement) = true

~$\sqcup$~(x::LatticeElement, y::LatticeElement) =  # join
    (x <= y ? y : y <= x ? x : ~$\top$~)

~$\sqcap$~(x::LatticeElement, y::LatticeElement) =  # meet
    (x <= y ? x : y <= x ? y : ~$\bot$~)
\end{lstlisting}
\end{singlespace}
  \end{center}
  \label{julialattices}
  \caption{A small Julia library for lattices}
\end{figure}

Figure~\ref{julialattices} shows a small Julia library for lattices.
It defines an abstract class \texttt{LatticeElement} that may be subclassed
by objects that will be used primarily as elements of some lattice.
\texttt{LatticeElement} provides some useful default method definitions.
% TODO Join of two incomparables in general lattices does not have to be top

\fi