-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Generator Enhancements
As of late October 2016 (https://github.com/halide/Halide/pull/1523), Halide Generators have been enhanced:
- Improve readability and flexibility of Generators
- Provide machine-generated Stubs that make it easier for one Generator to use another
- Make integration with the Autoscheduler easier and more reliable
Note that none of these changes break existing Generators (all existing Generators should work as-is); all existing Generators will continue to work as-is for the foreseeable future.
This document is meant to capture the nature of the changes and describe how to "upgrade" a Generator to use the new enhancements.
Param<>
continues to exist, but Generators can now use a new
class, Input<>
, instead. For scalar types, these can be considered essentially
identical to Param<>
, but have a different name for reasons of code clarity,
as we'll see later.
Similarly, ImageParam
continues to exist, but Generators can
instead use a Input<Func>
. This is (essentially) like an ImageParam
, with
the main difference being that it may (or may not) not be backed by an actual
buffer, and thus has no defined extents.
Input<Func> input{"input", Float(32), 2};
The equivalent of an ImageParam
backed by an actual buffer can be created
by an Input<Buffer<T>>
like this:
ImageParam input{UInt(8), 2, "input"};
becomes:
Input<Buffer<uint8_t>> input{"input", 2};
This allows you (in comparison to an Input<Func>
) to access the width and
height of the buffer through input.dim(0).extent()
and input.dim(1).extent()
.
It is an error for a Generator to declare both Input<>
and Param<>
or
ImageParam
(i.e.: if you use Input<>
you may not use the previous syntax).
Note that Input<>
is intended only for use with Generator, and is not
intended for use in other Halide code; in particular, it is not intended to
replace Param<>
, except for inside Generators.
Example:
class SumColumns : Generator<SumColumns> {
ImageParam input{Float(32), 2, "input"};
Func build() {
RDom r(0, input.width());
Func f;
Var y;
f(y) = 0.f;
f(y) += input(r.x, y);
return f;
}
};
becomes
class SumColumns : Generator<SumColumns> {
Input<Func> input{"input", Float(32), 2};
Input<int32_t> width{"width"};
Func build() {
RDom r(0, width);
Func f;
Var y;
f(y) = 0.f;
f(y) += input(r.x, y);
return f;
}
};
You can optionally make the type and/or dimensions of Input<Func>
unspecified, in which case the value is simply inferred from the actual Funcs passed to them. Of course, if you specify an explicit Type or Dimension, we still require the input Func
to match, or a compilation error results.
Input<Func> input{ "input", 3 }; // require 3-dimensional Func,
// but leave Type unspecified
When a Generator using Input<Func>
is compiled directly (e.g., using GenGen), the Input<Func>
must be concretely specified; if Type and/or Dimensions are unspecified, you can specify them using implicit GeneratorParams with names derived from the Input or Output. (In the example above, input has an implicit GeneratorParam named "input.type" and an implicit GeneratorParam named "input.dim".)
All of a Generator's inputs can be determined by introspecting its
members, but information about its outputs could previously only be determined by calling its
build()
method and examining the return value (which may be a Func
or a
Pipeline
).
With this change, a Generator can, instead, explicitly declare its output(s) as
member variables, and provide a generate()
method instead of a build()
method.
(These are equivalent aside from the fact that generate()
does not return a
value.)
Example:
class SumColumns : Generator<SumColumns> {
Input<Func> input{"input", Float(32), 2};
Input<int32_t> width{"width"};
Func build() {
RDom r(0, width);
Func f;
Var y;
f(y) = 0.f;
f(y) += input(r.x, y);
return f;
}
};
becomes
class SumColumns : Generator<SumColumns> {
Input<Func> input{"input", Float(32), 2};
Input<int32_t> width{"width"};
Output<Func> sum_cols{"sum_cols", Float(32), 1};
void generate() {
RDom r(0, width);
Var y;
sum_cols(y) = 0.f;
sum_cols(y) += input(r, y);
}
};
As with Input<Func>
, you can optionally make the type and/or dimensions of an
Output<Func>
unspecified; any unspecified types must be resolved via an implicit GeneratorParam in order to use top-level compilation.
Note that Output<>
is intended only for use with Generator, and is not
intended for use in other Halide code.
The Generator infrastructure will verify (after calling generate()
) that all
outputs are defined, and have definitions that match the declaration.
You can specify an output that returns a Tuple by specifying a list of Types:
class Tupler : Generator<Tupler> {
Input<Func> input{"input", Int(32), 2};
Output<Func> output{"output", {Float(32), UInt(8)}, 2};
void generate() {
Var x, y;
output(x, y) = Tuple(cast<float>(input(x, y)), cast<uint8_t>(input(x, y)));
}
};
A Generator can define multiple outputs (which is quietly implemented as a
Pipeline
under the hood):
class SumRowsAndColumns : Generator<SumRowsAndColumns> {
Input<Func> input{"input", Float(32), 2};
Input<int32_t> width{"width"};
Input<int32_t> height{"height"};
Output<Func> sum_rows{"sum_rows", Float(32), 1};
Output<Func> sum_cols{"sum_cols", Float(32), 1};
void generate() {
RDom rc(0, height);
Var x;
sum_rows(x) = 0.f;
sum_rows(x) += input(x, rc);
RDom rr(0, width);
Var y;
sum_cols(y) = 0.f;
sum_cols(y) += input(rr, y);
}
};
We also allow you to specify Output for any scalar type (except for Handle types); this is merely syntactic sugar on top of a zero-dimensional Func, but can be quite handy, especially when used with multiple outputs:
class Sum : Generator<Sum> {
Input<Func> input{"input", Float(32), 2};
Input<int32_t> width{"width"};
Input<int32_t> height{"height"};
Output<Func> sum_rows{"sum_rows", Float(32), 1};
Output<Func> sum_cols{"sum_cols", Float(32), 1};
Output<float> sum{"sum"};
void generate() {
RDom rc(0, height);
Var x;
sum_rows(x) = 0.f;
sum_rows(x) += input(x, rc);
RDom rr(0, width);
Var y;
sum_cols(y) = 0.f;
sum_cols(y) += input(rr, y);
RDom r(0, width, 0, height);
sum() = 0.f;
sum() += input(r.x, r.y);
}
};
Note that it is an error to define both a build()
and generate()
method.
You can also use the new syntax to declare an array of Input
or Output
, by
using an array type as the type parameter:
// Takes exactly 3 images and outputs exactly 3 sums.
class SumRowsAndColumns : Generator<SumRowsAndColumns> {
Input<Func[3]> inputs{"inputs", Float(32), 2};
Input<int32_t[2]> extents{"extents"};
Output<Func[3]> sums{"sums", Float(32), 1};
void generate() {
assert(inputs.size() == sums.size());
// assume all inputs are same extent
Expr width = extent[0];
Expr height = extent[1];
for (size_t i = 0; i < inputs.size(); ++i) {
RDom r(0, width, 0, height);
sums[i]() = 0.f;
sums[i]() += inputs[i](r.x, r.y);
}
}
};
You can also leave array size unspecified, with some caveats:
- For ahead-of-time compilation, Inputs must have a concrete size specified via a GeneratorParam at build time (e.g., pyramid.size=3)
- For JIT compilation via a Stub, Inputs array sizes will be inferred from the vector passed.
- For ahead-of-time compilation, Outputs may specify a concrete size via a GeneratorParam at build time (e.g., pyramid.size=3), or the size can be specified via a resize() method.
class Pyramid : public Generator<Pyramid> {
public:
GeneratorParam<int32_t> levels{"levels", 10};
Input<Func> input{ "input", Float(32), 2 };
Output<Func[]> pyramid{ "pyramid", Float(32), 2 };
void generate() {
pyramid.resize(levels);
pyramid[0](x, y) = input(x, y);
for (int i = 1; i < pyramid.size(); i++) {
pyramid[i](x, y) = (pyramid[i-1](2*x, 2*y) +
pyramid[i-1](2*x+1, 2*y) +
pyramid[i-1](2*x, 2*y+1) +
pyramid[i-1](2*x+1, 2*y+1))/4;
}
}
};
An Array Input/Output with unspecified size must be resolved to a concrete size for toplevel compilation; there are now implicit GeneratorParam<size_t> that allow to to set this, based on the name ("pyramid.size" in the example above).
Note that both Input and Output arrays support a limited subset of the methods from std::vector<>
:
operator[]
size()
begin()
end()
-
resize()
(Output only)
A Generator can now split the existing build()
method into two methods:
void generate() { ... }
void schedule() { ... }
Such a Generator must move all scheduling code for intermediate Func
into
the schedule()
method. Note that this means that schedulable Func
, Var
,
etc will need to be stored as member variables of the Generator. (Since
Output<>
are required to be declared as member variables, these are simple
enough, but intermediate Func
that need scheduling may require motion.)
Example:
class Example : Generator<Example> {
Output<Func> output{"output", Float(32), 2};
void generate() {
Var x, y;
Func intermediate;
intermediate(x, y) = SomeExpr(x, y);
output(x, y) = intermediate(x, y);
intermediate.compute_at(output, y);
}
};
becomes
class Example : Generator<Example> {
Output<Func> output{"output", Float(32), 2};
void generate() {
intermediate(x, y) = SomeExpr(x, y);
output(x, y) = intermediate(x, y);
}
void schedule() {
intermediate.compute_at(output, y);
}
Func intermediate;
Var x, y;
};
Note that the output Func
doesn't have a scheduling directive for
compute_at()
or store_at()
in either case: it is either implicitly
compute_root()
(when being compiled directly into a filter), or explicitly
scheduled by its caller (when being used as a subcomponent, as we'll see later).
Even if the intermediate Halide code doesn't have any scheduling necessary (e.g.
it's all inline), you should still provide an empty schedule()
method to make
this fact obvious and clear.
Example:
class ExampleInline : Generator<ExampleInline> {
Output<Func> output{"output", Float(32), 2};
void generate() {
Var x, y;
output(x, y) = SomeExpr(x, y);
}
};
becomes
class ExampleInline : Generator<ExampleInline> {
Output<Func> output{"output", Float(32), 2};
void generate() {
output(x, y) = SomeExpr(x, y);
}
void schedule() {
// empty
}
Var x, y;
};
GeneratorParam
is now augmented by the new ScheduleParam
type. All
generator params that are intended to be used by the schedule()
method should
be declared as ScheduleParam
rather than GeneratorParam
. This has two
purposes:
- It allows a declarative way to enumerate and communicate scheduling information between arbitrary Generators (as we'll see later).
- It makes clear which GeneratorParams are used for scheduling, which will aid future Autoscheduler work.
Note that there are common GeneratorParam
conventions that already act as
ScheduleParam
(most notably, vectorize
and parallelize
); this merely
formalizes the previous convention.
GeneratorParam
and ScheduleParam
continue to live inside a single
namespace (i.e., it is an error to declare a GeneratorParam
and
ScheduleParam
with the same name).
While a GeneratorParam
can be used from anywhere inside a Generator (either
the generate()
or schedule()
method), a ScheduleParam
should be accessed
only within the schedule()
method. (We'd like to make this a compile-time
error in the future.)
Note that while GeneratorParam
continues to be serializable to and from
strings (just as GeneratorParams are), some ScheduleParam
values are not
serializable, as they may reference runtime-only Halide structures (most
notably, LoopLevel
, which cannot be reliably specified by name in the general
case). Attempting to set such a ScheduleParam
from GenGen will cause a
compile-time error.
Example:
class Example : Generator<Example> {
GeneratorParam<int32_t> iters{"iters", 10};
GeneratorParam<bool> vectorize{"vectorize", true};
Func generate() {
Var x, y;
vector<Func> intermediates;
for (int i = 0; i < iters; ++i) {
Func g;
g(x, y) = (i == 0) ? SomeExpr(x, y) : SomeExpr2(g(x, y));
intermediates.push_back(g);
}
Func f;
f(x, y) = intermediates.back()(x, y);
// Schedule
for (auto fi : intermediates) {
fi.compute_at(f, y);
if (vectorize) fi.vectorize(x, natural_vector_size<float>());
}
return f;
}
};
becomes
class Example : Generator<Example> {
GeneratorParam<int32_t> iters{"iters", 10};
ScheduleParam<bool> vectorize{"vectorize", true};
Output<Func> output{"output", Float(32), 2};
void generate() {
for (int i = 0; i < iters; ++i) {
Func g;
g(x, y) = (i == 0) ? SomeExpr(x, y) : SomeExpr2(g(x, y));
intermediates.push_back(g);
}
output(x, y) = intermediates.back()(x, y);
}
void schedule() {
for (auto fi : intermediates) {
fi.compute_at(output, y);
if (vectorize) fi.vectorize(x, natural_vector_size<float>());
}
}
Var x, y;
vector<Func> intermediates;
};
Note that ScheduleParam
can have other interesting values too, most notably
LoopLevel
:
class Example : Generator<Example> {
// Specify a LoopLevel at which we want intermediate Func(s)
// to be computed and/or stored.
ScheduleParam<LoopLevel> intermediate_compute_level{"level", "undefined"};
ScheduleParam<LoopLevel> intermediate_store_level{"level", "root"};
Output<Func> output{"output", Float(32), 2};
void generate() {
intermediate(x, y) = SomeExpr(x, y);
output(x, y) = intermediate(x, y);
}
void schedule() {
intermediate
// If intermediate_compute_level is undefined,
// default to computing at output's rows
.compute_at(intermediate_compute_level.defined() ?
intermediate_compute_level :
LoopLevel(output, y))
.store_at(intermediate_store_level);
}
Func intermediate;
Var x, y;
};
Note that ScheduleParam<LoopLevel>
can default to "root", "inline", or
"undefined"; all other values (e.g. Func-and-Var) must be specified in actual
code. (It is explicitly not possible to specify LoopLevel(Func, Var) by name,
e.g. "func.var"; although Halide uses such a convention internally, it is not
currently possible to guarantee unique Func names across an arbitrary set of Generators.)
Note that it is an error to use an undefined LoopLevel for scheduling.
Previously, you'd register a Generator by explicitly instantiating a RegisterGenerator at global scope:
Halide::RegisterGenerator<MyGen> register_my_gen{"my_gen"};
This still works, but we're introducing a simpler registration macro:
HALIDE_REGISTER_GENERATOR(MyGen, my_gen) // no semicolon at end
If you want to generate a Stub for your Generator, you must use the new-style registration macro, and add that information to the declaration:
// We must forward-declare the name we want for the stub,
// inside the proper namespace(s). None of the namespace(s)
// may be anonymous (if they are, failures will occur at Halide
// compilation time).
namespace SomeNamespace { class MyGenStub; }
HALIDE_REGISTER_GENERATOR(MyGen, "my_gen", SomeNamespace::MyGenStub)
If the fully-qualified stub name specified for third argument hasn't been declared properly, a compile error will result. The fully-qualified name must have at least one namespace (i.e., a name at global scope is not acceptable).
Let's start with an example of usage, then work backwards to explain what's going on. Say we have an RGB-to-YCbCr component we want to re-use:
class RgbToYCbCr : public Generator<RgbToYCbCr> {
Input<Func> input{"input", Float(32), 3};
Output<Func> output{"output", Float(32), 3};
void generate() { ... conversion code here ... }
void schedule() { ... scheduling code here ... }
};
RegisterGenerator<RgbToYCbCr> register_me{"rgb_to_ycbcr"};
GenGen now can produce a "Func
-like" stub class around a generator, which (by convention)
is emitted in a file with the extension ".stub.h". It looks something like:
/path/to/rgb_to_rcbcr.stub.h:
// MACHINE-GENERATED
class RgbToYCbCr : public GeneratorStub {
struct Inputs {
// All the Input<>s declared in the Generator are listed here,
// as either Func or Expr
Func input;
};
struct GeneratorParams { ... };
struct ScheduleParams { ... };
// ctor, with required inputs, and (optional) GeneratorParams.
RgbToYCbCr(GeneratorContext* context,
const Inputs& inputs,
const GeneratorParams& = {}) { ... }
// Output(s)
Func output;
// Overloads for first output
operator Func() const { return output; }
Expr operator()(Expr x, Expr y, Expr z) const { return output(x, y, z); }
Expr operator()(std::vector<Expr> args) const { return output(args); }
Expr operator()(std::vector<Var> args) const { return output(args); }
void schedule(const ScheduleParams ¶ms = {});
};
Note that this is a "header-only" class; all methods are inlined (or
template-multilinked, etc) so there is no associated .cpp to incorporate. Also
note that this is a "by-value", internally-handled-based class, like most other
types in Halide (e.g. Func
, Expr
, etc).
We'd consume this downstream like so:
#include "/path/to/rgb_to_rcbcr.stub.h"
class AwesomeFilter : public Generator<AwesomeFilter> {
public:
Input<Func> input{"input", Float(32), 3};
Output<Func> output{"output", Float(32), 3};
void generate() {
// Snap image into buckets while still in RGB.
quantized(x, y, c) = Quantize(input(x, y, c));
// Convert to YCbCr.
rgb_to_ycbcr = RgbToYCbCr(this, {quantized});
// Do something awesome with it. Note that rgb_to_ycbcr autoconverts to a Func.
output(x, y, c) = SomethingAwesome(rgb_to_ycbcr(x, y, c));
}
void schedule() {
// explicitly schedule the intermediate Funcs we used
// (including any reusable Generators).
quantized.
.vectorize(x, natural_vector_size<float>())
.compute_at(rgb_to_ycbcr, y);
rgb_to_ycbcr
.vectorize(x, natural_vector_size<float>())
.compute_at(output, y);
// *Also* call the schedule method for all reusable Generators we used,
// so that they can schedule their own intermediate results as needed.
// (Note that we may have to pass them appropriate values for ScheduleParam,
// which vary from Generator to Generator; since RgbToYCbCr has none,
// we don't need to pass any.)
rgb_to_ycbcr.schedule();
}
private:
Var x, y, c;
Func quantized;
RgbToYCbCr rgb_to_ycbcr;
Expr Quantize(Expr e) { ... }
Expr SomethingAwesome(Expr e) { ... }
};
It's worth pointing out that all inputs to the subcomponent must be explicitly provided when the subcomponent is created (as arguments to its ctor); the caller is responsible for providing these. (There is no concept of automatic input forwarding from the caller to a subcomponent.)
What if RgbToYCbCr
has array inputs or outputs? For instance:
class RgbToYCbCrMulti : public Generator<RgbToYCbCrMulti> {
Input<Func[3]> inputs{"inputs", Float(32), 3};
Input<float> coefficients{"coefficients", 1.f};
Output<Func[3]> outputs{"outputs", Float(32), 3};
...
};
In that case, the generated RgbToYCbCrMulti
class requires vector-of-Func (or
vector-of-Expr) for inputs, and provides vector-of-Func as output members:
class RgbToYCbCrMulti : public GeneratorStub {
struct Inputs {
std::vector<Func> inputs;
std::vector<Expr> coefficients;
};
RgbToYCbCr(GeneratorContext* context,
const Inputs& inputs,
const GeneratorParams& = {}}) { ... }
...
std::vector<Func> outputs;
};
What if RgbToYCbCr
has multiple outputs? For instance:
class RgbToYCbCrMulti : public Generator<RgbToYCbCrMulti> {
Input<Func> input{"input", Float(32), 3};
Output<Func> output{"output", Float(32), 3};
Output<Func> mask{"mask", UInt(8), 2};
Output<float> score{"score"};
...
};
In that case, the generated RgbToYCbCrMulti
class has all outputs as struct
members, with names that match the declared names in the Generator:
struct RgbToYCbCrMulti {
...
Func output;
Func mask;
Func score;
};
Note that scalar outputs are still represented as (zero-dimensional) functions, for consistency. (Also note that "output" isn't a magic name; it just happens to be the name of the first output of this Generator.)
Note also that the first output is always represented both in an "is-a" relationship and a "has-a" relationship: RgbToYCbCrMulti overloads the necessary operators so that accessing it as a Func is the same as accessing its "output" field, i.e.:
struct RgbToYCbCrMulti {
...
Func output;
operator Func() const { return output; }
Expr operator()(Expr x, Expr y, Expr z) const { return output(x, y, z); }
Expr operator()(std::vector<Expr> args) const { return output(args); }
Expr operator()(std::vector<Var> args) const { return output(args); }
...
};
This is (admittedly) redundant, but is deliberate: it allows convenience for the most common case (a single output), but also orthogonality in the multi-output case.
The consumer might use this like so:
#include "/path/to/rgb_to_rcbcr_multi.stub.h"
class AwesomeFilter : public Generator<AwesomeFilter> {
...
void generate() {
rgb_to_ycbcr_multi = RgbToYCbCrMulti(this, {input});
output(x, y, c) = SomethingAwesome(rgb_to_ycbcr_multi.output(x, y, c),
rgb_to_ycbcr_multi.mask(x, y),
rgb_to_ycbcr_multi.score());
}
void schedule() {
rgb_to_ycbcr_multi.output
.vectorize(x, natural_vector_size<float>())
.compute_at(output, y);
rgb_to_ycbcr_multi.mask
.vectorize(x, natural_vector_size<float>())
.compute_at(output, y);
rgb_to_ycbcr_multi.score
.compute_root();
// Don't forget to call the schedule() function.
rgb_to_ycbcr_multi.schedule();
}
};
What if there were GeneratorParam
we wanted to set in RgbToYCbCr
, to
configure code generation? In that case, we'd pass a value for the optional
generator_params
field when calling its constructor
class RgbToYCbCr : public Generator<RgbToYCbCr> {
GeneratorParam<Type> input_type{"input_type", UInt(8)};
GeneratorParam<bool> fast_but_less_accurate{"fast_but_less_accurate", false};
...
};
This would produce a different (generated) definition of
GeneratorParams
, with a field for each GeneratorParam
, initialized
to the proper default:
struct GeneratorParams {
Halide::Type input_type{UInt(8)};
bool fast_but_less_accurate{false};
};
We could then fill this in manually:
class AwesomeFilter : public Generator<AwesomeFilter> {
void generate() {
...
GeneratorParams generator_params;
generator_params.input_type = Float(32);
generator_params.fast_but_less_accurate = true;
rgb_to_ycbcr = RgbToYCbCr(this, input, generator_params);
...
}
}
Alternately, if we know the types at C++ compilation time, we can use a templated construction method that is terser:
class AwesomeFilter : public Generator<AwesomeFilter> {
void generate() {
...
rgb_to_ycbcr = RgbToYCbCr::make<float, true>(this, input);
...
}
}
What if there are ScheduleParam
in RgbToYCbCr
?
class RgbToYCbCr : public Generator<RgbToYCbCr> {
ScheduleParam<LoopLevel> level{"level"};
ScheduleParam<bool> vectorize{"vectorize"};
void generate() {
intermediate(x, y) = SomeExpr(x, y);
output(x, y) = intermediate(x, y);
}
void schedule() {
intermediate.compute_at(level);
if (vectorize) intermediate.vectorize(x, natural_vector_width<float>());
}
Var x, y;
Func intermediate;
};
In that case, the generated stub code would have a different declaration for ScheduleParams
:
struct ScheduleParams {
LoopLevel level{"undefined"};
bool vectorize{false};
};
And we might call it like so:
class AwesomeFilter : public Generator<AwesomeFilter> {
...
void schedule() {
rgb_to_ycbcr
.vectorize(x, natural_vector_size<float>())
.compute_at(output, y);
rgb_to_ycbcr.schedule({
// We want any intermediate products also at compute_at(output, y)
LoopLevel(output, y),
// vectorization: yes please
true
});
}
...
}