-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: a new serialization format for optimizing & executing lowered IR #309
base: master
Are you sure you want to change the base?
Conversation
Co-authored-by: "Kristoffer Carlsson" <[email protected]>
This is a draft of a new tokenization of lowered IR. The main goal is to split off from the representation currently used in Base so that we are free to perform more significant transformations of the IR to enable more performance optimizations. This tackles the first step, serializing lowered code to the new tokenized format. Currently the only things supported are serializing and printing; execution is in draft form but never tested.
Really cool. Should we already now swap out the |
Quite likely. If you're interested, feel free to make changes to the struct. The goal is to make things as easy as possible. One thought would also be to organize these differently: for a call with two arguments, rather than having a list like this Tuple{Int, Int} => meth1
Tuple{Int16,Int} => meth1
Tuple{Float64,Int} => meth2
Tuple{Int,Float64} => meth3 one might instead consider a tree:
It's not obvious that this is a good idea, since I bet >90% of all call sites inside loops (which is basically all we care about) dispatch to a single method, and it seems likely to be hard to beat the list in that case. In general, writing this up was a good exercise in re-thinking "where should this bit of information go?" An example includes the Other good ideas that could potentially be "stolen" from this and applied to our current infrastructure include checking whether something is a builtin/intrinsic at the time of framecode construction, rather than having to call |
Should we define some "calling convention" for this serialized IR? So that the function that gets called know how to look up the arguments without having to put them into a vector, pass it to the next frame and push them into A bit related to the comment at #308 (comment). |
Is the reason for storing the ssastores explicitly due to the fact that some ssavalues not being used (is_used)? Otherwise, they could potentially just be implicit since every codeinfo statement stores to its corresponding ssaslot. |
Yes, I think if we do it the way outlined in that comment, that basically becomes the calling convention. |
One option here is to renumber them so that you only have as many SSAValues as you use. See JuliaInterpreter.jl/src/interpret.jl Line 6 in f9ebc94
JuliaInterpreter.jl/src/interpret.jl Lines 537 to 542 in f9ebc94
It's probably a slowdown to check this for every single statement. It's also a slowdown to save results to |
This is a draft of a new tokenization of lowered IR. The main goal is to split off from the representation currently used in Base so that we are free to perform more significant transformations of the IR to enable more performance optimizations.
This PR supports serialization, printing, and very limited execution. Lots is manual, and lots is broken, but I wanted to post it to collect early feedback. (This is built on top of #307 so this PR contains lots of irrelevant changes. To start with you should focus just on two files,
src/serializer.jl
and the demo intest/serialization.jl
, or just look at the commits individually.)A demo
In
test/
there's a demo script,serialization.jl
. If you run it you get this output:The
99
is a consequence of the last few lines, stepping forward until you get to the line that begins99: callbuiltin...
(builtins are not yet supported). This comes from the lowered code forsummer
, which looks like this:From this you should be able to learn a lot about the serialization format I've designed, but for the benefit of all I've reproduced below the extended comments that appear at the beginning of
src/serializer.jl
:A brief description of the serialization format
This uses a simple format, conceptually implementing a machine with the following
properties:
ser
ans
in
ser
. Executing these operations may consume (nondestructively) future tokens.Operations are conceptually of 4 categories:
ans
from a variety of sources), encoded byload*
or literal tokensans
somewhere more permanent), encoded bystore*
tokensans
)The implementation of calls is allowed to store data to named local variables or lists,
thus increasing the temporary storage beyond
ans
.The serialization of the lowered IR
might look something like this on the tape:
where
call
is an instruction token signaling that next operation is a function call(in reality, there are multiple call-type tokens for intrinsics, builtins,
generics via the interpreter, generics via ordinary compiled dispatch,
invokelatest
,Core._apply
, etc.)atan_idx
is a token representingatan
. It is encoded as an integer indexinto a table of functions (the table is maintained by the serializer)
methlist
is a pointer to a local method table, a performance optimizationused for avoiding full-blown dispatch (this also stores whether the method should
be called via the interpreter or the compiled path)
fixedargs 2
is an indication that this call should use the path optimized for aparticular (small) number of arguments, which in this case is 2.
An alternative is
listargs args
, which packs an arbitrary number of argumentsinto a literally-encoded
args::Vector{Any}
stored (via its pointer) inser
.Contrary to this (fictitious) example, currently only builtins exploit
fixedargs
since they are the only ones that will typically need runtime dispatch.
loadslot 3
indicates that the next argument (first argument) is to be loadedfrom the slots at index 3
float64 2.4
indicates that the next argument (second argument) is a literalvalue of type
Float64
encoded inser
itself.stored in
ans
storessa 4
indicates thatans
should be placed in%4
.Call sites that use
listargs
currently have their own privateargs
vector sized appropriately for that particular call site. Having one per site is almost certainly overkill. A better format would be to haveframecode
construction figure out what sizes the method needs and then have theframedata
allocate a pool of different sizes. This would support multithreaded interpretation while also decreasing the total storage size. This would probably be one of the most urgent changes to make.Potential performance benefits
Don't even think about timing things yet, since it's not finished enough to be meaningful. We need to support builtins/intrinsics and implement recursive calls.
But the serialization format already has some potential performance benefits. For example, one of our hottest methods is
maybe_evaluate_builtin
, which gets called even whenf
is not a builtin. With the new format, we decide "at compile time" (framecode construction time) whetherf
is a builtin or something else and then "dispatch" (a big if/then block) to the appropriate method.In the longer run, as mentioned above we may be able to perform optimizations that would be incompatible with the tools we rely on for handling lowered IR. For example, the section
might be simplified using a couple of new tokens as something like the following:
and we might be able to implement an optimized method that avoids having to create new frames in the default cases.
I don't know exactly where we want to head with this, but this might at least illustrate some possibilities.
The future
Just getting this far required that I put slightly more time into this than I can afford, so consider this post to be an invitation for others to run with it if interested. I've named the branch
serialization
rather thanteh/serialization
to explicitly disavow ownership. I'm well aware others will have their own agenda too, so if no one grabs it then it can just wait until I have time for it again. But that could be a while.It's also worth noting that JeffB mentioned at JuliaCon that he had been wondering about doing something similar, so at some point (once this has been developed a bit further) we should probably ping him and see what he thinks of the format. It might be nice to share one format (at least in non-optimized form) between base & this package.
EDITS
I'd now change several things about this, esp. not storing pointers inline with the code. Just use a single long "stack" for all frames and pass in an offset when you enter into a new frame.