Replies: 13 comments 19 replies
-
In this task, I continue using my bril-py interpreter and build a tracing-based JIT on top of it. The main part of my JIT can be found here. It is also very easy to add speculative execution support to my interpreter, which only needs to store the original data frame of the program and restore it when the guard function goes false. It takes less than 10 lines of code to implement. Tracing-Based JITI simply start tracing from the beginning of the program and add instructions to the trace based on which operation the JIT meets:
After we obtain the trace, we can call the LVN and DCE passes to optimize it. I then provide a tranform.py script to insert the trace back to the original program and add speculative markers to it. Since I trace the whole program, the whole trace can be directly put in front of the program. Finally, we can take the program with optimized trace and re-execute it. TestingI again took several test programs from previous lessons, JIT executed, and observed their performance. For demonstration, I only use two test cases here. The first case is the example on the class, which involves function calls (interprocedural optimization).
After the trace was generated, I found it was very tricky to do optimization with LVN. Our previous implementation of LVN actually cannot work for this case, since not all the arguments in the instruction are constants. We cannot simply apply constant folding or constant propagation to this case. Some symbolic equivalence testing should be supported. For a simple workaround, I extend the LVN pass to let it check the former referenced instruction, and see if the previous one is a complementary instruction(e.g.,
Finally, we obtain the program with optimized trace.
The result is shown below. We can see that the two programs indeed generate the same result, and our traced program has fewer instructions than the original one. > python3 bvm.py -f test/demo.json 42
42
# of instructions: 11
> python3 bvm.py -f test/demo.opt.json 42
42
# of instructions: 7 But tracing the whole program incurs overheads. If we switch to another value no smaller than 100, we may need to speculatively execute the traced code first, and then roll back to the beginning of the program and execute again. So in this case, the number of executed instructions increases. > python3 bvm.py -f test/demo.json 42
42
# of instructions: 11
> python3 bvm.py -f test/demo.opt.json 100
102
# of instructions: 15 The second case involves a loop. The test program can be found here. The traced program is exactly the same as loop unrolling, so no loop overhead is introduced after tracing. The number of instructions is greatly reduced as shown below, which shows the effectiveness of my JIT compiler. However, the downside of this is that the size of the generated program will be large, but it can be tackled by changing the starting point of the trace to the beginning of the loop. > python3 bvm.py -f test/loopcond.json
1984
# of instructions: 117
> python3 bvm.py -f test/loopcond.opt.json
1984
# of instructions: 79 |
Beta Was this translation helpful? Give feedback.
-
SummaryTracing-Stitching-Code and Tracing-Recording-Code. To run, run I implemented an ahead of time tracing optimizer. I go to the first non-main function in the program, trace until I hit the end of the function, e.g. a return, a call to another function, a call to print, or a memory operation. I then use this trace and optimize it using LVN, then stitch it back into the program. To stich, I simply add a speculate command, replace all branches with guards as appropriate, remove jumps, and finally add a commit. After, I add a jump to the correct location in the program, as required. To test, I used the given benchmarks, and checked whether the results were the same before and after running the trace program. I had to fix a few bugs, such as how my LVN failed to handle floating point operations. This forced me to bail on optimizing floating point operations. It would likely be a quick fix to handle some of these floating point operations for LVN, modulo some algebraic simplifications that might cause accuracy problems. Another bug came up based on how I used guard expressions. Since the guard only fails when the condition is false, I had to track the runtime value of branches, and manually negate the condition (when the false branch is taken in tracing) so that the guard works properly when stitched into the original program. ResultsResults were not great. The trace optimizer performs worse on every single benchmark. This is likely because my traces are very short, based on the conditions I have listed above. The short length of the trace does not allow lvn to do much optimization. It is also likely other optimizations besides LVN could be used as well, such as DCE, but I did not add this. Perhaps trying other optimizations like DCE would lead to more fruitful results. I also created some contrived test cases, which also show the same behavior, where the trace optimizer performs worse, even on long traces. I believe this may be an issue with how I am doing LVN, and the next thing I would do fix to improve the trace optimizer would be to identify opportunities to remove redundancies, and recognize why LVN is not removing redundancies. Furthermore, my tracing is not interprocedural. I made this decision because I was not sure how to make guards jump to the right label in the correct function if the guard failed. If the tracing was interprocedural, the inlined code would allow for longer regions of code that could be optimized using LVN. It is likely some more code would be either redundant or removed entirely with longer traces. Finally, tracing only occurs in 1 function, for simplicity of implementation. It is likely that the first function I choose top optimise does not present many chances for optimzation to occur. It would be good to trace in as many functions as possible, then optimise all of these traces and stitch them into the original program. This would give as many chances as possible to optimize trace. |
Beta Was this translation helpful? Give feedback.
-
Code here. what happenedtook it easy on this one as i had a lot of other stuff to do, but still was a good time :) step 1 was making the traces, which wasn't as trivial as a thought. i initially was tracing the whole execution, and planning to have my compiler throw out unused stuff, but turns out this was like 70 megs for the ackermann test, so that's a bad idea. Eventually i settled on only tracing the main function, and stopping tracing at calls and back edges. Kept track of labels so that we know where to jump to when speculation finishes Reading in the trace, i stopped early at prints as well, and figured out which label represented the "end" of the trace (sometimes needed to round down and lose some traced instructions, as i didn't fancy inserting new labels in the middle of blocks). Then, get rid of jumps and internal labels to the trace, and fix up branches. This required an extra temp for when the branch test was false, but was relatively straight forward to turn them into guards. Then, i translated the main function to start with a speculate, have the repaired trace, then a commit followed by a jump to the label representing the end of the trace. After this, i have a label for all of the guards to fail to, and then the original code. resultsso it did work, it just made everything worse . This makes sense though since i didn't optimize the trace, and only traced main. the fact that none of the programs broke really showcases how little i've accomplished as i did nothing to deal with memory. |
Beta Was this translation helpful? Give feedback.
-
My implementation is here. ImplementationI implemented the inter-procedure version of tracing and tested it on the benchmarks in the bril project. Details
Results and LimitationsLimitationI didn't handle recursive function calls and pointers. Simply copying the function arguments and return values doesn't work for recursive function calls as values in that function will be overwritten, I imagine there should some stack data structure to keep the values in each recursive call. ResultsI tested the tracing JIT on 23 benchmarks in the bril project with turnt. The results are all correct, and I got similar dynamic instruction count compared with baseline. This is as expected since I did not perform optimizations on the trace, DiscussionTurning
|
Beta Was this translation helpful? Give feedback.
-
@5hubh4m, @anshumanmohan, and @ayakayorihiro worked together on this assignment. ImplementationWe implemented a simple trace-based speculative optimizer for Bril following the recipe from the task documentation. Modifying the Reference InterpreterWe modified the interpreter to trace an entire function. We run tracing only if the funtion was called a certain number of times. When we encounter the next invocation of the function, after it becomes hot, we trace all instructions that were executed. For all
OptimizerOur optimizer reads in the original program from
TestingWe verified the correctness our work on all of the Bril benchmarks that do not use the memory extension.
The most interesting benchmark was
ChallengesAs before, we continued to face challenges in dealing with some of the intricacies of the TypeScript language. An implementation challenge we faced was dealing with recursive function calls within our tracing mechanism. Additionally, a challenge that we faced was implementing the guards, which led to the realization that we had to record the result of the conditional for every branch instruction. Furthermore, we initially implemented our tracing mechanism on the interprocedural level, but found the interaction between multiple function calls a large challenge (namely, the return of function subroutines caused interference in determining when to start and stop tracing). After grappling with this challenge for a while, we decided to perform global tracing rather than interprocedural tracing. Our implementation has a fundamental limitation, which is that we simply trace the path taken by the |
Beta Was this translation helpful? Give feedback.
-
My code is here. I implemented a very basic JIT compiler which works on all benchmarks except Implementation and ChallengesAt a high level, my modified bril interpreter starts tracing whenever it traverses a backedge, then continues until it loops back to itself, or hits something that it needs to bail out for ( Running python from typescriptThe most fun part of my implementation is in the callPython function. Basically, I wanted to reuse as much of my pre-existing code as possible, and so I wanted to be able to call python code. While typescript/python interoperability in general is complicated, the UNIX philosophy of having programs communicate primarily through text streams helped out a ton here. Since most of the python written in class used Getting this all to work did also force me to interact with Javascript's I only wound up using this function to use my old code for computing the dominators of a given basic block, which I think was still easier than reimplementing all of that in typescript. BackedgesI went with the suggestion to just start tracing every time you hit a backedge. I think that this is a super reasonable starting point for a tracing interpreter, since the main benefit of using JIT compilation is to improve code that gets used and reused a lot. The easiest way for that to happen is just for the code to be in a loop. Since the most basic loop is going to run through one path many times and then exit once, tracing through a backedge means that you're going to find that one path. Tracking tracesI managed to keep track of all of my traces just in a javascript array. If I were doing longer traces then maybe writing it out would have been necessary, but as is my traces were never that long anyway. I was worried that I would have to do more work to avoid trying to trace through the code that I just generated, but it was super easy to avoid because all it required doing was just bailing out of tracing whenever I saw a Inserting codeAnother tricky part was just inserting code into the program while the interpreter is still running it. At first this seemed suspiciously not a major problem, because mostly I was just tracing loops, then finalizing the code right when I saw that I had returned to the beginning. This meant that I was only ever inserting code just after what the interpreter was trying to run. However, once I started running the benchmarks I saw that while the "stop because we've looped now" case was working well, finalizing my trace at any other point was failing, because I would insert code which would change the MeasurementI measured my code by running It is correct on all benchmarks except for On most programs it has no impact, I think because it just never finds a backedge. Whenever it does something, it makes it worse. This is just because it has to execute speculative instructions which it might not use, but its reinserted code isn't actually doing anything better in any way so it can't make up for these unused speculations. |
Beta Was this translation helpful? Give feedback.
-
My implementation of the modified brili interpreter with tracing. TracingTracing is implemented starting from the main function to the first Some specifications to my
ResultsThe results were not impressive, probably the opposite as nothing really got any performance improvements. However, the way tracing is implemented here, it makes sense as it's only from the main function with no optimizations performed on the trace. Here are the results from the benchmarks, thankfully nothing crashed and the same results were produced. So, it's a very slight success in that I am adding code and getting the same results. I thought it was interesting that the pow benchmark gained a 25% of it's original instruction count.
|
Beta Was this translation helpful? Give feedback.
-
Sorry for the late post, here is my code. ImplementationMy code is implemented in two parts. I have a modified version of brili (in typescript) to log a trace for the entire program and a rust trace stitching program. The modified version of brili logs a trace for the entire run of the program to a file in /tmp including the instruction and it's line number in the main function. The trace stitching program than reads in the original program and the trace. The trace is cutoff at the first use of print, store, call, alloc, or free to prevent a trace calling unsupported functionality. Call is only not supported because I ran out of time to implement the copying of function arguments and return values in the trace. I also remove any jumps from the trace and replace branch instructions with guards. To ensure the generated guards are correct, the modified version of brili will add an extra not instruction to the trace before a branch that is not taken. When traces are stitched into the original program, they are placed at the start of the program, and if they successfully complete, jump to the location of the last instruction in the trace (this is why it's important to know the line number of instructions in the original trace). If speculative execution fails, it jumps back to the start of the normal program. TestingI used turnt to test against a subset of bril benchmarks. While the resulting programs work correctly, they are almost always worse than the original code because of the lack of optimization on traces. These optimizations should be able to be implemented within the stitching framework with relative ease, given a bit more time. IssuesI ran into some issues when trying to implement the copying of arguments and return values within a trace. I realized that because of nested function calls we need to be careful about matching function call with returns in the trace. Due to scheduling conflicts this week, I ran out of time to get this working and be able to support function calls. |
Beta Was this translation helpful? Give feedback.
-
The modified bril interpreter is here. The code that transforms a program given its trace is here. ImplementationBril InterpreterI modified the bril interpreter to produce traces. I added some new properties to the state type: Transforming the ProgramThe program transformer takes in a json of a bril program and also reads from a trace from standard input. It inserts all traced instructions to the start of the main function, except in the case of guard instructions. In this case, we replace the label argument with a fresh label that represents the start of the original program. This means that if the guard ever fails, we bail out and jump to the original start of the program. Here is an example of the
Transformed:
EvaluationI checked the benchmarks using turnt. All of them passed except DiscussionI think I could have made the program transformer produce more efficient code. I did what was easiest with the guard labels, so any time a guard fails the program jumps all the way to the beginning. This is pretty bad especially if the program has passed many guards and only fails on the last one. It would be better if the transformer picked a guard label in order to minimize instructions executed. |
Beta Was this translation helpful? Give feedback.
-
Hi, sorry for the late reply. My implementations are here:
How it worksTo run programs faster with JIT, I first collected traces with the extended version of the brili.ts (I just added a tracing class in it which records traces from the beginning of (1) Includes:
(2) Includes:
(3) Includes:
ResultsExample original program:
Example of the JIT-prepended program when recording the trace for the input
The above JIT code is not the most optimal due to the limitations of my LVN/DCE optimizations that I got form the first assignments. However, it does show some improvements:
while the original execution:
LimitationsCurrently, no support for inter-procedural JIT, just decided to prioritize optimizations over it. |
Beta Was this translation helpful? Give feedback.
-
Sorry for the late reply. Most of my work are in these two places: modified brili.ts and [python script bril-jit.py ImplementationProducing traceI first modified the typescript interpreter to make it produce the trace through standard output. I basically added a field When I encounter branch instructions on some Note that when I terminate tracing(this always happens in StitchingThen, I wrote a python script I adopted the categorical dual of @atucker's approach by flipping the arrows to the other way(/s), i.e. I generated the trace string by calling the typescript interpreter as a subprocess from my python script and collecting its
here TestingI tested using DifficultiesTypeScript was not hard to learn. However I initially had trouble understanding how it is possible to make the trace and original program communicate "dynamically". But then I realized that we really are just doing syntactic manipulations mostly in this assignment. |
Beta Was this translation helpful? Give feedback.
-
Sorry for the late post. My implementation is here. Trace-based Speculative Optimizer for BrilThe task is to implement a trace-based speculative optimizer for Bril. You’ll implement the same concept as in a tracing JIT, but in a profile-guided AOT setting: profiling, transformation, and execution will be distinct phases. The idea is to implement the “heavy lifting” for a trace-based JIT without needing all the scaffolding that a complete JIT requires, such as on-stack replacement. TestHere for simplicity, I only test on two examples: The output is correct, and the dynamic instrs between original program and tracing program is nearly the same. That's because I haven't implemented optimizations here. The difference is that tracing program would add some
The output answer remains the same. The original program has dynamic instrs = 26; the tracing program has dynamic instrs = 25.
The output answer remains the same. The original program has dynamic instrs = 5; the tracing program has dynamic instrs = 7. ImplementationModify the reference interpreter to produce tracesInsert ">" prefix is added to distinguish the instruction tracing from the normal print output. Transform tracing instructions to required straight-line code
So a workaround is that we change the condition of guard instruction based on profiling result (take cond or not): we want the whole trace to be executed. This is not what Stitch the trace back into the programAs we trace the whole program, we add After that, the original program block is added as the fallback of the |
Beta Was this translation helpful? Give feedback.
-
Super-late submission. My code can be found here. I implemented a tracer as part of ImplementationTracerThe tracer code can be found here. There is a InjectorThe injector is responsible for parsing the trace ( Simple BranchConsider the following program
After parsing the trance, the injector change the
So in case that the guard fails, it will jump to the InliningSomething similar happens when a function call occurs inside a branch. Let's consider the following example:
Now, function
We can see that the whole ExperimentsAlthough I am not very confident about this implementation, in the two examples above it looks like it saves a couple of instructions. I run one experiment for each of those two benchmarks:
The validity of the results looks good, as they produce the same output. However, it definitely requires more test-cases to make sure that everything works fine. |
Beta Was this translation helpful? Give feedback.
-
Here's the thread for the dynamic compilers task, which involves doing some speculative transformations on Bril IR!
Beta Was this translation helpful? Give feedback.
All reactions