-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Staticaly scheduled theta conversion in rhls #552
base: master
Are you sure you want to change the base?
Conversation
@haved I added you as a reviewer as you should get familiar with most of the jlm code base, and this is a larger contribution to it. |
@sjalander I agree, I will look at it. I don't quite understand what program/circuit the figures shown in the PR represent. I assume they are the result of conversion, but what did the theta node look like before transformation? Is static HLS written about somewhere? @ProgrammingLouis |
@sjalander @ProgrammingLouis I spend some time looking at this now and I have quite a hard time to understand what is going on. From the best of my understanding, here are some high-level comments to start with:
|
@phate @haved @ProgrammingLouis @davidmetz In our case, we have a bit more freedom as the architecture is not predefined. So, we can choose how many operations that can be performed in a single cycle, i.e., the number of adds/subs, multiplications, memory operations, etc. We can think of this as a parallel ALU/MEM/BRANCH unit that in a conventional pipeline represents the execute stage. This part is represented by the “compute subregion” of the static_loop node, as shown in the figure. The next step is to connect the outputs/results of our “execute stage”/“compute subregion” to registers such that we can store temporary variables and multiplexers such that we can feed the “compute subregion” with operands for the next compute cycle. Finally, we need a finite state machine (FSM) that for each cycle of the loop controls, which registers should be written and which registers/arguments that should be fed to the “compute subregion” The FSM in its simplest form can be viewed as a straight chain of states, with each state containing the control signals for controlling the multiplexers and registers. This is modeled with a gamma node, which each region representing one state in the FSM. |
@sjalander sounds cool! I like the explicit register node, and I assume its second input (red) is either a reset or set control wire? My only question about the scheme for now is what kind of input the fsm takes, beyond the current state. Does it come from the last "evaluation" of the computation region, being implicitly latched, or does it have to come from one of the register nodes? |
In the current implementation, the fsm takes the state as input which is connected to a region argument. Each register has a red input which corresponds to the store input. |
For 1. I just added a small test that runs the conversion. |
@ProgrammingLouis that makes sense. I'm trying to understand the fsm structure more generally. Does it always take the previous state + a single predicate, or is that just a side effect of scheduling a theta? (Could it ever take more inputs?) Is the idea with this scheme eventually to automatically create a "microcode-interpreter" and corresponding "microcode-program" for the theta? Does it make sense to use this scheme for things that are not theta nodes? (I'm misusing the word "microcode", but the scheme you present here reminds me of the way microinstructions were presented in TDT4160. Using "VLIW" like @sjalander did is more precise) |
Staticaly scheduled theta conversion in rhls
THIS IS STILL A WORK IN PROGRESS, the goal of this PR is to present the basic mechanisms and implementation and get feedbacks
The final goal is to have a statically scheduled version of rhls to compare with the existing dynamic version. Furthermore, one could propose a mixed statically/dynamically scheduled harware that benefits from the performance of dynamic hls and from small circuit area of static hls.
Theta nodes are converted into a new
jlm::static_hls::loop_node
. Thejlm::static_hls::loop_node
is a structural node that contains 2 subregion :The implemented scheduling algorithm is made to be as simple as possible (and is really not effcient) at this point.
Every node in the original theta subregion goes through the
loop_node::add_node
method which eitherA fsm_state is also created for original node.
The
fsm.hpp
file contains 3 main classes for representing the final state machine :fsm_node_temp
: When building the fsm, this stuctural node is created by thefsm_builder
and structural outputs are incrementally added to it to connect registers and muxes control inputsfsm_state
: fsm_states are regions that represent a state of the fsm. They contain control constants to set the registers store inputs and muxes control inputs. They have their results connected to the structural ouputs of thefsm_node_temp
but are not subregions of thefsm_node_temp
because they are incrementally added. This is a workaround to be able the incrementally add states to the fsm as well as to add new connected registers and muxes.fsm_builder
: This is the main class for building the fsm. When the building is complete thefsm_node_temp
is deleted and converted to a gamma (seefsm_node_builder::generate_gamma()
andloop_node::finalize()
)This PR does not contain conversion to FirRTL for now.
This PR is here to have feedback on this implementation but still misses a lot of things that will be added soon in new commits.
Things that I'm planning on adding as soon as possible :
@phate