-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hw: Add native bootrom #168
Conversation
7026e09
to
85232d4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, the motivation for this PR is to increase the probability of the Snitch cores to successfully boot at test time, as booting from an internal boot ROM decouples it from external components' reliability, e.g. the system-level interconnect.
While in Occamy the boot ROM was unfortunately located in a different clock domain, past many interconnect adapters, other defenses could be put in place. For instance, in a system such as FlooOccamy, the boot ROM could occupy just one of the many tiles in the system-level NoC. Unreliability in the components on this path would most likely kill the deployment of any application, independent on the reliability of the boot process. Thus, I still struggle to see the usefulness of this PR.
I think before proceeding with this PR, we need to clearly sort out and weigh its actual advantages and disadvantages. I still don't see any advantage, while I see the following disadvantages:
- Hardware cost. What is the real cost of adding another port to the wide interconnect? In the thesis, the cost was estimated with a 64-bit wide interconnect configuration. The reported +60 GE sounds unrealistic to me even considering this interconnect, together with the
axi_to_mem
adapter, (small) boot ROM and additional scratch registers. - Increased code complexity and maintenance effort.
- Increased configuration complexity.
If we then choose to go on with this PR, I suggest to make the internal boot feature parametrizable so that the associated hardware cost can be fully removed at configuration time. More comments follow in the review. In any case, we can merge the scratch registers.
hw/snitch_cluster/src/snitch_cluster_peripheral/snitch_cluster_peripheral_reg.hjson
Outdated
Show resolved
Hide resolved
The main reason to have a native cluster bootrom is that you don't have a single point of failure anymore which is far away (potentially in a different clock domain). Also, having high contention from multiple clusters on a single bootrom is also not really desirable. It is also much easier to verify the bootrom on the cluster level (e.g. with post-layout simulation) which is almost impossible on the top-level.
For me, the advantage is that the bootrom might actually work😉
This we can evaluate again for a more realistic 512-bit configuration. I think the reported 60GE are only the bootrom itself and does not account for the increase in Xbar size and the additional scratch registers. But my feeling is that the overhead will not be too significant, with tool optimization.
I don't think the code is that complex and I don't really see a big maintenance overhead. The bootrom in this repository should not really change, since it anyway needs to be adapted/overwritten for systems that integrate the snitch cluster.
The current way it is configured has room for improvement, yes.
This makes sense to me, and could be done. Maybe @paulsc96 and @thommythomaso can also give their two cents on it, since they have the most experience with malfunctioning bootroms and initially proposed this thesis. |
I think @fischeti summarized the advantages quite well. Adding this internal boot ROM turns the cluster into a much more autonomous IP, greatly simplifying integration and testing and avoiding fatal integration mistakes. It also speeds up initialization in multi-cluster systems dramatically (which may be critical for useful large-scale simulation) and saves integrators lots of valuable time in integration and system-level testing. Finally, it provides a standard interface for interacting with clusters, and the ability to freely repoint execution is a versatile escape hatch in case integration mistakes are made or other problems arise. We actually relied on such an escape hatch through an incidental TLB in an extremely critical chip very recently, so it is a wise choice to integrate such a mechanism straight into the cluster. EDIT: Regarding the area cost, While this may add some area (as most 1% realistically?) there are numerous inefficiencies in the cluster I would tackle to reduce area long before this would become a concern. |
I thought about this, but you're gonna run into the same slowdown right after you terminate the first few boot ROM instructions, so I don't think the speedup will be dramatic.
Well, this is only true if the boot ROM is actually reused, and the integrating systems don't just implement their own (in contrast with what @fischeti said). But in that case the same can be obtained even if we don't instantiate it within each cluster. Couldn't we just provide the boot ROM code and utilities, and test it by integrating it within the testbench, while still not instantiating it in every cluster? |
I changed my mind regarding this. As Paul said, I think it would make more sense to provide the snitch cluster as a standalone IP with a proper default bootrom that makes it easy to integrate into an actual system without the need to implement your custom one (which would still be possible if needed). So the default bootrom should already implement the trampoline functionality with the scratch register. I don't really see how integrating the generated bootrom into the testbench would improve the situation. The goal in the end is that we can verify the cluster boot procedure standalone, which is only possible if you have a block that combines the cluster and bootrom. Of course, you could write a wrapper around the snitch cluster that includes a bootrom, but this only shifts the complexity and makes integration unnecessarily complex. I can do a synthesis run to compare the increase in complexity in the Xbar, but some increase is justified in my opinion given that this increases the flexibility and safety of the boot process. |
Setup toolchain for generating bootrom. Created scratch registers. Begun setup for instantiating bootrom.
Bootrom fully instantiated. Fetches Bootrom instructions in simulation.
Setup toolchain for generating bootrom. Created scratch registers. Begun setup for instantiating bootrom.
Bootrom fully instantiated. Fetches Bootrom instructions in simulation.
…d set Performance Counters to track Retired Instructions during Boot Procedure.
`Bootrom` is not a legal index if `IntBootromEnable` is unset
Starting the performance counter takes very long for some reason, and DMA transfer starts before counter starts tracking
This reverts commit a63105a.
This reverts commit e3d68fe.
This is required to run it on IIS machines since verilator currently can only be run in the oseda environment
Adds native bootrom to the cluster instead of fetching from externally. Also adds additional scratch registers to the peripherals which can be used to write the entry point of the binary.
The current implemented bootrom is the following. The clusters enable interrupts of the cluster internal clint as well as the software interrupts. After an interrupt, the cores start fetching from the scratch1 register, which needs to be written with the entryaddress of the binary.
To make the jump to the scratch register independent of any configuration, we decided to fix the bootrom size to 4kB, which is placed after the TCDM and before the peripherals in the address map.
By default, the internal bootrom is now enabled, but it can also be disabled in the configuration with the
int_bootrom_enable
flag. The native bootrom can also be anbled with or without the alias feature. If bothAliasRegionEnable
andIntBootromEnable
are set, the cores will start fetching from theBootRomAliasStart
. Otherwise, the boot address needs to be provided withBootAddr
as before.To support writting the entry address of the binary to the scratch register, as well as to trigger an interrupt, I created a VIP module (inspired by Cheshire), which combines all task necessary to write to the cluster from outside.
TODO
rtl
target and prerequisites