Selected as one of IEEE Micro's Top Picks (2023)
- MICRO2023 paper
- Top Picks paper
- Up-to-date version of ReRoCC interface repo
- Top level FireSim (artifact): repo, zenodo
- Submodules (Zenodo): chipyard, accelerator HW, SW
AuRORA is a novel full-stack accelerator integration methodology that enables scalable multi-accelerator deployment for multi-tenant workloads. AuRORA supports virtualized accelerator orchestration through co-designing the hw-sw stack of accelerator allow adaptively binding the workloads into accelerators. AuRORA consists of ReRoCC (remote RoCC), a virtualized and disaggregated accelerator interface for many-accelerator integration, and a runtime system for adaptive accelerator management. Similar to virtual memory to physical memory abstraction, AuRORA provides an abstraction between user's view of accelerator and the physical accelerator instances. AuRORA's virtualized interface allows workloads to be flexibly and dynamically orchestrated to available accelerators based on their latency requirement, regardless of the physical accelerator instances' location. To effectively support virtualized accelerator orchestration, AuRORA delivers a full-stack solution that co-designs the HW and SW layers, with the goal of delivering scalable performance for multi-accelerator systems.
From bottom to top, AuRORA full-stack includes:
- Low-overhead shim microarchitecture to interface between cores and accelerators.
- Hardware messaging protocol between core and accelerators to enable scalable and virtualized accelerator deployment on SoC.
- ISA extension to allow user threads to interact with AuRORA hardware in a programmable fashion.
- Lightweight software runtime to dynamically reallocate resources for multi-tenant workloads.
Please refer to our paper for details.
AuRORA microarchitecture component consists of Client
and Manager
.
Client
integrates with the host general-purpose cores. It allows communication to and from disaggregated accelerators and provide illusion of tight-coupling.
Manager
wraps an existing accelerators. It includes PTW and L2 TLB which are compliant to accelerator MMU. It implements a shadow copy of architectural CSRs used by accelerator MMU.
AuRORA includes 5 ISAs, which are rerocc_acquire
and rerocc_release
to acquire and release the accelerator, rerocc_assign
to map acquired accelerator to available opcode, rerocc_fence
to fence memory between core memory and acquire accelerator if needed, and rerocc_memrate
for memory rate partitioning.
This file contains ISA sets used.
AuRORA supports both crossbar and NoC integration for protocol transport. This can be shared with on-chip memory interconnect, or can be configured as a separate interconnect. Please refer to the SoC Configs how we configured NoC and crossbar SoC.
AuRORA runtime is implemented in gemmini tests for convenience as we use Gemmini DNN accelerator generator for evaluation.
If AuRORA helps you in your research, you are encouraged to cite our paper. Here is an example bibtex:
@inproceedings{
aurora,
title={AuRORA: Virtualized Accelerator Orchestration for Multi-Tenant Workloads},
author={Seah Kim and Jerry Zhao and Krste Asanovic and Borivoje Nikolic and Yakun Sophia Shao},
booktitle={IEEE/ACM International Symposium on Microarchitecture (MICRO)},
year={2023}
}
@inproceedings{
aurora_top_picks,
title={AuRORA: A Full-Stack Solution for Scalable and Virtualized Accelerator Integration},
author={Seah Kim and Jerry Zhao and Krste Asanovic and Borivoje Nikolic and Yakun Sophia Shao},
journal={IEEE Micro},
year={2024},
volume={44}
}
To learn about using Chipyard, see the documentation on the Chipyard documentation site: https://chipyard.readthedocs.io/
To learn about using FireSim, you can find the documentation and getting-started guide at docs.fires.im.
To learn about using Gemmini, visit Gemmini repository.