Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zfh #13

Open
wants to merge 48 commits into
base: master
Choose a base branch
from
Open

Zfh #13

wants to merge 48 commits into from

Conversation

YunhaoLan
Copy link

Initial commit of half-precision FPU. This is a copy of stage3 branch.

cole-nelson and others added 30 commits October 8, 2020 20:33
Pulling SPARCE into master as basis for further development.
external halt signal. Tests all pass IF:
1. RESET_PC is changed to 0x8000_0000
2. In the sparce SASA Table, the constraint on the skipping
   (pc[31:18] == '0) is removed to allow PC 0x8000_0000+ to skip.
Merging Enes' priv unit updates. Works for all existing ASM tests (made to work with it).
Updated AHB Master with support for waited transfers
Implementation of a 3-stage pipelined multiplier and a radix-4 restoring divider. Passes all RV32M tests, and are synthesizable.

Co-authored-by: Jing Yin See <[email protected]>
Co-authored-by: Yuqing Fan
* adjustments to priv control to account for interrupt mie issues

* Added commentary on the PRIV_CONTROL fix

* Fixing prior commit -- added the wrong files. Reverting changes & adding correct files

* Fixed memory controller bug where interrupts caused FSM to lock up. Passes all RVB tests, and works in AFTx06 integration.

* Added halt on infinite loop as microarch paramter, fixed minor memory controller bug

* Fixed priv unit issue where mepc was captured during interrupt handler

Co-authored-by: Christopher Chiminski <[email protected]>
Initial implementation of RV32C.

Contains instruction decompression logic and fetch buffer to allow for free mixing of 16- and 32-bit instructions. Also includes self-tests for all currently-implemented RV32C instructions and updates to config_core.py to allow RV32C to be enabled/disabled.

Author: Jing Yin See
* Initialize RV32E and DEV branch

* update .gitignore

* add RV32E support

* Reverted changes to TESTNUM

Co-authored-by: Jiahao Xu (socet94) <[email protected]>
Co-authored-by: Cole Nelson <[email protected]>
* WFI detection for clock_manager

Co-authored-by: Project47 <Raghuraman Kannan>
Co-authored-by: Cole Nelson <[email protected]>
Initial support for Verilator in RISCVBusiness.

Adds Makefile supporting Verilator along wtih small fixes to ensure compilation. Adds an updated "run_tests_verilator" based on @ngildenhuys improved run_tests script.

Co-authored-by: Mitch <[email protected]>
Co-authored-by: Hadi Ahmed <[email protected]>
Occasionally, when RV32C is able to provide an instruction early
(i.e. buffered compressed instruction), execution could be skipped
due to some validity flags relying on waiting for I-fetch. This provides
a small fix to the hazard unit that detects the "early finish" condition
and allows the updates the same as if instruction fetch had completed.
Implementation of complete M-Mode Priv 1.12 spec.

Co-authored-by: Hadi Ahmed Project17 SOC <[email protected]>
Co-authored-by: Cole Nelson <[email protected]>
* Initial support for FuseSoC in RISCVBusiness

Co-authored-by: Mitch <[email protected]>
Co-authored-by: Hadi Ahmed <[email protected]>
RISC-V PMA implementation for Privileged Spec 1.12.
Implementation of Priv 1.12 PMP
Implementation of User Mode, Priv 1.12
For implementing pipelines with >2 stages, it is necessary for rd to be
supplied by a later stage instead of the current instruction. This changes
the control unit to output its own rd signal instead of feeding the register
file directly. For tspp, the control unit's rd is assigned directly to the
register file's rd, but for stage3 it will be passed to the next stage, and
the next stage's rd will be fed to the register file for performing writeback.
This adds a 3-stage pipeline, defined as follows:

 Fetch | Decode/Execute | Memory/Writeback

There is a forwarding path from the second latch into the decode/execute
stage. However, memory references cannot be forwarded due to critical
path issues, so load->use and csr->use hazards will incur 1 extra cycle
of stalling (beyond usual delay of load) for dependent instructions.

Some rough estimates of frequency (standalone) are:
- 70MHz max on FPGA
- 200MHz+ feasible in ASIC
This implies that 50MHz FPGA speed and 100MHz ASIC speed should be attainable,
with more room to push the latter depending on the rest of the system.

All the RV32I tests pass. Remaining TODOs are:
- Alter the TB to allow testing of forwarding. Currently, memory references take
so long that there are never back-to-back instructions in the pipeline. While
this is somewhat realistic (AFT will have these delays), the addition of a cache
(and prefetching effect of RV32C) will allow single-cycle instruction hits.
- Implement exceptions/interrupts. This is currently untested.
- Allow RV32C. Theoretically, this shouldn't cause any problems, since
the parts of the pipeline that touch the decompressor were not altered.
- RV32M. There needs to be a decision about whether RISC-MGMT should be
supported for this pipeline. Ideally it should be added, but this is also
a chance to remove the standard extensions from RISC-MGMT, and change it to
be only for custom instructions if that is desired.
This commit adds debug-only signals to the pipeline for full CPU tracker
support. Additionally allows the TB to dump the waveform on a CTL-C (SIGINT)
instead of leaving a corrupted waveform trace for better debugging of
infinite loops.
All existing synchronous exception tests pass (ecall, pma). Additionally,
2 more tests for illegal instructions and pma i-fetch faults were added
to test exceptions originating in all 3 stages, which also pass.
This changes the logic for stalling/flushing so that on receiving an
interrupt, the currently-executing memory instruction is allowed to finish
(if such an instruction exists), then the oldest PC in the pipeline is taken.

Since this is asynchronous, there is no guarantee the M-stage has a valid PC,
so priority logic is needed to select from the 3 PCs in the pipeline. This is
simpler than latching the next PC of the last valid instruction since it doesn't
require instruction-specific knowledge (e.g. control flow target, compressed, etc.).

Signals were added to the pipeline to track when an instruction is valid as well, since
insn == 0 cannot be assumed to be a pipeline bubble instead of a nop instruction.
Adds minor fixes for stalling logic to allow RV32C to work. Includes
fixes to forwarding logic that were encountered due to back-to-back
execution being possible with RV32C buffer.

All RV32I and RV32C tests pass with RV32C enabled and compiling with
compression, with the exception of RV32I fence.i, jal, jalr. These
are expected failures as they all implicitly assume instructions are
aligned to 4 instead of 2 or 4.
This adds a second version of RISCVBusiness for testing the core that does not
have a memory controller. This allows direct access to the buses for testing with
different latencies. Currently, it just does immediate latencies to mimic having
perfect caches.

Additional work would be to add the ability to simulate hits/misses by having
a random chance of getting bad latency, independently for I/D streams.
This fixes bugs related to hazards occurring only in the case of back-to-back
execution of instructions. The fixes were:
- Fixing forwarding unit assignments to allow detection of forwarding conditions correctly
- Forcing ifence to flush the pipeline and re-fetch in-flight instructions that may
no longer be valid
- Fix RV32C to obey the pipeline control signals. Previously, it ignored the "pc_en" signal,
which led to cases where instructions would be skipped if the first pipeline latch was stalled
while RV32C wanted to advance.

Remaining items to test/implement:
- Flushing for selected CSR writes. Need a list of such instructions, but at minimum this should
include the PMP/PMA configuration registers.
- Testing variable latencies, instead of only fixed slow/fast latencies
This commit adds RV32M to the stage3 pipeline. This does not use
RISC-MGMT, instead opting for a wrapper with enable/disable like
RV32C. In discussions with the team, this seems more manageable than
trying to fit more complex extensions (that include state) into RISC-MGMT.

In the future, RISC-MGMT should be integrated to allow custom instructions.

All tests for RV32IMC pass.
hadiahmed098 and others added 18 commits December 11, 2022 16:00
Fixes a bug where I-fetch after a PC redirect could read the wrong
instruction if the prior in-progress request became ready after the
PC changed.
Changes are:
- Suppress iren whenever PC is redirected
- Do not sample EPC from mem stage on interrupt (fixes repeated load/store
instruction to non-idempotent region, but still permits load/store faults)
- Expand memory controller ability to abort transactions when iren is suppressed
This adds an extra state to the APB requester module to permit correct
handling of back-to-back transactions. The new request state takes the
same latched signals as the data state, so spurious input changes
cannot break the request.
Fixes an issue where misaligned addresses can appear on APB,
causing completers to signal an error. Fix forces address alignment
by tying lower bits to '0', and relying on strobe for writes.
This commit fixes up the SystemVerilog self-test testbench,
and changes the simulated ram model to use binary files to match
the behavior of the Verilator testbench.
Only basic testing has been done.
Connect some unconnected signals (masked by Verilator 'x' handling)
and fix some parsing differences between Verilator and Xcelium/Modelsim.
The signal "valid_e" was unassigned, causing an unknown value in
simulation with Xcelium
L1 Cache integration

---------

Co-authored-by: Jimmy <[email protected]>
Co-authored-by: Cole Nelson <[email protected]>
This fixes a bug where high-latency operations being cancelled could
cause incorrect execution. The strategy is to hold the memory controller
in a state where all buses are "busy" until all outstanding bus requests
complete.
* Priv Unit: Fix handling of PMA/PMP faults

* generic_bus_if: Add "error" signal, propagate

* Bus Fault: Fix I-fault case

* L1: Make pass_through respect wen/ren, revert cache state when request
deaserted.
1. Label generate blocks
2. Fix inferred latch in priv unit (typo)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants