diff --git a/CHANGELOG.md b/CHANGELOG.md
index 872896d94..f3fb1ce38 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -29,6 +29,7 @@ mimpid = 0x01040312 -> Version 01.04.03.12 -> v1.4.3.12
| Date | Version | Comment | Ticket |
|:----:|:-------:|:--------|:------:|
+| 03.02.2025 | 1.11.0.8 | :sparkles: add explicit memory ordering/coherence support; :warning: remove WDT "halt-on-debug" and "halt-on-sleep" options; :bug: rework cache module fixing several (minor?) design flaws | [#1176](https://github.com/stnolting/neorv32/pull/1176) |
| 03.02.2025 | 1.11.0.7 | :bug: add missing CFS clock gen enable signal | [#1177](https://github.com/stnolting/neorv32/pull/1177) |
| 01.02.2025 | 1.11.0.6 | :warning: remove XIP module | [#1175](https://github.com/stnolting/neorv32/pull/1175) |
| 01.02.2025 | 1.11.0.5 | minor rtl optimizations and cleanups; :warning: remove DMA "fence" feature | [#1174](https://github.com/stnolting/neorv32/pull/1174) |
diff --git a/docs/datasheet/cpu.adoc b/docs/datasheet/cpu.adoc
index 3fe4529f5..213afc025 100644
--- a/docs/datasheet/cpu.adoc
+++ b/docs/datasheet/cpu.adoc
@@ -1,3 +1,4 @@
+<<<
:sectnums:
== NEORV32 Central Processing Unit (CPU)
@@ -66,7 +67,7 @@ direction as seen from the CPU.
[options="header", grid="rows"]
|=======================
| Signal | Width/Type | Dir | Description
-4+^| **Global Signals**
+4+^| **Clock and reset**
| `clk_i` | 1 | in | Global clock line, all registers triggering on rising edge.
| `rstn_i` | 1 | in | Global reset, low-active.
4+^| **Interrupts (<<_traps_exceptions_and_interrupts>>)**
@@ -75,20 +76,17 @@ direction as seen from the CPU.
| `mti_i` | 1 | in | RISC-V machine timer interrupt.
| `firq_i` | 16 | in | Custom fast interrupt request signals.
| `dbi_i` | 1 | in | Request CPU to halt and enter debug mode (RISC-V <<_on_chip_debugger_ocd>>).
+4+^| **<<_inter_core_communication_icc>> links**
+| `icc_tx_o` | `icc_t` | out | TX link
+| `icc_rx_i` | `icc_t` | in | RX link
4+^| **Instruction <<_bus_interface>>**
| `ibus_req_o` | `bus_req_t` | out | Instruction fetch bus request.
| `ibus_rsp_i` | `bus_rsp_t` | in | Instruction fetch bus response.
4+^| **Data <<_bus_interface>>**
| `dbus_req_o` | `bus_req_t` | out | Data access (load/store) bus request.
| `dbus_rsp_i` | `bus_rsp_t` | in | Data access (load/store) bus response.
-4+^| **<<_inter_core_communication_icc>> TX links**
-| `icc_tx_rdy_o` | 2 | out | Data available for cores `0..1`.
-| `icc_tx_ack_i` | 2 | in | Read-enable from cores `0..1`.
-| `icc_tx_dat_o` | 2*32 | out | Data for cores `0..1`.
-4+^| **<<_inter_core_communication_icc>> RX links**
-| `icc_rx_rdy_i` | 2 | in | Data available from cores `0..1`.
-| `icc_rx_ack_o` | 2 | out | Read-enable for cores `0..1`.
-| `icc_rx_dat_i` | 2*32 | in | Data from cores `0..1`.
+4+^| **<<_memory_coherence>> status**
+| `mem_sync_i` | 1 | in | Requested coherence established when set (single-shot)
|=======================
.Bus Interface Protocol
@@ -424,12 +422,11 @@ always valid when set.
| `rw` | 1 | Access direction (`0` = read, `1` = write)
| `src` | 1 | Access source (`0` = instruction fetch, `1` = load/store)
| `priv` | 1 | Set if privileged (M-mode) access
+| `debug` | 1 | Set if debug mode access
| `amo` | 1 | Set if current access is an atomic memory operation (<<_atomic_memory_access>>)
| `amoop` | 4 | Type of atomic memory operation (<<_atomic_memory_access>>)
3+^| **Out-Of-Band Signals**
-| `fence` | 1 | Data/instruction fence request; single-shot
-| `sleep` | 1 | Set if ALL upstream devices are in <<_sleep_mode>>
-| `debug` | 1 | Set if the upstream device is in debug-mode
+| `fence` | 1 | Data (load/store; `fence`) or instruction (instruction-fetch; `fence.i`) fence request; single-shot; see <<_memory_coherence>>
|=======================
.Bus Interface - Response Bus (`bus_rsp_t`)
@@ -463,7 +460,7 @@ The figure below shows three exemplary bus accesses:
. A write access to address `B_addr` writing `wdata` (fastest response; `ACK` arrives right in the next cycle).
. A failing read access to address `C_addr` (slow response; `ERR` arrives after several cycles).
-.Three Exemplary Bus Transactions (showing only in-band signals)
+.Three Exemplary Bus Transactions (showing only in-band signals; privileged non-debug non-atomic accesses)
image::bus_interface.png[700]
.Adding Register Stages
@@ -501,8 +498,8 @@ operation:
.Cache Coherency
[IMPORTANT]
-Atomic operations **always bypass** the CPU caches using direct/uncached accesses. Care must be taken
-to maintain data <<_cache_coherency>>.
+Atomic operations **always bypass** the (CPU) caches using direct/uncached accesses. Care must be taken
+to maintain data synchronization. See section <<_memory_coherence>> for more information.
<<<
@@ -632,7 +629,7 @@ The `I` ISA extensions is the base RISC-V integer ISA that is always enabled.
| Jump/call | `jal[r]` | 6
| Load/store | `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw` | 5
| System | `ecall` `ebreak` | 3
-| Data fence | `fence` | 5
+| Data fence | `fence` | depends on the memory system
| System | `wfi` | 3
| System | `mret` | 5
| Illegal inst. | - | 3
@@ -641,10 +638,10 @@ The `I` ISA extensions is the base RISC-V integer ISA that is always enabled.
.`fence` Instruction
[NOTE]
Analogous to the `fence.i` instruction (<<_zifencei_isa_extension>>) the `fence` instruction triggers
-a data cache synchronization operation. See section <<_cache_coherency>> for more information.
-Furthermore, the `fence` instruction word's _predecessor_ and _successor_ bits (used for memory ordering)
-are not evaluated by the hardware at all.
-
+a load/store memory synchronization operation. The CPU will stall until the requested coherence is
+established (`mem_sync_i` goes high). See section <<_memory_coherence>> for more information.
+NEORV32 ignores the predecessor and successor fields and always executes a conservative fence on all
+operations.
.`wfi` Instruction
[NOTE]
@@ -716,16 +713,16 @@ The instruction word's `aq` and `lr` memory ordering bits are not evaluated by t
==== `Zifencei` ISA Extension
The `Zifencei` CPU extension allows manual synchronization of the instruction stream. This extension is always enabled.
-
-Analogous to the `fence` instruction the `fence.i` instruction triggers an instruction cache synchronization operation.
-See section <<_cache_coherency>> for more information.
+This instruction is the only standard mechanism to ensure that stores visible to a hart will also be visible to its
+instruction fetches. The CPU will stall until the requested coherence is established (`mem_sync_i` goes high).
+See section <<_memory_coherence>> for more information.
.Instructions and Timing
[cols="<2,<4,<3"]
[options="header", grid="rows"]
|=======================
| Class | Instructions | Execution cycles
-| Instruction fence | `fence.i` | 5
+| Instruction fence | `fence.i` | depends on the memory system
|=======================
diff --git a/docs/datasheet/on_chip_debugger.adoc b/docs/datasheet/on_chip_debugger.adoc
index bd2def5e1..05aec8d7c 100644
--- a/docs/datasheet/on_chip_debugger.adoc
+++ b/docs/datasheet/on_chip_debugger.adoc
@@ -667,7 +667,7 @@ Debug-mode is entered on any of the following events:
. A hardware trigger from the <<_trigger_module>> fires (`exe` and `action` in <<_tdata1>> / `mcontrol` are set).
[NOTE]
-From a hardware point of view these debug-mode-entry conditions are special traps (synchronous exceptions or
+From a hardware point of view these debug-mode-entry conditions are normal traps (synchronous exceptions or
asynchronous interrupts) that are handled transparently by the control logic.
**Whenever the CPU enters debug-mode it performs the following operations:**
@@ -684,6 +684,8 @@ asynchronous interrupts) that are handled transparently by the control logic.
**When the CPU is in debug-mode:**
* while in debug mode, the CPU executes the parking loop and - if requested by the DM - the program buffer
+* all **caches are bypassed** when in debug-mode; hence, a <<_memory_coherence>> has to be re-established when entering debug-mode
+and when leaving debug-mode
* effective CPU privilege level is `machine` mode; any active physical memory protection (PMP) configuration is bypassed
* the `wfi` instruction acts as a `nop` (also during single-stepping)
* if an exception occurs while being in debug mode:
diff --git a/docs/datasheet/overview.adoc b/docs/datasheet/overview.adoc
index ace607d80..b292055bf 100644
--- a/docs/datasheet/overview.adoc
+++ b/docs/datasheet/overview.adoc
@@ -1,3 +1,4 @@
+<<<
:sectnums:
== Overview
diff --git a/docs/datasheet/rationale.adoc b/docs/datasheet/rationale.adoc
index d98dc790b..2560c1adb 100644
--- a/docs/datasheet/rationale.adoc
+++ b/docs/datasheet/rationale.adoc
@@ -1,3 +1,4 @@
+<<<
:sectnums:
=== Rationale
diff --git a/docs/datasheet/soc.adoc b/docs/datasheet/soc.adoc
index 30808c046..dbe2a0e6a 100644
--- a/docs/datasheet/soc.adoc
+++ b/docs/datasheet/soc.adoc
@@ -1,5 +1,4 @@
-
-// ####################################################################################################################
+<<<
:sectnums:
== NEORV32 Processor (SoC)
@@ -595,7 +594,7 @@ content of the addresses memory cell) is sent back to the requesting CPU.
.Direct Access
[IMPORTANT]
Atomic operations **always bypass** the CPU's <<_processor_internal_data_cache_dcache, data cache>>
-using direct/uncached accesses. Care must be taken to maintain data <<_cache_coherency>>.
+using direct/uncached accesses. Care must be taken to maintain data <<_memory_coherence>>.
.Physical Memory Attributes
[NOTE]
@@ -610,43 +609,50 @@ cannot be interrupted. Hence, they execute in an atomic way.
:sectnums:
-==== Cache Coherency
+==== Memory Coherence
-In total the NEORV32 Processor provides up to three optional caches organized in two levels. Level-1
-caches are closer to the CPU while level-2 caches are closer to main memory (however, this highly depends
-on the the actual cache configurations).
+Depending on the configuration, the NEORV32 processor provides several _layer_ of memory consisting
+of caches, buffers and storage.
+* The CPU instruction prefetch buffer ("level-0")
* The <<_processor_internal_data_cache_dcache>> (level-1)
* The <<_processor_internal_instruction_cache_icache>> (level-1)
* The cache of the <<_processor_external_bus_interface_xbus>> (level-2)
+* Internal and external memories
-As all caches operate transparently for the software, special attention must therefore be paid to coherence.
-Note that coherence and cache _synchronization_ is **not** performed by the hardware itself (there is no
-snooping implemented).
+All caches and buffers operate transparently for the software. Hence, special attention must therefore be
+paid to maintain coherence. Note that coherence and cache _synchronization_ is **not** automatically performed
+by the hardware itself as there is no snooping implemented.
-The NEORV32 uses two instructions for manual cache synchronization (both instructions are always available
-regardless of the actual CPU/ISA configuration):
+NEORV32 uses two instructions for manual memory synchronization which are always available
+regardless of the actual CPU/ISA configuration:
* `fence` (<<_i_isa_extension>> / <<_e_isa_extension>>)
* `fence.i` (<<_zifencei_isa_extension>>)
-By executing the "data" `fence` instruction the CPU's data cache is synchronized in four steps:
+By executing the "data" `fence` instruction the CPU's load/store operations are ordered
+and synchronized across the entire system:
[start=1]
-. The CPU data cache is flushed: all local modifications are copied to the next higher memory level;
-this can be the XBUS cache or main memory.
-. The CPU data cache is cleared invalidating all local entries.
-. The synchronization request is sent to the next-higher memory level (for example to the XBUS cache
-so it can perform the same synchronization steps).
-. The CPU data cache is reloaded with up-to-date data from the next higher memory level.
+. The CPU data cache (if enabled) is flushed and invalidated: all local modifications are copied to
+the next higher memory level (for example the internal DMEM or the XBUS-cache).
+. The CPU data cache is cleared invalidating so the next load/store access will cause a cache miss
+that will fetch up-to-date data from the memory system.
+. The synchronization request is forwarded to the next-higher memory level. If the XBUS cache is implemented
+it will also be flushed and invalidated.
-By executing the "instruction" `fence.i` instruction the CPU's instruction cache is synchronized in three steps:
+By executing the "instruction" `fence.i` instruction the CPU's instruction-fetch cache is are ordered
+and synchronized across the entire system:
[start=1]
-. The synchronization request is sent to the next-higher memory level (for example to the XBUS cache
-so it can perform the same synchronization steps).
-. The CPU instruction cache is cleared invalidating all local entries.
-. The CPU instruction cache is reloaded with up-to-date data from the next higher memory level.
+. Perform all the steps that are performed by the `fence` instruction.
+. The CPU instruction cache is cleared invalidating all local entries so the next instruction fetch access
+will cause a cache miss that will fetch up-to-date data from the memory system.
+
+.CPU Stall While Synchronizing
+[IMPORTANT]
+Executing any fence instruction will stall the CPU until all the requested ordering/synchronization
+steps are completed.
<<<
diff --git a/docs/datasheet/soc_dcache.adoc b/docs/datasheet/soc_dcache.adoc
index 8d76c92bc..163fec950 100644
--- a/docs/datasheet/soc_dcache.adoc
+++ b/docs/datasheet/soc_dcache.adoc
@@ -1,4 +1,5 @@
<<<
+<<<
:sectnums:
==== Processor-Internal Data Cache (dCACHE)
@@ -6,11 +7,11 @@
[grid="none"]
|=======================
| Hardware source files: | neorv32_cache.vhd | Generic cache module
-| Software driver files: | none | _implicitly used_
+| Software driver files: | none |
| Top entity ports: | none |
| Configuration generics: | `DCACHE_EN` | implement processor-internal data cache when `true`
-| | `DCACHE_NUM_BLOCKS` | number of cache blocks (pages/lines)
-| | `DCACHE_BLOCK_SIZE` | size of a cache block in bytes
+| | `DCACHE_NUM_BLOCKS` | number of cache blocks (pages or lines); has to be a power of two
+| | `DCACHE_BLOCK_SIZE` | size of a cache block in bytes; has to be a power of two
| CPU interrupts: | none |
|=======================
@@ -21,24 +22,17 @@ The processor features an optional data cache to improve performance when using
access latency. The cache is connected directly to the CPU's data access interface and provides
full-transparent accesses. The cache is direct-mapped and uses "write-allocate" and "write-back" strategies.
-.Cached/Uncached Accesses
+.Uncached Accesses
[NOTE]
The data cache provides direct accesses (= uncached) to memory in order to access memory-mapped IO (like the
processor-internal IO/peripheral modules). All accesses that target the address range from `0xF0000000` to `0xFFFFFFFF`
-will not be cached at all (see section <<_address_space>>). Direct/uncached accesses have **lower** priority than
-cache block operations to allow continuous burst transfer and also to maintain logical instruction forward
-progress / data coherency. Furthermore, the atomic memory operations of the <<_zaamo_isa_extension>> will
-always **bypass** the cache.
-
-.Caching Internal Memories
-[NOTE]
-The data cache is intended to accelerate data access to **processor-external** memories.
-The CPU cache(s) should not be implemented when using only processor-internal data and instruction memories.
+will not be cached at all (see section <<_address_space>>). Furthermore, the atomic memory operations
+of the <<_zaamo_isa_extension>> will always **bypass** the cache.
-.Manual Cache Flush/Clear/Reload
+.Manual Cache Flush/Clear/Reload and Memory Coherence
[NOTE]
By executing the `fence` instruction the data cache is flushed, cleared and reloaded.
-See section <<_cache_coherency>> for more information.
+See section <<_memory_coherence>> for more information.
.Retrieve Cache Configuration from Software
[TIP]
@@ -46,8 +40,6 @@ Software can retrieve the cache configuration/layout from the <<_sysinfo_cache_c
.Bus Access Fault Handling
[NOTE]
-The cache always loads a complete cache block (aligned to the block size) every time a
-cache miss is detected. Each cached word from this block provides a single status bit that indicates if the
-according bus access was successful or caused a bus error. Hence, the whole cache block remains valid even
-if certain addresses inside caused a bus error. If the CPU accesses any of the faulty cache words, a
-data bus error exception is raised.
+If the cache encounters a bus error when uploading a modified block to the next memory level or when
+downloading a new block from the next memory level, the entire block is invalidated and a bus access
+error exception is raised.
diff --git a/docs/datasheet/soc_icache.adoc b/docs/datasheet/soc_icache.adoc
index 8f77eb8e3..765d10e01 100644
--- a/docs/datasheet/soc_icache.adoc
+++ b/docs/datasheet/soc_icache.adoc
@@ -1,4 +1,5 @@
<<<
+<<<
:sectnums:
==== Processor-Internal Instruction Cache (iCACHE)
@@ -6,11 +7,11 @@
[grid="none"]
|=======================
| Hardware source files: | neorv32_cache.vhd | Generic cache module
-| Software driver files: | none | _implicitly used_
+| Software driver files: | none |
| Top entity ports: | none |
| Configuration generics: | `ICACHE_EN` | implement processor-internal instruction cache when `true`
-| | `ICACHE_NUM_BLOCKS` | number of cache blocks (pages/lines)
-| | `ICACHE_BLOCK_SIZE` | size of a cache block in bytes
+| | `ICACHE_NUM_BLOCKS` | number of cache blocks (pages or lines); has to be a power of two
+| | `ICACHE_BLOCK_SIZE` | size of a cache block in bytes; has to be a power of two
| CPU interrupts: | none |
|=======================
@@ -21,24 +22,17 @@ The processor features an optional instruction cache to improve performance when
access latency. The cache is connected directly to the CPU's instruction fetch interface and provides
full-transparent accesses. The cache is direct-mapped and read-only.
-.Cached/Uncached Accesses
+.Uncached Accesses
[NOTE]
The data cache provides direct accesses (= uncached) to memory in order to access memory-mapped IO (like the
processor-internal IO/peripheral modules). All accesses that target the address range from `0xF0000000` to `0xFFFFFFFF`
-will not be cached at all (see section <<_address_space>>). Direct/uncached accesses have **lower** priority than
-cache block operations to allow continuous burst transfer and also to maintain logical instruction forward
-progress / data coherency. Furthermore, the atomic memory operations of the <<_zaamo_isa_extension>> will
-always **bypass** the cache.
-
-.Caching Internal Memories
-[NOTE]
-The data cache is intended to accelerate data access to **processor-external** memories.
-The CPU cache(s) should not be implemented when using only processor-internal data and instruction memories.
+will not be cached at all (see section <<_address_space>>). Furthermore, the atomic memory operations
+of the <<_zaamo_isa_extension>> will always **bypass** the cache.
-.Manual Cache Clear/Reload
+.Manual Cache Flush/Clear/Reload and Memory Coherence
[NOTE]
By executing the `fence.i` instruction the instruction cache is cleared and reloaded.
-See section <<_cache_coherency>> for more information.
+See section <<_memory_coherence>> for more information.
.Retrieve Cache Configuration from Software
[TIP]
@@ -46,8 +40,6 @@ Software can retrieve the cache configuration/layout from the <<_sysinfo_cache_c
.Bus Access Fault Handling
[NOTE]
-The cache always loads a complete cache block (aligned to the block size) every time a
-cache miss is detected. Each cached word from this block provides a single status bit that indicates if the
-according bus access was successful or caused a bus error. Hence, the whole cache block remains valid even
-if certain addresses inside caused a bus error. If the CPU accesses any of the faulty cache words, an
-instruction bus error exception is raised.
+If the cache encounters a bus error when uploading a modified block to the next memory level or when
+downloading a new block from the next memory level, the entire block is invalidated and a bus access
+error exception is raised.
diff --git a/docs/datasheet/soc_wdt.adoc b/docs/datasheet/soc_wdt.adoc
index 5337c16af..009c4ba60 100644
--- a/docs/datasheet/soc_wdt.adoc
+++ b/docs/datasheet/soc_wdt.adoc
@@ -33,17 +33,9 @@ hardware reset is triggered.
The watchdog's timeout counter is reset ("feeding the watchdog") by writing the reset **PASSWORD** to the `RESET` register.
The password is hardwired to hexadecimal `0x709D1AB3`.
-.Watchdog Operation during Debugging
[IMPORTANT]
-By default, the watchdog stops operation when the CPU enters debug mode and will resume normal operation after
-the CPU has left debug mode again. This will prevent an unintended watchdog timeout during a debug session. However,
-the watchdog can also be configured to keep operating even when the CPU is in debug mode by setting the control
-register's `WDT_CTRL_DBEN` bit.
-
-.Watchdog Operation during CPU Sleep
-[IMPORTANT]
-By default, the watchdog stops operating when the CPU enters sleep mode. However, the watchdog can also be configured
-to keep operating even when the CPU is in sleep mode by setting the control register's `WDT_CTRL_SEN` bit.
+Once enabled, the watchdog keeps operating even if the CPU is in <<_sleep_mode>> or if the processor is being
+debugged via the <<_on_chip_debugger_ocd>>.
**Configuration Lock**
@@ -91,12 +83,10 @@ processor's main reset signal is active (even if the watchdog is deactivated or
[options="header",grid="all"]
|=======================
| Address | Name [C] | Bit(s), Name [C] | R/W | Reset value | Writable if locked | Function
-.8+<| `0xfffb0000` .8+<| `CTRL` <|`0` `WDT_CTRL_EN` ^| r/w ^| `0` ^| no <| watchdog enable
+.6+<| `0xfffb0000` .6+<| `CTRL` <|`0` `WDT_CTRL_EN` ^| r/w ^| `0` ^| no <| watchdog enable
<|`1` `WDT_CTRL_LOCK` ^| r/w ^| `0` ^| no <| lock configuration when set, clears only on system reset, can only be set if enable bit is set already
- <|`2` `WDT_CTRL_DBEN` ^| r/w ^| `0` ^| no <| set to allow WDT to continue operation even when CPU is in debug mode
- <|`3` `WDT_CTRL_SEN` ^| r/w ^| `0` ^| no <| set to allow WDT to continue operation even when CPU is in sleep mode
- <|`4` `WDT_CTRL_STRICT` ^| r/w ^| `0` ^| no <| set to enable strict mode (force hardware reset if reset password is incorrect or if write access to locked CTRL register)
- <|`6:5` `WDT_CTRL_RCAUSE_HI : WDT_CTRL_RCAUSE_LO` ^| r/- ^| `0` ^| - <| cause of last system reset; 0=external reset, 1=ocd-reset, 2=watchdog reset
+ <|`2` `WDT_CTRL_STRICT` ^| r/w ^| `0` ^| no <| set to enable strict mode (force hardware reset if reset password is incorrect or if write access to locked CTRL register)
+ <|`4:3` `WDT_CTRL_RCAUSE_HI : WDT_CTRL_RCAUSE_LO` ^| r/- ^| `0` ^| - <| cause of last system reset; 0=external reset, 1=ocd-reset, 2=watchdog reset
<|`7` - ^| r/- ^| - ^| - <| _reserved_, reads as zero
<|`31:8` `WDT_CTRL_TIMEOUT_MSB : WDT_CTRL_TIMEOUT_LSB` ^| r/w ^| 0 ^| no <| 24-bit watchdog timeout value
| `0xfffb0004` | `RESET` |`31:0` | -/w | - | yes | Write _PASSWORD_ to reset WDT timeout counter
diff --git a/docs/datasheet/soc_xbus.adoc b/docs/datasheet/soc_xbus.adoc
index 52ce6ee9d..05d1457dd 100644
--- a/docs/datasheet/soc_xbus.adoc
+++ b/docs/datasheet/soc_xbus.adoc
@@ -7,30 +7,30 @@
|=======================
| Hardware source files: | neorv32_xbus.vhd | External bus gateway
| | neorv32_cache.vhd | Generic cache module
-| Software driver files: | none | _implicitly used_
+| Software driver files: | none |
| Top entity ports: | `xbus_adr_o` | address output (32-bit)
+| | `xbus_dat_i` | data input (32-bit)
| | `xbus_dat_o` | data output (32-bit)
| | `xbus_tag_o` | access tag (3-bit)
| | `xbus_we_o` | write enable (1-bit)
| | `xbus_sel_o` | byte enable (4-bit)
| | `xbus_stb_o` | bus strobe (1-bit)
| | `xbus_cyc_o` | valid cycle (1-bit)
-| | `xbus_dat_i` | data input (32-bit)
| | `xbus_ack_i` | acknowledge (1-bit)
| | `xbus_err_i` | bus error (1-bit)
| Configuration generics: | `XBUS_EN` | enable external bus interface when `true`
| | `XBUS_TIMEOUT` | number of clock cycles after which an unacknowledged external bus access will auto-terminate (0 = disabled)
| | `XBUS_REGSTAGE_EN` | implement XBUS register stages
-| | `XBUS_CACHE_EN` | implement the external bus cache
-| | `XBUS_CACHE_NUM_BLOCKS` | number of blocks ("lines"), has to be a power of two.
-| | `XBUS_CACHE_BLOCK_SIZE` | size in bytes of each block, has to be a power of two.
+| | `XBUS_CACHE_EN` | implement the external bus cache when `true`
+| | `XBUS_CACHE_NUM_BLOCKS` | number of cache blocks (pages or lines); has to be a power of two
+| | `XBUS_CACHE_BLOCK_SIZE` | size of a cache block in bytes; has to be a power of two
| CPU interrupts: | none |
|=======================
**Overview**
-The external bus interface provides a **Wishbone b4**-compatible on-chip bus interface that is
+The external bus interface provides a **Wishbone b4**-compatible on-chip bus interface that gets
implemented if the `XBUS_EN` generic is `true`. This bus interface can be used to attach processor-external
modules like memories, custom hardware accelerators or additional peripheral devices.
An optional cache module ("XCACHE") can be enabled to improve memory access latency.
@@ -76,12 +76,8 @@ device's / bus system's `cyc` and `stb` signals (omitting the processor's `xbus_
.Atomic Memory Accesses
[NOTE]
-<<_Atomic_Memory_Access>> keep the `cyc` signal active to perform a back-to-back bus access consisting of
-two `stb` strobes (one for the load/read operation and another one for the store/write operation).
-
-.Endianness
-[NOTE]
-Just like the processor itself the XBUS interface uses **little-endian** byte order.
+<<_atomic_memory_access>> operations keep the `cyc` signal active to perform a back-to-back bus access
+consisting of two `stb` strobes (one for the load/read operation and another one for the store/write operation).
.Wishbone Specs.
[TIP]
@@ -123,36 +119,28 @@ It compatible to the the AXI4 `ARPROT` and `AWPROT` signals.
The XBUS interface provides an optional internal cache that can be used to buffer processor-external accesses.
The x-cache is enabled via the `XBUS_CACHE_EN` generic. The total size of the cache is split into the number of
cache lines or cache blocks (`XBUS_CACHE_NUM_BLOCKS` generic) and the line or block size in bytes
-(`XBUS_CACHE_BLOCK_SIZE` generic).
-
-.Simplified X-Cache Architecture
-[source,asciiart]
----------------------------------------
- Direct Access +----------+
- /|------------------------->| Register |------------------------>|\
- | | +----------+ | |
-Core --->| | | |---> XBUS
- | | +--------------+ +--------------+ +-------------+ | |
- \|--->| Host Arbiter |--->| Cache Memory |<---| Bus Arbiter |--->|/
- +--------------+ +--------------+ +-------------+
----------------------------------------
-
-The cache uses a direct-mapped architecture that implements "write-allocate" and "write-back" strategies.
-The **write-allocate** strategy will fetch the entire referenced block from main memory when encountering
-a cache write-miss. The **write-back** strategy will gather all writes locally inside the cache until the according
-cache block is about to be replaced. In this case, the entire modified cache block is written back to main memory.
-
-.Manual Cache Flush/Clear/Reload
+(`XBUS_CACHE_BLOCK_SIZE` generic). The cache uses a direct-mapped architecture that implements "write-allocate"
+and "write-back" strategies.
+
+.Uncached Accesses
+[NOTE]
+The data cache provides direct accesses (= uncached) to memory in order to access memory-mapped IO.
+All accesses that target the address range from `0xF0000000` to `0xFFFFFFFF`
+will not be cached at all (see section <<_address_space>>). Furthermore, the atomic memory operations
+of the <<_zaamo_isa_extension>> will always **bypass** the cache.
+
+.Manual Cache Flush/Clear/Reload and Memory Coherence
[NOTE]
By executing a `fence` **or** `fence.i` instruction the XBUS cache is flushed (local modifications are send back to
main memory), cleared (all cache entries are invalidated) and a reloaded (fetching new data from main memory).
-See section <<_cache_coherency>> for more information.
+See section <<_memory_coherence>> for more information.
+
+.Retrieve Cache Configuration from Software
+[TIP]
+Software can retrieve the cache configuration/layout from the <<_sysinfo_cache_configuration>> register.
-.Cached/Uncached Accesses
+.Bus Access Fault Handling
[NOTE]
-The data cache provides direct accesses (= uncached) to memory in order to access memory-mapped IO.
-All accesses that target the address range from `0xF0000000` to `0xFFFFFFFF`
-will not be cached at all (see section <<_address_space>>). Direct/uncached accesses have **lower** priority than
-cache block operations to allow continuous burst transfer and also to maintain logical instruction forward
-progress / data coherency. Furthermore, the atomic memory operations of the <<_zaamo_isa_extension>> will
-always **bypass** the cache.
+If the cache encounters a bus error when uploading a modified block to the next memory level or when
+downloading a new block from the next memory level, the entire block is invalidated and a bus access
+error exception is raised.
diff --git a/docs/datasheet/software.adoc b/docs/datasheet/software.adoc
index 47c4e9485..cd4af117b 100644
--- a/docs/datasheet/software.adoc
+++ b/docs/datasheet/software.adoc
@@ -1,3 +1,4 @@
+<<<
:sectnums:
== Software Framework
diff --git a/docs/datasheet/software_bootloader.adoc b/docs/datasheet/software_bootloader.adoc
index b20ec6708..db65d39db 100644
--- a/docs/datasheet/software_bootloader.adoc
+++ b/docs/datasheet/software_bootloader.adoc
@@ -1,3 +1,4 @@
+<<<
:sectnums:
=== Bootloader
diff --git a/docs/datasheet/software_rte.adoc b/docs/datasheet/software_rte.adoc
index 6b32fa6ee..d379e98ab 100644
--- a/docs/datasheet/software_rte.adoc
+++ b/docs/datasheet/software_rte.adoc
@@ -1,3 +1,4 @@
+<<<
:sectnums:
=== NEORV32 Runtime Environment
diff --git a/docs/figures/bus_interface.png b/docs/figures/bus_interface.png
index 13a8b03bd..131ee5f01 100644
Binary files a/docs/figures/bus_interface.png and b/docs/figures/bus_interface.png differ
diff --git a/docs/sources/bus_interface.json b/docs/sources/bus_interface.json
index a1ca92605..3fcc0244a 100644
--- a/docs/sources/bus_interface.json
+++ b/docs/sources/bus_interface.json
@@ -6,9 +6,12 @@
{name: 'data', wave: 'x..|..4.x..|..', data: ['wdata']},
{name: 'ben', wave: 'x..|..4.x..|..', data: ['ben']},
{name: 'stb', wave: '010|..10.10|..', node: '.a....d..f....'},
- {name: 'rw', wave: '0..|..1..0.|..', node: '..............'},
- {name: 'src', wave: 'x0.|.x0.x..|..'},
- {name: 'priv', wave: 'x0.|.x0.x..|..'},
+ {name: 'rw', wave: 'x0.|.x1.x0.|..', node: '..............'},
+ {name: 'src', wave: 'x0.|.x0.x0.|.x'},
+ {name: 'priv', wave: 'x1.|.x1.x1.|.x'},
+ {name: 'debug', wave: 'x0.|.x0.x0.|.x'},
+ {name: 'amo', wave: 'x0.|.x0.x0.|.x'},
+ {name: 'amoop', wave: 'x0.|.x0.x0.|.x'},
],
{},
[
diff --git a/rtl/core/neorv32_bus.vhd b/rtl/core/neorv32_bus.vhd
index 29eb7cc5f..a9f54b28f 100644
--- a/rtl/core/neorv32_bus.vhd
+++ b/rtl/core/neorv32_bus.vhd
@@ -21,15 +21,14 @@ entity neorv32_bus_switch is
PORT_B_READ_ONLY : boolean := false -- set if port B is read-only
);
port (
- clk_i : in std_ulogic; -- global clock, rising edge
- rstn_i : in std_ulogic; -- global reset, low-active, async
- a_lock_i : in std_ulogic; -- exclusive access for port A while set
- a_req_i : in bus_req_t; -- host port A request bus
- a_rsp_o : out bus_rsp_t; -- host port A response bus
- b_req_i : in bus_req_t; -- host port B request bus
- b_rsp_o : out bus_rsp_t; -- host port B response bus
- x_req_o : out bus_req_t; -- device port request bus
- x_rsp_i : in bus_rsp_t -- device port response bus
+ clk_i : in std_ulogic; -- global clock, rising edge
+ rstn_i : in std_ulogic; -- global reset, low-active, async
+ a_req_i : in bus_req_t; -- host port A request bus
+ a_rsp_o : out bus_rsp_t; -- host port A response bus
+ b_req_i : in bus_req_t; -- host port B request bus
+ b_rsp_o : out bus_rsp_t; -- host port B response bus
+ x_req_o : out bus_req_t; -- device port request bus
+ x_rsp_i : in bus_rsp_t -- device port response bus
);
end neorv32_bus_switch;
@@ -71,7 +70,7 @@ begin
-- -------------------------------------------------------------------------------------------
arbiter_prioritized:
if not ROUND_ROBIN_EN generate
- arbiter_fsm: process(state, a_req, b_req, a_lock_i, a_req_i, b_req_i, x_rsp_i)
+ arbiter_fsm: process(state, a_req, b_req, a_req_i, b_req_i, x_rsp_i)
begin
-- defaults --
state_nxt <= state;
@@ -101,7 +100,7 @@ begin
sel <= '0';
stb <= '1';
state_nxt <= S_BUSY_A;
- elsif ((b_req_i.stb = '1') or (b_req = '1')) and (a_lock_i = '0') then -- request from port B?
+ elsif (b_req_i.stb = '1') or (b_req = '1') then -- request from port B?
sel <= '1';
stb <= '1';
state_nxt <= S_BUSY_B;
@@ -175,11 +174,10 @@ begin
x_req_o.amo <= a_req_i.amo when (sel = '0') else b_req_i.amo;
x_req_o.amoop <= a_req_i.amoop when (sel = '0') else b_req_i.amoop;
x_req_o.priv <= a_req_i.priv when (sel = '0') else b_req_i.priv;
+ x_req_o.debug <= a_req_i.debug when (sel = '0') else b_req_i.debug;
x_req_o.src <= a_req_i.src when (sel = '0') else b_req_i.src;
x_req_o.rw <= a_req_i.rw when (sel = '0') else b_req_i.rw;
- x_req_o.fence <= a_req_i.fence or b_req_i.fence; -- propagate any fence request
- x_req_o.sleep <= a_req_i.sleep and b_req_i.sleep; -- set if ALL upstream devices are in sleep mode
- x_req_o.debug <= a_req_i.debug when (sel = '0') else b_req_i.debug;
+ x_req_o.fence <= a_req_i.fence or b_req_i.fence;
x_req_o.data <= b_req_i.data when PORT_A_READ_ONLY else
a_req_i.data when PORT_B_READ_ONLY else
@@ -855,11 +853,10 @@ begin
sys_req_o.rw <= '1' when (arbiter.state = S_WRITE) or (arbiter.state = S_WRITE_WAIT) else core_req_i.rw;
sys_req_o.src <= core_req_i.src;
sys_req_o.priv <= core_req_i.priv;
+ sys_req_o.debug <= core_req_i.debug;
sys_req_o.amo <= core_req_i.amo; -- set during the entire read-modify-write operation
sys_req_o.amoop <= (others => '0'); -- the specific AMO type should not matter after this point
sys_req_o.fence <= core_req_i.fence;
- sys_req_o.sleep <= core_req_i.sleep;
- sys_req_o.debug <= core_req_i.debug;
-- response switch --
core_rsp_o.data <= sys_rsp_i.data when (arbiter.state = S_IDLE) else arbiter.rdata;
diff --git a/rtl/core/neorv32_cache.vhd b/rtl/core/neorv32_cache.vhd
index 9d61dfa67..478949441 100644
--- a/rtl/core/neorv32_cache.vhd
+++ b/rtl/core/neorv32_cache.vhd
@@ -4,20 +4,11 @@
-- Configurable generic cache module. The cache is direct-mapped and implements --
-- "write-back" and "write-allocate" strategies. --
-- --
--- All requests targeting the "uncached address space page" (or higher), defined by --
--- the 4 most significant address bits, well as all atomic (reservation set) --
--- operations will always **bypass** the cache resulting in "direct accesses". --
--- --
--- Simplified cache architecture ("-->" = direction of access requests): --
--- --
--- Direct Access +----------+ --
--- /|----------------------->| Register |---------------------->|\ --
--- | | +----------+ | | --
--- Host -->| | | |--> Bus --
--- | | +--------------+ +--------------+ +-------------+ | | --
--- \|-->| Host Arbiter |-->| Cache Memory |<--| Bus Arbiter |-->|/ --
--- +--------------+ +--------------+ +-------------+ --
--- --
+-- Uncached / direct accesses: Several bus transaction types will bypass the cache: --
+-- * atomic memory operations --
+-- * accesses within debug-mode (on-chip debugger) --
+-- * accesses to the explicit "uncached address space page" (or higher); defined by --
+-- the 4 most significant address bits (UC_BEGIN) --
-- -------------------------------------------------------------------------------- --
-- The NEORV32 RISC-V Processor - https://github.com/stnolting/neorv32 --
-- Copyright (c) NEORV32 contributors. --
@@ -38,12 +29,12 @@ entity neorv32_cache is
NUM_BLOCKS : natural range 2 to 1024; -- number of cache blocks (min 2), has to be a power of 2
BLOCK_SIZE : natural range 4 to 32768; -- cache block size in bytes (min 4), has to be a power of 2
UC_BEGIN : std_ulogic_vector(3 downto 0); -- begin of uncached address space (page number / 4 MSBs of address)
- UC_ENABLE : boolean; -- enable direct/uncached accesses
READ_ONLY : boolean -- read-only accesses for host
);
port (
clk_i : in std_ulogic; -- global clock, rising edge
rstn_i : in std_ulogic; -- global reset, low-active, async
+ clean_o : out std_ulogic; -- cache is clean
host_req_i : in bus_req_t; -- host request
host_rsp_o : out bus_rsp_t; -- host response
bus_req_o : out bus_req_t; -- bus request
@@ -53,30 +44,14 @@ end neorv32_cache;
architecture neorv32_cache_rtl of neorv32_cache is
- -- host access arbiter (handle CPU accesses to cache) --
- component neorv32_cache_host
- generic (
- READ_ONLY : boolean
- );
- port (
- rstn_i : in std_ulogic;
- clk_i : in std_ulogic;
- req_i : in bus_req_t;
- rsp_o : out bus_rsp_t;
- bus_sync_o : out std_ulogic;
- bus_miss_o : out std_ulogic;
- bus_busy_i : in std_ulogic;
- dirty_o : out std_ulogic;
- hit_i : in std_ulogic;
- addr_o : out std_ulogic_vector(31 downto 0);
- we_o : out std_ulogic_vector(3 downto 0);
- swe_o : out std_ulogic;
- wdata_o : out std_ulogic_vector(31 downto 0);
- wstat_o : out std_ulogic;
- rdata_i : in std_ulogic_vector(31 downto 0);
- rstat_i : in std_ulogic
- );
- end component;
+ -- make sure cache sizes are a power of two --
+ constant block_num_c : natural := 2**index_size_f(NUM_BLOCKS);
+ constant block_size_c : natural := 2**index_size_f(BLOCK_SIZE);
+
+ -- cache layout --
+ constant offset_size_c : natural := index_size_f(block_size_c/4); -- WORD offset!
+ constant index_size_c : natural := index_size_f(block_num_c);
+ constant tag_size_c : natural := 32 - (offset_size_c + index_size_c + 2);
-- cache memory core (cache memory and management) --
component neorv32_cache_memory
@@ -86,337 +61,66 @@ architecture neorv32_cache_rtl of neorv32_cache is
READ_ONLY : boolean
);
port (
- rstn_i : in std_ulogic;
- clk_i : in std_ulogic;
- inval_i : in std_ulogic;
- new_i : in std_ulogic;
- dirty_i : in std_ulogic;
- hit_o : out std_ulogic;
- dirty_o : out std_ulogic;
- base_o : out std_ulogic_vector(31 downto 0);
- addr_i : in std_ulogic_vector(31 downto 0);
- we_i : in std_ulogic_vector(3 downto 0);
- swe_i : in std_ulogic;
- wdata_i : in std_ulogic_vector(31 downto 0);
- wstat_i : in std_ulogic;
- rdata_o : out std_ulogic_vector(31 downto 0);
- rstat_o : out std_ulogic
+ rstn_i : in std_ulogic;
+ clk_i : in std_ulogic;
+ inval_i : in std_ulogic;
+ new_i : in std_ulogic;
+ dirty_i : in std_ulogic;
+ hit_o : out std_ulogic;
+ dirty_o : out std_ulogic;
+ tag_o : out std_ulogic_vector(31 downto 0);
+ clean_o : out std_ulogic;
+ addr_i : in std_ulogic_vector(31 downto 0);
+ we_i : in std_ulogic_vector(3 downto 0);
+ wdata_i : in std_ulogic_vector(31 downto 0);
+ rdata_o : out std_ulogic_vector(31 downto 0)
);
end component;
- -- bus access arbiter (handle cache misses) --
- component neorv32_cache_bus
- generic (
- NUM_BLOCKS : natural;
- BLOCK_SIZE : natural;
- READ_ONLY : boolean
- );
- port (
- rstn_i : in std_ulogic;
- clk_i : in std_ulogic;
- host_req_i : in bus_req_t;
- bus_req_o : out bus_req_t;
- bus_rsp_i : in bus_rsp_t;
- cmd_sync_i : in std_ulogic;
- cmd_miss_i : in std_ulogic;
- cmd_busy_o : out std_ulogic;
- inval_o : out std_ulogic;
- new_o : out std_ulogic;
- dirty_i : in std_ulogic;
- base_i : in std_ulogic_vector(31 downto 0);
- addr_o : out std_ulogic_vector(31 downto 0);
- we_o : out std_ulogic_vector(3 downto 0);
- swe_o : out std_ulogic;
- wdata_o : out std_ulogic_vector(31 downto 0);
- wstat_o : out std_ulogic;
- rdata_i : in std_ulogic_vector(31 downto 0)
- );
- end component;
-
- -- make sure cache sizes are a power of two --
- constant block_num_c : natural := 2**index_size_f(NUM_BLOCKS);
- constant block_size_c : natural := 2**index_size_f(BLOCK_SIZE);
-
- -- bus de-mux control for direct/uncached or caches access --
- signal dir_acc_d, dir_acc_q : std_ulogic;
-
- -- internal bus system --
- signal bus_req, dir_req_d, dir_req_q, cache_req : bus_req_t;
- signal bus_rsp, dir_rsp_d, dir_rsp_q, cache_rsp : bus_rsp_t;
-
- -- cache memory module interface --
- type cache_in_t is record
- addr : std_ulogic_vector(31 downto 0);
- we : std_ulogic_vector(3 downto 0);
- swe : std_ulogic;
- wdata : std_ulogic_vector(31 downto 0);
- wstat : std_ulogic;
+ -- control -> cache interface --
+ type cache_o_t is record
+ cmd_inv : std_ulogic;
+ cmd_new : std_ulogic;
+ cmd_dir : std_ulogic;
+ addr : std_ulogic_vector(31 downto 0);
+ data : std_ulogic_vector(31 downto 0);
+ we : std_ulogic_vector(3 downto 0);
end record;
- signal cache_in_host, cache_in_bus, cache_in : cache_in_t;
- --
- type cache_out_t is record
- rdata : std_ulogic_vector(31 downto 0);
- rstat : std_ulogic;
+ signal cache_o : cache_o_t;
+
+ -- cache -> control interface --
+ type cache_i_t is record
+ sta_hit : std_ulogic;
+ sta_dir : std_ulogic;
+ sta_cln : std_ulogic;
+ sta_tag : std_ulogic_vector(31 downto 0);
+ data : std_ulogic_vector(31 downto 0);
end record;
- signal cache_out : cache_out_t;
-
- -- cache status --
- signal cache_stat_dirty, cache_stat_hit : std_ulogic;
- signal cache_stat_base : std_ulogic_vector(31 downto 0);
-
- -- operation commands --
- signal cache_cmd_inval, cache_cmd_new, cache_cmd_dirty, bus_cmd_sync, bus_cmd_miss, bus_cmd_busy : std_ulogic;
-
-begin
-
- -- Check if Direct/Uncached Access --------------------------------------------------------
- -- -------------------------------------------------------------------------------------------
- dir_acc_d <= '1' when UC_ENABLE and -- direct accesses implemented
- ((unsigned(host_req_i.addr(31 downto 28)) >= unsigned(UC_BEGIN)) or -- uncached memory page
- (host_req_i.amo = '1')) else '0'; -- atomic memory operation
-
- -- request splitter: cached or direct access --
- req_splitter: process(host_req_i, dir_acc_d)
- begin
- -- default: pass-through all bus signals --
- cache_req <= host_req_i;
- dir_req_d <= host_req_i;
- -- direct access --
- dir_req_d.stb <= host_req_i.stb and dir_acc_d;
- dir_req_d.fence <= '0'; -- no fence requests from this side
- -- cached access --
- cache_req.stb <= host_req_i.stb and (not dir_acc_d);
- end process req_splitter;
-
- -- direct/uncached access path pipeline stage --
- direct_acc_enable:
- if UC_ENABLE generate
- bus_buffer: process(rstn_i, clk_i)
- begin
- if (rstn_i = '0') then
- dir_acc_q <= '0';
- dir_req_q <= req_terminate_c;
- dir_rsp_q <= rsp_terminate_c;
- elsif rising_edge(clk_i) then
- dir_acc_q <= dir_acc_d;
- if READ_ONLY then -- do not propagate STB on write access, issue ERR instead
- dir_req_q <= dir_req_d;
- dir_req_q.stb <= dir_req_d.stb and (not dir_req_d.rw); -- read accesses only
- dir_rsp_q <= dir_rsp_d;
- dir_rsp_q.err <= dir_rsp_d.err or (dir_req_d.stb and dir_req_d.rw); -- error on write access
- else
- dir_req_q <= dir_req_d;
- dir_rsp_q <= dir_rsp_d;
- end if;
- end if;
- end process bus_buffer;
-
- -- internal response switch --
- host_rsp_o <= cache_rsp when (dir_acc_q = '0') else dir_rsp_q;
- end generate;
-
- -- direct accesses not implemented --
- direct_acc_disable:
- if not UC_ENABLE generate
- dir_req_q <= req_terminate_c;
- host_rsp_o <= cache_rsp;
- end generate;
-
-
- -- Host Access Arbiter (Handle *Cached* CPU Bus Requests) ---------------------------------
- -- -------------------------------------------------------------------------------------------
- neorv32_cache_host_inst: neorv32_cache_host
- generic map (
- READ_ONLY => READ_ONLY -- host accesses are read-only
- )
- port map (
- -- global control --
- rstn_i => rstn_i, -- global reset, async, low-active
- clk_i => clk_i, -- global clock, rising edge
- -- host access port --
- req_i => cache_req, -- request
- rsp_o => cache_rsp, -- response
- -- bus unit interface --
- bus_sync_o => bus_cmd_sync, -- sync cache and main memory
- bus_miss_o => bus_cmd_miss, -- cache miss
- bus_busy_i => bus_cmd_busy, -- bus operation in progress
- -- cache status interface --
- dirty_o => cache_cmd_dirty, -- make accessed block dirty
- hit_i => cache_stat_hit, -- cache hit
- -- cache data interface --
- addr_o => cache_in_host.addr, -- access address
- we_o => cache_in_host.we, -- byte-wide data write enable
- swe_o => cache_in_host.swe, -- status write enable
- wdata_o => cache_in_host.wdata, -- write data
- wstat_o => cache_in_host.wstat, -- write status
- rdata_i => cache_out.rdata, -- read data
- rstat_i => cache_out.rstat -- read status
- );
-
-
- -- Cache Memory Core (Cache Storage and Status Management) --------------------------------
- -- -------------------------------------------------------------------------------------------
- neorv32_cache_memory_inst: neorv32_cache_memory
- generic map (
- NUM_BLOCKS => block_num_c, -- number of blocks (min 2), has to be a power of 2
- BLOCK_SIZE => block_size_c, -- block size in bytes (min 4), has to be a power of 2
- READ_ONLY => READ_ONLY -- cache is read-only (for host)
- )
- port map (
- -- global control --
- rstn_i => rstn_i, -- global reset, async, low-active
- clk_i => clk_i, -- global clock, rising edge
- -- management --
- inval_i => cache_cmd_inval, -- make accessed block invalid
- new_i => cache_cmd_new, -- make accessed block valid, clean and set tag
- dirty_i => cache_cmd_dirty, -- make accessed block dirty
- -- status --
- hit_o => cache_stat_hit, -- cache hit
- dirty_o => cache_stat_dirty, -- accessed block is dirty
- base_o => cache_stat_base, -- base address of current block
- -- cache access --
- addr_i => cache_in.addr, -- access address
- we_i => cache_in.we, -- byte-wide data write enable
- swe_i => cache_in.swe, -- status write enable
- wdata_i => cache_in.wdata, -- write data
- wstat_i => cache_in.wstat, -- write status
- rdata_o => cache_out.rdata, -- read data
- rstat_o => cache_out.rstat -- read status
- );
-
- -- cache access switch --
- cache_in <= cache_in_host when (bus_cmd_busy = '0') else cache_in_bus;
-
-
- -- Bus Access Arbiter (Handle Cache Miss and Flush/Reload) --------------------------------
- -- -------------------------------------------------------------------------------------------
- neorv32_cache_bus_inst: neorv32_cache_bus
- generic map (
- NUM_BLOCKS => block_num_c, -- number of blocks (min 2), has to be a power of 2
- BLOCK_SIZE => block_size_c, -- block size in bytes (min 4), has to be a power of 2
- READ_ONLY => READ_ONLY -- read-only bus accesses
- )
- port map (
- -- global control --
- rstn_i => rstn_i, -- global reset, async, low-active
- clk_i => clk_i, -- global clock, rising edge
- -- host access port --
- host_req_i => host_req_i, -- request
- -- bus access port --
- bus_req_o => bus_req, -- request
- bus_rsp_i => bus_rsp, -- response
- -- operation interface --
- cmd_sync_i => bus_cmd_sync, -- sync cache and main memory
- cmd_miss_i => bus_cmd_miss, -- cache miss
- cmd_busy_o => bus_cmd_busy, -- bus operation in progress
- -- cache status interface --
- inval_o => cache_cmd_inval, -- invalidate accessed block
- new_o => cache_cmd_new, -- set new cache entry
- dirty_i => cache_stat_dirty, -- accessed block is dirty
- base_i => cache_stat_base, -- base address of accessed block
- -- cache data interface --
- addr_o => cache_in_bus.addr, -- access address
- we_o => cache_in_bus.we, -- byte-wide data write enable
- swe_o => cache_in_bus.swe, -- status write enable
- wdata_o => cache_in_bus.wdata, -- write data
- wstat_o => cache_in_bus.wstat, -- write status
- rdata_i => cache_out.rdata -- read data
- );
-
-
- -- Bus Access Switch ----------------------------------------------------------------------
- -- -------------------------------------------------------------------------------------------
- bus_switch_enable:
- if UC_ENABLE generate
- -- Use a real switch here to buffer direct access requests during
- -- out-of-band cache operations (downstream memory synchronization).
- neorv32_cache_bus_switch: entity neorv32.neorv32_bus_switch
- generic map (
- PORT_A_READ_ONLY => READ_ONLY,
- PORT_B_READ_ONLY => READ_ONLY
- )
- port map (
- clk_i => clk_i,
- rstn_i => rstn_i,
- a_lock_i => bus_cmd_busy, -- cache accesses have exclusive access
- a_req_i => bus_req,
- a_rsp_o => bus_rsp,
- b_req_i => dir_req_q,
- b_rsp_o => dir_rsp_d,
- x_req_o => bus_req_o,
- x_rsp_i => bus_rsp_i
- );
- end generate;
-
- bus_switch_disable:
- if not UC_ENABLE generate
- bus_req_o <= bus_req;
- bus_rsp <= bus_rsp_i;
- end generate;
-
+ signal cache_i : cache_i_t;
-end neorv32_cache_rtl;
-
-
--- ================================================================================ --
--- NEORV32 CPU - Generic Cache: Host Access Controller --
--- -------------------------------------------------------------------------------- --
--- Handle host accesses to the cache (check for hit/miss) or bypass cache if --
--- direct/uncached access. If a cache miss occurs or a fence request is received an --
--- according command is sent to the bus interface unit. --
--- -------------------------------------------------------------------------------- --
--- The NEORV32 RISC-V Processor - https://github.com/stnolting/neorv32 --
--- Copyright (c) NEORV32 contributors. --
--- Copyright (c) 2020 - 2025 Stephan Nolting. All rights reserved. --
--- Licensed under the BSD-3-Clause license, see LICENSE for details. --
--- SPDX-License-Identifier: BSD-3-Clause --
--- ================================================================================ --
-
-library ieee;
-use ieee.std_logic_1164.all;
-
-library neorv32;
-use neorv32.neorv32_package.all;
-
-entity neorv32_cache_host is
- generic (
- READ_ONLY : boolean -- host accesses are read-only
- );
- port (
- -- global control --
- rstn_i : in std_ulogic; -- global reset, async, low-active
- clk_i : in std_ulogic; -- global clock, rising edge
- -- host access port --
- req_i : in bus_req_t; -- request
- rsp_o : out bus_rsp_t; -- response
- -- bus unit interface --
- bus_sync_o : out std_ulogic; -- sync cache and main memory
- bus_miss_o : out std_ulogic; -- cache miss
- bus_busy_i : in std_ulogic; -- bus operation in progress
- -- cache status interface --
- dirty_o : out std_ulogic; -- make accessed block dirty
- hit_i : in std_ulogic; -- cache hit
- -- cache data interface --
- addr_o : out std_ulogic_vector(31 downto 0); -- access address
- we_o : out std_ulogic_vector(3 downto 0); -- byte-wide data write enable
- swe_o : out std_ulogic; -- status write enable
- wdata_o : out std_ulogic_vector(31 downto 0); -- write data
- wstat_o : out std_ulogic; -- write status
- rdata_i : in std_ulogic_vector(31 downto 0); -- read data
- rstat_i : in std_ulogic -- read status
+ -- control fsm --
+ type state_t is (
+ S_IDLE, S_CHECK, S_MISS, S_DIRECT_REQ, S_DIRECT_RSP,
+ S_DOWNLOAD_REQ, S_DOWNLOAD_RSP, S_DOWNLOAD_DONE, S_DOWNLOAD_ERR,
+ S_UPLOAD_GET, S_UPLOAD_REQ, S_UPLOAD_RSP,
+ S_FLUSH_START, S_FLUSH_READ, S_FLUSH_CHECK, S_FLUSH_DONE,
+ S_ERROR
);
-end neorv32_cache_host;
-
-architecture neorv32_cache_host_rtl of neorv32_cache_host is
-
- -- control engine --
- type ctrl_state_t is (S_IDLE, S_CHECK, S_WAIT_MISS, S_WAIT_SYNC, S_ERROR);
type ctrl_t is record
- state, state_nxt : ctrl_state_t; -- FSM state
- req_buf, req_buf_nxt : std_ulogic; -- access request buffer
- sync_buf, sync_buf_nxt : std_ulogic; -- flush/reload (sync with main memory) request buffer
+ state : state_t;
+ upret : state_t;
+ buf_req : std_ulogic;
+ buf_sync : std_ulogic;
end record;
- signal ctrl : ctrl_t;
+ signal ctrl, ctrl_nxt : ctrl_t;
+
+ -- address generator --
+ type addr_t is record
+ tag : std_ulogic_vector(tag_size_c-1 downto 0);
+ idx : std_ulogic_vector(index_size_c-1 downto 0);
+ ofs : std_ulogic_vector(offset_size_c-1 downto 0); -- word offset
+ end record;
+ signal addr, addr_nxt : addr_t;
begin
@@ -426,100 +130,290 @@ begin
begin
if (rstn_i = '0') then
ctrl.state <= S_IDLE;
- ctrl.req_buf <= '0';
- ctrl.sync_buf <= '0';
+ ctrl.upret <= S_IDLE;
+ ctrl.buf_req <= '0';
+ ctrl.buf_sync <= '0';
+ addr.tag <= (others => '0');
+ addr.idx <= (others => '0');
+ addr.ofs <= (others => '0');
+ clean_o <= '0';
elsif rising_edge(clk_i) then
- ctrl.state <= ctrl.state_nxt;
- ctrl.req_buf <= ctrl.req_buf_nxt;
- ctrl.sync_buf <= ctrl.sync_buf_nxt;
+ ctrl.state <= ctrl_nxt.state;
+ ctrl.upret <= ctrl_nxt.upret;
+ ctrl.buf_req <= ctrl_nxt.buf_req;
+ ctrl.buf_sync <= ctrl_nxt.buf_sync;
+ addr <= addr_nxt;
+ -- cache clean (sync with downstream memory)? --
+ if (cache_i.sta_cln = '1') and (ctrl.state = S_IDLE) then
+ clean_o <= '1';
+ else
+ clean_o <= '0';
+ end if;
end if;
end process ctrl_engine_sync;
-- Control Engine FSM Comb ----------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
- ctrl_engine_comb: process(ctrl, req_i, hit_i, rdata_i, rstat_i, bus_busy_i)
+ ctrl_engine_comb: process(ctrl, addr, host_req_i, bus_rsp_i, cache_i)
begin
- -- control defaults --
- ctrl.state_nxt <= ctrl.state;
- ctrl.req_buf_nxt <= ctrl.req_buf or req_i.stb;
- ctrl.sync_buf_nxt <= ctrl.sync_buf or req_i.fence;
+ -- control engine defaults --
+ ctrl_nxt.state <= ctrl.state;
+ ctrl_nxt.upret <= ctrl.upret;
+ ctrl_nxt.buf_req <= ctrl.buf_req or host_req_i.stb;
+ ctrl_nxt.buf_sync <= ctrl.buf_sync or host_req_i.fence;
+ addr_nxt <= addr;
-- cache access defaults --
- dirty_o <= '0';
- addr_o <= req_i.addr;
- we_o <= (others => '0');
- swe_o <= '0'; -- host cannot alter status bits
- wdata_o <= req_i.data;
- wstat_o <= '0'; -- host cannot alter status bits
+ cache_o.cmd_inv <= '0';
+ cache_o.cmd_new <= '0';
+ cache_o.cmd_dir <= '0';
+ cache_o.addr <= host_req_i.addr;
+ cache_o.we <= (others => '0');
+ cache_o.data <= host_req_i.data;
- -- bus unit command defaults --
- bus_sync_o <= '0';
- bus_miss_o <= '0';
+ -- host response defaults --
+ host_rsp_o <= rsp_terminate_c;
- -- host interface defaults --
- rsp_o <= rsp_terminate_c;
+ -- bus interface defaults --
+ bus_req_o.addr <= addr.tag & addr.idx & addr.ofs & "00"; -- always word-aligned
+ bus_req_o.data <= cache_i.data;
+ bus_req_o.ben <= (others => '1'); -- full-word writes only
+ bus_req_o.stb <= '0'; -- no request by default
+ bus_req_o.rw <= '0';
+ bus_req_o.src <= host_req_i.src; -- pass-through
+ bus_req_o.priv <= host_req_i.priv; -- pass-through
+ bus_req_o.debug <= host_req_i.debug; -- pass-through
+ bus_req_o.amo <= '0'; -- cache accesses cannot be atomic
+ bus_req_o.amoop <= (others => '0'); -- cache accesses cannot be atomic
+ bus_req_o.fence <= '0'; -- no fence by default
-- fsm --
case ctrl.state is
- when S_IDLE => -- wait for host request
+ when S_IDLE => -- wait for request
-- ------------------------------------------------------------
- if (ctrl.sync_buf = '1') then -- flush and reload cache (sync with main memory)
- bus_sync_o <= '1'; -- trigger bus unit: sync operation
- ctrl.state_nxt <= S_WAIT_SYNC;
- elsif (req_i.stb = '1') or (ctrl.req_buf = '1') then -- (pending) access request
- if (req_i.rw = '1') and READ_ONLY then -- invalid write access?
- ctrl.state_nxt <= S_ERROR;
+ if (host_req_i.fence = '1') or (ctrl.buf_sync = '1') then -- (pending) sync request
+ ctrl_nxt.state <= S_FLUSH_START;
+ elsif (host_req_i.stb = '1') or (ctrl.buf_req = '1') then -- (pending) access request
+ if (host_req_i.rw = '1') and (READ_ONLY = true) then -- invalid write access
+ ctrl_nxt.state <= S_ERROR;
+ elsif (unsigned(host_req_i.addr(31 downto 28)) >= unsigned(UC_BEGIN)) or
+ (host_req_i.amo = '1') or (host_req_i.debug = '1') then
+ ctrl_nxt.state <= S_DIRECT_REQ;
else
- ctrl.state_nxt <= S_CHECK;
+ ctrl_nxt.state <= S_CHECK;
end if;
end if;
+
+ when S_DIRECT_REQ => -- direct (uncached) access request
+ -- ------------------------------------------------------------
+ bus_req_o <= host_req_i;
+ bus_req_o.stb <= '1';
+ ctrl_nxt.state <= S_DIRECT_RSP;
+
+ when S_DIRECT_RSP => -- wait for direct (uncached) access response
+ -- ------------------------------------------------------------
+ bus_req_o <= host_req_i;
+ bus_req_o.stb <= '0';
+ host_rsp_o <= bus_rsp_i;
+ ctrl_nxt.buf_req <= '0'; -- access (about to be) completed
+ if (bus_rsp_i.ack = '1') or (bus_rsp_i.err = '1') then
+ ctrl_nxt.state <= S_IDLE;
+ end if;
+
+
when S_CHECK => -- check if cache hit
-- ------------------------------------------------------------
- rsp_o.data <= rdata_i; -- output read data
- ctrl.req_buf_nxt <= '0'; -- access request completed
- if (hit_i = '1') then
- if (req_i.rw = '1') and (not READ_ONLY) then -- write access
- dirty_o <= '1'; -- cache block is dirty now
- we_o <= req_i.ben; -- finalize write access
+ ctrl_nxt.buf_req <= '0'; -- access (about to be) completed
+ host_rsp_o.data <= cache_i.data;
+ if (cache_i.sta_hit = '1') then
+ if (host_req_i.rw = '0') then -- read access
+ host_rsp_o.ack <= '1';
+ else -- write access
+ cache_o.cmd_dir <= '1'; -- cache block is dirty now
+ cache_o.we <= host_req_i.ben; -- finalize write access
+ host_rsp_o.ack <= '1';
end if;
- rsp_o.ack <= not rstat_i; -- data word fine?
- rsp_o.err <= rstat_i; -- data word faulty?
- ctrl.state_nxt <= S_IDLE;
+ ctrl_nxt.state <= S_IDLE;
else -- cache miss
- bus_miss_o <= '1'; -- trigger bus unit: cache miss
- ctrl.state_nxt <= S_WAIT_MISS;
+ ctrl_nxt.state <= S_MISS;
+ end if;
+
+ when S_MISS => -- check if accessed block is dirty (cache address is still applied by host controller!)
+ -- ------------------------------------------------------------
+ ctrl_nxt.buf_req <= '0'; -- access (about to be) completed
+ addr_nxt.ofs <= (others => '0'); -- align block base address for upload/download
+ addr_nxt.idx <= host_req_i.addr((offset_size_c+2+index_size_c)-1 downto offset_size_c+2); -- index of referenced block
+ ctrl_nxt.upret <= S_MISS; -- come back here after UPLOAD
+ --
+ if (cache_i.sta_dir = '1') and (READ_ONLY = false) then -- block is dirty, upload first
+ addr_nxt.tag <= cache_i.sta_tag(31 downto 32-tag_size_c); -- tag of accessed block
+ ctrl_nxt.state <= S_UPLOAD_GET;
+ else -- block is clean, replace by new block
+ addr_nxt.tag <= host_req_i.addr(31 downto 32-tag_size_c); -- tag of referenced block
+ ctrl_nxt.state <= S_DOWNLOAD_REQ;
+ end if;
+
+
+ when S_DOWNLOAD_REQ => -- download new cache block: request new word
+ -- ------------------------------------------------------------
+ cache_o.addr <= addr.tag & addr.idx & addr.ofs & "00";
+ cache_o.data <= bus_rsp_i.data;
+ bus_req_o.rw <= '0'; -- read access
+ bus_req_o.stb <= '1'; -- request new transfer
+ ctrl_nxt.state <= S_DOWNLOAD_RSP;
+
+ when S_DOWNLOAD_RSP => -- download new cache block: wait for bus response
+ -- ------------------------------------------------------------
+ cache_o.addr <= addr.tag & addr.idx & addr.ofs & "00";
+ cache_o.data <= bus_rsp_i.data;
+ cache_o.cmd_new <= '1'; -- set new block (set tag, make valid, make clean)
+ bus_req_o.rw <= '0'; -- read access
+ if (bus_rsp_i.err = '1') then --
+ ctrl_nxt.state <= S_DOWNLOAD_ERR;
+ elsif (bus_rsp_i.ack = '1') then
+ cache_o.we <= (others => '1'); -- cache: full-word write
+ addr_nxt.ofs <= std_ulogic_vector(unsigned(addr.ofs) + 1);
+ if (and_reduce_f(addr.ofs) = '1') then -- block completed
+ ctrl_nxt.state <= S_DOWNLOAD_DONE;
+ else -- get next word
+ ctrl_nxt.state <= S_DOWNLOAD_REQ;
+ end if;
+ end if;
+
+ when S_DOWNLOAD_DONE => -- delay cycle for update of cache status
+ -- ------------------------------------------------------------
+ ctrl_nxt.state <= S_CHECK;
+
+ when S_DOWNLOAD_ERR => -- error during block download
+ -- ------------------------------------------------------------
+ cache_o.cmd_inv <= '1'; -- this block in broken
+ ctrl_nxt.state <= S_ERROR;
+
+
+ when S_UPLOAD_GET => -- upload dirty cache block: read word from cache
+ -- ------------------------------------------------------------
+ if (READ_ONLY = true) then
+ ctrl_nxt.state <= S_IDLE;
+ else
+ cache_o.addr <= addr.tag & addr.idx & addr.ofs & "00";
+ bus_req_o.rw <= '1'; -- write access
+ ctrl_nxt.state <= S_UPLOAD_REQ;
+ end if;
+
+ when S_UPLOAD_REQ => -- upload dirty cache block: request bus write
+ -- ------------------------------------------------------------
+ if (READ_ONLY = true) then
+ ctrl_nxt.state <= S_IDLE;
+ else
+ cache_o.addr <= addr.tag & addr.idx & addr.ofs & "00";
+ bus_req_o.rw <= '1'; -- write access
+ bus_req_o.stb <= '1'; -- request new transfer
+ ctrl_nxt.state <= S_UPLOAD_RSP;
end if;
- when S_WAIT_SYNC => -- wait for bus engine to handle cache sync
+ when S_UPLOAD_RSP => -- upload dirty cache block: wait for bus response
+ -- ------------------------------------------------------------
+ if (READ_ONLY = true) then
+ ctrl_nxt.state <= S_IDLE;
+ else
+ cache_o.addr <= addr.tag & addr.idx & addr.ofs & "00";
+ bus_req_o.rw <= '1'; -- write access
+ cache_o.cmd_new <= '1'; -- set new block (set tag, make valid, make clean)
+ if (bus_rsp_i.err = '1') then -- bus error (this is really bad...)
+ ctrl_nxt.state <= S_ERROR;
+ elsif (bus_rsp_i.ack = '1') then
+ addr_nxt.ofs <= std_ulogic_vector(unsigned(addr.ofs) + 1);
+ if (and_reduce_f(addr.ofs) = '1') then -- block completed
+ ctrl_nxt.state <= ctrl.upret; -- go back to "upload-done return state"
+ else -- get next word
+ ctrl_nxt.state <= S_UPLOAD_GET;
+ end if;
+ end if;
+ end if;
+
+
+ when S_FLUSH_START => -- start checking for dirty blocks
+ -- ------------------------------------------------------------
+ cache_o.addr <= addr.tag & addr.idx & addr.ofs & "00";
+ addr_nxt.idx <= (others => '0'); -- start with index 0
+ ctrl_nxt.upret <= S_FLUSH_READ; -- come back to S_FLUSH_READ after block UPLOAD
+ ctrl_nxt.state <= S_FLUSH_READ;
+
+ when S_FLUSH_READ => -- cache read access latency cycle
-- ------------------------------------------------------------
- ctrl.sync_buf_nxt <= '0'; -- sync operation has been issued
- if (bus_busy_i = '0') then
- ctrl.state_nxt <= S_IDLE;
+ cache_o.addr <= addr.tag & addr.idx & addr.ofs & "00";
+ ctrl_nxt.state <= S_FLUSH_CHECK;
+
+ when S_FLUSH_CHECK => -- check if currently indexed block is dirty
+ -- ------------------------------------------------------------
+ cache_o.addr <= addr.tag & addr.idx & addr.ofs & "00";
+ addr_nxt.tag <= cache_i.sta_tag(31 downto 32-tag_size_c); -- tag of currently index block
+ cache_o.cmd_inv <= '1'; -- invalidate currently indexed block
+ if (cache_i.sta_dir = '1') and (READ_ONLY = false) then -- block dirty?
+ ctrl_nxt.state <= S_UPLOAD_GET;
+ else -- move on to next block
+ addr_nxt.idx <= std_ulogic_vector(unsigned(addr.idx) + 1);
+ if (and_reduce_f(addr.idx) = '1') then -- all blocks done
+ ctrl_nxt.state <= S_FLUSH_DONE;
+ else -- go to next block
+ ctrl_nxt.state <= S_FLUSH_READ;
+ end if;
end if;
- when S_WAIT_MISS => -- wait for bus engine to handle cache miss
+ when S_FLUSH_DONE => -- flush completed
-- ------------------------------------------------------------
- if (bus_busy_i = '0') then
- ctrl.state_nxt <= S_CHECK; -- redo cache access
+ if not READ_ONLY then
+ bus_req_o.fence <= '1'; -- forward fence request
end if;
+ ctrl_nxt.buf_sync <= '0'; -- sync completed
+ ctrl_nxt.state <= S_IDLE;
- when S_ERROR => -- access error
+
+ when S_ERROR => -- error
-- ------------------------------------------------------------
- rsp_o.err <= '1';
- ctrl.state_nxt <= S_IDLE;
+ host_rsp_o.err <= '1';
+ ctrl_nxt.state <= S_IDLE;
when others => -- undefined
-- ------------------------------------------------------------
- ctrl.state_nxt <= S_IDLE;
+ ctrl_nxt.state <= S_IDLE;
end case;
end process ctrl_engine_comb;
-end neorv32_cache_host_rtl;
+ -- Cache Memory Core (Cache Storage and Status Management) --------------------------------
+ -- -------------------------------------------------------------------------------------------
+ neorv32_cache_memory_inst: neorv32_cache_memory
+ generic map (
+ NUM_BLOCKS => block_num_c, -- number of blocks (min 2), has to be a power of 2
+ BLOCK_SIZE => block_size_c, -- block size in bytes (min 4), has to be a power of 2
+ READ_ONLY => READ_ONLY -- cache is read-only (for host)
+ )
+ port map (
+ -- global control --
+ rstn_i => rstn_i, -- global reset, async, low-active
+ clk_i => clk_i, -- global clock, rising edge
+ -- management --
+ inval_i => cache_o.cmd_inv, -- make accessed block invalid
+ new_i => cache_o.cmd_new, -- make accessed block valid, clean and set tag
+ dirty_i => cache_o.cmd_dir, -- make accessed block dirty
+ -- status --
+ hit_o => cache_i.sta_hit, -- cache hit
+ dirty_o => cache_i.sta_dir, -- accessed block is dirty
+ tag_o => cache_i.sta_tag, -- tag of current block (MSB-aligned)
+ clean_o => cache_i.sta_cln, -- cache is clean (global status)
+ -- cache access --
+ addr_i => cache_o.addr, -- access address
+ we_i => cache_o.we, -- byte-wide data write enable
+ wdata_i => cache_o.data, -- write data
+ rdata_o => cache_i.data -- read data
+ );
+
+end neorv32_cache_rtl;
-- ================================================================================ --
@@ -547,24 +441,22 @@ entity neorv32_cache_memory is
);
port (
-- global control --
- rstn_i : in std_ulogic; -- global reset, async, low-active
- clk_i : in std_ulogic; -- global clock, rising edge
+ rstn_i : in std_ulogic; -- global reset, async, low-active
+ clk_i : in std_ulogic; -- global clock, rising edge
-- management --
- inval_i : in std_ulogic; -- make accessed block invalid
- new_i : in std_ulogic; -- make accessed block valid, clean and set tag
- dirty_i : in std_ulogic; -- make accessed block dirty
+ inval_i : in std_ulogic; -- make accessed block invalid
+ new_i : in std_ulogic; -- make accessed block valid, clean and set tag
+ dirty_i : in std_ulogic; -- make accessed block dirty
-- status --
- hit_o : out std_ulogic; -- cache hit
- dirty_o : out std_ulogic; -- accessed block is dirty
- base_o : out std_ulogic_vector(31 downto 0); -- base address of current block
+ hit_o : out std_ulogic; -- cache hit
+ dirty_o : out std_ulogic; -- accessed block is dirty
+ tag_o : out std_ulogic_vector(31 downto 0); -- tag of current block (MSB-aligned)
+ clean_o : out std_ulogic; -- cache is clean (global status)
-- cache access --
- addr_i : in std_ulogic_vector(31 downto 0); -- access address
- we_i : in std_ulogic_vector(3 downto 0); -- byte-wide data write enable
- swe_i : in std_ulogic; -- status write enable
- wdata_i : in std_ulogic_vector(31 downto 0); -- write data
- wstat_i : in std_ulogic; -- write status
- rdata_o : out std_ulogic_vector(31 downto 0); -- read data
- rstat_o : out std_ulogic -- read status
+ addr_i : in std_ulogic_vector(31 downto 0); -- access address
+ we_i : in std_ulogic_vector(3 downto 0); -- byte-wide data write enable
+ wdata_i : in std_ulogic_vector(31 downto 0); -- write data
+ rdata_o : out std_ulogic_vector(31 downto 0) -- read data
);
end neorv32_cache_memory;
@@ -576,26 +468,21 @@ architecture neorv32_cache_memory_rtl of neorv32_cache_memory is
constant tag_size_c : natural := 32 - (offset_size_c + index_size_c + 2); -- 2 additional bits for byte offset
-- status flag memory --
- signal valid_mem, dirty_mem : std_ulogic_vector(NUM_BLOCKS-1 downto 0);
+ signal valid_mem, dirty_mem : std_ulogic_vector(NUM_BLOCKS-1 downto 0);
signal valid_mem_rd, dirty_mem_rd : std_ulogic;
-- tag memory --
type tag_mem_t is array (0 to NUM_BLOCKS-1) of std_ulogic_vector(tag_size_c-1 downto 0);
- signal tag_mem : tag_mem_t;
+ signal tag_mem : tag_mem_t;
signal tag_mem_rd : std_ulogic_vector(tag_size_c-1 downto 0);
-- cache data memory --
type data_mem_t is array (0 to (NUM_BLOCKS * (BLOCK_SIZE/4))-1) of std_ulogic_vector(7 downto 0);
signal data_mem_b0, data_mem_b1, data_mem_b2, data_mem_b3 : data_mem_t; -- byte-wide sub-memories
- signal data_mem_rd : std_ulogic_vector(31 downto 0);
-
- -- cache data status memory (used for the bus error response - just mark individual words as faults and not the entire block) --
- signal stat_mem : std_ulogic_vector((NUM_BLOCKS * (BLOCK_SIZE/4))-1 downto 0);
- signal stat_mem_rd : std_ulogic;
-- access address decomposition --
- signal acc_tag, acc_tag_ff : std_ulogic_vector(tag_size_c-1 downto 0);
- signal acc_idx, acc_idx_ff : std_ulogic_vector(index_size_c-1 downto 0);
+ signal acc_tag : std_ulogic_vector(tag_size_c-1 downto 0);
+ signal acc_idx : std_ulogic_vector(index_size_c-1 downto 0);
signal acc_off : std_ulogic_vector(offset_size_c-1 downto 0);
signal acc_adr : std_ulogic_vector((index_size_c+offset_size_c)-1 downto 0);
@@ -608,26 +495,16 @@ begin
acc_off <= addr_i(2+(offset_size_c-1) downto 2);
acc_adr <= acc_idx & acc_off;
- -- access buffer (tag + index) --
- access_buffer: process(rstn_i, clk_i)
- begin
- if (rstn_i = '0') then
- acc_tag_ff <= (others => '0');
- acc_idx_ff <= (others => '0');
- elsif rising_edge(clk_i) then
- acc_tag_ff <= acc_tag;
- acc_idx_ff <= acc_idx;
- end if;
- end process access_buffer;
-
-- Status Flag Memory ---------------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
status_memory: process(rstn_i, clk_i)
begin
if (rstn_i = '0') then
- valid_mem <= (others => '0');
- dirty_mem <= (others => '0');
+ valid_mem <= (others => '0');
+ dirty_mem <= (others => '0');
+ valid_mem_rd <= '0';
+ dirty_mem_rd <= '0';
elsif rising_edge(clk_i) then
if (new_i = '1') then -- set new block
valid_mem(to_integer(unsigned(acc_idx))) <= '1'; -- valid
@@ -636,7 +513,7 @@ begin
if (inval_i = '1') then -- invalidate current block
valid_mem(to_integer(unsigned(acc_idx))) <= '0';
end if;
- if (dirty_i = '1') then -- make current block dirty
+ if (dirty_i = '1') and (READ_ONLY = false) then -- make current block dirty
dirty_mem(to_integer(unsigned(acc_idx))) <= '1';
end if;
end if;
@@ -659,16 +536,27 @@ begin
end if;
end process tag_memory;
+ -- tag of accessed block --
+ tag_o(31 downto 31-(tag_size_c-1)) <= tag_mem_rd;
+ tag_o(31-tag_size_c downto 0) <= (others => '0');
+
-- Access Status (1 Cycle Latency) --------------------------------------------------------
-- -------------------------------------------------------------------------------------------
- hit_o <= '1' when (valid_mem_rd = '1') and (tag_mem_rd = acc_tag_ff) else '0'; -- cache access hit
- dirty_o <= '1' when (valid_mem_rd = '1') and (dirty_mem_rd = '1') and (not READ_ONLY) else '0'; -- accessed block is dirty
+ hit_o <= '1' when (valid_mem_rd = '1') and (tag_mem_rd = acc_tag) else '0'; -- cache access hit
+ dirty_o <= '1' when (valid_mem_rd = '1') and (dirty_mem_rd = '1') and (READ_ONLY = false) else '0'; -- block is dirty
- -- base address of accessed block --
- base_o(31 downto 31-(tag_size_c-1)) <= tag_mem_rd;
- base_o(31-tag_size_c downto 2+offset_size_c) <= acc_idx_ff;
- base_o(2+(offset_size_c-1) downto 0) <= (others => '0');
+ -- cache is clean if all blocks are invalid --
+ clean_read_only:
+ if READ_ONLY generate
+ clean_o <= '1' when (or_reduce_f(valid_mem) = '0') else '0';
+ end generate;
+
+ -- cache is clean if all valid blocks are clean --
+ clean_read_write:
+ if not READ_ONLY generate
+ clean_o <= '1' when (or_reduce_f(valid_mem and dirty_mem) = '0') else '0';
+ end generate;
-- Cache Data Memory ----------------------------------------------------------------------
@@ -689,287 +577,13 @@ begin
if (we_i(3) = '1') then
data_mem_b3(to_integer(unsigned(acc_adr))) <= wdata_i(31 downto 24);
end if;
- if (swe_i = '1') then
- stat_mem(to_integer(unsigned(acc_adr))) <= wstat_i;
- end if;
-- read access --
- data_mem_rd(07 downto 00) <= data_mem_b0(to_integer(unsigned(acc_adr)));
- data_mem_rd(15 downto 08) <= data_mem_b1(to_integer(unsigned(acc_adr)));
- data_mem_rd(23 downto 16) <= data_mem_b2(to_integer(unsigned(acc_adr)));
- data_mem_rd(31 downto 24) <= data_mem_b3(to_integer(unsigned(acc_adr)));
- stat_mem_rd <= stat_mem(to_integer(unsigned(acc_adr)));
+ rdata_o(7 downto 0) <= data_mem_b0(to_integer(unsigned(acc_adr)));
+ rdata_o(15 downto 8) <= data_mem_b1(to_integer(unsigned(acc_adr)));
+ rdata_o(23 downto 16) <= data_mem_b2(to_integer(unsigned(acc_adr)));
+ rdata_o(31 downto 24) <= data_mem_b3(to_integer(unsigned(acc_adr)));
end if;
end process cache_mem_access;
- -- read-data + status --
- rdata_o <= data_mem_rd;
- rstat_o <= stat_mem_rd and valid_mem_rd;
-
end neorv32_cache_memory_rtl;
-
-
--- ================================================================================ --
--- NEORV32 CPU - Generic Cache: Bus Interface Unit --
--- -------------------------------------------------------------------------------- --
--- The NEORV32 RISC-V Processor - https://github.com/stnolting/neorv32 --
--- Copyright (c) NEORV32 contributors. --
--- Copyright (c) 2020 - 2025 Stephan Nolting. All rights reserved. --
--- Licensed under the BSD-3-Clause license, see LICENSE for details. --
--- SPDX-License-Identifier: BSD-3-Clause --
--- ================================================================================ --
-
-library ieee;
-use ieee.std_logic_1164.all;
-use ieee.numeric_std.all;
-
-library neorv32;
-use neorv32.neorv32_package.all;
-
-entity neorv32_cache_bus is
- generic (
- NUM_BLOCKS : natural; -- number of blocks (min 2), has to be a power of 2
- BLOCK_SIZE : natural; -- block size in bytes (min 4), has to be a power of 2
- READ_ONLY : boolean -- read-only bus accesses
- );
- port (
- -- global control --
- rstn_i : in std_ulogic; -- global reset, async, low-active
- clk_i : in std_ulogic; -- global clock, rising edge
- -- host access port --
- host_req_i : in bus_req_t; -- request
- -- bus access port --
- bus_req_o : out bus_req_t; -- request
- bus_rsp_i : in bus_rsp_t; -- response
- -- operation interface --
- cmd_sync_i : in std_ulogic; -- sync cache and main memory
- cmd_miss_i : in std_ulogic; -- cache miss
- cmd_busy_o : out std_ulogic; -- bus operation in progress
- -- cache status interface --
- inval_o : out std_ulogic; -- invalidate accessed block
- new_o : out std_ulogic; -- set new cache entry
- dirty_i : in std_ulogic; -- accessed block is dirty
- base_i : in std_ulogic_vector(31 downto 0); -- base address of accessed block
- -- cache data interface --
- addr_o : out std_ulogic_vector(31 downto 0); -- access address
- we_o : out std_ulogic_vector(3 downto 0); -- byte-wide data write enable
- swe_o : out std_ulogic; -- status write enable
- wdata_o : out std_ulogic_vector(31 downto 0); -- write data
- wstat_o : out std_ulogic; -- write status
- rdata_i : in std_ulogic_vector(31 downto 0) -- read data
- );
-end neorv32_cache_bus;
-
-architecture neorv32_cache_bus_rtl of neorv32_cache_bus is
-
- -- cache layout --
- constant offset_size_c : natural := index_size_f(BLOCK_SIZE/4); -- WORD offset!
- constant index_size_c : natural := index_size_f(NUM_BLOCKS);
- constant tag_size_c : natural := 32 - (offset_size_c + index_size_c + 2);
-
- -- control fsm --
- type state_t is (S_IDLE, S_CHECK, S_DOWNLOAD_REQ, S_DOWNLOAD_RSP, S_UPLOAD_GET,
- S_UPLOAD_REQ, S_UPLOAD_RSP, S_FLUSH_START, S_FLUSH_READ, S_FLUSH_CHECK);
- signal state, upret, state_nxt, upret_nxt: state_t;
-
- -- address generator --
- type addr_t is record
- tag : std_ulogic_vector(tag_size_c-1 downto 0);
- idx : std_ulogic_vector(index_size_c-1 downto 0);
- ofs : std_ulogic_vector(offset_size_c-1 downto 0); -- WORD offset!
- end record;
- signal haddr, baddr, addr, addr_nxt : addr_t;
-
-begin
-
- -- Address Decomposition ------------------------------------------------------------------
- -- -------------------------------------------------------------------------------------------
- -- base address of original host access --
- haddr.tag <= host_req_i.addr(31 downto (32-tag_size_c));
- haddr.idx <= (others => '0'); -- unused
- haddr.ofs <= (others => '0'); -- unused
-
- -- base address of indexed cache block --
- baddr.tag <= base_i(31 downto (32-tag_size_c));
- baddr.idx <= base_i((offset_size_c+2+index_size_c)-1 downto offset_size_c+2);
- baddr.ofs <= (others => '0'); -- unused
-
-
- -- Control Engine FSM Sync ----------------------------------------------------------------
- -- -------------------------------------------------------------------------------------------
- ctrl_engine_sync: process(rstn_i, clk_i)
- begin
- if (rstn_i = '0') then
- state <= S_IDLE;
- upret <= S_IDLE;
- addr.tag <= (others => '0');
- addr.idx <= (others => '0');
- addr.ofs <= (others => '0');
- elsif rising_edge(clk_i) then
- state <= state_nxt;
- upret <= upret_nxt;
- addr <= addr_nxt;
- end if;
- end process ctrl_engine_sync;
-
-
- -- Control Engine FSM Comb ----------------------------------------------------------------
- -- -------------------------------------------------------------------------------------------
- ctrl_engine_comb: process(state, upret, addr, haddr, baddr, host_req_i, bus_rsp_i, cmd_sync_i, cmd_miss_i, rdata_i, dirty_i)
- begin
- -- control engine defaults --
- state_nxt <= state;
- upret_nxt <= upret;
- addr_nxt <= addr;
-
- -- cache access defaults --
- addr_o <= addr.tag & addr.idx & addr.ofs & "00"; -- always word-aligned
- we_o <= (others => '0');
- swe_o <= '0';
- wdata_o <= bus_rsp_i.data;
- wstat_o <= bus_rsp_i.err;
-
- -- cache command defaults --
- inval_o <= '0';
- new_o <= '0';
-
- -- bus interface defaults --
- bus_req_o <= req_terminate_c; -- all-zero
- bus_req_o.addr <= addr.tag & addr.idx & addr.ofs & "00"; -- always word-aligned
- bus_req_o.data <= rdata_i;
- bus_req_o.ben <= (others => '1'); -- full-word writes only
- bus_req_o.src <= '0'; -- cache accesses are always data accesses
- bus_req_o.priv <= '0'; -- cache accesses are always "unprivileged" accesses
- bus_req_o.amo <= '0'; -- cache accesses can never be an atomic memory operation
- bus_req_o.amoop <= (others => '0'); -- cache accesses can never be an atomic memory operation
- bus_req_o.debug <= host_req_i.debug;
- if (state = S_IDLE) then
- bus_req_o.sleep <= host_req_i.sleep;
- else
- bus_req_o.sleep <= '0';
- end if;
-
- -- fsm --
- case state is
-
- when S_IDLE => -- wait for request
- -- ------------------------------------------------------------
- addr_nxt.ofs <= (others => '0'); -- align block base address for upload/download (and flush)
- if (cmd_sync_i = '1') then -- cache sync
- state_nxt <= S_FLUSH_START;
- elsif (cmd_miss_i = '1') then -- cache miss
- state_nxt <= S_CHECK;
- end if;
-
- when S_CHECK => -- check if accessed block is dirty (cache address is still applied by host controller!)
- -- ------------------------------------------------------------
- upret_nxt <= S_DOWNLOAD_REQ; -- go straight to S_DOWNLOAD_REQ when S_UPLOAD_GET has completed (if executed)
- addr_nxt.idx <= baddr.idx; -- index of reference cache block
- if (dirty_i = '1') and (not READ_ONLY) then -- block is dirty, upload first
- addr_nxt.tag <= baddr.tag; -- base address (tag + index) of accessed block
- state_nxt <= S_UPLOAD_GET;
- else -- block is clean, download new block
- addr_nxt.tag <= haddr.tag; -- base address (tag + index) of requested block
- state_nxt <= S_DOWNLOAD_REQ;
- end if;
-
-
- when S_DOWNLOAD_REQ => -- download new cache block: request new word
- -- ------------------------------------------------------------
- bus_req_o.rw <= '0'; -- read access
- bus_req_o.stb <= '1'; -- request new transfer
- state_nxt <= S_DOWNLOAD_RSP;
-
- when S_DOWNLOAD_RSP => -- download new cache block: wait for bus response
- -- ------------------------------------------------------------
- bus_req_o.rw <= '0'; -- read access
- we_o <= (others => '1'); -- cache: full-word write (write all the time until ACK/ERR)
- swe_o <= '1'; -- cache: write status bit (bus error response)
- new_o <= '1'; -- set new block (set tag, make valid, make clean)
- if (bus_rsp_i.ack = '1') or (bus_rsp_i.err = '1') then -- wait for response
- addr_nxt.ofs <= std_ulogic_vector(unsigned(addr.ofs) + 1);
- if (and_reduce_f(addr.ofs) = '1') then -- block completed? offset will be all-zero again after block completion
- state_nxt <= S_IDLE;
- else -- get next word
- state_nxt <= S_DOWNLOAD_REQ;
- end if;
- end if;
-
-
- when S_UPLOAD_GET => -- upload dirty cache block: read word from cache
- -- ------------------------------------------------------------
- if READ_ONLY then
- state_nxt <= S_IDLE;
- else
- bus_req_o.rw <= '1'; -- write access
- state_nxt <= S_UPLOAD_REQ;
- end if;
-
- when S_UPLOAD_REQ => -- upload dirty cache block: request bus write
- -- ------------------------------------------------------------
- if READ_ONLY then
- state_nxt <= S_IDLE;
- else
- bus_req_o.rw <= '1'; -- write access
- bus_req_o.stb <= '1'; -- request new transfer
- state_nxt <= S_UPLOAD_RSP;
- end if;
-
- when S_UPLOAD_RSP => -- upload dirty cache block: wait for bus response
- -- ------------------------------------------------------------
- if READ_ONLY then
- state_nxt <= S_IDLE;
- else
- bus_req_o.rw <= '1'; -- write access
- new_o <= '1'; -- set new block (set tag, make valid, make clean)
- if (bus_rsp_i.ack = '1') or (bus_rsp_i.err = '1') then -- wait for response
- addr_nxt.ofs <= std_ulogic_vector(unsigned(addr.ofs) + 1);
- if (and_reduce_f(addr.ofs) = '1') then -- block completed? offset will be all-zero again after block completion
- state_nxt <= upret; -- go back to "upload-done return state"
- else -- get next word
- state_nxt <= S_UPLOAD_GET;
- end if;
- end if;
- end if;
-
-
- when S_FLUSH_START => -- start checking for dirty blocks
- -- ------------------------------------------------------------
- addr_nxt.idx <= (others => '0'); -- start with index 0
- bus_req_o.fence <= bool_to_ulogic_f(READ_ONLY); -- forward fence request
- upret_nxt <= S_FLUSH_CHECK; -- come back to S_FLUSH_CHECK after block upload
- state_nxt <= S_FLUSH_READ;
-
- when S_FLUSH_READ => -- cache read access latency cycle
- -- ------------------------------------------------------------
- state_nxt <= S_FLUSH_CHECK;
-
- when S_FLUSH_CHECK => -- check if currently indexed block is dirty
- -- ------------------------------------------------------------
- addr_nxt.tag <= baddr.tag; -- tag of currently index block
- inval_o <= '1'; -- invalidate currently index block
- if (dirty_i = '1') and (not READ_ONLY) then -- block dirty?
- state_nxt <= S_UPLOAD_GET;
- else -- move on to next block
- addr_nxt.idx <= std_ulogic_vector(unsigned(addr.idx) + 1);
- if (and_reduce_f(addr.idx) = '1') then -- all blocks done?
- bus_req_o.fence <= not bool_to_ulogic_f(READ_ONLY); -- forward fence request
- state_nxt <= S_IDLE;
- else -- go to next block
- state_nxt <= S_FLUSH_READ;
- end if;
- end if;
-
-
- when others => -- undefined
- -- ------------------------------------------------------------
- state_nxt <= S_IDLE;
-
- end case;
- end process ctrl_engine_comb;
-
- -- bus arbiter operation in progress --
- cmd_busy_o <= '0' when (state = S_IDLE) else '1';
-
-
-end neorv32_cache_bus_rtl;
diff --git a/rtl/core/neorv32_cpu.vhd b/rtl/core/neorv32_cpu.vhd
index 7e59f46bb..6312e3843 100644
--- a/rtl/core/neorv32_cpu.vhd
+++ b/rtl/core/neorv32_cpu.vhd
@@ -47,7 +47,7 @@ entity neorv32_cpu is
RISCV_ISA_Zkne : boolean; -- implement cryptography NIST AES encryption extension
RISCV_ISA_Zknh : boolean; -- implement cryptography NIST hash extension
RISCV_ISA_Zksed : boolean; -- implement ShangMi hash extension
- RISCV_ISA_Zksh : boolean; -- implement ShangMi block cypher extension
+ RISCV_ISA_Zksh : boolean; -- implement ShangMi block cipher extension
RISCV_ISA_Zmmul : boolean; -- implement multiply-only M sub-extension
RISCV_ISA_Zxcfu : boolean; -- implement custom (instr.) functions unit
RISCV_ISA_Sdext : boolean; -- implement external debug mode extension
@@ -69,23 +69,25 @@ entity neorv32_cpu is
);
port (
-- global control --
- clk_i : in std_ulogic; -- global clock, rising edge
- rstn_i : in std_ulogic; -- global reset, low-active, async
+ clk_i : in std_ulogic; -- global clock, rising edge
+ rstn_i : in std_ulogic; -- global reset, low-active, async
-- interrupts --
- msi_i : in std_ulogic; -- risc-v machine software interrupt
- mei_i : in std_ulogic; -- risc-v machine external interrupt
- mti_i : in std_ulogic; -- risc-v machine timer interrupt
- firq_i : in std_ulogic_vector(15 downto 0); -- custom fast interrupts
- dbi_i : in std_ulogic; -- risc-v debug halt request interrupt
+ msi_i : in std_ulogic; -- risc-v machine software interrupt
+ mei_i : in std_ulogic; -- risc-v machine external interrupt
+ mti_i : in std_ulogic; -- risc-v machine timer interrupt
+ firq_i : in std_ulogic_vector(15 downto 0); -- custom fast interrupts
+ dbi_i : in std_ulogic; -- risc-v debug halt request interrupt
-- inter-core communication links --
- icc_tx_o : out icc_t; -- TX links
- icc_rx_i : in icc_t; -- RX links
+ icc_tx_o : out icc_t; -- TX links
+ icc_rx_i : in icc_t; -- RX links
-- instruction bus interface --
- ibus_req_o : out bus_req_t; -- request bus
- ibus_rsp_i : in bus_rsp_t; -- response bus
+ ibus_req_o : out bus_req_t; -- request bus
+ ibus_rsp_i : in bus_rsp_t; -- response bus
-- data bus interface --
- dbus_req_o : out bus_req_t; -- request bus
- dbus_rsp_i : in bus_rsp_t -- response bus
+ dbus_req_o : out bus_req_t; -- request bus
+ dbus_rsp_i : in bus_rsp_t; -- response bus
+ -- memory synchronization --
+ mem_sync_i : in std_ulogic -- synchronization operation done
);
end neorv32_cpu;
@@ -238,7 +240,7 @@ begin
RISCV_ISA_Zkne => RISCV_ISA_Zkne, -- implement cryptography NIST AES encryption extension
RISCV_ISA_Zknh => RISCV_ISA_Zknh, -- implement cryptography NIST hash extension
RISCV_ISA_Zks => riscv_zks_c, -- ShangMi algorithm suite available
- RISCV_ISA_Zksed => RISCV_ISA_Zksed, -- implement ShangMi block cypher extension
+ RISCV_ISA_Zksed => RISCV_ISA_Zksed, -- implement ShangMi block cipher extension
RISCV_ISA_Zksh => RISCV_ISA_Zksh, -- implement ShangMi hash extension
RISCV_ISA_Zkt => riscv_zkt_c, -- data-independent execution time available (for cryptographic operations)
RISCV_ISA_Zmmul => RISCV_ISA_Zmmul, -- implement multiply-only M sub-extension
@@ -289,7 +291,9 @@ begin
-- load/store unit interface --
lsu_wait_i => lsu_wait, -- wait for data bus
lsu_mar_i => lsu_mar, -- memory address register
- lsu_err_i => lsu_err -- alignment/access errors
+ lsu_err_i => lsu_err, -- alignment/access errors
+ -- memory synchronization --
+ mem_sync_i => mem_sync_i -- synchronization operation done
);
-- RISC-V machine interrupts --
diff --git a/rtl/core/neorv32_cpu_control.vhd b/rtl/core/neorv32_cpu_control.vhd
index 53e7b5dc1..a163fd24f 100644
--- a/rtl/core/neorv32_cpu_control.vhd
+++ b/rtl/core/neorv32_cpu_control.vhd
@@ -106,7 +106,9 @@ entity neorv32_cpu_control is
-- load/store unit interface --
lsu_wait_i : in std_ulogic; -- wait for data bus
lsu_mar_i : in std_ulogic_vector(XLEN-1 downto 0); -- memory address register
- lsu_err_i : in std_ulogic_vector(3 downto 0) -- alignment/access errors
+ lsu_err_i : in std_ulogic_vector(3 downto 0); -- alignment/access errors
+ -- memory synchronization --
+ mem_sync_i : in std_ulogic -- synchronization operation done
);
end neorv32_cpu_control;
@@ -153,7 +155,7 @@ architecture neorv32_cpu_control_rtl of neorv32_cpu_control is
-- instruction execution engine --
type exe_engine_state_t is (EX_DISPATCH, EX_TRAP_ENTER, EX_TRAP_EXIT, EX_RESTART, EX_SLEEP, EX_EXECUTE,
- EX_ALU_WAIT, EX_BRANCH, EX_BRANCHED, EX_SYSTEM, EX_MEM_REQ, EX_MEM_RSP);
+ EX_ALU_WAIT, EX_FENCE, EX_BRANCH, EX_BRANCHED, EX_SYSTEM, EX_MEM_REQ, EX_MEM_RSP);
type exe_engine_t is record
state : exe_engine_state_t;
ir : std_ulogic_vector(31 downto 0); -- instruction word being executed right now
@@ -161,6 +163,7 @@ architecture neorv32_cpu_control_rtl of neorv32_cpu_control is
pc : std_ulogic_vector(XLEN-1 downto 0); -- current PC (current instruction)
pc2 : std_ulogic_vector(XLEN-1 downto 0); -- next PC (next linear instruction)
ra : std_ulogic_vector(XLEN-1 downto 0); -- return address
+ msync : std_ulogic; -- memory synchronization completed
end record;
signal exe_engine, exe_engine_nxt : exe_engine_t;
@@ -308,7 +311,7 @@ begin
fetch_engine.state <= IF_RESTART;
fetch_engine.restart <= '1'; -- reset IPB and issue engine
fetch_engine.pc <= (others => '0');
- fetch_engine.priv <= '0';
+ fetch_engine.priv <= priv_mode_m_c;
elsif rising_edge(clk_i) then
case fetch_engine.state is
@@ -364,16 +367,15 @@ begin
ipb.we(1) <= '1' when (fetch_engine.state = IF_PENDING) and (fetch_engine.resp = '1') else '0';
-- bus access meta data --
- ibus_req_o.priv <= fetch_engine.priv; -- current effective privilege level
ibus_req_o.data <= (others => '0'); -- read-only
ibus_req_o.ben <= (others => '0'); -- read-only
ibus_req_o.rw <= '0'; -- read-only
- ibus_req_o.src <= '1'; -- source = instruction fetch
+ ibus_req_o.src <= '1'; -- always "instruction fetch" access
+ ibus_req_o.priv <= fetch_engine.priv; -- current effective privilege level
+ ibus_req_o.debug <= debug_ctrl.run; -- debug mode, valid without STB being set
ibus_req_o.amo <= '0'; -- cannot be an atomic memory operation
ibus_req_o.amoop <= (others => '0'); -- cannot be an atomic memory operation
ibus_req_o.fence <= ctrl.if_fence; -- fence operation, valid without STB being set
- ibus_req_o.sleep <= sleep_mode; -- sleep mode, valid without STB being set
- ibus_req_o.debug <= debug_ctrl.run; -- debug mode, valid without STB being set
-- Instruction Prefetch Buffer (FIFO) -----------------------------------------------------
@@ -555,6 +557,7 @@ begin
exe_engine.pc <= BOOT_ADDR(XLEN-1 downto 2) & "00"; -- 32-bit-aligned boot address
exe_engine.pc2 <= BOOT_ADDR(XLEN-1 downto 2) & "00"; -- 32-bit-aligned boot address
exe_engine.ra <= (others => '0');
+ exe_engine.msync <= '0';
elsif rising_edge(clk_i) then
ctrl <= ctrl_nxt;
exe_engine <= exe_engine_nxt;
@@ -573,7 +576,7 @@ begin
-- Execute Engine FSM Comb ----------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
execute_engine_fsm_comb: process(exe_engine, debug_ctrl, trap_ctrl, hw_trigger_match, opcode, issue_engine, csr,
- ctrl, alu_cp_done_i, lsu_wait_i, alu_add_i, branch_taken, pmp_fault_i)
+ ctrl, alu_cp_done_i, lsu_wait_i, alu_add_i, branch_taken, pmp_fault_i, mem_sync_i)
variable funct3_v : std_ulogic_vector(2 downto 0);
variable funct7_v : std_ulogic_vector(6 downto 0);
begin
@@ -588,6 +591,7 @@ begin
exe_engine_nxt.pc <= exe_engine.pc;
exe_engine_nxt.pc2 <= exe_engine.pc2;
exe_engine_nxt.ra <= (others => '0'); -- output zero if not a branch instruction
+ exe_engine_nxt.msync <= mem_sync_i and (not ctrl.lsu_fence);
issue_engine.ack <= '0';
fetch_engine.reset <= '0';
trap_ctrl.env_enter <= '0';
@@ -752,9 +756,8 @@ begin
-- memory fence operations (execute even if illegal funct3) --
when opcode_fence_c =>
- ctrl_nxt.if_fence <= exe_engine.ir(instr_funct3_lsb_c); -- fence.i
- ctrl_nxt.lsu_fence <= not exe_engine.ir(instr_funct3_lsb_c); -- fence
- exe_engine_nxt.state <= EX_RESTART; -- reset instruction fetch + IPB (actually only required for fence.i)
+ ctrl_nxt.lsu_fence <= '1'; -- load/store fence (always executed)
+ exe_engine_nxt.state <= EX_FENCE;
-- FPU: floating-point operations --
when opcode_fop_c =>
@@ -785,6 +788,17 @@ begin
exe_engine_nxt.state <= EX_DISPATCH;
end if;
+ when EX_FENCE => -- wait for LOAD/STORE memory synchronization
+ -- ------------------------------------------------------------
+ if (exe_engine.msync = '1') then -- wait for pending synchronization request to complete
+ if (exe_engine.ir(instr_funct3_lsb_c) = '0') then -- fence
+ exe_engine_nxt.state <= EX_DISPATCH;
+ else -- fence.i
+ ctrl_nxt.if_fence <= '1'; -- instruction-fetch fence
+ exe_engine_nxt.state <= EX_RESTART; -- reset instruction fetch + IPB
+ end if;
+ end if;
+
when EX_BRANCH => -- update next PC on taken branches and jumps
-- ------------------------------------------------------------
exe_engine_nxt.ra <= exe_engine.pc2(XLEN-1 downto 1) & '0'; -- output return address
diff --git a/rtl/core/neorv32_cpu_lsu.vhd b/rtl/core/neorv32_cpu_lsu.vhd
index 59a0907f2..9f7c37bbd 100644
--- a/rtl/core/neorv32_cpu_lsu.vhd
+++ b/rtl/core/neorv32_cpu_lsu.vhd
@@ -78,6 +78,7 @@ begin
if (rstn_i = '0') then
dbus_req_o.rw <= '0';
dbus_req_o.priv <= priv_mode_m_c;
+ dbus_req_o.debug <= '0';
dbus_req_o.amo <= '0';
dbus_req_o.amoop <= (others => '0');
dbus_req_o.data <= (others => '0');
@@ -87,6 +88,7 @@ begin
-- type identifiers --
dbus_req_o.rw <= ctrl_i.lsu_rw; -- read/write
dbus_req_o.priv <= ctrl_i.lsu_priv; -- privilege level
+ dbus_req_o.debug <= ctrl_i.cpu_debug; -- debug-mode access
dbus_req_o.amo <= bool_to_ulogic_f(AMO_EN) and ctrl_i.ir_opcode(2); -- atomic memory operation
dbus_req_o.amoop <= amo_cmd;
-- data alignment + byte-enable --
@@ -108,11 +110,11 @@ begin
end if;
end process mem_do_reg;
- dbus_req_o.src <= '0'; -- 0 = data access
- dbus_req_o.fence <= ctrl_i.lsu_fence; -- out-of-band: this is valid without STB being set
- dbus_req_o.sleep <= ctrl_i.cpu_sleep; -- out-of-band: this is valid without STB being set
- dbus_req_o.debug <= ctrl_i.cpu_debug; -- out-of-band: this is valid without STB being set
+ -- hardwired signals --
+ dbus_req_o.src <= '0'; -- always "data" access
+ -- out-of band signals --
+ dbus_req_o.fence <= ctrl_i.lsu_fence;
-- atomic memory access operation encoding --
amo_encode: process(ctrl_i.ir_funct12)
diff --git a/rtl/core/neorv32_dma.vhd b/rtl/core/neorv32_dma.vhd
index e7c9e6248..f2de69a80 100644
--- a/rtl/core/neorv32_dma.vhd
+++ b/rtl/core/neorv32_dma.vhd
@@ -303,11 +303,10 @@ begin
dma_req_o.addr <= engine.src_addr when (engine.state = S_READ) else engine.dst_addr;
dma_req_o.src <= '0'; -- source = data access
dma_req_o.priv <= priv_mode_m_c; -- DMA accesses are always privileged
+ dma_req_o.debug <= '0'; -- can never ever be in debug mode
dma_req_o.amo <= '0'; -- no atomic memory operation possible
dma_req_o.amoop <= (others => '0'); -- no atomic memory operation possible
- dma_req_o.fence <= '0'; -- no fences
- dma_req_o.sleep <= '1' when (engine.state = S_IDLE) else '0'; -- idle = sleep mode
- dma_req_o.debug <= '0'; -- can never ever be in debug mode
+ dma_req_o.fence <= '0';
-- address increment --
address_inc: process(cfg.qsel)
diff --git a/rtl/core/neorv32_gpio.vhd b/rtl/core/neorv32_gpio.vhd
index 2c3ba5613..c1fa1992a 100644
--- a/rtl/core/neorv32_gpio.vhd
+++ b/rtl/core/neorv32_gpio.vhd
@@ -144,4 +144,4 @@ begin
end process irq_buffer;
-end neorv32_gpio_rtl;
\ No newline at end of file
+end neorv32_gpio_rtl;
diff --git a/rtl/core/neorv32_package.vhd b/rtl/core/neorv32_package.vhd
index bcff4fcfb..853d93029 100644
--- a/rtl/core/neorv32_package.vhd
+++ b/rtl/core/neorv32_package.vhd
@@ -29,7 +29,7 @@ package neorv32_package is
-- Architecture Constants -----------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
- constant hw_version_c : std_ulogic_vector(31 downto 0) := x"01110007"; -- hardware version
+ constant hw_version_c : std_ulogic_vector(31 downto 0) := x"01110008"; -- hardware version
constant archid_c : natural := 19; -- official RISC-V architecture ID
constant XLEN : natural := 32; -- native data path width
@@ -123,20 +123,19 @@ package neorv32_package is
data : std_ulogic_vector(31 downto 0); -- write data
ben : std_ulogic_vector(3 downto 0); -- byte enable
stb : std_ulogic; -- request strobe, single-shot
- rw : std_ulogic; -- 0=read, 1=write
- src : std_ulogic; -- access source (1=instruction fetch, 0=data access)
+ rw : std_ulogic; -- 0 = read, 1 = write
+ src : std_ulogic; -- 0 = data access, 1 = instruction fetch
priv : std_ulogic; -- set if privileged (machine-mode) access
+ debug : std_ulogic; -- set if debug mode access
amo : std_ulogic; -- set if atomic memory operation
amoop : std_ulogic_vector(3 downto 0); -- type of atomic memory operation
-- out-of-band signals --
- fence : std_ulogic; -- set if fence(.i) request by upstream device, single-shot
- sleep : std_ulogic; -- set if ALL upstream sources are in sleep mode
- debug : std_ulogic; -- set if upstream device is in debug mode
+ fence : std_ulogic; -- set if fence(.i) operation, single-shot
end record;
-- bus response --
type bus_rsp_t is record
- data : std_ulogic_vector(31 downto 0); -- read data, valid if ack=1
+ data : std_ulogic_vector(31 downto 0); -- read data, valid if ack = 1
ack : std_ulogic; -- set if access acknowledge, single-shot
err : std_ulogic; -- set if access error, single-shot, has priority over ack
end record;
@@ -150,11 +149,10 @@ package neorv32_package is
rw => '0',
src => '0',
priv => '0',
+ debug => '0',
amo => '0',
amoop => (others => '0'),
- fence => '0',
- sleep => '1',
- debug => '0'
+ fence => '0'
);
-- endpoint (response) termination --
diff --git a/rtl/core/neorv32_sysinfo.vhd b/rtl/core/neorv32_sysinfo.vhd
index 4986495bc..4d8b0eff5 100644
--- a/rtl/core/neorv32_sysinfo.vhd
+++ b/rtl/core/neorv32_sysinfo.vhd
@@ -115,7 +115,7 @@ begin
sysinfo(2)(7) <= '0'; -- reserved
sysinfo(2)(8) <= '1' when xcache_en_c else '0'; -- external bus interface cache implemented?
sysinfo(2)(9) <= '0'; -- reserved
- sysinfo(2)(10) <= '0'; -- reservedented?
+ sysinfo(2)(10) <= '0'; -- reserved
sysinfo(2)(11) <= '1' when ocd_auth_en_c else '0'; -- on-chip debugger authentication implemented?
sysinfo(2)(12) <= '1' when int_imem_rom_c else '0'; -- processor-internal instruction memory implemented as pre-initialized ROM?
sysinfo(2)(13) <= '1' when IO_TWD_EN else '0'; -- two-wire device (TWD) implemented?
diff --git a/rtl/core/neorv32_top.vhd b/rtl/core/neorv32_top.vhd
index 4af56902d..6ee87077e 100644
--- a/rtl/core/neorv32_top.vhd
+++ b/rtl/core/neorv32_top.vhd
@@ -316,6 +316,10 @@ architecture neorv32_top_rtl of neorv32_top is
signal iodev_req : iodev_req_t;
signal iodev_rsp : iodev_rsp_t;
+ -- memory synchronization / ordering / coherence --
+ signal mem_sync, dcache_clean : std_ulogic_vector(num_cores_c-1 downto 0);
+ signal xcache_clean : std_ulogic;
+
-- IRQs --
type firq_enum_t is (
FIRQ_TWD, FIRQ_UART0_RX, FIRQ_UART0_TX, FIRQ_UART1_RX, FIRQ_UART1_TX, FIRQ_SPI, FIRQ_SDI, FIRQ_TWI,
@@ -542,9 +546,14 @@ begin
ibus_rsp_i => cpu_i_rsp(i),
-- data bus interface --
dbus_req_o => cpu_d_req(i),
- dbus_rsp_i => cpu_d_rsp(i)
+ dbus_rsp_i => cpu_d_rsp(i),
+ -- memory synchronization --
+ mem_sync_i => mem_sync(i)
);
+ -- memory synchronization (ordering / coherence) --
+ mem_sync(i) <= dcache_clean(i) and xcache_clean;
+
-- CPU L1 Instruction Cache (I-Cache) -----------------------------------------------------
-- -------------------------------------------------------------------------------------------
@@ -555,12 +564,12 @@ begin
NUM_BLOCKS => ICACHE_NUM_BLOCKS,
BLOCK_SIZE => ICACHE_BLOCK_SIZE,
UC_BEGIN => mem_uncached_begin_c(31 downto 28),
- UC_ENABLE => true,
READ_ONLY => true
)
port map (
clk_i => clk_i,
rstn_i => rstn_sys,
+ clean_o => open, -- cache is read-only so it cannot be dirty
host_req_i => cpu_i_req(i),
host_rsp_o => cpu_i_rsp(i),
bus_req_o => icache_req(i),
@@ -584,12 +593,12 @@ begin
NUM_BLOCKS => DCACHE_NUM_BLOCKS,
BLOCK_SIZE => DCACHE_BLOCK_SIZE,
UC_BEGIN => mem_uncached_begin_c(31 downto 28),
- UC_ENABLE => true,
READ_ONLY => false
)
port map (
clk_i => clk_i,
rstn_i => rstn_sys,
+ clean_o => dcache_clean(i),
host_req_i => cpu_d_req(i),
host_rsp_o => cpu_d_rsp(i),
bus_req_o => dcache_req(i),
@@ -599,8 +608,9 @@ begin
neorv32_dcache_disabled:
if not DCACHE_EN generate
- dcache_req(i) <= cpu_d_req(i);
- cpu_d_rsp(i) <= dcache_rsp(i);
+ dcache_clean(i) <= '1';
+ dcache_req(i) <= cpu_d_req(i);
+ cpu_d_rsp(i) <= dcache_rsp(i);
end generate;
@@ -613,15 +623,14 @@ begin
PORT_B_READ_ONLY => true -- instruction fetch is read-only
)
port map (
- clk_i => clk_i,
- rstn_i => rstn_sys,
- a_lock_i => '0', -- no exclusive accesses
- a_req_i => dcache_req(i), -- data accesses are prioritized
- a_rsp_o => dcache_rsp(i),
- b_req_i => icache_req(i),
- b_rsp_o => icache_rsp(i),
- x_req_o => core_req(i),
- x_rsp_i => core_rsp(i)
+ clk_i => clk_i,
+ rstn_i => rstn_sys,
+ a_req_i => dcache_req(i), -- data accesses are prioritized
+ a_rsp_o => dcache_rsp(i),
+ b_req_i => icache_req(i),
+ b_rsp_o => icache_rsp(i),
+ x_req_o => core_req(i),
+ x_rsp_i => core_rsp(i)
);
end generate; -- /core_complex
@@ -647,15 +656,14 @@ begin
PORT_B_READ_ONLY => false
)
port map (
- clk_i => clk_i,
- rstn_i => rstn_sys,
- a_lock_i => '0',
- a_req_i => core_req(core_req'left),
- a_rsp_o => core_rsp(core_rsp'left),
- b_req_i => core_req(core_req'right),
- b_rsp_o => core_rsp(core_rsp'right),
- x_req_o => sys1_req,
- x_rsp_i => sys1_rsp
+ clk_i => clk_i,
+ rstn_i => rstn_sys,
+ a_req_i => core_req(core_req'left),
+ a_rsp_o => core_rsp(core_rsp'left),
+ b_req_i => core_req(core_req'right),
+ b_rsp_o => core_rsp(core_rsp'right),
+ x_req_o => sys1_req,
+ x_rsp_i => sys1_rsp
);
end generate;
@@ -697,15 +705,14 @@ begin
PORT_B_READ_ONLY => false
)
port map (
- clk_i => clk_i,
- rstn_i => rstn_sys,
- a_lock_i => '0', -- no exclusive accesses
- a_req_i => sys1_req, -- prioritized
- a_rsp_o => sys1_rsp,
- b_req_i => dma_req,
- b_rsp_o => dma_rsp,
- x_req_o => sys2_req,
- x_rsp_i => sys2_rsp
+ clk_i => clk_i,
+ rstn_i => rstn_sys,
+ a_req_i => sys1_req, -- prioritized
+ a_rsp_o => sys1_rsp,
+ b_req_i => dma_req,
+ b_rsp_o => dma_rsp,
+ x_req_o => sys2_req,
+ x_rsp_i => sys2_rsp
);
end generate; -- /neorv32_dma_complex_enabled
@@ -876,12 +883,12 @@ begin
NUM_BLOCKS => XBUS_CACHE_NUM_BLOCKS,
BLOCK_SIZE => XBUS_CACHE_BLOCK_SIZE,
UC_BEGIN => mem_uncached_begin_c(31 downto 28),
- UC_ENABLE => true,
READ_ONLY => false
)
port map (
clk_i => clk_i,
rstn_i => rstn_sys,
+ clean_o => xcache_clean,
host_req_i => xbus_req,
host_rsp_o => xbus_rsp,
bus_req_o => xcache_req,
@@ -891,22 +898,25 @@ begin
neorv32_xcache_disabled:
if not XBUS_CACHE_EN generate
- xcache_req <= xbus_req;
- xbus_rsp <= xcache_rsp;
+ xcache_clean <= '1';
+ xcache_req <= xbus_req;
+ xbus_rsp <= xcache_rsp;
end generate;
end generate; -- /neorv32_xbus_enabled
neorv32_xbus_disabled:
if not XBUS_EN generate
- xbus_rsp <= rsp_terminate_c;
- xbus_adr_o <= (others => '0');
- xbus_dat_o <= (others => '0');
- xbus_tag_o <= (others => '0');
- xbus_we_o <= '0';
- xbus_sel_o <= (others => '0');
- xbus_stb_o <= '0';
- xbus_cyc_o <= '0';
+ xcache_clean <= '1';
+ xcache_req <= req_terminate_c;
+ xbus_rsp <= rsp_terminate_c;
+ xbus_adr_o <= (others => '0');
+ xbus_dat_o <= (others => '0');
+ xbus_tag_o <= (others => '0');
+ xbus_we_o <= '0';
+ xbus_sel_o <= (others => '0');
+ xbus_stb_o <= '0';
+ xbus_cyc_o <= '0';
end generate;
end generate; -- /memory_system
diff --git a/rtl/core/neorv32_wdt.vhd b/rtl/core/neorv32_wdt.vhd
index b4827bc66..8c1fb0b9d 100644
--- a/rtl/core/neorv32_wdt.vhd
+++ b/rtl/core/neorv32_wdt.vhd
@@ -3,7 +3,7 @@
-- -------------------------------------------------------------------------------- --
-- The NEORV32 RISC-V Processor - https://github.com/stnolting/neorv32 --
-- Copyright (c) NEORV32 contributors. --
--- Copyright (c) 2020 - 2024 Stephan Nolting. All rights reserved. --
+-- Copyright (c) 2020 - 2025 Stephan Nolting. All rights reserved. --
-- Licensed under the BSD-3-Clause license, see LICENSE for details. --
-- SPDX-License-Identifier: BSD-3-Clause --
-- ================================================================================ --
@@ -37,12 +37,10 @@ architecture neorv32_wdt_rtl of neorv32_wdt is
-- Control register bits --
constant ctrl_enable_c : natural := 0; -- r/w: WDT enable
constant ctrl_lock_c : natural := 1; -- r/w: lock write access to control register when set
- constant ctrl_dben_c : natural := 2; -- r/w: allow WDT to continue operation even when CPU is in debug mode
- constant ctrl_sen_c : natural := 3; -- r/w: allow WDT to continue operation even when CPU is in sleep mode
- constant ctrl_strict_c : natural := 4; -- r/w: force hardware reset if reset password is incorrect or if access to locked config
- constant ctrl_rcause_lo_c : natural := 5; -- r/-: cause of last system reset - low
- constant ctrl_rcause_hi_c : natural := 6; -- r/-: cause of last system reset - high
---constant ctrl_reserved_c : natural := 7; -- r/-: reserved
+ constant ctrl_strict_c : natural := 2; -- r/w: force hardware reset if reset password is incorrect or if access to locked config
+ constant ctrl_rcause_lo_c : natural := 3; -- r/-: cause of last system reset - low
+ constant ctrl_rcause_hi_c : natural := 4; -- r/-: cause of last system reset - high
+ --
constant ctrl_timeout_lsb_c : natural := 8; -- r/w: timeout value LSB
constant ctrl_timeout_msb_c : natural := 31; -- r/w: timeout value MSB
@@ -50,8 +48,6 @@ architecture neorv32_wdt_rtl of neorv32_wdt is
type ctrl_t is record
enable : std_ulogic;
lock : std_ulogic;
- dben : std_ulogic;
- sen : std_ulogic;
strict : std_ulogic;
timeout : std_ulogic_vector(23 downto 0);
end record;
@@ -61,7 +57,6 @@ architecture neorv32_wdt_rtl of neorv32_wdt is
signal cnt : std_ulogic_vector(23 downto 0); -- timeout counter
signal cnt_started : std_ulogic; -- set when timeout counter has started
signal cnt_inc : std_ulogic; -- increment counter when set
- signal cnt_inc_ff : std_ulogic;
signal cnt_timeout : std_ulogic; -- counter matches programmed timeout value
signal reset_cause : std_ulogic_vector(1 downto 0); -- cause of last reset
signal hw_rst_timeout : std_ulogic; -- trigger reset because of timeout
@@ -79,8 +74,6 @@ begin
bus_rsp_o <= rsp_terminate_c;
ctrl.enable <= '0'; -- disable WDT after reset
ctrl.lock <= '0'; -- unlock after reset
- ctrl.dben <= '0';
- ctrl.sen <= '0';
ctrl.strict <= '0';
ctrl.timeout <= (others => '0');
reset_wdt <= '0';
@@ -100,8 +93,6 @@ begin
if (ctrl.lock = '0') then -- update configuration only if not locked
ctrl.enable <= bus_req_i.data(ctrl_enable_c);
ctrl.lock <= bus_req_i.data(ctrl_lock_c) and ctrl.enable; -- lock only if already enabled
- ctrl.dben <= bus_req_i.data(ctrl_dben_c);
- ctrl.sen <= bus_req_i.data(ctrl_sen_c);
ctrl.strict <= bus_req_i.data(ctrl_strict_c);
ctrl.timeout <= bus_req_i.data(ctrl_timeout_msb_c downto ctrl_timeout_lsb_c);
else -- write access attempt to locked CTRL register
@@ -117,8 +108,6 @@ begin
else -- read access
bus_rsp_o.data(ctrl_enable_c) <= ctrl.enable;
bus_rsp_o.data(ctrl_lock_c) <= ctrl.lock;
- bus_rsp_o.data(ctrl_dben_c) <= ctrl.dben;
- bus_rsp_o.data(ctrl_sen_c) <= ctrl.sen;
bus_rsp_o.data(ctrl_rcause_hi_c downto ctrl_rcause_lo_c) <= reset_cause;
bus_rsp_o.data(ctrl_strict_c) <= ctrl.strict;
bus_rsp_o.data(ctrl_timeout_msb_c downto ctrl_timeout_lsb_c) <= ctrl.timeout;
@@ -133,15 +122,15 @@ begin
wdt_counter: process(rstn_sys_i, clk_i)
begin
if (rstn_sys_i = '0') then
- cnt_inc_ff <= '0';
+ cnt_inc <= '0';
cnt_started <= '0';
cnt <= (others => '0');
elsif rising_edge(clk_i) then
- cnt_inc_ff <= cnt_inc;
+ cnt_inc <= prsc_tick and cnt_started; -- clock tick and started
cnt_started <= ctrl.enable and (cnt_started or prsc_tick); -- start with next clock tick
if (ctrl.enable = '0') or (reset_wdt = '1') then -- watchdog disabled or reset with correct password
cnt <= (others => '0');
- elsif (cnt_inc_ff = '1') then
+ elsif (cnt_inc = '1') then
cnt <= std_ulogic_vector(unsigned(cnt) + 1);
end if;
end if;
@@ -151,11 +140,6 @@ begin
clkgen_en_o <= ctrl.enable; -- enable clock generator
prsc_tick <= clkgen_i(clk_div4096_c); -- clock enable tick
- -- valid counter increment? --
- cnt_inc <= '1' when ((prsc_tick = '1') and (cnt_started = '1')) and -- clock tick and started
- ((bus_req_i.debug = '0') or (ctrl.dben = '1')) and -- not in debug mode or allowed to run in debug mode
- ((bus_req_i.sleep = '0') or (ctrl.sen = '1')) else '0'; -- not in sleep mode or allowed to run in sleep mode
-
-- timeout detector --
cnt_timeout <= '1' when (cnt_started = '1') and (cnt = ctrl.timeout) else '0';
diff --git a/rtl/core/neorv32_xbus.vhd b/rtl/core/neorv32_xbus.vhd
index 0afd83d2b..cc65f37eb 100644
--- a/rtl/core/neorv32_xbus.vhd
+++ b/rtl/core/neorv32_xbus.vhd
@@ -150,7 +150,11 @@ begin
xbus_sel_o <= bus_req.ben;
xbus_stb_o <= bus_req.stb;
xbus_cyc_o <= bus_req.stb or pending(1);
- xbus_tag_o <= bus_req.src & '0' & bus_req.priv; -- instr/data, secure, privileged/unprivileged
+
+ -- access meta data (compatible to AXI4 "xPROT") --
+ xbus_tag_o(2) <= bus_req.src; -- 0 = data access, 1 = instruction fetch
+ xbus_tag_o(1) <= '0'; -- always "secure" access
+ xbus_tag_o(0) <= bus_req.priv or bus_req.debug; -- 0 = unprivileged access, 1 = privileged access
-- response gating --
bus_rsp.data <= xbus_dat_i when (pending(1) = '1') else (others => '0');
diff --git a/rtl/file_list_soc.f b/rtl/file_list_soc.f
index b66bda255..88c4ae4ec 100644
--- a/rtl/file_list_soc.f
+++ b/rtl/file_list_soc.f
@@ -17,8 +17,8 @@
NEORV32_RTL_PATH_PLACEHOLDER/core/neorv32_cpu_pmp.vhd
NEORV32_RTL_PATH_PLACEHOLDER/core/neorv32_cpu_icc.vhd
NEORV32_RTL_PATH_PLACEHOLDER/core/neorv32_cpu.vhd
-NEORV32_RTL_PATH_PLACEHOLDER/core/neorv32_bus.vhd
NEORV32_RTL_PATH_PLACEHOLDER/core/neorv32_cache.vhd
+NEORV32_RTL_PATH_PLACEHOLDER/core/neorv32_bus.vhd
NEORV32_RTL_PATH_PLACEHOLDER/core/neorv32_dma.vhd
NEORV32_RTL_PATH_PLACEHOLDER/core/neorv32_application_image.vhd
NEORV32_RTL_PATH_PLACEHOLDER/core/neorv32_imem.vhd
diff --git a/sw/example/demo_wdt/main.c b/sw/example/demo_wdt/main.c
index 95d7de38f..eaf617851 100644
--- a/sw/example/demo_wdt/main.c
+++ b/sw/example/demo_wdt/main.c
@@ -1,7 +1,7 @@
// ================================================================================ //
// The NEORV32 RISC-V Processor - https://github.com/stnolting/neorv32 //
// Copyright (c) NEORV32 contributors. //
-// Copyright (c) 2020 - 2024 Stephan Nolting. All rights reserved. //
+// Copyright (c) 2020 - 2025 Stephan Nolting. All rights reserved. //
// Licensed under the BSD-3-Clause license, see LICENSE for details. //
// SPDX-License-Identifier: BSD-3-Clause //
// ================================================================================ //
@@ -82,9 +82,9 @@ int main() {
return -1;
}
- // setup watchdog: no lock, disable in debug mode, enable in sleep mode, enable strict mode
+ // setup watchdog: no lock, enable strict mode
neorv32_uart0_puts("Starting WDT...\n");
- neorv32_wdt_setup(timeout, 0, 0, 1, 1);
+ neorv32_wdt_setup(timeout, 0, 1);
// feed the watchdog
diff --git a/sw/example/processor_check/main.c b/sw/example/processor_check/main.c
index ee47ba02f..78c01197e 100644
--- a/sw/example/processor_check/main.c
+++ b/sw/example/processor_check/main.c
@@ -80,8 +80,6 @@ volatile uint32_t num_hpm_cnts_global = 0; // global number of available hpms
volatile int vectored_mei_handler_ack = 0; // vectored mei trap handler acknowledge
volatile uint32_t gpio_trap_handler_ack = 0; // gpio trap handler acknowledge
volatile uint32_t hw_brk_mscratch_ok = 0; // set when mepc was correct in trap handler
-
-
volatile uint32_t dma_src; // dma source & destination data
volatile uint32_t store_access_addr[2]; // variable to test store accesses
volatile uint32_t __attribute__((aligned(4))) pmp_access[2]; // variable to test pmp
@@ -281,6 +279,32 @@ int main() {
}
+ // ----------------------------------------------------------
+ // Test fence instructions
+ // ----------------------------------------------------------
+ neorv32_cpu_csr_write(CSR_MCAUSE, mcause_never_c);
+ PRINT_STANDARD("[%i] Fences ", cnt_test);
+
+ cnt_test++;
+
+ // test that we do no crash the core and check if cache flushing works
+ store_access_addr[0] = 0x01234567;
+ asm volatile ("fence");
+ asm volatile ("fence.i");
+ store_access_addr[0] += 0x76543210;
+ asm volatile ("fence");
+ asm volatile ("fence.i");
+ store_access_addr[0] += 0x11111111;
+
+ if ((store_access_addr[0] == 0x88888888) &&
+ (neorv32_cpu_csr_read(CSR_MCAUSE) == mcause_never_c)) { // no exception
+ test_ok();
+ }
+ else {
+ test_fail();
+ }
+
+
// ----------------------------------------------------------
// Test standard RISC-V counters
// ----------------------------------------------------------
diff --git a/sw/lib/include/neorv32_wdt.h b/sw/lib/include/neorv32_wdt.h
index a7fab5c15..1ca449597 100644
--- a/sw/lib/include/neorv32_wdt.h
+++ b/sw/lib/include/neorv32_wdt.h
@@ -1,7 +1,7 @@
// ================================================================================ //
// The NEORV32 RISC-V Processor - https://github.com/stnolting/neorv32 //
// Copyright (c) NEORV32 contributors. //
-// Copyright (c) 2020 - 2024 Stephan Nolting. All rights reserved. //
+// Copyright (c) 2020 - 2025 Stephan Nolting. All rights reserved. //
// Licensed under the BSD-3-Clause license, see LICENSE for details. //
// SPDX-License-Identifier: BSD-3-Clause //
// ================================================================================ //
@@ -9,10 +9,6 @@
/**
* @file neorv32_wdt.h
* @brief Watchdog Timer (WDT) HW driver header file.
- *
- * @note These functions should only be used if the WDT unit was synthesized (IO_WDT_EN = true).
- *
- * @see https://stnolting.github.io/neorv32/sw/files.html
*/
#ifndef neorv32_wdt_h
@@ -38,11 +34,9 @@ typedef volatile struct __attribute__((packed,aligned(4))) {
enum NEORV32_WDT_CTRL_enum {
WDT_CTRL_EN = 0, /**< WDT control register(0) (r/w): Watchdog enable */
WDT_CTRL_LOCK = 1, /**< WDT control register(1) (r/w): Lock write access to control register, clears on reset only */
- WDT_CTRL_DBEN = 2, /**< WDT control register(2) (r/w): Allow WDT to continue operation even when CPU is in debug mode */
- WDT_CTRL_SEN = 3, /**< WDT control register(3) (r/w): Allow WDT to continue operation even when CPU is in sleep mode */
- WDT_CTRL_STRICT = 4, /**< WDT control register(4) (r/w): Force hardware reset if reset password is incorrect or if write attempt to locked CTRL register */
- WDT_CTRL_RCAUSE_LO = 5, /**< WDT control register(5) (r/-): Cause of last system reset - low */
- WDT_CTRL_RCAUSE_HI = 6, /**< WDT control register(5) (r/-): Cause of last system reset - high */
+ WDT_CTRL_STRICT = 2, /**< WDT control register(2) (r/w): Force hardware reset if reset password is incorrect or if write attempt to locked CTRL register */
+ WDT_CTRL_RCAUSE_LO = 3, /**< WDT control register(3) (r/-): Cause of last system reset - low */
+ WDT_CTRL_RCAUSE_HI = 4, /**< WDT control register(4) (r/-): Cause of last system reset - high */
WDT_CTRL_TIMEOUT_LSB = 8, /**< WDT control register(8) (r/w): Timeout value, LSB */
WDT_CTRL_TIMEOUT_MSB = 31 /**< WDT control register(31) (r/w): Timeout value, MSB */
@@ -72,7 +66,7 @@ enum NEORV32_WDT_RCAUSE_enum {
**************************************************************************/
/**@{*/
int neorv32_wdt_available(void);
-void neorv32_wdt_setup(uint32_t timeout, int lock, int debug_en, int sleep_en, int strict);
+void neorv32_wdt_setup(uint32_t timeout, int lock, int strict);
int neorv32_wdt_disable(void);
void neorv32_wdt_feed(uint32_t password);
int neorv32_wdt_get_cause(void);
diff --git a/sw/lib/source/neorv32_wdt.c b/sw/lib/source/neorv32_wdt.c
index 58cacee71..af807dbb9 100644
--- a/sw/lib/source/neorv32_wdt.c
+++ b/sw/lib/source/neorv32_wdt.c
@@ -38,11 +38,9 @@ int neorv32_wdt_available(void) {
* @param[in] timeout 24-bit timeout value. A WDT IRQ is triggered when the internal counter reaches
* 'timeout/2'. A system hardware reset is triggered when the internal counter reaches 'timeout'.
* @param[in] lock Control register will be locked when 1 (until next reset).
- * @param[in] debug_en Allow watchdog to continue operation even when CPU is in debug mode.
- * @param[in] sleep_en Allow watchdog to continue operation even when CPU is in sleep mode.
* @param[in] strict Force hardware reset if reset password is incorrect or if trying to alter a locked configuration.
**************************************************************************/
-void neorv32_wdt_setup(uint32_t timeout, int lock, int debug_en, int sleep_en, int strict) {
+void neorv32_wdt_setup(uint32_t timeout, int lock, int strict) {
NEORV32_WDT->CTRL = 0; // reset and disable
@@ -50,8 +48,6 @@ void neorv32_wdt_setup(uint32_t timeout, int lock, int debug_en, int sleep_en, i
uint32_t ctrl = 0;
ctrl |= ((uint32_t)(1)) << WDT_CTRL_EN;
ctrl |= ((uint32_t)(timeout & 0xffffffU)) << WDT_CTRL_TIMEOUT_LSB;
- ctrl |= ((uint32_t)(debug_en & 0x1U)) << WDT_CTRL_DBEN;
- ctrl |= ((uint32_t)(sleep_en & 0x1U)) << WDT_CTRL_SEN;
ctrl |= ((uint32_t)(strict & 0x1U)) << WDT_CTRL_STRICT;
NEORV32_WDT->CTRL = ctrl;
diff --git a/sw/svd/neorv32.svd b/sw/svd/neorv32.svd
index febed041b..2d60918a4 100644
--- a/sw/svd/neorv32.svd
+++ b/sw/svd/neorv32.svd
@@ -1404,24 +1404,14 @@
[1:1]
Lock write access to control register, clears on reset (HW or WDT) only
-
- WDT_CTRL_DBEN
- [2:2]
- Allow WDT to continue operation even when in debug mode
-
-
- WDT_CTRL_SEN
- [3:3]
- Allow WDT to continue operation even when in sleep mode
-
WDT_CTRL_STRICT
- [4:4]
+ [2:2]
Force hardware reset if reset password is incorrect or if write attempt to locked CTRL register
WDT_CTRL_RCAUSE
- [6:5]
+ [4:3]
read-only
Cause of last system reset: 0=external reset, 1=OCD reset, 2=WDT reset, 3=WDT access violation