diff --git a/CHANGELOG.md b/CHANGELOG.md index 872896d94..f3fb1ce38 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -29,6 +29,7 @@ mimpid = 0x01040312 -> Version 01.04.03.12 -> v1.4.3.12 | Date | Version | Comment | Ticket | |:----:|:-------:|:--------|:------:| +| 03.02.2025 | 1.11.0.8 | :sparkles: add explicit memory ordering/coherence support; :warning: remove WDT "halt-on-debug" and "halt-on-sleep" options; :bug: rework cache module fixing several (minor?) design flaws | [#1176](https://github.com/stnolting/neorv32/pull/1176) | | 03.02.2025 | 1.11.0.7 | :bug: add missing CFS clock gen enable signal | [#1177](https://github.com/stnolting/neorv32/pull/1177) | | 01.02.2025 | 1.11.0.6 | :warning: remove XIP module | [#1175](https://github.com/stnolting/neorv32/pull/1175) | | 01.02.2025 | 1.11.0.5 | minor rtl optimizations and cleanups; :warning: remove DMA "fence" feature | [#1174](https://github.com/stnolting/neorv32/pull/1174) | diff --git a/docs/datasheet/cpu.adoc b/docs/datasheet/cpu.adoc index 3fe4529f5..213afc025 100644 --- a/docs/datasheet/cpu.adoc +++ b/docs/datasheet/cpu.adoc @@ -1,3 +1,4 @@ +<<< :sectnums: == NEORV32 Central Processing Unit (CPU) @@ -66,7 +67,7 @@ direction as seen from the CPU. [options="header", grid="rows"] |======================= | Signal | Width/Type | Dir | Description -4+^| **Global Signals** +4+^| **Clock and reset** | `clk_i` | 1 | in | Global clock line, all registers triggering on rising edge. | `rstn_i` | 1 | in | Global reset, low-active. 4+^| **Interrupts (<<_traps_exceptions_and_interrupts>>)** @@ -75,20 +76,17 @@ direction as seen from the CPU. | `mti_i` | 1 | in | RISC-V machine timer interrupt. | `firq_i` | 16 | in | Custom fast interrupt request signals. | `dbi_i` | 1 | in | Request CPU to halt and enter debug mode (RISC-V <<_on_chip_debugger_ocd>>). +4+^| **<<_inter_core_communication_icc>> links** +| `icc_tx_o` | `icc_t` | out | TX link +| `icc_rx_i` | `icc_t` | in | RX link 4+^| **Instruction <<_bus_interface>>** | `ibus_req_o` | `bus_req_t` | out | Instruction fetch bus request. | `ibus_rsp_i` | `bus_rsp_t` | in | Instruction fetch bus response. 4+^| **Data <<_bus_interface>>** | `dbus_req_o` | `bus_req_t` | out | Data access (load/store) bus request. | `dbus_rsp_i` | `bus_rsp_t` | in | Data access (load/store) bus response. -4+^| **<<_inter_core_communication_icc>> TX links** -| `icc_tx_rdy_o` | 2 | out | Data available for cores `0..1`. -| `icc_tx_ack_i` | 2 | in | Read-enable from cores `0..1`. -| `icc_tx_dat_o` | 2*32 | out | Data for cores `0..1`. -4+^| **<<_inter_core_communication_icc>> RX links** -| `icc_rx_rdy_i` | 2 | in | Data available from cores `0..1`. -| `icc_rx_ack_o` | 2 | out | Read-enable for cores `0..1`. -| `icc_rx_dat_i` | 2*32 | in | Data from cores `0..1`. +4+^| **<<_memory_coherence>> status** +| `mem_sync_i` | 1 | in | Requested coherence established when set (single-shot) |======================= .Bus Interface Protocol @@ -424,12 +422,11 @@ always valid when set. | `rw` | 1 | Access direction (`0` = read, `1` = write) | `src` | 1 | Access source (`0` = instruction fetch, `1` = load/store) | `priv` | 1 | Set if privileged (M-mode) access +| `debug` | 1 | Set if debug mode access | `amo` | 1 | Set if current access is an atomic memory operation (<<_atomic_memory_access>>) | `amoop` | 4 | Type of atomic memory operation (<<_atomic_memory_access>>) 3+^| **Out-Of-Band Signals** -| `fence` | 1 | Data/instruction fence request; single-shot -| `sleep` | 1 | Set if ALL upstream devices are in <<_sleep_mode>> -| `debug` | 1 | Set if the upstream device is in debug-mode +| `fence` | 1 | Data (load/store; `fence`) or instruction (instruction-fetch; `fence.i`) fence request; single-shot; see <<_memory_coherence>> |======================= .Bus Interface - Response Bus (`bus_rsp_t`) @@ -463,7 +460,7 @@ The figure below shows three exemplary bus accesses: . A write access to address `B_addr` writing `wdata` (fastest response; `ACK` arrives right in the next cycle). . A failing read access to address `C_addr` (slow response; `ERR` arrives after several cycles). -.Three Exemplary Bus Transactions (showing only in-band signals) +.Three Exemplary Bus Transactions (showing only in-band signals; privileged non-debug non-atomic accesses) image::bus_interface.png[700] .Adding Register Stages @@ -501,8 +498,8 @@ operation: .Cache Coherency [IMPORTANT] -Atomic operations **always bypass** the CPU caches using direct/uncached accesses. Care must be taken -to maintain data <<_cache_coherency>>. +Atomic operations **always bypass** the (CPU) caches using direct/uncached accesses. Care must be taken +to maintain data synchronization. See section <<_memory_coherence>> for more information. <<< @@ -632,7 +629,7 @@ The `I` ISA extensions is the base RISC-V integer ISA that is always enabled. | Jump/call | `jal[r]` | 6 | Load/store | `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw` | 5 | System | `ecall` `ebreak` | 3 -| Data fence | `fence` | 5 +| Data fence | `fence` | depends on the memory system | System | `wfi` | 3 | System | `mret` | 5 | Illegal inst. | - | 3 @@ -641,10 +638,10 @@ The `I` ISA extensions is the base RISC-V integer ISA that is always enabled. .`fence` Instruction [NOTE] Analogous to the `fence.i` instruction (<<_zifencei_isa_extension>>) the `fence` instruction triggers -a data cache synchronization operation. See section <<_cache_coherency>> for more information. -Furthermore, the `fence` instruction word's _predecessor_ and _successor_ bits (used for memory ordering) -are not evaluated by the hardware at all. - +a load/store memory synchronization operation. The CPU will stall until the requested coherence is +established (`mem_sync_i` goes high). See section <<_memory_coherence>> for more information. +NEORV32 ignores the predecessor and successor fields and always executes a conservative fence on all +operations. .`wfi` Instruction [NOTE] @@ -716,16 +713,16 @@ The instruction word's `aq` and `lr` memory ordering bits are not evaluated by t ==== `Zifencei` ISA Extension The `Zifencei` CPU extension allows manual synchronization of the instruction stream. This extension is always enabled. - -Analogous to the `fence` instruction the `fence.i` instruction triggers an instruction cache synchronization operation. -See section <<_cache_coherency>> for more information. +This instruction is the only standard mechanism to ensure that stores visible to a hart will also be visible to its +instruction fetches. The CPU will stall until the requested coherence is established (`mem_sync_i` goes high). +See section <<_memory_coherence>> for more information. .Instructions and Timing [cols="<2,<4,<3"] [options="header", grid="rows"] |======================= | Class | Instructions | Execution cycles -| Instruction fence | `fence.i` | 5 +| Instruction fence | `fence.i` | depends on the memory system |======================= diff --git a/docs/datasheet/on_chip_debugger.adoc b/docs/datasheet/on_chip_debugger.adoc index bd2def5e1..05aec8d7c 100644 --- a/docs/datasheet/on_chip_debugger.adoc +++ b/docs/datasheet/on_chip_debugger.adoc @@ -667,7 +667,7 @@ Debug-mode is entered on any of the following events: . A hardware trigger from the <<_trigger_module>> fires (`exe` and `action` in <<_tdata1>> / `mcontrol` are set). [NOTE] -From a hardware point of view these debug-mode-entry conditions are special traps (synchronous exceptions or +From a hardware point of view these debug-mode-entry conditions are normal traps (synchronous exceptions or asynchronous interrupts) that are handled transparently by the control logic. **Whenever the CPU enters debug-mode it performs the following operations:** @@ -684,6 +684,8 @@ asynchronous interrupts) that are handled transparently by the control logic. **When the CPU is in debug-mode:** * while in debug mode, the CPU executes the parking loop and - if requested by the DM - the program buffer +* all **caches are bypassed** when in debug-mode; hence, a <<_memory_coherence>> has to be re-established when entering debug-mode +and when leaving debug-mode * effective CPU privilege level is `machine` mode; any active physical memory protection (PMP) configuration is bypassed * the `wfi` instruction acts as a `nop` (also during single-stepping) * if an exception occurs while being in debug mode: diff --git a/docs/datasheet/overview.adoc b/docs/datasheet/overview.adoc index ace607d80..b292055bf 100644 --- a/docs/datasheet/overview.adoc +++ b/docs/datasheet/overview.adoc @@ -1,3 +1,4 @@ +<<< :sectnums: == Overview diff --git a/docs/datasheet/rationale.adoc b/docs/datasheet/rationale.adoc index d98dc790b..2560c1adb 100644 --- a/docs/datasheet/rationale.adoc +++ b/docs/datasheet/rationale.adoc @@ -1,3 +1,4 @@ +<<< :sectnums: === Rationale diff --git a/docs/datasheet/soc.adoc b/docs/datasheet/soc.adoc index 30808c046..dbe2a0e6a 100644 --- a/docs/datasheet/soc.adoc +++ b/docs/datasheet/soc.adoc @@ -1,5 +1,4 @@ - -// #################################################################################################################### +<<< :sectnums: == NEORV32 Processor (SoC) @@ -595,7 +594,7 @@ content of the addresses memory cell) is sent back to the requesting CPU. .Direct Access [IMPORTANT] Atomic operations **always bypass** the CPU's <<_processor_internal_data_cache_dcache, data cache>> -using direct/uncached accesses. Care must be taken to maintain data <<_cache_coherency>>. +using direct/uncached accesses. Care must be taken to maintain data <<_memory_coherence>>. .Physical Memory Attributes [NOTE] @@ -610,43 +609,50 @@ cannot be interrupted. Hence, they execute in an atomic way. :sectnums: -==== Cache Coherency +==== Memory Coherence -In total the NEORV32 Processor provides up to three optional caches organized in two levels. Level-1 -caches are closer to the CPU while level-2 caches are closer to main memory (however, this highly depends -on the the actual cache configurations). +Depending on the configuration, the NEORV32 processor provides several _layer_ of memory consisting +of caches, buffers and storage. +* The CPU instruction prefetch buffer ("level-0") * The <<_processor_internal_data_cache_dcache>> (level-1) * The <<_processor_internal_instruction_cache_icache>> (level-1) * The cache of the <<_processor_external_bus_interface_xbus>> (level-2) +* Internal and external memories -As all caches operate transparently for the software, special attention must therefore be paid to coherence. -Note that coherence and cache _synchronization_ is **not** performed by the hardware itself (there is no -snooping implemented). +All caches and buffers operate transparently for the software. Hence, special attention must therefore be +paid to maintain coherence. Note that coherence and cache _synchronization_ is **not** automatically performed +by the hardware itself as there is no snooping implemented. -The NEORV32 uses two instructions for manual cache synchronization (both instructions are always available -regardless of the actual CPU/ISA configuration): +NEORV32 uses two instructions for manual memory synchronization which are always available +regardless of the actual CPU/ISA configuration: * `fence` (<<_i_isa_extension>> / <<_e_isa_extension>>) * `fence.i` (<<_zifencei_isa_extension>>) -By executing the "data" `fence` instruction the CPU's data cache is synchronized in four steps: +By executing the "data" `fence` instruction the CPU's load/store operations are ordered +and synchronized across the entire system: [start=1] -. The CPU data cache is flushed: all local modifications are copied to the next higher memory level; -this can be the XBUS cache or main memory. -. The CPU data cache is cleared invalidating all local entries. -. The synchronization request is sent to the next-higher memory level (for example to the XBUS cache -so it can perform the same synchronization steps). -. The CPU data cache is reloaded with up-to-date data from the next higher memory level. +. The CPU data cache (if enabled) is flushed and invalidated: all local modifications are copied to +the next higher memory level (for example the internal DMEM or the XBUS-cache). +. The CPU data cache is cleared invalidating so the next load/store access will cause a cache miss +that will fetch up-to-date data from the memory system. +. The synchronization request is forwarded to the next-higher memory level. If the XBUS cache is implemented +it will also be flushed and invalidated. -By executing the "instruction" `fence.i` instruction the CPU's instruction cache is synchronized in three steps: +By executing the "instruction" `fence.i` instruction the CPU's instruction-fetch cache is are ordered +and synchronized across the entire system: [start=1] -. The synchronization request is sent to the next-higher memory level (for example to the XBUS cache -so it can perform the same synchronization steps). -. The CPU instruction cache is cleared invalidating all local entries. -. The CPU instruction cache is reloaded with up-to-date data from the next higher memory level. +. Perform all the steps that are performed by the `fence` instruction. +. The CPU instruction cache is cleared invalidating all local entries so the next instruction fetch access +will cause a cache miss that will fetch up-to-date data from the memory system. + +.CPU Stall While Synchronizing +[IMPORTANT] +Executing any fence instruction will stall the CPU until all the requested ordering/synchronization +steps are completed. <<< diff --git a/docs/datasheet/soc_dcache.adoc b/docs/datasheet/soc_dcache.adoc index 8d76c92bc..163fec950 100644 --- a/docs/datasheet/soc_dcache.adoc +++ b/docs/datasheet/soc_dcache.adoc @@ -1,4 +1,5 @@ <<< +<<< :sectnums: ==== Processor-Internal Data Cache (dCACHE) @@ -6,11 +7,11 @@ [grid="none"] |======================= | Hardware source files: | neorv32_cache.vhd | Generic cache module -| Software driver files: | none | _implicitly used_ +| Software driver files: | none | | Top entity ports: | none | | Configuration generics: | `DCACHE_EN` | implement processor-internal data cache when `true` -| | `DCACHE_NUM_BLOCKS` | number of cache blocks (pages/lines) -| | `DCACHE_BLOCK_SIZE` | size of a cache block in bytes +| | `DCACHE_NUM_BLOCKS` | number of cache blocks (pages or lines); has to be a power of two +| | `DCACHE_BLOCK_SIZE` | size of a cache block in bytes; has to be a power of two | CPU interrupts: | none | |======================= @@ -21,24 +22,17 @@ The processor features an optional data cache to improve performance when using access latency. The cache is connected directly to the CPU's data access interface and provides full-transparent accesses. The cache is direct-mapped and uses "write-allocate" and "write-back" strategies. -.Cached/Uncached Accesses +.Uncached Accesses [NOTE] The data cache provides direct accesses (= uncached) to memory in order to access memory-mapped IO (like the processor-internal IO/peripheral modules). All accesses that target the address range from `0xF0000000` to `0xFFFFFFFF` -will not be cached at all (see section <<_address_space>>). Direct/uncached accesses have **lower** priority than -cache block operations to allow continuous burst transfer and also to maintain logical instruction forward -progress / data coherency. Furthermore, the atomic memory operations of the <<_zaamo_isa_extension>> will -always **bypass** the cache. - -.Caching Internal Memories -[NOTE] -The data cache is intended to accelerate data access to **processor-external** memories. -The CPU cache(s) should not be implemented when using only processor-internal data and instruction memories. +will not be cached at all (see section <<_address_space>>). Furthermore, the atomic memory operations +of the <<_zaamo_isa_extension>> will always **bypass** the cache. -.Manual Cache Flush/Clear/Reload +.Manual Cache Flush/Clear/Reload and Memory Coherence [NOTE] By executing the `fence` instruction the data cache is flushed, cleared and reloaded. -See section <<_cache_coherency>> for more information. +See section <<_memory_coherence>> for more information. .Retrieve Cache Configuration from Software [TIP] @@ -46,8 +40,6 @@ Software can retrieve the cache configuration/layout from the <<_sysinfo_cache_c .Bus Access Fault Handling [NOTE] -The cache always loads a complete cache block (aligned to the block size) every time a -cache miss is detected. Each cached word from this block provides a single status bit that indicates if the -according bus access was successful or caused a bus error. Hence, the whole cache block remains valid even -if certain addresses inside caused a bus error. If the CPU accesses any of the faulty cache words, a -data bus error exception is raised. +If the cache encounters a bus error when uploading a modified block to the next memory level or when +downloading a new block from the next memory level, the entire block is invalidated and a bus access +error exception is raised. diff --git a/docs/datasheet/soc_icache.adoc b/docs/datasheet/soc_icache.adoc index 8f77eb8e3..765d10e01 100644 --- a/docs/datasheet/soc_icache.adoc +++ b/docs/datasheet/soc_icache.adoc @@ -1,4 +1,5 @@ <<< +<<< :sectnums: ==== Processor-Internal Instruction Cache (iCACHE) @@ -6,11 +7,11 @@ [grid="none"] |======================= | Hardware source files: | neorv32_cache.vhd | Generic cache module -| Software driver files: | none | _implicitly used_ +| Software driver files: | none | | Top entity ports: | none | | Configuration generics: | `ICACHE_EN` | implement processor-internal instruction cache when `true` -| | `ICACHE_NUM_BLOCKS` | number of cache blocks (pages/lines) -| | `ICACHE_BLOCK_SIZE` | size of a cache block in bytes +| | `ICACHE_NUM_BLOCKS` | number of cache blocks (pages or lines); has to be a power of two +| | `ICACHE_BLOCK_SIZE` | size of a cache block in bytes; has to be a power of two | CPU interrupts: | none | |======================= @@ -21,24 +22,17 @@ The processor features an optional instruction cache to improve performance when access latency. The cache is connected directly to the CPU's instruction fetch interface and provides full-transparent accesses. The cache is direct-mapped and read-only. -.Cached/Uncached Accesses +.Uncached Accesses [NOTE] The data cache provides direct accesses (= uncached) to memory in order to access memory-mapped IO (like the processor-internal IO/peripheral modules). All accesses that target the address range from `0xF0000000` to `0xFFFFFFFF` -will not be cached at all (see section <<_address_space>>). Direct/uncached accesses have **lower** priority than -cache block operations to allow continuous burst transfer and also to maintain logical instruction forward -progress / data coherency. Furthermore, the atomic memory operations of the <<_zaamo_isa_extension>> will -always **bypass** the cache. - -.Caching Internal Memories -[NOTE] -The data cache is intended to accelerate data access to **processor-external** memories. -The CPU cache(s) should not be implemented when using only processor-internal data and instruction memories. +will not be cached at all (see section <<_address_space>>). Furthermore, the atomic memory operations +of the <<_zaamo_isa_extension>> will always **bypass** the cache. -.Manual Cache Clear/Reload +.Manual Cache Flush/Clear/Reload and Memory Coherence [NOTE] By executing the `fence.i` instruction the instruction cache is cleared and reloaded. -See section <<_cache_coherency>> for more information. +See section <<_memory_coherence>> for more information. .Retrieve Cache Configuration from Software [TIP] @@ -46,8 +40,6 @@ Software can retrieve the cache configuration/layout from the <<_sysinfo_cache_c .Bus Access Fault Handling [NOTE] -The cache always loads a complete cache block (aligned to the block size) every time a -cache miss is detected. Each cached word from this block provides a single status bit that indicates if the -according bus access was successful or caused a bus error. Hence, the whole cache block remains valid even -if certain addresses inside caused a bus error. If the CPU accesses any of the faulty cache words, an -instruction bus error exception is raised. +If the cache encounters a bus error when uploading a modified block to the next memory level or when +downloading a new block from the next memory level, the entire block is invalidated and a bus access +error exception is raised. diff --git a/docs/datasheet/soc_wdt.adoc b/docs/datasheet/soc_wdt.adoc index 5337c16af..009c4ba60 100644 --- a/docs/datasheet/soc_wdt.adoc +++ b/docs/datasheet/soc_wdt.adoc @@ -33,17 +33,9 @@ hardware reset is triggered. The watchdog's timeout counter is reset ("feeding the watchdog") by writing the reset **PASSWORD** to the `RESET` register. The password is hardwired to hexadecimal `0x709D1AB3`. -.Watchdog Operation during Debugging [IMPORTANT] -By default, the watchdog stops operation when the CPU enters debug mode and will resume normal operation after -the CPU has left debug mode again. This will prevent an unintended watchdog timeout during a debug session. However, -the watchdog can also be configured to keep operating even when the CPU is in debug mode by setting the control -register's `WDT_CTRL_DBEN` bit. - -.Watchdog Operation during CPU Sleep -[IMPORTANT] -By default, the watchdog stops operating when the CPU enters sleep mode. However, the watchdog can also be configured -to keep operating even when the CPU is in sleep mode by setting the control register's `WDT_CTRL_SEN` bit. +Once enabled, the watchdog keeps operating even if the CPU is in <<_sleep_mode>> or if the processor is being +debugged via the <<_on_chip_debugger_ocd>>. **Configuration Lock** @@ -91,12 +83,10 @@ processor's main reset signal is active (even if the watchdog is deactivated or [options="header",grid="all"] |======================= | Address | Name [C] | Bit(s), Name [C] | R/W | Reset value | Writable if locked | Function -.8+<| `0xfffb0000` .8+<| `CTRL` <|`0` `WDT_CTRL_EN` ^| r/w ^| `0` ^| no <| watchdog enable +.6+<| `0xfffb0000` .6+<| `CTRL` <|`0` `WDT_CTRL_EN` ^| r/w ^| `0` ^| no <| watchdog enable <|`1` `WDT_CTRL_LOCK` ^| r/w ^| `0` ^| no <| lock configuration when set, clears only on system reset, can only be set if enable bit is set already - <|`2` `WDT_CTRL_DBEN` ^| r/w ^| `0` ^| no <| set to allow WDT to continue operation even when CPU is in debug mode - <|`3` `WDT_CTRL_SEN` ^| r/w ^| `0` ^| no <| set to allow WDT to continue operation even when CPU is in sleep mode - <|`4` `WDT_CTRL_STRICT` ^| r/w ^| `0` ^| no <| set to enable strict mode (force hardware reset if reset password is incorrect or if write access to locked CTRL register) - <|`6:5` `WDT_CTRL_RCAUSE_HI : WDT_CTRL_RCAUSE_LO` ^| r/- ^| `0` ^| - <| cause of last system reset; 0=external reset, 1=ocd-reset, 2=watchdog reset + <|`2` `WDT_CTRL_STRICT` ^| r/w ^| `0` ^| no <| set to enable strict mode (force hardware reset if reset password is incorrect or if write access to locked CTRL register) + <|`4:3` `WDT_CTRL_RCAUSE_HI : WDT_CTRL_RCAUSE_LO` ^| r/- ^| `0` ^| - <| cause of last system reset; 0=external reset, 1=ocd-reset, 2=watchdog reset <|`7` - ^| r/- ^| - ^| - <| _reserved_, reads as zero <|`31:8` `WDT_CTRL_TIMEOUT_MSB : WDT_CTRL_TIMEOUT_LSB` ^| r/w ^| 0 ^| no <| 24-bit watchdog timeout value | `0xfffb0004` | `RESET` |`31:0` | -/w | - | yes | Write _PASSWORD_ to reset WDT timeout counter diff --git a/docs/datasheet/soc_xbus.adoc b/docs/datasheet/soc_xbus.adoc index 52ce6ee9d..05d1457dd 100644 --- a/docs/datasheet/soc_xbus.adoc +++ b/docs/datasheet/soc_xbus.adoc @@ -7,30 +7,30 @@ |======================= | Hardware source files: | neorv32_xbus.vhd | External bus gateway | | neorv32_cache.vhd | Generic cache module -| Software driver files: | none | _implicitly used_ +| Software driver files: | none | | Top entity ports: | `xbus_adr_o` | address output (32-bit) +| | `xbus_dat_i` | data input (32-bit) | | `xbus_dat_o` | data output (32-bit) | | `xbus_tag_o` | access tag (3-bit) | | `xbus_we_o` | write enable (1-bit) | | `xbus_sel_o` | byte enable (4-bit) | | `xbus_stb_o` | bus strobe (1-bit) | | `xbus_cyc_o` | valid cycle (1-bit) -| | `xbus_dat_i` | data input (32-bit) | | `xbus_ack_i` | acknowledge (1-bit) | | `xbus_err_i` | bus error (1-bit) | Configuration generics: | `XBUS_EN` | enable external bus interface when `true` | | `XBUS_TIMEOUT` | number of clock cycles after which an unacknowledged external bus access will auto-terminate (0 = disabled) | | `XBUS_REGSTAGE_EN` | implement XBUS register stages -| | `XBUS_CACHE_EN` | implement the external bus cache -| | `XBUS_CACHE_NUM_BLOCKS` | number of blocks ("lines"), has to be a power of two. -| | `XBUS_CACHE_BLOCK_SIZE` | size in bytes of each block, has to be a power of two. +| | `XBUS_CACHE_EN` | implement the external bus cache when `true` +| | `XBUS_CACHE_NUM_BLOCKS` | number of cache blocks (pages or lines); has to be a power of two +| | `XBUS_CACHE_BLOCK_SIZE` | size of a cache block in bytes; has to be a power of two | CPU interrupts: | none | |======================= **Overview** -The external bus interface provides a **Wishbone b4**-compatible on-chip bus interface that is +The external bus interface provides a **Wishbone b4**-compatible on-chip bus interface that gets implemented if the `XBUS_EN` generic is `true`. This bus interface can be used to attach processor-external modules like memories, custom hardware accelerators or additional peripheral devices. An optional cache module ("XCACHE") can be enabled to improve memory access latency. @@ -76,12 +76,8 @@ device's / bus system's `cyc` and `stb` signals (omitting the processor's `xbus_ .Atomic Memory Accesses [NOTE] -<<_Atomic_Memory_Access>> keep the `cyc` signal active to perform a back-to-back bus access consisting of -two `stb` strobes (one for the load/read operation and another one for the store/write operation). - -.Endianness -[NOTE] -Just like the processor itself the XBUS interface uses **little-endian** byte order. +<<_atomic_memory_access>> operations keep the `cyc` signal active to perform a back-to-back bus access +consisting of two `stb` strobes (one for the load/read operation and another one for the store/write operation). .Wishbone Specs. [TIP] @@ -123,36 +119,28 @@ It compatible to the the AXI4 `ARPROT` and `AWPROT` signals. The XBUS interface provides an optional internal cache that can be used to buffer processor-external accesses. The x-cache is enabled via the `XBUS_CACHE_EN` generic. The total size of the cache is split into the number of cache lines or cache blocks (`XBUS_CACHE_NUM_BLOCKS` generic) and the line or block size in bytes -(`XBUS_CACHE_BLOCK_SIZE` generic). - -.Simplified X-Cache Architecture -[source,asciiart] ---------------------------------------- - Direct Access +----------+ - /|------------------------->| Register |------------------------>|\ - | | +----------+ | | -Core --->| | | |---> XBUS - | | +--------------+ +--------------+ +-------------+ | | - \|--->| Host Arbiter |--->| Cache Memory |<---| Bus Arbiter |--->|/ - +--------------+ +--------------+ +-------------+ ---------------------------------------- - -The cache uses a direct-mapped architecture that implements "write-allocate" and "write-back" strategies. -The **write-allocate** strategy will fetch the entire referenced block from main memory when encountering -a cache write-miss. The **write-back** strategy will gather all writes locally inside the cache until the according -cache block is about to be replaced. In this case, the entire modified cache block is written back to main memory. - -.Manual Cache Flush/Clear/Reload +(`XBUS_CACHE_BLOCK_SIZE` generic). The cache uses a direct-mapped architecture that implements "write-allocate" +and "write-back" strategies. + +.Uncached Accesses +[NOTE] +The data cache provides direct accesses (= uncached) to memory in order to access memory-mapped IO. +All accesses that target the address range from `0xF0000000` to `0xFFFFFFFF` +will not be cached at all (see section <<_address_space>>). Furthermore, the atomic memory operations +of the <<_zaamo_isa_extension>> will always **bypass** the cache. + +.Manual Cache Flush/Clear/Reload and Memory Coherence [NOTE] By executing a `fence` **or** `fence.i` instruction the XBUS cache is flushed (local modifications are send back to main memory), cleared (all cache entries are invalidated) and a reloaded (fetching new data from main memory). -See section <<_cache_coherency>> for more information. +See section <<_memory_coherence>> for more information. + +.Retrieve Cache Configuration from Software +[TIP] +Software can retrieve the cache configuration/layout from the <<_sysinfo_cache_configuration>> register. -.Cached/Uncached Accesses +.Bus Access Fault Handling [NOTE] -The data cache provides direct accesses (= uncached) to memory in order to access memory-mapped IO. -All accesses that target the address range from `0xF0000000` to `0xFFFFFFFF` -will not be cached at all (see section <<_address_space>>). Direct/uncached accesses have **lower** priority than -cache block operations to allow continuous burst transfer and also to maintain logical instruction forward -progress / data coherency. Furthermore, the atomic memory operations of the <<_zaamo_isa_extension>> will -always **bypass** the cache. +If the cache encounters a bus error when uploading a modified block to the next memory level or when +downloading a new block from the next memory level, the entire block is invalidated and a bus access +error exception is raised. diff --git a/docs/datasheet/software.adoc b/docs/datasheet/software.adoc index 47c4e9485..cd4af117b 100644 --- a/docs/datasheet/software.adoc +++ b/docs/datasheet/software.adoc @@ -1,3 +1,4 @@ +<<< :sectnums: == Software Framework diff --git a/docs/datasheet/software_bootloader.adoc b/docs/datasheet/software_bootloader.adoc index b20ec6708..db65d39db 100644 --- a/docs/datasheet/software_bootloader.adoc +++ b/docs/datasheet/software_bootloader.adoc @@ -1,3 +1,4 @@ +<<< :sectnums: === Bootloader diff --git a/docs/datasheet/software_rte.adoc b/docs/datasheet/software_rte.adoc index 6b32fa6ee..d379e98ab 100644 --- a/docs/datasheet/software_rte.adoc +++ b/docs/datasheet/software_rte.adoc @@ -1,3 +1,4 @@ +<<< :sectnums: === NEORV32 Runtime Environment diff --git a/docs/figures/bus_interface.png b/docs/figures/bus_interface.png index 13a8b03bd..131ee5f01 100644 Binary files a/docs/figures/bus_interface.png and b/docs/figures/bus_interface.png differ diff --git a/docs/sources/bus_interface.json b/docs/sources/bus_interface.json index a1ca92605..3fcc0244a 100644 --- a/docs/sources/bus_interface.json +++ b/docs/sources/bus_interface.json @@ -6,9 +6,12 @@ {name: 'data', wave: 'x..|..4.x..|..', data: ['wdata']}, {name: 'ben', wave: 'x..|..4.x..|..', data: ['ben']}, {name: 'stb', wave: '010|..10.10|..', node: '.a....d..f....'}, - {name: 'rw', wave: '0..|..1..0.|..', node: '..............'}, - {name: 'src', wave: 'x0.|.x0.x..|..'}, - {name: 'priv', wave: 'x0.|.x0.x..|..'}, + {name: 'rw', wave: 'x0.|.x1.x0.|..', node: '..............'}, + {name: 'src', wave: 'x0.|.x0.x0.|.x'}, + {name: 'priv', wave: 'x1.|.x1.x1.|.x'}, + {name: 'debug', wave: 'x0.|.x0.x0.|.x'}, + {name: 'amo', wave: 'x0.|.x0.x0.|.x'}, + {name: 'amoop', wave: 'x0.|.x0.x0.|.x'}, ], {}, [ diff --git a/rtl/core/neorv32_bus.vhd b/rtl/core/neorv32_bus.vhd index 29eb7cc5f..a9f54b28f 100644 --- a/rtl/core/neorv32_bus.vhd +++ b/rtl/core/neorv32_bus.vhd @@ -21,15 +21,14 @@ entity neorv32_bus_switch is PORT_B_READ_ONLY : boolean := false -- set if port B is read-only ); port ( - clk_i : in std_ulogic; -- global clock, rising edge - rstn_i : in std_ulogic; -- global reset, low-active, async - a_lock_i : in std_ulogic; -- exclusive access for port A while set - a_req_i : in bus_req_t; -- host port A request bus - a_rsp_o : out bus_rsp_t; -- host port A response bus - b_req_i : in bus_req_t; -- host port B request bus - b_rsp_o : out bus_rsp_t; -- host port B response bus - x_req_o : out bus_req_t; -- device port request bus - x_rsp_i : in bus_rsp_t -- device port response bus + clk_i : in std_ulogic; -- global clock, rising edge + rstn_i : in std_ulogic; -- global reset, low-active, async + a_req_i : in bus_req_t; -- host port A request bus + a_rsp_o : out bus_rsp_t; -- host port A response bus + b_req_i : in bus_req_t; -- host port B request bus + b_rsp_o : out bus_rsp_t; -- host port B response bus + x_req_o : out bus_req_t; -- device port request bus + x_rsp_i : in bus_rsp_t -- device port response bus ); end neorv32_bus_switch; @@ -71,7 +70,7 @@ begin -- ------------------------------------------------------------------------------------------- arbiter_prioritized: if not ROUND_ROBIN_EN generate - arbiter_fsm: process(state, a_req, b_req, a_lock_i, a_req_i, b_req_i, x_rsp_i) + arbiter_fsm: process(state, a_req, b_req, a_req_i, b_req_i, x_rsp_i) begin -- defaults -- state_nxt <= state; @@ -101,7 +100,7 @@ begin sel <= '0'; stb <= '1'; state_nxt <= S_BUSY_A; - elsif ((b_req_i.stb = '1') or (b_req = '1')) and (a_lock_i = '0') then -- request from port B? + elsif (b_req_i.stb = '1') or (b_req = '1') then -- request from port B? sel <= '1'; stb <= '1'; state_nxt <= S_BUSY_B; @@ -175,11 +174,10 @@ begin x_req_o.amo <= a_req_i.amo when (sel = '0') else b_req_i.amo; x_req_o.amoop <= a_req_i.amoop when (sel = '0') else b_req_i.amoop; x_req_o.priv <= a_req_i.priv when (sel = '0') else b_req_i.priv; + x_req_o.debug <= a_req_i.debug when (sel = '0') else b_req_i.debug; x_req_o.src <= a_req_i.src when (sel = '0') else b_req_i.src; x_req_o.rw <= a_req_i.rw when (sel = '0') else b_req_i.rw; - x_req_o.fence <= a_req_i.fence or b_req_i.fence; -- propagate any fence request - x_req_o.sleep <= a_req_i.sleep and b_req_i.sleep; -- set if ALL upstream devices are in sleep mode - x_req_o.debug <= a_req_i.debug when (sel = '0') else b_req_i.debug; + x_req_o.fence <= a_req_i.fence or b_req_i.fence; x_req_o.data <= b_req_i.data when PORT_A_READ_ONLY else a_req_i.data when PORT_B_READ_ONLY else @@ -855,11 +853,10 @@ begin sys_req_o.rw <= '1' when (arbiter.state = S_WRITE) or (arbiter.state = S_WRITE_WAIT) else core_req_i.rw; sys_req_o.src <= core_req_i.src; sys_req_o.priv <= core_req_i.priv; + sys_req_o.debug <= core_req_i.debug; sys_req_o.amo <= core_req_i.amo; -- set during the entire read-modify-write operation sys_req_o.amoop <= (others => '0'); -- the specific AMO type should not matter after this point sys_req_o.fence <= core_req_i.fence; - sys_req_o.sleep <= core_req_i.sleep; - sys_req_o.debug <= core_req_i.debug; -- response switch -- core_rsp_o.data <= sys_rsp_i.data when (arbiter.state = S_IDLE) else arbiter.rdata; diff --git a/rtl/core/neorv32_cache.vhd b/rtl/core/neorv32_cache.vhd index 9d61dfa67..478949441 100644 --- a/rtl/core/neorv32_cache.vhd +++ b/rtl/core/neorv32_cache.vhd @@ -4,20 +4,11 @@ -- Configurable generic cache module. The cache is direct-mapped and implements -- -- "write-back" and "write-allocate" strategies. -- -- -- --- All requests targeting the "uncached address space page" (or higher), defined by -- --- the 4 most significant address bits, well as all atomic (reservation set) -- --- operations will always **bypass** the cache resulting in "direct accesses". -- --- -- --- Simplified cache architecture ("-->" = direction of access requests): -- --- -- --- Direct Access +----------+ -- --- /|----------------------->| Register |---------------------->|\ -- --- | | +----------+ | | -- --- Host -->| | | |--> Bus -- --- | | +--------------+ +--------------+ +-------------+ | | -- --- \|-->| Host Arbiter |-->| Cache Memory |<--| Bus Arbiter |-->|/ -- --- +--------------+ +--------------+ +-------------+ -- --- -- +-- Uncached / direct accesses: Several bus transaction types will bypass the cache: -- +-- * atomic memory operations -- +-- * accesses within debug-mode (on-chip debugger) -- +-- * accesses to the explicit "uncached address space page" (or higher); defined by -- +-- the 4 most significant address bits (UC_BEGIN) -- -- -------------------------------------------------------------------------------- -- -- The NEORV32 RISC-V Processor - https://github.com/stnolting/neorv32 -- -- Copyright (c) NEORV32 contributors. -- @@ -38,12 +29,12 @@ entity neorv32_cache is NUM_BLOCKS : natural range 2 to 1024; -- number of cache blocks (min 2), has to be a power of 2 BLOCK_SIZE : natural range 4 to 32768; -- cache block size in bytes (min 4), has to be a power of 2 UC_BEGIN : std_ulogic_vector(3 downto 0); -- begin of uncached address space (page number / 4 MSBs of address) - UC_ENABLE : boolean; -- enable direct/uncached accesses READ_ONLY : boolean -- read-only accesses for host ); port ( clk_i : in std_ulogic; -- global clock, rising edge rstn_i : in std_ulogic; -- global reset, low-active, async + clean_o : out std_ulogic; -- cache is clean host_req_i : in bus_req_t; -- host request host_rsp_o : out bus_rsp_t; -- host response bus_req_o : out bus_req_t; -- bus request @@ -53,30 +44,14 @@ end neorv32_cache; architecture neorv32_cache_rtl of neorv32_cache is - -- host access arbiter (handle CPU accesses to cache) -- - component neorv32_cache_host - generic ( - READ_ONLY : boolean - ); - port ( - rstn_i : in std_ulogic; - clk_i : in std_ulogic; - req_i : in bus_req_t; - rsp_o : out bus_rsp_t; - bus_sync_o : out std_ulogic; - bus_miss_o : out std_ulogic; - bus_busy_i : in std_ulogic; - dirty_o : out std_ulogic; - hit_i : in std_ulogic; - addr_o : out std_ulogic_vector(31 downto 0); - we_o : out std_ulogic_vector(3 downto 0); - swe_o : out std_ulogic; - wdata_o : out std_ulogic_vector(31 downto 0); - wstat_o : out std_ulogic; - rdata_i : in std_ulogic_vector(31 downto 0); - rstat_i : in std_ulogic - ); - end component; + -- make sure cache sizes are a power of two -- + constant block_num_c : natural := 2**index_size_f(NUM_BLOCKS); + constant block_size_c : natural := 2**index_size_f(BLOCK_SIZE); + + -- cache layout -- + constant offset_size_c : natural := index_size_f(block_size_c/4); -- WORD offset! + constant index_size_c : natural := index_size_f(block_num_c); + constant tag_size_c : natural := 32 - (offset_size_c + index_size_c + 2); -- cache memory core (cache memory and management) -- component neorv32_cache_memory @@ -86,337 +61,66 @@ architecture neorv32_cache_rtl of neorv32_cache is READ_ONLY : boolean ); port ( - rstn_i : in std_ulogic; - clk_i : in std_ulogic; - inval_i : in std_ulogic; - new_i : in std_ulogic; - dirty_i : in std_ulogic; - hit_o : out std_ulogic; - dirty_o : out std_ulogic; - base_o : out std_ulogic_vector(31 downto 0); - addr_i : in std_ulogic_vector(31 downto 0); - we_i : in std_ulogic_vector(3 downto 0); - swe_i : in std_ulogic; - wdata_i : in std_ulogic_vector(31 downto 0); - wstat_i : in std_ulogic; - rdata_o : out std_ulogic_vector(31 downto 0); - rstat_o : out std_ulogic + rstn_i : in std_ulogic; + clk_i : in std_ulogic; + inval_i : in std_ulogic; + new_i : in std_ulogic; + dirty_i : in std_ulogic; + hit_o : out std_ulogic; + dirty_o : out std_ulogic; + tag_o : out std_ulogic_vector(31 downto 0); + clean_o : out std_ulogic; + addr_i : in std_ulogic_vector(31 downto 0); + we_i : in std_ulogic_vector(3 downto 0); + wdata_i : in std_ulogic_vector(31 downto 0); + rdata_o : out std_ulogic_vector(31 downto 0) ); end component; - -- bus access arbiter (handle cache misses) -- - component neorv32_cache_bus - generic ( - NUM_BLOCKS : natural; - BLOCK_SIZE : natural; - READ_ONLY : boolean - ); - port ( - rstn_i : in std_ulogic; - clk_i : in std_ulogic; - host_req_i : in bus_req_t; - bus_req_o : out bus_req_t; - bus_rsp_i : in bus_rsp_t; - cmd_sync_i : in std_ulogic; - cmd_miss_i : in std_ulogic; - cmd_busy_o : out std_ulogic; - inval_o : out std_ulogic; - new_o : out std_ulogic; - dirty_i : in std_ulogic; - base_i : in std_ulogic_vector(31 downto 0); - addr_o : out std_ulogic_vector(31 downto 0); - we_o : out std_ulogic_vector(3 downto 0); - swe_o : out std_ulogic; - wdata_o : out std_ulogic_vector(31 downto 0); - wstat_o : out std_ulogic; - rdata_i : in std_ulogic_vector(31 downto 0) - ); - end component; - - -- make sure cache sizes are a power of two -- - constant block_num_c : natural := 2**index_size_f(NUM_BLOCKS); - constant block_size_c : natural := 2**index_size_f(BLOCK_SIZE); - - -- bus de-mux control for direct/uncached or caches access -- - signal dir_acc_d, dir_acc_q : std_ulogic; - - -- internal bus system -- - signal bus_req, dir_req_d, dir_req_q, cache_req : bus_req_t; - signal bus_rsp, dir_rsp_d, dir_rsp_q, cache_rsp : bus_rsp_t; - - -- cache memory module interface -- - type cache_in_t is record - addr : std_ulogic_vector(31 downto 0); - we : std_ulogic_vector(3 downto 0); - swe : std_ulogic; - wdata : std_ulogic_vector(31 downto 0); - wstat : std_ulogic; + -- control -> cache interface -- + type cache_o_t is record + cmd_inv : std_ulogic; + cmd_new : std_ulogic; + cmd_dir : std_ulogic; + addr : std_ulogic_vector(31 downto 0); + data : std_ulogic_vector(31 downto 0); + we : std_ulogic_vector(3 downto 0); end record; - signal cache_in_host, cache_in_bus, cache_in : cache_in_t; - -- - type cache_out_t is record - rdata : std_ulogic_vector(31 downto 0); - rstat : std_ulogic; + signal cache_o : cache_o_t; + + -- cache -> control interface -- + type cache_i_t is record + sta_hit : std_ulogic; + sta_dir : std_ulogic; + sta_cln : std_ulogic; + sta_tag : std_ulogic_vector(31 downto 0); + data : std_ulogic_vector(31 downto 0); end record; - signal cache_out : cache_out_t; - - -- cache status -- - signal cache_stat_dirty, cache_stat_hit : std_ulogic; - signal cache_stat_base : std_ulogic_vector(31 downto 0); - - -- operation commands -- - signal cache_cmd_inval, cache_cmd_new, cache_cmd_dirty, bus_cmd_sync, bus_cmd_miss, bus_cmd_busy : std_ulogic; - -begin - - -- Check if Direct/Uncached Access -------------------------------------------------------- - -- ------------------------------------------------------------------------------------------- - dir_acc_d <= '1' when UC_ENABLE and -- direct accesses implemented - ((unsigned(host_req_i.addr(31 downto 28)) >= unsigned(UC_BEGIN)) or -- uncached memory page - (host_req_i.amo = '1')) else '0'; -- atomic memory operation - - -- request splitter: cached or direct access -- - req_splitter: process(host_req_i, dir_acc_d) - begin - -- default: pass-through all bus signals -- - cache_req <= host_req_i; - dir_req_d <= host_req_i; - -- direct access -- - dir_req_d.stb <= host_req_i.stb and dir_acc_d; - dir_req_d.fence <= '0'; -- no fence requests from this side - -- cached access -- - cache_req.stb <= host_req_i.stb and (not dir_acc_d); - end process req_splitter; - - -- direct/uncached access path pipeline stage -- - direct_acc_enable: - if UC_ENABLE generate - bus_buffer: process(rstn_i, clk_i) - begin - if (rstn_i = '0') then - dir_acc_q <= '0'; - dir_req_q <= req_terminate_c; - dir_rsp_q <= rsp_terminate_c; - elsif rising_edge(clk_i) then - dir_acc_q <= dir_acc_d; - if READ_ONLY then -- do not propagate STB on write access, issue ERR instead - dir_req_q <= dir_req_d; - dir_req_q.stb <= dir_req_d.stb and (not dir_req_d.rw); -- read accesses only - dir_rsp_q <= dir_rsp_d; - dir_rsp_q.err <= dir_rsp_d.err or (dir_req_d.stb and dir_req_d.rw); -- error on write access - else - dir_req_q <= dir_req_d; - dir_rsp_q <= dir_rsp_d; - end if; - end if; - end process bus_buffer; - - -- internal response switch -- - host_rsp_o <= cache_rsp when (dir_acc_q = '0') else dir_rsp_q; - end generate; - - -- direct accesses not implemented -- - direct_acc_disable: - if not UC_ENABLE generate - dir_req_q <= req_terminate_c; - host_rsp_o <= cache_rsp; - end generate; - - - -- Host Access Arbiter (Handle *Cached* CPU Bus Requests) --------------------------------- - -- ------------------------------------------------------------------------------------------- - neorv32_cache_host_inst: neorv32_cache_host - generic map ( - READ_ONLY => READ_ONLY -- host accesses are read-only - ) - port map ( - -- global control -- - rstn_i => rstn_i, -- global reset, async, low-active - clk_i => clk_i, -- global clock, rising edge - -- host access port -- - req_i => cache_req, -- request - rsp_o => cache_rsp, -- response - -- bus unit interface -- - bus_sync_o => bus_cmd_sync, -- sync cache and main memory - bus_miss_o => bus_cmd_miss, -- cache miss - bus_busy_i => bus_cmd_busy, -- bus operation in progress - -- cache status interface -- - dirty_o => cache_cmd_dirty, -- make accessed block dirty - hit_i => cache_stat_hit, -- cache hit - -- cache data interface -- - addr_o => cache_in_host.addr, -- access address - we_o => cache_in_host.we, -- byte-wide data write enable - swe_o => cache_in_host.swe, -- status write enable - wdata_o => cache_in_host.wdata, -- write data - wstat_o => cache_in_host.wstat, -- write status - rdata_i => cache_out.rdata, -- read data - rstat_i => cache_out.rstat -- read status - ); - - - -- Cache Memory Core (Cache Storage and Status Management) -------------------------------- - -- ------------------------------------------------------------------------------------------- - neorv32_cache_memory_inst: neorv32_cache_memory - generic map ( - NUM_BLOCKS => block_num_c, -- number of blocks (min 2), has to be a power of 2 - BLOCK_SIZE => block_size_c, -- block size in bytes (min 4), has to be a power of 2 - READ_ONLY => READ_ONLY -- cache is read-only (for host) - ) - port map ( - -- global control -- - rstn_i => rstn_i, -- global reset, async, low-active - clk_i => clk_i, -- global clock, rising edge - -- management -- - inval_i => cache_cmd_inval, -- make accessed block invalid - new_i => cache_cmd_new, -- make accessed block valid, clean and set tag - dirty_i => cache_cmd_dirty, -- make accessed block dirty - -- status -- - hit_o => cache_stat_hit, -- cache hit - dirty_o => cache_stat_dirty, -- accessed block is dirty - base_o => cache_stat_base, -- base address of current block - -- cache access -- - addr_i => cache_in.addr, -- access address - we_i => cache_in.we, -- byte-wide data write enable - swe_i => cache_in.swe, -- status write enable - wdata_i => cache_in.wdata, -- write data - wstat_i => cache_in.wstat, -- write status - rdata_o => cache_out.rdata, -- read data - rstat_o => cache_out.rstat -- read status - ); - - -- cache access switch -- - cache_in <= cache_in_host when (bus_cmd_busy = '0') else cache_in_bus; - - - -- Bus Access Arbiter (Handle Cache Miss and Flush/Reload) -------------------------------- - -- ------------------------------------------------------------------------------------------- - neorv32_cache_bus_inst: neorv32_cache_bus - generic map ( - NUM_BLOCKS => block_num_c, -- number of blocks (min 2), has to be a power of 2 - BLOCK_SIZE => block_size_c, -- block size in bytes (min 4), has to be a power of 2 - READ_ONLY => READ_ONLY -- read-only bus accesses - ) - port map ( - -- global control -- - rstn_i => rstn_i, -- global reset, async, low-active - clk_i => clk_i, -- global clock, rising edge - -- host access port -- - host_req_i => host_req_i, -- request - -- bus access port -- - bus_req_o => bus_req, -- request - bus_rsp_i => bus_rsp, -- response - -- operation interface -- - cmd_sync_i => bus_cmd_sync, -- sync cache and main memory - cmd_miss_i => bus_cmd_miss, -- cache miss - cmd_busy_o => bus_cmd_busy, -- bus operation in progress - -- cache status interface -- - inval_o => cache_cmd_inval, -- invalidate accessed block - new_o => cache_cmd_new, -- set new cache entry - dirty_i => cache_stat_dirty, -- accessed block is dirty - base_i => cache_stat_base, -- base address of accessed block - -- cache data interface -- - addr_o => cache_in_bus.addr, -- access address - we_o => cache_in_bus.we, -- byte-wide data write enable - swe_o => cache_in_bus.swe, -- status write enable - wdata_o => cache_in_bus.wdata, -- write data - wstat_o => cache_in_bus.wstat, -- write status - rdata_i => cache_out.rdata -- read data - ); - - - -- Bus Access Switch ---------------------------------------------------------------------- - -- ------------------------------------------------------------------------------------------- - bus_switch_enable: - if UC_ENABLE generate - -- Use a real switch here to buffer direct access requests during - -- out-of-band cache operations (downstream memory synchronization). - neorv32_cache_bus_switch: entity neorv32.neorv32_bus_switch - generic map ( - PORT_A_READ_ONLY => READ_ONLY, - PORT_B_READ_ONLY => READ_ONLY - ) - port map ( - clk_i => clk_i, - rstn_i => rstn_i, - a_lock_i => bus_cmd_busy, -- cache accesses have exclusive access - a_req_i => bus_req, - a_rsp_o => bus_rsp, - b_req_i => dir_req_q, - b_rsp_o => dir_rsp_d, - x_req_o => bus_req_o, - x_rsp_i => bus_rsp_i - ); - end generate; - - bus_switch_disable: - if not UC_ENABLE generate - bus_req_o <= bus_req; - bus_rsp <= bus_rsp_i; - end generate; - + signal cache_i : cache_i_t; -end neorv32_cache_rtl; - - --- ================================================================================ -- --- NEORV32 CPU - Generic Cache: Host Access Controller -- --- -------------------------------------------------------------------------------- -- --- Handle host accesses to the cache (check for hit/miss) or bypass cache if -- --- direct/uncached access. If a cache miss occurs or a fence request is received an -- --- according command is sent to the bus interface unit. -- --- -------------------------------------------------------------------------------- -- --- The NEORV32 RISC-V Processor - https://github.com/stnolting/neorv32 -- --- Copyright (c) NEORV32 contributors. -- --- Copyright (c) 2020 - 2025 Stephan Nolting. All rights reserved. -- --- Licensed under the BSD-3-Clause license, see LICENSE for details. -- --- SPDX-License-Identifier: BSD-3-Clause -- --- ================================================================================ -- - -library ieee; -use ieee.std_logic_1164.all; - -library neorv32; -use neorv32.neorv32_package.all; - -entity neorv32_cache_host is - generic ( - READ_ONLY : boolean -- host accesses are read-only - ); - port ( - -- global control -- - rstn_i : in std_ulogic; -- global reset, async, low-active - clk_i : in std_ulogic; -- global clock, rising edge - -- host access port -- - req_i : in bus_req_t; -- request - rsp_o : out bus_rsp_t; -- response - -- bus unit interface -- - bus_sync_o : out std_ulogic; -- sync cache and main memory - bus_miss_o : out std_ulogic; -- cache miss - bus_busy_i : in std_ulogic; -- bus operation in progress - -- cache status interface -- - dirty_o : out std_ulogic; -- make accessed block dirty - hit_i : in std_ulogic; -- cache hit - -- cache data interface -- - addr_o : out std_ulogic_vector(31 downto 0); -- access address - we_o : out std_ulogic_vector(3 downto 0); -- byte-wide data write enable - swe_o : out std_ulogic; -- status write enable - wdata_o : out std_ulogic_vector(31 downto 0); -- write data - wstat_o : out std_ulogic; -- write status - rdata_i : in std_ulogic_vector(31 downto 0); -- read data - rstat_i : in std_ulogic -- read status + -- control fsm -- + type state_t is ( + S_IDLE, S_CHECK, S_MISS, S_DIRECT_REQ, S_DIRECT_RSP, + S_DOWNLOAD_REQ, S_DOWNLOAD_RSP, S_DOWNLOAD_DONE, S_DOWNLOAD_ERR, + S_UPLOAD_GET, S_UPLOAD_REQ, S_UPLOAD_RSP, + S_FLUSH_START, S_FLUSH_READ, S_FLUSH_CHECK, S_FLUSH_DONE, + S_ERROR ); -end neorv32_cache_host; - -architecture neorv32_cache_host_rtl of neorv32_cache_host is - - -- control engine -- - type ctrl_state_t is (S_IDLE, S_CHECK, S_WAIT_MISS, S_WAIT_SYNC, S_ERROR); type ctrl_t is record - state, state_nxt : ctrl_state_t; -- FSM state - req_buf, req_buf_nxt : std_ulogic; -- access request buffer - sync_buf, sync_buf_nxt : std_ulogic; -- flush/reload (sync with main memory) request buffer + state : state_t; + upret : state_t; + buf_req : std_ulogic; + buf_sync : std_ulogic; end record; - signal ctrl : ctrl_t; + signal ctrl, ctrl_nxt : ctrl_t; + + -- address generator -- + type addr_t is record + tag : std_ulogic_vector(tag_size_c-1 downto 0); + idx : std_ulogic_vector(index_size_c-1 downto 0); + ofs : std_ulogic_vector(offset_size_c-1 downto 0); -- word offset + end record; + signal addr, addr_nxt : addr_t; begin @@ -426,100 +130,290 @@ begin begin if (rstn_i = '0') then ctrl.state <= S_IDLE; - ctrl.req_buf <= '0'; - ctrl.sync_buf <= '0'; + ctrl.upret <= S_IDLE; + ctrl.buf_req <= '0'; + ctrl.buf_sync <= '0'; + addr.tag <= (others => '0'); + addr.idx <= (others => '0'); + addr.ofs <= (others => '0'); + clean_o <= '0'; elsif rising_edge(clk_i) then - ctrl.state <= ctrl.state_nxt; - ctrl.req_buf <= ctrl.req_buf_nxt; - ctrl.sync_buf <= ctrl.sync_buf_nxt; + ctrl.state <= ctrl_nxt.state; + ctrl.upret <= ctrl_nxt.upret; + ctrl.buf_req <= ctrl_nxt.buf_req; + ctrl.buf_sync <= ctrl_nxt.buf_sync; + addr <= addr_nxt; + -- cache clean (sync with downstream memory)? -- + if (cache_i.sta_cln = '1') and (ctrl.state = S_IDLE) then + clean_o <= '1'; + else + clean_o <= '0'; + end if; end if; end process ctrl_engine_sync; -- Control Engine FSM Comb ---------------------------------------------------------------- -- ------------------------------------------------------------------------------------------- - ctrl_engine_comb: process(ctrl, req_i, hit_i, rdata_i, rstat_i, bus_busy_i) + ctrl_engine_comb: process(ctrl, addr, host_req_i, bus_rsp_i, cache_i) begin - -- control defaults -- - ctrl.state_nxt <= ctrl.state; - ctrl.req_buf_nxt <= ctrl.req_buf or req_i.stb; - ctrl.sync_buf_nxt <= ctrl.sync_buf or req_i.fence; + -- control engine defaults -- + ctrl_nxt.state <= ctrl.state; + ctrl_nxt.upret <= ctrl.upret; + ctrl_nxt.buf_req <= ctrl.buf_req or host_req_i.stb; + ctrl_nxt.buf_sync <= ctrl.buf_sync or host_req_i.fence; + addr_nxt <= addr; -- cache access defaults -- - dirty_o <= '0'; - addr_o <= req_i.addr; - we_o <= (others => '0'); - swe_o <= '0'; -- host cannot alter status bits - wdata_o <= req_i.data; - wstat_o <= '0'; -- host cannot alter status bits + cache_o.cmd_inv <= '0'; + cache_o.cmd_new <= '0'; + cache_o.cmd_dir <= '0'; + cache_o.addr <= host_req_i.addr; + cache_o.we <= (others => '0'); + cache_o.data <= host_req_i.data; - -- bus unit command defaults -- - bus_sync_o <= '0'; - bus_miss_o <= '0'; + -- host response defaults -- + host_rsp_o <= rsp_terminate_c; - -- host interface defaults -- - rsp_o <= rsp_terminate_c; + -- bus interface defaults -- + bus_req_o.addr <= addr.tag & addr.idx & addr.ofs & "00"; -- always word-aligned + bus_req_o.data <= cache_i.data; + bus_req_o.ben <= (others => '1'); -- full-word writes only + bus_req_o.stb <= '0'; -- no request by default + bus_req_o.rw <= '0'; + bus_req_o.src <= host_req_i.src; -- pass-through + bus_req_o.priv <= host_req_i.priv; -- pass-through + bus_req_o.debug <= host_req_i.debug; -- pass-through + bus_req_o.amo <= '0'; -- cache accesses cannot be atomic + bus_req_o.amoop <= (others => '0'); -- cache accesses cannot be atomic + bus_req_o.fence <= '0'; -- no fence by default -- fsm -- case ctrl.state is - when S_IDLE => -- wait for host request + when S_IDLE => -- wait for request -- ------------------------------------------------------------ - if (ctrl.sync_buf = '1') then -- flush and reload cache (sync with main memory) - bus_sync_o <= '1'; -- trigger bus unit: sync operation - ctrl.state_nxt <= S_WAIT_SYNC; - elsif (req_i.stb = '1') or (ctrl.req_buf = '1') then -- (pending) access request - if (req_i.rw = '1') and READ_ONLY then -- invalid write access? - ctrl.state_nxt <= S_ERROR; + if (host_req_i.fence = '1') or (ctrl.buf_sync = '1') then -- (pending) sync request + ctrl_nxt.state <= S_FLUSH_START; + elsif (host_req_i.stb = '1') or (ctrl.buf_req = '1') then -- (pending) access request + if (host_req_i.rw = '1') and (READ_ONLY = true) then -- invalid write access + ctrl_nxt.state <= S_ERROR; + elsif (unsigned(host_req_i.addr(31 downto 28)) >= unsigned(UC_BEGIN)) or + (host_req_i.amo = '1') or (host_req_i.debug = '1') then + ctrl_nxt.state <= S_DIRECT_REQ; else - ctrl.state_nxt <= S_CHECK; + ctrl_nxt.state <= S_CHECK; end if; end if; + + when S_DIRECT_REQ => -- direct (uncached) access request + -- ------------------------------------------------------------ + bus_req_o <= host_req_i; + bus_req_o.stb <= '1'; + ctrl_nxt.state <= S_DIRECT_RSP; + + when S_DIRECT_RSP => -- wait for direct (uncached) access response + -- ------------------------------------------------------------ + bus_req_o <= host_req_i; + bus_req_o.stb <= '0'; + host_rsp_o <= bus_rsp_i; + ctrl_nxt.buf_req <= '0'; -- access (about to be) completed + if (bus_rsp_i.ack = '1') or (bus_rsp_i.err = '1') then + ctrl_nxt.state <= S_IDLE; + end if; + + when S_CHECK => -- check if cache hit -- ------------------------------------------------------------ - rsp_o.data <= rdata_i; -- output read data - ctrl.req_buf_nxt <= '0'; -- access request completed - if (hit_i = '1') then - if (req_i.rw = '1') and (not READ_ONLY) then -- write access - dirty_o <= '1'; -- cache block is dirty now - we_o <= req_i.ben; -- finalize write access + ctrl_nxt.buf_req <= '0'; -- access (about to be) completed + host_rsp_o.data <= cache_i.data; + if (cache_i.sta_hit = '1') then + if (host_req_i.rw = '0') then -- read access + host_rsp_o.ack <= '1'; + else -- write access + cache_o.cmd_dir <= '1'; -- cache block is dirty now + cache_o.we <= host_req_i.ben; -- finalize write access + host_rsp_o.ack <= '1'; end if; - rsp_o.ack <= not rstat_i; -- data word fine? - rsp_o.err <= rstat_i; -- data word faulty? - ctrl.state_nxt <= S_IDLE; + ctrl_nxt.state <= S_IDLE; else -- cache miss - bus_miss_o <= '1'; -- trigger bus unit: cache miss - ctrl.state_nxt <= S_WAIT_MISS; + ctrl_nxt.state <= S_MISS; + end if; + + when S_MISS => -- check if accessed block is dirty (cache address is still applied by host controller!) + -- ------------------------------------------------------------ + ctrl_nxt.buf_req <= '0'; -- access (about to be) completed + addr_nxt.ofs <= (others => '0'); -- align block base address for upload/download + addr_nxt.idx <= host_req_i.addr((offset_size_c+2+index_size_c)-1 downto offset_size_c+2); -- index of referenced block + ctrl_nxt.upret <= S_MISS; -- come back here after UPLOAD + -- + if (cache_i.sta_dir = '1') and (READ_ONLY = false) then -- block is dirty, upload first + addr_nxt.tag <= cache_i.sta_tag(31 downto 32-tag_size_c); -- tag of accessed block + ctrl_nxt.state <= S_UPLOAD_GET; + else -- block is clean, replace by new block + addr_nxt.tag <= host_req_i.addr(31 downto 32-tag_size_c); -- tag of referenced block + ctrl_nxt.state <= S_DOWNLOAD_REQ; + end if; + + + when S_DOWNLOAD_REQ => -- download new cache block: request new word + -- ------------------------------------------------------------ + cache_o.addr <= addr.tag & addr.idx & addr.ofs & "00"; + cache_o.data <= bus_rsp_i.data; + bus_req_o.rw <= '0'; -- read access + bus_req_o.stb <= '1'; -- request new transfer + ctrl_nxt.state <= S_DOWNLOAD_RSP; + + when S_DOWNLOAD_RSP => -- download new cache block: wait for bus response + -- ------------------------------------------------------------ + cache_o.addr <= addr.tag & addr.idx & addr.ofs & "00"; + cache_o.data <= bus_rsp_i.data; + cache_o.cmd_new <= '1'; -- set new block (set tag, make valid, make clean) + bus_req_o.rw <= '0'; -- read access + if (bus_rsp_i.err = '1') then -- + ctrl_nxt.state <= S_DOWNLOAD_ERR; + elsif (bus_rsp_i.ack = '1') then + cache_o.we <= (others => '1'); -- cache: full-word write + addr_nxt.ofs <= std_ulogic_vector(unsigned(addr.ofs) + 1); + if (and_reduce_f(addr.ofs) = '1') then -- block completed + ctrl_nxt.state <= S_DOWNLOAD_DONE; + else -- get next word + ctrl_nxt.state <= S_DOWNLOAD_REQ; + end if; + end if; + + when S_DOWNLOAD_DONE => -- delay cycle for update of cache status + -- ------------------------------------------------------------ + ctrl_nxt.state <= S_CHECK; + + when S_DOWNLOAD_ERR => -- error during block download + -- ------------------------------------------------------------ + cache_o.cmd_inv <= '1'; -- this block in broken + ctrl_nxt.state <= S_ERROR; + + + when S_UPLOAD_GET => -- upload dirty cache block: read word from cache + -- ------------------------------------------------------------ + if (READ_ONLY = true) then + ctrl_nxt.state <= S_IDLE; + else + cache_o.addr <= addr.tag & addr.idx & addr.ofs & "00"; + bus_req_o.rw <= '1'; -- write access + ctrl_nxt.state <= S_UPLOAD_REQ; + end if; + + when S_UPLOAD_REQ => -- upload dirty cache block: request bus write + -- ------------------------------------------------------------ + if (READ_ONLY = true) then + ctrl_nxt.state <= S_IDLE; + else + cache_o.addr <= addr.tag & addr.idx & addr.ofs & "00"; + bus_req_o.rw <= '1'; -- write access + bus_req_o.stb <= '1'; -- request new transfer + ctrl_nxt.state <= S_UPLOAD_RSP; end if; - when S_WAIT_SYNC => -- wait for bus engine to handle cache sync + when S_UPLOAD_RSP => -- upload dirty cache block: wait for bus response + -- ------------------------------------------------------------ + if (READ_ONLY = true) then + ctrl_nxt.state <= S_IDLE; + else + cache_o.addr <= addr.tag & addr.idx & addr.ofs & "00"; + bus_req_o.rw <= '1'; -- write access + cache_o.cmd_new <= '1'; -- set new block (set tag, make valid, make clean) + if (bus_rsp_i.err = '1') then -- bus error (this is really bad...) + ctrl_nxt.state <= S_ERROR; + elsif (bus_rsp_i.ack = '1') then + addr_nxt.ofs <= std_ulogic_vector(unsigned(addr.ofs) + 1); + if (and_reduce_f(addr.ofs) = '1') then -- block completed + ctrl_nxt.state <= ctrl.upret; -- go back to "upload-done return state" + else -- get next word + ctrl_nxt.state <= S_UPLOAD_GET; + end if; + end if; + end if; + + + when S_FLUSH_START => -- start checking for dirty blocks + -- ------------------------------------------------------------ + cache_o.addr <= addr.tag & addr.idx & addr.ofs & "00"; + addr_nxt.idx <= (others => '0'); -- start with index 0 + ctrl_nxt.upret <= S_FLUSH_READ; -- come back to S_FLUSH_READ after block UPLOAD + ctrl_nxt.state <= S_FLUSH_READ; + + when S_FLUSH_READ => -- cache read access latency cycle -- ------------------------------------------------------------ - ctrl.sync_buf_nxt <= '0'; -- sync operation has been issued - if (bus_busy_i = '0') then - ctrl.state_nxt <= S_IDLE; + cache_o.addr <= addr.tag & addr.idx & addr.ofs & "00"; + ctrl_nxt.state <= S_FLUSH_CHECK; + + when S_FLUSH_CHECK => -- check if currently indexed block is dirty + -- ------------------------------------------------------------ + cache_o.addr <= addr.tag & addr.idx & addr.ofs & "00"; + addr_nxt.tag <= cache_i.sta_tag(31 downto 32-tag_size_c); -- tag of currently index block + cache_o.cmd_inv <= '1'; -- invalidate currently indexed block + if (cache_i.sta_dir = '1') and (READ_ONLY = false) then -- block dirty? + ctrl_nxt.state <= S_UPLOAD_GET; + else -- move on to next block + addr_nxt.idx <= std_ulogic_vector(unsigned(addr.idx) + 1); + if (and_reduce_f(addr.idx) = '1') then -- all blocks done + ctrl_nxt.state <= S_FLUSH_DONE; + else -- go to next block + ctrl_nxt.state <= S_FLUSH_READ; + end if; end if; - when S_WAIT_MISS => -- wait for bus engine to handle cache miss + when S_FLUSH_DONE => -- flush completed -- ------------------------------------------------------------ - if (bus_busy_i = '0') then - ctrl.state_nxt <= S_CHECK; -- redo cache access + if not READ_ONLY then + bus_req_o.fence <= '1'; -- forward fence request end if; + ctrl_nxt.buf_sync <= '0'; -- sync completed + ctrl_nxt.state <= S_IDLE; - when S_ERROR => -- access error + + when S_ERROR => -- error -- ------------------------------------------------------------ - rsp_o.err <= '1'; - ctrl.state_nxt <= S_IDLE; + host_rsp_o.err <= '1'; + ctrl_nxt.state <= S_IDLE; when others => -- undefined -- ------------------------------------------------------------ - ctrl.state_nxt <= S_IDLE; + ctrl_nxt.state <= S_IDLE; end case; end process ctrl_engine_comb; -end neorv32_cache_host_rtl; + -- Cache Memory Core (Cache Storage and Status Management) -------------------------------- + -- ------------------------------------------------------------------------------------------- + neorv32_cache_memory_inst: neorv32_cache_memory + generic map ( + NUM_BLOCKS => block_num_c, -- number of blocks (min 2), has to be a power of 2 + BLOCK_SIZE => block_size_c, -- block size in bytes (min 4), has to be a power of 2 + READ_ONLY => READ_ONLY -- cache is read-only (for host) + ) + port map ( + -- global control -- + rstn_i => rstn_i, -- global reset, async, low-active + clk_i => clk_i, -- global clock, rising edge + -- management -- + inval_i => cache_o.cmd_inv, -- make accessed block invalid + new_i => cache_o.cmd_new, -- make accessed block valid, clean and set tag + dirty_i => cache_o.cmd_dir, -- make accessed block dirty + -- status -- + hit_o => cache_i.sta_hit, -- cache hit + dirty_o => cache_i.sta_dir, -- accessed block is dirty + tag_o => cache_i.sta_tag, -- tag of current block (MSB-aligned) + clean_o => cache_i.sta_cln, -- cache is clean (global status) + -- cache access -- + addr_i => cache_o.addr, -- access address + we_i => cache_o.we, -- byte-wide data write enable + wdata_i => cache_o.data, -- write data + rdata_o => cache_i.data -- read data + ); + +end neorv32_cache_rtl; -- ================================================================================ -- @@ -547,24 +441,22 @@ entity neorv32_cache_memory is ); port ( -- global control -- - rstn_i : in std_ulogic; -- global reset, async, low-active - clk_i : in std_ulogic; -- global clock, rising edge + rstn_i : in std_ulogic; -- global reset, async, low-active + clk_i : in std_ulogic; -- global clock, rising edge -- management -- - inval_i : in std_ulogic; -- make accessed block invalid - new_i : in std_ulogic; -- make accessed block valid, clean and set tag - dirty_i : in std_ulogic; -- make accessed block dirty + inval_i : in std_ulogic; -- make accessed block invalid + new_i : in std_ulogic; -- make accessed block valid, clean and set tag + dirty_i : in std_ulogic; -- make accessed block dirty -- status -- - hit_o : out std_ulogic; -- cache hit - dirty_o : out std_ulogic; -- accessed block is dirty - base_o : out std_ulogic_vector(31 downto 0); -- base address of current block + hit_o : out std_ulogic; -- cache hit + dirty_o : out std_ulogic; -- accessed block is dirty + tag_o : out std_ulogic_vector(31 downto 0); -- tag of current block (MSB-aligned) + clean_o : out std_ulogic; -- cache is clean (global status) -- cache access -- - addr_i : in std_ulogic_vector(31 downto 0); -- access address - we_i : in std_ulogic_vector(3 downto 0); -- byte-wide data write enable - swe_i : in std_ulogic; -- status write enable - wdata_i : in std_ulogic_vector(31 downto 0); -- write data - wstat_i : in std_ulogic; -- write status - rdata_o : out std_ulogic_vector(31 downto 0); -- read data - rstat_o : out std_ulogic -- read status + addr_i : in std_ulogic_vector(31 downto 0); -- access address + we_i : in std_ulogic_vector(3 downto 0); -- byte-wide data write enable + wdata_i : in std_ulogic_vector(31 downto 0); -- write data + rdata_o : out std_ulogic_vector(31 downto 0) -- read data ); end neorv32_cache_memory; @@ -576,26 +468,21 @@ architecture neorv32_cache_memory_rtl of neorv32_cache_memory is constant tag_size_c : natural := 32 - (offset_size_c + index_size_c + 2); -- 2 additional bits for byte offset -- status flag memory -- - signal valid_mem, dirty_mem : std_ulogic_vector(NUM_BLOCKS-1 downto 0); + signal valid_mem, dirty_mem : std_ulogic_vector(NUM_BLOCKS-1 downto 0); signal valid_mem_rd, dirty_mem_rd : std_ulogic; -- tag memory -- type tag_mem_t is array (0 to NUM_BLOCKS-1) of std_ulogic_vector(tag_size_c-1 downto 0); - signal tag_mem : tag_mem_t; + signal tag_mem : tag_mem_t; signal tag_mem_rd : std_ulogic_vector(tag_size_c-1 downto 0); -- cache data memory -- type data_mem_t is array (0 to (NUM_BLOCKS * (BLOCK_SIZE/4))-1) of std_ulogic_vector(7 downto 0); signal data_mem_b0, data_mem_b1, data_mem_b2, data_mem_b3 : data_mem_t; -- byte-wide sub-memories - signal data_mem_rd : std_ulogic_vector(31 downto 0); - - -- cache data status memory (used for the bus error response - just mark individual words as faults and not the entire block) -- - signal stat_mem : std_ulogic_vector((NUM_BLOCKS * (BLOCK_SIZE/4))-1 downto 0); - signal stat_mem_rd : std_ulogic; -- access address decomposition -- - signal acc_tag, acc_tag_ff : std_ulogic_vector(tag_size_c-1 downto 0); - signal acc_idx, acc_idx_ff : std_ulogic_vector(index_size_c-1 downto 0); + signal acc_tag : std_ulogic_vector(tag_size_c-1 downto 0); + signal acc_idx : std_ulogic_vector(index_size_c-1 downto 0); signal acc_off : std_ulogic_vector(offset_size_c-1 downto 0); signal acc_adr : std_ulogic_vector((index_size_c+offset_size_c)-1 downto 0); @@ -608,26 +495,16 @@ begin acc_off <= addr_i(2+(offset_size_c-1) downto 2); acc_adr <= acc_idx & acc_off; - -- access buffer (tag + index) -- - access_buffer: process(rstn_i, clk_i) - begin - if (rstn_i = '0') then - acc_tag_ff <= (others => '0'); - acc_idx_ff <= (others => '0'); - elsif rising_edge(clk_i) then - acc_tag_ff <= acc_tag; - acc_idx_ff <= acc_idx; - end if; - end process access_buffer; - -- Status Flag Memory --------------------------------------------------------------------- -- ------------------------------------------------------------------------------------------- status_memory: process(rstn_i, clk_i) begin if (rstn_i = '0') then - valid_mem <= (others => '0'); - dirty_mem <= (others => '0'); + valid_mem <= (others => '0'); + dirty_mem <= (others => '0'); + valid_mem_rd <= '0'; + dirty_mem_rd <= '0'; elsif rising_edge(clk_i) then if (new_i = '1') then -- set new block valid_mem(to_integer(unsigned(acc_idx))) <= '1'; -- valid @@ -636,7 +513,7 @@ begin if (inval_i = '1') then -- invalidate current block valid_mem(to_integer(unsigned(acc_idx))) <= '0'; end if; - if (dirty_i = '1') then -- make current block dirty + if (dirty_i = '1') and (READ_ONLY = false) then -- make current block dirty dirty_mem(to_integer(unsigned(acc_idx))) <= '1'; end if; end if; @@ -659,16 +536,27 @@ begin end if; end process tag_memory; + -- tag of accessed block -- + tag_o(31 downto 31-(tag_size_c-1)) <= tag_mem_rd; + tag_o(31-tag_size_c downto 0) <= (others => '0'); + -- Access Status (1 Cycle Latency) -------------------------------------------------------- -- ------------------------------------------------------------------------------------------- - hit_o <= '1' when (valid_mem_rd = '1') and (tag_mem_rd = acc_tag_ff) else '0'; -- cache access hit - dirty_o <= '1' when (valid_mem_rd = '1') and (dirty_mem_rd = '1') and (not READ_ONLY) else '0'; -- accessed block is dirty + hit_o <= '1' when (valid_mem_rd = '1') and (tag_mem_rd = acc_tag) else '0'; -- cache access hit + dirty_o <= '1' when (valid_mem_rd = '1') and (dirty_mem_rd = '1') and (READ_ONLY = false) else '0'; -- block is dirty - -- base address of accessed block -- - base_o(31 downto 31-(tag_size_c-1)) <= tag_mem_rd; - base_o(31-tag_size_c downto 2+offset_size_c) <= acc_idx_ff; - base_o(2+(offset_size_c-1) downto 0) <= (others => '0'); + -- cache is clean if all blocks are invalid -- + clean_read_only: + if READ_ONLY generate + clean_o <= '1' when (or_reduce_f(valid_mem) = '0') else '0'; + end generate; + + -- cache is clean if all valid blocks are clean -- + clean_read_write: + if not READ_ONLY generate + clean_o <= '1' when (or_reduce_f(valid_mem and dirty_mem) = '0') else '0'; + end generate; -- Cache Data Memory ---------------------------------------------------------------------- @@ -689,287 +577,13 @@ begin if (we_i(3) = '1') then data_mem_b3(to_integer(unsigned(acc_adr))) <= wdata_i(31 downto 24); end if; - if (swe_i = '1') then - stat_mem(to_integer(unsigned(acc_adr))) <= wstat_i; - end if; -- read access -- - data_mem_rd(07 downto 00) <= data_mem_b0(to_integer(unsigned(acc_adr))); - data_mem_rd(15 downto 08) <= data_mem_b1(to_integer(unsigned(acc_adr))); - data_mem_rd(23 downto 16) <= data_mem_b2(to_integer(unsigned(acc_adr))); - data_mem_rd(31 downto 24) <= data_mem_b3(to_integer(unsigned(acc_adr))); - stat_mem_rd <= stat_mem(to_integer(unsigned(acc_adr))); + rdata_o(7 downto 0) <= data_mem_b0(to_integer(unsigned(acc_adr))); + rdata_o(15 downto 8) <= data_mem_b1(to_integer(unsigned(acc_adr))); + rdata_o(23 downto 16) <= data_mem_b2(to_integer(unsigned(acc_adr))); + rdata_o(31 downto 24) <= data_mem_b3(to_integer(unsigned(acc_adr))); end if; end process cache_mem_access; - -- read-data + status -- - rdata_o <= data_mem_rd; - rstat_o <= stat_mem_rd and valid_mem_rd; - end neorv32_cache_memory_rtl; - - --- ================================================================================ -- --- NEORV32 CPU - Generic Cache: Bus Interface Unit -- --- -------------------------------------------------------------------------------- -- --- The NEORV32 RISC-V Processor - https://github.com/stnolting/neorv32 -- --- Copyright (c) NEORV32 contributors. -- --- Copyright (c) 2020 - 2025 Stephan Nolting. All rights reserved. -- --- Licensed under the BSD-3-Clause license, see LICENSE for details. -- --- SPDX-License-Identifier: BSD-3-Clause -- --- ================================================================================ -- - -library ieee; -use ieee.std_logic_1164.all; -use ieee.numeric_std.all; - -library neorv32; -use neorv32.neorv32_package.all; - -entity neorv32_cache_bus is - generic ( - NUM_BLOCKS : natural; -- number of blocks (min 2), has to be a power of 2 - BLOCK_SIZE : natural; -- block size in bytes (min 4), has to be a power of 2 - READ_ONLY : boolean -- read-only bus accesses - ); - port ( - -- global control -- - rstn_i : in std_ulogic; -- global reset, async, low-active - clk_i : in std_ulogic; -- global clock, rising edge - -- host access port -- - host_req_i : in bus_req_t; -- request - -- bus access port -- - bus_req_o : out bus_req_t; -- request - bus_rsp_i : in bus_rsp_t; -- response - -- operation interface -- - cmd_sync_i : in std_ulogic; -- sync cache and main memory - cmd_miss_i : in std_ulogic; -- cache miss - cmd_busy_o : out std_ulogic; -- bus operation in progress - -- cache status interface -- - inval_o : out std_ulogic; -- invalidate accessed block - new_o : out std_ulogic; -- set new cache entry - dirty_i : in std_ulogic; -- accessed block is dirty - base_i : in std_ulogic_vector(31 downto 0); -- base address of accessed block - -- cache data interface -- - addr_o : out std_ulogic_vector(31 downto 0); -- access address - we_o : out std_ulogic_vector(3 downto 0); -- byte-wide data write enable - swe_o : out std_ulogic; -- status write enable - wdata_o : out std_ulogic_vector(31 downto 0); -- write data - wstat_o : out std_ulogic; -- write status - rdata_i : in std_ulogic_vector(31 downto 0) -- read data - ); -end neorv32_cache_bus; - -architecture neorv32_cache_bus_rtl of neorv32_cache_bus is - - -- cache layout -- - constant offset_size_c : natural := index_size_f(BLOCK_SIZE/4); -- WORD offset! - constant index_size_c : natural := index_size_f(NUM_BLOCKS); - constant tag_size_c : natural := 32 - (offset_size_c + index_size_c + 2); - - -- control fsm -- - type state_t is (S_IDLE, S_CHECK, S_DOWNLOAD_REQ, S_DOWNLOAD_RSP, S_UPLOAD_GET, - S_UPLOAD_REQ, S_UPLOAD_RSP, S_FLUSH_START, S_FLUSH_READ, S_FLUSH_CHECK); - signal state, upret, state_nxt, upret_nxt: state_t; - - -- address generator -- - type addr_t is record - tag : std_ulogic_vector(tag_size_c-1 downto 0); - idx : std_ulogic_vector(index_size_c-1 downto 0); - ofs : std_ulogic_vector(offset_size_c-1 downto 0); -- WORD offset! - end record; - signal haddr, baddr, addr, addr_nxt : addr_t; - -begin - - -- Address Decomposition ------------------------------------------------------------------ - -- ------------------------------------------------------------------------------------------- - -- base address of original host access -- - haddr.tag <= host_req_i.addr(31 downto (32-tag_size_c)); - haddr.idx <= (others => '0'); -- unused - haddr.ofs <= (others => '0'); -- unused - - -- base address of indexed cache block -- - baddr.tag <= base_i(31 downto (32-tag_size_c)); - baddr.idx <= base_i((offset_size_c+2+index_size_c)-1 downto offset_size_c+2); - baddr.ofs <= (others => '0'); -- unused - - - -- Control Engine FSM Sync ---------------------------------------------------------------- - -- ------------------------------------------------------------------------------------------- - ctrl_engine_sync: process(rstn_i, clk_i) - begin - if (rstn_i = '0') then - state <= S_IDLE; - upret <= S_IDLE; - addr.tag <= (others => '0'); - addr.idx <= (others => '0'); - addr.ofs <= (others => '0'); - elsif rising_edge(clk_i) then - state <= state_nxt; - upret <= upret_nxt; - addr <= addr_nxt; - end if; - end process ctrl_engine_sync; - - - -- Control Engine FSM Comb ---------------------------------------------------------------- - -- ------------------------------------------------------------------------------------------- - ctrl_engine_comb: process(state, upret, addr, haddr, baddr, host_req_i, bus_rsp_i, cmd_sync_i, cmd_miss_i, rdata_i, dirty_i) - begin - -- control engine defaults -- - state_nxt <= state; - upret_nxt <= upret; - addr_nxt <= addr; - - -- cache access defaults -- - addr_o <= addr.tag & addr.idx & addr.ofs & "00"; -- always word-aligned - we_o <= (others => '0'); - swe_o <= '0'; - wdata_o <= bus_rsp_i.data; - wstat_o <= bus_rsp_i.err; - - -- cache command defaults -- - inval_o <= '0'; - new_o <= '0'; - - -- bus interface defaults -- - bus_req_o <= req_terminate_c; -- all-zero - bus_req_o.addr <= addr.tag & addr.idx & addr.ofs & "00"; -- always word-aligned - bus_req_o.data <= rdata_i; - bus_req_o.ben <= (others => '1'); -- full-word writes only - bus_req_o.src <= '0'; -- cache accesses are always data accesses - bus_req_o.priv <= '0'; -- cache accesses are always "unprivileged" accesses - bus_req_o.amo <= '0'; -- cache accesses can never be an atomic memory operation - bus_req_o.amoop <= (others => '0'); -- cache accesses can never be an atomic memory operation - bus_req_o.debug <= host_req_i.debug; - if (state = S_IDLE) then - bus_req_o.sleep <= host_req_i.sleep; - else - bus_req_o.sleep <= '0'; - end if; - - -- fsm -- - case state is - - when S_IDLE => -- wait for request - -- ------------------------------------------------------------ - addr_nxt.ofs <= (others => '0'); -- align block base address for upload/download (and flush) - if (cmd_sync_i = '1') then -- cache sync - state_nxt <= S_FLUSH_START; - elsif (cmd_miss_i = '1') then -- cache miss - state_nxt <= S_CHECK; - end if; - - when S_CHECK => -- check if accessed block is dirty (cache address is still applied by host controller!) - -- ------------------------------------------------------------ - upret_nxt <= S_DOWNLOAD_REQ; -- go straight to S_DOWNLOAD_REQ when S_UPLOAD_GET has completed (if executed) - addr_nxt.idx <= baddr.idx; -- index of reference cache block - if (dirty_i = '1') and (not READ_ONLY) then -- block is dirty, upload first - addr_nxt.tag <= baddr.tag; -- base address (tag + index) of accessed block - state_nxt <= S_UPLOAD_GET; - else -- block is clean, download new block - addr_nxt.tag <= haddr.tag; -- base address (tag + index) of requested block - state_nxt <= S_DOWNLOAD_REQ; - end if; - - - when S_DOWNLOAD_REQ => -- download new cache block: request new word - -- ------------------------------------------------------------ - bus_req_o.rw <= '0'; -- read access - bus_req_o.stb <= '1'; -- request new transfer - state_nxt <= S_DOWNLOAD_RSP; - - when S_DOWNLOAD_RSP => -- download new cache block: wait for bus response - -- ------------------------------------------------------------ - bus_req_o.rw <= '0'; -- read access - we_o <= (others => '1'); -- cache: full-word write (write all the time until ACK/ERR) - swe_o <= '1'; -- cache: write status bit (bus error response) - new_o <= '1'; -- set new block (set tag, make valid, make clean) - if (bus_rsp_i.ack = '1') or (bus_rsp_i.err = '1') then -- wait for response - addr_nxt.ofs <= std_ulogic_vector(unsigned(addr.ofs) + 1); - if (and_reduce_f(addr.ofs) = '1') then -- block completed? offset will be all-zero again after block completion - state_nxt <= S_IDLE; - else -- get next word - state_nxt <= S_DOWNLOAD_REQ; - end if; - end if; - - - when S_UPLOAD_GET => -- upload dirty cache block: read word from cache - -- ------------------------------------------------------------ - if READ_ONLY then - state_nxt <= S_IDLE; - else - bus_req_o.rw <= '1'; -- write access - state_nxt <= S_UPLOAD_REQ; - end if; - - when S_UPLOAD_REQ => -- upload dirty cache block: request bus write - -- ------------------------------------------------------------ - if READ_ONLY then - state_nxt <= S_IDLE; - else - bus_req_o.rw <= '1'; -- write access - bus_req_o.stb <= '1'; -- request new transfer - state_nxt <= S_UPLOAD_RSP; - end if; - - when S_UPLOAD_RSP => -- upload dirty cache block: wait for bus response - -- ------------------------------------------------------------ - if READ_ONLY then - state_nxt <= S_IDLE; - else - bus_req_o.rw <= '1'; -- write access - new_o <= '1'; -- set new block (set tag, make valid, make clean) - if (bus_rsp_i.ack = '1') or (bus_rsp_i.err = '1') then -- wait for response - addr_nxt.ofs <= std_ulogic_vector(unsigned(addr.ofs) + 1); - if (and_reduce_f(addr.ofs) = '1') then -- block completed? offset will be all-zero again after block completion - state_nxt <= upret; -- go back to "upload-done return state" - else -- get next word - state_nxt <= S_UPLOAD_GET; - end if; - end if; - end if; - - - when S_FLUSH_START => -- start checking for dirty blocks - -- ------------------------------------------------------------ - addr_nxt.idx <= (others => '0'); -- start with index 0 - bus_req_o.fence <= bool_to_ulogic_f(READ_ONLY); -- forward fence request - upret_nxt <= S_FLUSH_CHECK; -- come back to S_FLUSH_CHECK after block upload - state_nxt <= S_FLUSH_READ; - - when S_FLUSH_READ => -- cache read access latency cycle - -- ------------------------------------------------------------ - state_nxt <= S_FLUSH_CHECK; - - when S_FLUSH_CHECK => -- check if currently indexed block is dirty - -- ------------------------------------------------------------ - addr_nxt.tag <= baddr.tag; -- tag of currently index block - inval_o <= '1'; -- invalidate currently index block - if (dirty_i = '1') and (not READ_ONLY) then -- block dirty? - state_nxt <= S_UPLOAD_GET; - else -- move on to next block - addr_nxt.idx <= std_ulogic_vector(unsigned(addr.idx) + 1); - if (and_reduce_f(addr.idx) = '1') then -- all blocks done? - bus_req_o.fence <= not bool_to_ulogic_f(READ_ONLY); -- forward fence request - state_nxt <= S_IDLE; - else -- go to next block - state_nxt <= S_FLUSH_READ; - end if; - end if; - - - when others => -- undefined - -- ------------------------------------------------------------ - state_nxt <= S_IDLE; - - end case; - end process ctrl_engine_comb; - - -- bus arbiter operation in progress -- - cmd_busy_o <= '0' when (state = S_IDLE) else '1'; - - -end neorv32_cache_bus_rtl; diff --git a/rtl/core/neorv32_cpu.vhd b/rtl/core/neorv32_cpu.vhd index 7e59f46bb..6312e3843 100644 --- a/rtl/core/neorv32_cpu.vhd +++ b/rtl/core/neorv32_cpu.vhd @@ -47,7 +47,7 @@ entity neorv32_cpu is RISCV_ISA_Zkne : boolean; -- implement cryptography NIST AES encryption extension RISCV_ISA_Zknh : boolean; -- implement cryptography NIST hash extension RISCV_ISA_Zksed : boolean; -- implement ShangMi hash extension - RISCV_ISA_Zksh : boolean; -- implement ShangMi block cypher extension + RISCV_ISA_Zksh : boolean; -- implement ShangMi block cipher extension RISCV_ISA_Zmmul : boolean; -- implement multiply-only M sub-extension RISCV_ISA_Zxcfu : boolean; -- implement custom (instr.) functions unit RISCV_ISA_Sdext : boolean; -- implement external debug mode extension @@ -69,23 +69,25 @@ entity neorv32_cpu is ); port ( -- global control -- - clk_i : in std_ulogic; -- global clock, rising edge - rstn_i : in std_ulogic; -- global reset, low-active, async + clk_i : in std_ulogic; -- global clock, rising edge + rstn_i : in std_ulogic; -- global reset, low-active, async -- interrupts -- - msi_i : in std_ulogic; -- risc-v machine software interrupt - mei_i : in std_ulogic; -- risc-v machine external interrupt - mti_i : in std_ulogic; -- risc-v machine timer interrupt - firq_i : in std_ulogic_vector(15 downto 0); -- custom fast interrupts - dbi_i : in std_ulogic; -- risc-v debug halt request interrupt + msi_i : in std_ulogic; -- risc-v machine software interrupt + mei_i : in std_ulogic; -- risc-v machine external interrupt + mti_i : in std_ulogic; -- risc-v machine timer interrupt + firq_i : in std_ulogic_vector(15 downto 0); -- custom fast interrupts + dbi_i : in std_ulogic; -- risc-v debug halt request interrupt -- inter-core communication links -- - icc_tx_o : out icc_t; -- TX links - icc_rx_i : in icc_t; -- RX links + icc_tx_o : out icc_t; -- TX links + icc_rx_i : in icc_t; -- RX links -- instruction bus interface -- - ibus_req_o : out bus_req_t; -- request bus - ibus_rsp_i : in bus_rsp_t; -- response bus + ibus_req_o : out bus_req_t; -- request bus + ibus_rsp_i : in bus_rsp_t; -- response bus -- data bus interface -- - dbus_req_o : out bus_req_t; -- request bus - dbus_rsp_i : in bus_rsp_t -- response bus + dbus_req_o : out bus_req_t; -- request bus + dbus_rsp_i : in bus_rsp_t; -- response bus + -- memory synchronization -- + mem_sync_i : in std_ulogic -- synchronization operation done ); end neorv32_cpu; @@ -238,7 +240,7 @@ begin RISCV_ISA_Zkne => RISCV_ISA_Zkne, -- implement cryptography NIST AES encryption extension RISCV_ISA_Zknh => RISCV_ISA_Zknh, -- implement cryptography NIST hash extension RISCV_ISA_Zks => riscv_zks_c, -- ShangMi algorithm suite available - RISCV_ISA_Zksed => RISCV_ISA_Zksed, -- implement ShangMi block cypher extension + RISCV_ISA_Zksed => RISCV_ISA_Zksed, -- implement ShangMi block cipher extension RISCV_ISA_Zksh => RISCV_ISA_Zksh, -- implement ShangMi hash extension RISCV_ISA_Zkt => riscv_zkt_c, -- data-independent execution time available (for cryptographic operations) RISCV_ISA_Zmmul => RISCV_ISA_Zmmul, -- implement multiply-only M sub-extension @@ -289,7 +291,9 @@ begin -- load/store unit interface -- lsu_wait_i => lsu_wait, -- wait for data bus lsu_mar_i => lsu_mar, -- memory address register - lsu_err_i => lsu_err -- alignment/access errors + lsu_err_i => lsu_err, -- alignment/access errors + -- memory synchronization -- + mem_sync_i => mem_sync_i -- synchronization operation done ); -- RISC-V machine interrupts -- diff --git a/rtl/core/neorv32_cpu_control.vhd b/rtl/core/neorv32_cpu_control.vhd index 53e7b5dc1..a163fd24f 100644 --- a/rtl/core/neorv32_cpu_control.vhd +++ b/rtl/core/neorv32_cpu_control.vhd @@ -106,7 +106,9 @@ entity neorv32_cpu_control is -- load/store unit interface -- lsu_wait_i : in std_ulogic; -- wait for data bus lsu_mar_i : in std_ulogic_vector(XLEN-1 downto 0); -- memory address register - lsu_err_i : in std_ulogic_vector(3 downto 0) -- alignment/access errors + lsu_err_i : in std_ulogic_vector(3 downto 0); -- alignment/access errors + -- memory synchronization -- + mem_sync_i : in std_ulogic -- synchronization operation done ); end neorv32_cpu_control; @@ -153,7 +155,7 @@ architecture neorv32_cpu_control_rtl of neorv32_cpu_control is -- instruction execution engine -- type exe_engine_state_t is (EX_DISPATCH, EX_TRAP_ENTER, EX_TRAP_EXIT, EX_RESTART, EX_SLEEP, EX_EXECUTE, - EX_ALU_WAIT, EX_BRANCH, EX_BRANCHED, EX_SYSTEM, EX_MEM_REQ, EX_MEM_RSP); + EX_ALU_WAIT, EX_FENCE, EX_BRANCH, EX_BRANCHED, EX_SYSTEM, EX_MEM_REQ, EX_MEM_RSP); type exe_engine_t is record state : exe_engine_state_t; ir : std_ulogic_vector(31 downto 0); -- instruction word being executed right now @@ -161,6 +163,7 @@ architecture neorv32_cpu_control_rtl of neorv32_cpu_control is pc : std_ulogic_vector(XLEN-1 downto 0); -- current PC (current instruction) pc2 : std_ulogic_vector(XLEN-1 downto 0); -- next PC (next linear instruction) ra : std_ulogic_vector(XLEN-1 downto 0); -- return address + msync : std_ulogic; -- memory synchronization completed end record; signal exe_engine, exe_engine_nxt : exe_engine_t; @@ -308,7 +311,7 @@ begin fetch_engine.state <= IF_RESTART; fetch_engine.restart <= '1'; -- reset IPB and issue engine fetch_engine.pc <= (others => '0'); - fetch_engine.priv <= '0'; + fetch_engine.priv <= priv_mode_m_c; elsif rising_edge(clk_i) then case fetch_engine.state is @@ -364,16 +367,15 @@ begin ipb.we(1) <= '1' when (fetch_engine.state = IF_PENDING) and (fetch_engine.resp = '1') else '0'; -- bus access meta data -- - ibus_req_o.priv <= fetch_engine.priv; -- current effective privilege level ibus_req_o.data <= (others => '0'); -- read-only ibus_req_o.ben <= (others => '0'); -- read-only ibus_req_o.rw <= '0'; -- read-only - ibus_req_o.src <= '1'; -- source = instruction fetch + ibus_req_o.src <= '1'; -- always "instruction fetch" access + ibus_req_o.priv <= fetch_engine.priv; -- current effective privilege level + ibus_req_o.debug <= debug_ctrl.run; -- debug mode, valid without STB being set ibus_req_o.amo <= '0'; -- cannot be an atomic memory operation ibus_req_o.amoop <= (others => '0'); -- cannot be an atomic memory operation ibus_req_o.fence <= ctrl.if_fence; -- fence operation, valid without STB being set - ibus_req_o.sleep <= sleep_mode; -- sleep mode, valid without STB being set - ibus_req_o.debug <= debug_ctrl.run; -- debug mode, valid without STB being set -- Instruction Prefetch Buffer (FIFO) ----------------------------------------------------- @@ -555,6 +557,7 @@ begin exe_engine.pc <= BOOT_ADDR(XLEN-1 downto 2) & "00"; -- 32-bit-aligned boot address exe_engine.pc2 <= BOOT_ADDR(XLEN-1 downto 2) & "00"; -- 32-bit-aligned boot address exe_engine.ra <= (others => '0'); + exe_engine.msync <= '0'; elsif rising_edge(clk_i) then ctrl <= ctrl_nxt; exe_engine <= exe_engine_nxt; @@ -573,7 +576,7 @@ begin -- Execute Engine FSM Comb ---------------------------------------------------------------- -- ------------------------------------------------------------------------------------------- execute_engine_fsm_comb: process(exe_engine, debug_ctrl, trap_ctrl, hw_trigger_match, opcode, issue_engine, csr, - ctrl, alu_cp_done_i, lsu_wait_i, alu_add_i, branch_taken, pmp_fault_i) + ctrl, alu_cp_done_i, lsu_wait_i, alu_add_i, branch_taken, pmp_fault_i, mem_sync_i) variable funct3_v : std_ulogic_vector(2 downto 0); variable funct7_v : std_ulogic_vector(6 downto 0); begin @@ -588,6 +591,7 @@ begin exe_engine_nxt.pc <= exe_engine.pc; exe_engine_nxt.pc2 <= exe_engine.pc2; exe_engine_nxt.ra <= (others => '0'); -- output zero if not a branch instruction + exe_engine_nxt.msync <= mem_sync_i and (not ctrl.lsu_fence); issue_engine.ack <= '0'; fetch_engine.reset <= '0'; trap_ctrl.env_enter <= '0'; @@ -752,9 +756,8 @@ begin -- memory fence operations (execute even if illegal funct3) -- when opcode_fence_c => - ctrl_nxt.if_fence <= exe_engine.ir(instr_funct3_lsb_c); -- fence.i - ctrl_nxt.lsu_fence <= not exe_engine.ir(instr_funct3_lsb_c); -- fence - exe_engine_nxt.state <= EX_RESTART; -- reset instruction fetch + IPB (actually only required for fence.i) + ctrl_nxt.lsu_fence <= '1'; -- load/store fence (always executed) + exe_engine_nxt.state <= EX_FENCE; -- FPU: floating-point operations -- when opcode_fop_c => @@ -785,6 +788,17 @@ begin exe_engine_nxt.state <= EX_DISPATCH; end if; + when EX_FENCE => -- wait for LOAD/STORE memory synchronization + -- ------------------------------------------------------------ + if (exe_engine.msync = '1') then -- wait for pending synchronization request to complete + if (exe_engine.ir(instr_funct3_lsb_c) = '0') then -- fence + exe_engine_nxt.state <= EX_DISPATCH; + else -- fence.i + ctrl_nxt.if_fence <= '1'; -- instruction-fetch fence + exe_engine_nxt.state <= EX_RESTART; -- reset instruction fetch + IPB + end if; + end if; + when EX_BRANCH => -- update next PC on taken branches and jumps -- ------------------------------------------------------------ exe_engine_nxt.ra <= exe_engine.pc2(XLEN-1 downto 1) & '0'; -- output return address diff --git a/rtl/core/neorv32_cpu_lsu.vhd b/rtl/core/neorv32_cpu_lsu.vhd index 59a0907f2..9f7c37bbd 100644 --- a/rtl/core/neorv32_cpu_lsu.vhd +++ b/rtl/core/neorv32_cpu_lsu.vhd @@ -78,6 +78,7 @@ begin if (rstn_i = '0') then dbus_req_o.rw <= '0'; dbus_req_o.priv <= priv_mode_m_c; + dbus_req_o.debug <= '0'; dbus_req_o.amo <= '0'; dbus_req_o.amoop <= (others => '0'); dbus_req_o.data <= (others => '0'); @@ -87,6 +88,7 @@ begin -- type identifiers -- dbus_req_o.rw <= ctrl_i.lsu_rw; -- read/write dbus_req_o.priv <= ctrl_i.lsu_priv; -- privilege level + dbus_req_o.debug <= ctrl_i.cpu_debug; -- debug-mode access dbus_req_o.amo <= bool_to_ulogic_f(AMO_EN) and ctrl_i.ir_opcode(2); -- atomic memory operation dbus_req_o.amoop <= amo_cmd; -- data alignment + byte-enable -- @@ -108,11 +110,11 @@ begin end if; end process mem_do_reg; - dbus_req_o.src <= '0'; -- 0 = data access - dbus_req_o.fence <= ctrl_i.lsu_fence; -- out-of-band: this is valid without STB being set - dbus_req_o.sleep <= ctrl_i.cpu_sleep; -- out-of-band: this is valid without STB being set - dbus_req_o.debug <= ctrl_i.cpu_debug; -- out-of-band: this is valid without STB being set + -- hardwired signals -- + dbus_req_o.src <= '0'; -- always "data" access + -- out-of band signals -- + dbus_req_o.fence <= ctrl_i.lsu_fence; -- atomic memory access operation encoding -- amo_encode: process(ctrl_i.ir_funct12) diff --git a/rtl/core/neorv32_dma.vhd b/rtl/core/neorv32_dma.vhd index e7c9e6248..f2de69a80 100644 --- a/rtl/core/neorv32_dma.vhd +++ b/rtl/core/neorv32_dma.vhd @@ -303,11 +303,10 @@ begin dma_req_o.addr <= engine.src_addr when (engine.state = S_READ) else engine.dst_addr; dma_req_o.src <= '0'; -- source = data access dma_req_o.priv <= priv_mode_m_c; -- DMA accesses are always privileged + dma_req_o.debug <= '0'; -- can never ever be in debug mode dma_req_o.amo <= '0'; -- no atomic memory operation possible dma_req_o.amoop <= (others => '0'); -- no atomic memory operation possible - dma_req_o.fence <= '0'; -- no fences - dma_req_o.sleep <= '1' when (engine.state = S_IDLE) else '0'; -- idle = sleep mode - dma_req_o.debug <= '0'; -- can never ever be in debug mode + dma_req_o.fence <= '0'; -- address increment -- address_inc: process(cfg.qsel) diff --git a/rtl/core/neorv32_gpio.vhd b/rtl/core/neorv32_gpio.vhd index 2c3ba5613..c1fa1992a 100644 --- a/rtl/core/neorv32_gpio.vhd +++ b/rtl/core/neorv32_gpio.vhd @@ -144,4 +144,4 @@ begin end process irq_buffer; -end neorv32_gpio_rtl; \ No newline at end of file +end neorv32_gpio_rtl; diff --git a/rtl/core/neorv32_package.vhd b/rtl/core/neorv32_package.vhd index bcff4fcfb..853d93029 100644 --- a/rtl/core/neorv32_package.vhd +++ b/rtl/core/neorv32_package.vhd @@ -29,7 +29,7 @@ package neorv32_package is -- Architecture Constants ----------------------------------------------------------------- -- ------------------------------------------------------------------------------------------- - constant hw_version_c : std_ulogic_vector(31 downto 0) := x"01110007"; -- hardware version + constant hw_version_c : std_ulogic_vector(31 downto 0) := x"01110008"; -- hardware version constant archid_c : natural := 19; -- official RISC-V architecture ID constant XLEN : natural := 32; -- native data path width @@ -123,20 +123,19 @@ package neorv32_package is data : std_ulogic_vector(31 downto 0); -- write data ben : std_ulogic_vector(3 downto 0); -- byte enable stb : std_ulogic; -- request strobe, single-shot - rw : std_ulogic; -- 0=read, 1=write - src : std_ulogic; -- access source (1=instruction fetch, 0=data access) + rw : std_ulogic; -- 0 = read, 1 = write + src : std_ulogic; -- 0 = data access, 1 = instruction fetch priv : std_ulogic; -- set if privileged (machine-mode) access + debug : std_ulogic; -- set if debug mode access amo : std_ulogic; -- set if atomic memory operation amoop : std_ulogic_vector(3 downto 0); -- type of atomic memory operation -- out-of-band signals -- - fence : std_ulogic; -- set if fence(.i) request by upstream device, single-shot - sleep : std_ulogic; -- set if ALL upstream sources are in sleep mode - debug : std_ulogic; -- set if upstream device is in debug mode + fence : std_ulogic; -- set if fence(.i) operation, single-shot end record; -- bus response -- type bus_rsp_t is record - data : std_ulogic_vector(31 downto 0); -- read data, valid if ack=1 + data : std_ulogic_vector(31 downto 0); -- read data, valid if ack = 1 ack : std_ulogic; -- set if access acknowledge, single-shot err : std_ulogic; -- set if access error, single-shot, has priority over ack end record; @@ -150,11 +149,10 @@ package neorv32_package is rw => '0', src => '0', priv => '0', + debug => '0', amo => '0', amoop => (others => '0'), - fence => '0', - sleep => '1', - debug => '0' + fence => '0' ); -- endpoint (response) termination -- diff --git a/rtl/core/neorv32_sysinfo.vhd b/rtl/core/neorv32_sysinfo.vhd index 4986495bc..4d8b0eff5 100644 --- a/rtl/core/neorv32_sysinfo.vhd +++ b/rtl/core/neorv32_sysinfo.vhd @@ -115,7 +115,7 @@ begin sysinfo(2)(7) <= '0'; -- reserved sysinfo(2)(8) <= '1' when xcache_en_c else '0'; -- external bus interface cache implemented? sysinfo(2)(9) <= '0'; -- reserved - sysinfo(2)(10) <= '0'; -- reservedented? + sysinfo(2)(10) <= '0'; -- reserved sysinfo(2)(11) <= '1' when ocd_auth_en_c else '0'; -- on-chip debugger authentication implemented? sysinfo(2)(12) <= '1' when int_imem_rom_c else '0'; -- processor-internal instruction memory implemented as pre-initialized ROM? sysinfo(2)(13) <= '1' when IO_TWD_EN else '0'; -- two-wire device (TWD) implemented? diff --git a/rtl/core/neorv32_top.vhd b/rtl/core/neorv32_top.vhd index 4af56902d..6ee87077e 100644 --- a/rtl/core/neorv32_top.vhd +++ b/rtl/core/neorv32_top.vhd @@ -316,6 +316,10 @@ architecture neorv32_top_rtl of neorv32_top is signal iodev_req : iodev_req_t; signal iodev_rsp : iodev_rsp_t; + -- memory synchronization / ordering / coherence -- + signal mem_sync, dcache_clean : std_ulogic_vector(num_cores_c-1 downto 0); + signal xcache_clean : std_ulogic; + -- IRQs -- type firq_enum_t is ( FIRQ_TWD, FIRQ_UART0_RX, FIRQ_UART0_TX, FIRQ_UART1_RX, FIRQ_UART1_TX, FIRQ_SPI, FIRQ_SDI, FIRQ_TWI, @@ -542,9 +546,14 @@ begin ibus_rsp_i => cpu_i_rsp(i), -- data bus interface -- dbus_req_o => cpu_d_req(i), - dbus_rsp_i => cpu_d_rsp(i) + dbus_rsp_i => cpu_d_rsp(i), + -- memory synchronization -- + mem_sync_i => mem_sync(i) ); + -- memory synchronization (ordering / coherence) -- + mem_sync(i) <= dcache_clean(i) and xcache_clean; + -- CPU L1 Instruction Cache (I-Cache) ----------------------------------------------------- -- ------------------------------------------------------------------------------------------- @@ -555,12 +564,12 @@ begin NUM_BLOCKS => ICACHE_NUM_BLOCKS, BLOCK_SIZE => ICACHE_BLOCK_SIZE, UC_BEGIN => mem_uncached_begin_c(31 downto 28), - UC_ENABLE => true, READ_ONLY => true ) port map ( clk_i => clk_i, rstn_i => rstn_sys, + clean_o => open, -- cache is read-only so it cannot be dirty host_req_i => cpu_i_req(i), host_rsp_o => cpu_i_rsp(i), bus_req_o => icache_req(i), @@ -584,12 +593,12 @@ begin NUM_BLOCKS => DCACHE_NUM_BLOCKS, BLOCK_SIZE => DCACHE_BLOCK_SIZE, UC_BEGIN => mem_uncached_begin_c(31 downto 28), - UC_ENABLE => true, READ_ONLY => false ) port map ( clk_i => clk_i, rstn_i => rstn_sys, + clean_o => dcache_clean(i), host_req_i => cpu_d_req(i), host_rsp_o => cpu_d_rsp(i), bus_req_o => dcache_req(i), @@ -599,8 +608,9 @@ begin neorv32_dcache_disabled: if not DCACHE_EN generate - dcache_req(i) <= cpu_d_req(i); - cpu_d_rsp(i) <= dcache_rsp(i); + dcache_clean(i) <= '1'; + dcache_req(i) <= cpu_d_req(i); + cpu_d_rsp(i) <= dcache_rsp(i); end generate; @@ -613,15 +623,14 @@ begin PORT_B_READ_ONLY => true -- instruction fetch is read-only ) port map ( - clk_i => clk_i, - rstn_i => rstn_sys, - a_lock_i => '0', -- no exclusive accesses - a_req_i => dcache_req(i), -- data accesses are prioritized - a_rsp_o => dcache_rsp(i), - b_req_i => icache_req(i), - b_rsp_o => icache_rsp(i), - x_req_o => core_req(i), - x_rsp_i => core_rsp(i) + clk_i => clk_i, + rstn_i => rstn_sys, + a_req_i => dcache_req(i), -- data accesses are prioritized + a_rsp_o => dcache_rsp(i), + b_req_i => icache_req(i), + b_rsp_o => icache_rsp(i), + x_req_o => core_req(i), + x_rsp_i => core_rsp(i) ); end generate; -- /core_complex @@ -647,15 +656,14 @@ begin PORT_B_READ_ONLY => false ) port map ( - clk_i => clk_i, - rstn_i => rstn_sys, - a_lock_i => '0', - a_req_i => core_req(core_req'left), - a_rsp_o => core_rsp(core_rsp'left), - b_req_i => core_req(core_req'right), - b_rsp_o => core_rsp(core_rsp'right), - x_req_o => sys1_req, - x_rsp_i => sys1_rsp + clk_i => clk_i, + rstn_i => rstn_sys, + a_req_i => core_req(core_req'left), + a_rsp_o => core_rsp(core_rsp'left), + b_req_i => core_req(core_req'right), + b_rsp_o => core_rsp(core_rsp'right), + x_req_o => sys1_req, + x_rsp_i => sys1_rsp ); end generate; @@ -697,15 +705,14 @@ begin PORT_B_READ_ONLY => false ) port map ( - clk_i => clk_i, - rstn_i => rstn_sys, - a_lock_i => '0', -- no exclusive accesses - a_req_i => sys1_req, -- prioritized - a_rsp_o => sys1_rsp, - b_req_i => dma_req, - b_rsp_o => dma_rsp, - x_req_o => sys2_req, - x_rsp_i => sys2_rsp + clk_i => clk_i, + rstn_i => rstn_sys, + a_req_i => sys1_req, -- prioritized + a_rsp_o => sys1_rsp, + b_req_i => dma_req, + b_rsp_o => dma_rsp, + x_req_o => sys2_req, + x_rsp_i => sys2_rsp ); end generate; -- /neorv32_dma_complex_enabled @@ -876,12 +883,12 @@ begin NUM_BLOCKS => XBUS_CACHE_NUM_BLOCKS, BLOCK_SIZE => XBUS_CACHE_BLOCK_SIZE, UC_BEGIN => mem_uncached_begin_c(31 downto 28), - UC_ENABLE => true, READ_ONLY => false ) port map ( clk_i => clk_i, rstn_i => rstn_sys, + clean_o => xcache_clean, host_req_i => xbus_req, host_rsp_o => xbus_rsp, bus_req_o => xcache_req, @@ -891,22 +898,25 @@ begin neorv32_xcache_disabled: if not XBUS_CACHE_EN generate - xcache_req <= xbus_req; - xbus_rsp <= xcache_rsp; + xcache_clean <= '1'; + xcache_req <= xbus_req; + xbus_rsp <= xcache_rsp; end generate; end generate; -- /neorv32_xbus_enabled neorv32_xbus_disabled: if not XBUS_EN generate - xbus_rsp <= rsp_terminate_c; - xbus_adr_o <= (others => '0'); - xbus_dat_o <= (others => '0'); - xbus_tag_o <= (others => '0'); - xbus_we_o <= '0'; - xbus_sel_o <= (others => '0'); - xbus_stb_o <= '0'; - xbus_cyc_o <= '0'; + xcache_clean <= '1'; + xcache_req <= req_terminate_c; + xbus_rsp <= rsp_terminate_c; + xbus_adr_o <= (others => '0'); + xbus_dat_o <= (others => '0'); + xbus_tag_o <= (others => '0'); + xbus_we_o <= '0'; + xbus_sel_o <= (others => '0'); + xbus_stb_o <= '0'; + xbus_cyc_o <= '0'; end generate; end generate; -- /memory_system diff --git a/rtl/core/neorv32_wdt.vhd b/rtl/core/neorv32_wdt.vhd index b4827bc66..8c1fb0b9d 100644 --- a/rtl/core/neorv32_wdt.vhd +++ b/rtl/core/neorv32_wdt.vhd @@ -3,7 +3,7 @@ -- -------------------------------------------------------------------------------- -- -- The NEORV32 RISC-V Processor - https://github.com/stnolting/neorv32 -- -- Copyright (c) NEORV32 contributors. -- --- Copyright (c) 2020 - 2024 Stephan Nolting. All rights reserved. -- +-- Copyright (c) 2020 - 2025 Stephan Nolting. All rights reserved. -- -- Licensed under the BSD-3-Clause license, see LICENSE for details. -- -- SPDX-License-Identifier: BSD-3-Clause -- -- ================================================================================ -- @@ -37,12 +37,10 @@ architecture neorv32_wdt_rtl of neorv32_wdt is -- Control register bits -- constant ctrl_enable_c : natural := 0; -- r/w: WDT enable constant ctrl_lock_c : natural := 1; -- r/w: lock write access to control register when set - constant ctrl_dben_c : natural := 2; -- r/w: allow WDT to continue operation even when CPU is in debug mode - constant ctrl_sen_c : natural := 3; -- r/w: allow WDT to continue operation even when CPU is in sleep mode - constant ctrl_strict_c : natural := 4; -- r/w: force hardware reset if reset password is incorrect or if access to locked config - constant ctrl_rcause_lo_c : natural := 5; -- r/-: cause of last system reset - low - constant ctrl_rcause_hi_c : natural := 6; -- r/-: cause of last system reset - high ---constant ctrl_reserved_c : natural := 7; -- r/-: reserved + constant ctrl_strict_c : natural := 2; -- r/w: force hardware reset if reset password is incorrect or if access to locked config + constant ctrl_rcause_lo_c : natural := 3; -- r/-: cause of last system reset - low + constant ctrl_rcause_hi_c : natural := 4; -- r/-: cause of last system reset - high + -- constant ctrl_timeout_lsb_c : natural := 8; -- r/w: timeout value LSB constant ctrl_timeout_msb_c : natural := 31; -- r/w: timeout value MSB @@ -50,8 +48,6 @@ architecture neorv32_wdt_rtl of neorv32_wdt is type ctrl_t is record enable : std_ulogic; lock : std_ulogic; - dben : std_ulogic; - sen : std_ulogic; strict : std_ulogic; timeout : std_ulogic_vector(23 downto 0); end record; @@ -61,7 +57,6 @@ architecture neorv32_wdt_rtl of neorv32_wdt is signal cnt : std_ulogic_vector(23 downto 0); -- timeout counter signal cnt_started : std_ulogic; -- set when timeout counter has started signal cnt_inc : std_ulogic; -- increment counter when set - signal cnt_inc_ff : std_ulogic; signal cnt_timeout : std_ulogic; -- counter matches programmed timeout value signal reset_cause : std_ulogic_vector(1 downto 0); -- cause of last reset signal hw_rst_timeout : std_ulogic; -- trigger reset because of timeout @@ -79,8 +74,6 @@ begin bus_rsp_o <= rsp_terminate_c; ctrl.enable <= '0'; -- disable WDT after reset ctrl.lock <= '0'; -- unlock after reset - ctrl.dben <= '0'; - ctrl.sen <= '0'; ctrl.strict <= '0'; ctrl.timeout <= (others => '0'); reset_wdt <= '0'; @@ -100,8 +93,6 @@ begin if (ctrl.lock = '0') then -- update configuration only if not locked ctrl.enable <= bus_req_i.data(ctrl_enable_c); ctrl.lock <= bus_req_i.data(ctrl_lock_c) and ctrl.enable; -- lock only if already enabled - ctrl.dben <= bus_req_i.data(ctrl_dben_c); - ctrl.sen <= bus_req_i.data(ctrl_sen_c); ctrl.strict <= bus_req_i.data(ctrl_strict_c); ctrl.timeout <= bus_req_i.data(ctrl_timeout_msb_c downto ctrl_timeout_lsb_c); else -- write access attempt to locked CTRL register @@ -117,8 +108,6 @@ begin else -- read access bus_rsp_o.data(ctrl_enable_c) <= ctrl.enable; bus_rsp_o.data(ctrl_lock_c) <= ctrl.lock; - bus_rsp_o.data(ctrl_dben_c) <= ctrl.dben; - bus_rsp_o.data(ctrl_sen_c) <= ctrl.sen; bus_rsp_o.data(ctrl_rcause_hi_c downto ctrl_rcause_lo_c) <= reset_cause; bus_rsp_o.data(ctrl_strict_c) <= ctrl.strict; bus_rsp_o.data(ctrl_timeout_msb_c downto ctrl_timeout_lsb_c) <= ctrl.timeout; @@ -133,15 +122,15 @@ begin wdt_counter: process(rstn_sys_i, clk_i) begin if (rstn_sys_i = '0') then - cnt_inc_ff <= '0'; + cnt_inc <= '0'; cnt_started <= '0'; cnt <= (others => '0'); elsif rising_edge(clk_i) then - cnt_inc_ff <= cnt_inc; + cnt_inc <= prsc_tick and cnt_started; -- clock tick and started cnt_started <= ctrl.enable and (cnt_started or prsc_tick); -- start with next clock tick if (ctrl.enable = '0') or (reset_wdt = '1') then -- watchdog disabled or reset with correct password cnt <= (others => '0'); - elsif (cnt_inc_ff = '1') then + elsif (cnt_inc = '1') then cnt <= std_ulogic_vector(unsigned(cnt) + 1); end if; end if; @@ -151,11 +140,6 @@ begin clkgen_en_o <= ctrl.enable; -- enable clock generator prsc_tick <= clkgen_i(clk_div4096_c); -- clock enable tick - -- valid counter increment? -- - cnt_inc <= '1' when ((prsc_tick = '1') and (cnt_started = '1')) and -- clock tick and started - ((bus_req_i.debug = '0') or (ctrl.dben = '1')) and -- not in debug mode or allowed to run in debug mode - ((bus_req_i.sleep = '0') or (ctrl.sen = '1')) else '0'; -- not in sleep mode or allowed to run in sleep mode - -- timeout detector -- cnt_timeout <= '1' when (cnt_started = '1') and (cnt = ctrl.timeout) else '0'; diff --git a/rtl/core/neorv32_xbus.vhd b/rtl/core/neorv32_xbus.vhd index 0afd83d2b..cc65f37eb 100644 --- a/rtl/core/neorv32_xbus.vhd +++ b/rtl/core/neorv32_xbus.vhd @@ -150,7 +150,11 @@ begin xbus_sel_o <= bus_req.ben; xbus_stb_o <= bus_req.stb; xbus_cyc_o <= bus_req.stb or pending(1); - xbus_tag_o <= bus_req.src & '0' & bus_req.priv; -- instr/data, secure, privileged/unprivileged + + -- access meta data (compatible to AXI4 "xPROT") -- + xbus_tag_o(2) <= bus_req.src; -- 0 = data access, 1 = instruction fetch + xbus_tag_o(1) <= '0'; -- always "secure" access + xbus_tag_o(0) <= bus_req.priv or bus_req.debug; -- 0 = unprivileged access, 1 = privileged access -- response gating -- bus_rsp.data <= xbus_dat_i when (pending(1) = '1') else (others => '0'); diff --git a/rtl/file_list_soc.f b/rtl/file_list_soc.f index b66bda255..88c4ae4ec 100644 --- a/rtl/file_list_soc.f +++ b/rtl/file_list_soc.f @@ -17,8 +17,8 @@ NEORV32_RTL_PATH_PLACEHOLDER/core/neorv32_cpu_pmp.vhd NEORV32_RTL_PATH_PLACEHOLDER/core/neorv32_cpu_icc.vhd NEORV32_RTL_PATH_PLACEHOLDER/core/neorv32_cpu.vhd -NEORV32_RTL_PATH_PLACEHOLDER/core/neorv32_bus.vhd NEORV32_RTL_PATH_PLACEHOLDER/core/neorv32_cache.vhd +NEORV32_RTL_PATH_PLACEHOLDER/core/neorv32_bus.vhd NEORV32_RTL_PATH_PLACEHOLDER/core/neorv32_dma.vhd NEORV32_RTL_PATH_PLACEHOLDER/core/neorv32_application_image.vhd NEORV32_RTL_PATH_PLACEHOLDER/core/neorv32_imem.vhd diff --git a/sw/example/demo_wdt/main.c b/sw/example/demo_wdt/main.c index 95d7de38f..eaf617851 100644 --- a/sw/example/demo_wdt/main.c +++ b/sw/example/demo_wdt/main.c @@ -1,7 +1,7 @@ // ================================================================================ // // The NEORV32 RISC-V Processor - https://github.com/stnolting/neorv32 // // Copyright (c) NEORV32 contributors. // -// Copyright (c) 2020 - 2024 Stephan Nolting. All rights reserved. // +// Copyright (c) 2020 - 2025 Stephan Nolting. All rights reserved. // // Licensed under the BSD-3-Clause license, see LICENSE for details. // // SPDX-License-Identifier: BSD-3-Clause // // ================================================================================ // @@ -82,9 +82,9 @@ int main() { return -1; } - // setup watchdog: no lock, disable in debug mode, enable in sleep mode, enable strict mode + // setup watchdog: no lock, enable strict mode neorv32_uart0_puts("Starting WDT...\n"); - neorv32_wdt_setup(timeout, 0, 0, 1, 1); + neorv32_wdt_setup(timeout, 0, 1); // feed the watchdog diff --git a/sw/example/processor_check/main.c b/sw/example/processor_check/main.c index ee47ba02f..78c01197e 100644 --- a/sw/example/processor_check/main.c +++ b/sw/example/processor_check/main.c @@ -80,8 +80,6 @@ volatile uint32_t num_hpm_cnts_global = 0; // global number of available hpms volatile int vectored_mei_handler_ack = 0; // vectored mei trap handler acknowledge volatile uint32_t gpio_trap_handler_ack = 0; // gpio trap handler acknowledge volatile uint32_t hw_brk_mscratch_ok = 0; // set when mepc was correct in trap handler - - volatile uint32_t dma_src; // dma source & destination data volatile uint32_t store_access_addr[2]; // variable to test store accesses volatile uint32_t __attribute__((aligned(4))) pmp_access[2]; // variable to test pmp @@ -281,6 +279,32 @@ int main() { } + // ---------------------------------------------------------- + // Test fence instructions + // ---------------------------------------------------------- + neorv32_cpu_csr_write(CSR_MCAUSE, mcause_never_c); + PRINT_STANDARD("[%i] Fences ", cnt_test); + + cnt_test++; + + // test that we do no crash the core and check if cache flushing works + store_access_addr[0] = 0x01234567; + asm volatile ("fence"); + asm volatile ("fence.i"); + store_access_addr[0] += 0x76543210; + asm volatile ("fence"); + asm volatile ("fence.i"); + store_access_addr[0] += 0x11111111; + + if ((store_access_addr[0] == 0x88888888) && + (neorv32_cpu_csr_read(CSR_MCAUSE) == mcause_never_c)) { // no exception + test_ok(); + } + else { + test_fail(); + } + + // ---------------------------------------------------------- // Test standard RISC-V counters // ---------------------------------------------------------- diff --git a/sw/lib/include/neorv32_wdt.h b/sw/lib/include/neorv32_wdt.h index a7fab5c15..1ca449597 100644 --- a/sw/lib/include/neorv32_wdt.h +++ b/sw/lib/include/neorv32_wdt.h @@ -1,7 +1,7 @@ // ================================================================================ // // The NEORV32 RISC-V Processor - https://github.com/stnolting/neorv32 // // Copyright (c) NEORV32 contributors. // -// Copyright (c) 2020 - 2024 Stephan Nolting. All rights reserved. // +// Copyright (c) 2020 - 2025 Stephan Nolting. All rights reserved. // // Licensed under the BSD-3-Clause license, see LICENSE for details. // // SPDX-License-Identifier: BSD-3-Clause // // ================================================================================ // @@ -9,10 +9,6 @@ /** * @file neorv32_wdt.h * @brief Watchdog Timer (WDT) HW driver header file. - * - * @note These functions should only be used if the WDT unit was synthesized (IO_WDT_EN = true). - * - * @see https://stnolting.github.io/neorv32/sw/files.html */ #ifndef neorv32_wdt_h @@ -38,11 +34,9 @@ typedef volatile struct __attribute__((packed,aligned(4))) { enum NEORV32_WDT_CTRL_enum { WDT_CTRL_EN = 0, /**< WDT control register(0) (r/w): Watchdog enable */ WDT_CTRL_LOCK = 1, /**< WDT control register(1) (r/w): Lock write access to control register, clears on reset only */ - WDT_CTRL_DBEN = 2, /**< WDT control register(2) (r/w): Allow WDT to continue operation even when CPU is in debug mode */ - WDT_CTRL_SEN = 3, /**< WDT control register(3) (r/w): Allow WDT to continue operation even when CPU is in sleep mode */ - WDT_CTRL_STRICT = 4, /**< WDT control register(4) (r/w): Force hardware reset if reset password is incorrect or if write attempt to locked CTRL register */ - WDT_CTRL_RCAUSE_LO = 5, /**< WDT control register(5) (r/-): Cause of last system reset - low */ - WDT_CTRL_RCAUSE_HI = 6, /**< WDT control register(5) (r/-): Cause of last system reset - high */ + WDT_CTRL_STRICT = 2, /**< WDT control register(2) (r/w): Force hardware reset if reset password is incorrect or if write attempt to locked CTRL register */ + WDT_CTRL_RCAUSE_LO = 3, /**< WDT control register(3) (r/-): Cause of last system reset - low */ + WDT_CTRL_RCAUSE_HI = 4, /**< WDT control register(4) (r/-): Cause of last system reset - high */ WDT_CTRL_TIMEOUT_LSB = 8, /**< WDT control register(8) (r/w): Timeout value, LSB */ WDT_CTRL_TIMEOUT_MSB = 31 /**< WDT control register(31) (r/w): Timeout value, MSB */ @@ -72,7 +66,7 @@ enum NEORV32_WDT_RCAUSE_enum { **************************************************************************/ /**@{*/ int neorv32_wdt_available(void); -void neorv32_wdt_setup(uint32_t timeout, int lock, int debug_en, int sleep_en, int strict); +void neorv32_wdt_setup(uint32_t timeout, int lock, int strict); int neorv32_wdt_disable(void); void neorv32_wdt_feed(uint32_t password); int neorv32_wdt_get_cause(void); diff --git a/sw/lib/source/neorv32_wdt.c b/sw/lib/source/neorv32_wdt.c index 58cacee71..af807dbb9 100644 --- a/sw/lib/source/neorv32_wdt.c +++ b/sw/lib/source/neorv32_wdt.c @@ -38,11 +38,9 @@ int neorv32_wdt_available(void) { * @param[in] timeout 24-bit timeout value. A WDT IRQ is triggered when the internal counter reaches * 'timeout/2'. A system hardware reset is triggered when the internal counter reaches 'timeout'. * @param[in] lock Control register will be locked when 1 (until next reset). - * @param[in] debug_en Allow watchdog to continue operation even when CPU is in debug mode. - * @param[in] sleep_en Allow watchdog to continue operation even when CPU is in sleep mode. * @param[in] strict Force hardware reset if reset password is incorrect or if trying to alter a locked configuration. **************************************************************************/ -void neorv32_wdt_setup(uint32_t timeout, int lock, int debug_en, int sleep_en, int strict) { +void neorv32_wdt_setup(uint32_t timeout, int lock, int strict) { NEORV32_WDT->CTRL = 0; // reset and disable @@ -50,8 +48,6 @@ void neorv32_wdt_setup(uint32_t timeout, int lock, int debug_en, int sleep_en, i uint32_t ctrl = 0; ctrl |= ((uint32_t)(1)) << WDT_CTRL_EN; ctrl |= ((uint32_t)(timeout & 0xffffffU)) << WDT_CTRL_TIMEOUT_LSB; - ctrl |= ((uint32_t)(debug_en & 0x1U)) << WDT_CTRL_DBEN; - ctrl |= ((uint32_t)(sleep_en & 0x1U)) << WDT_CTRL_SEN; ctrl |= ((uint32_t)(strict & 0x1U)) << WDT_CTRL_STRICT; NEORV32_WDT->CTRL = ctrl; diff --git a/sw/svd/neorv32.svd b/sw/svd/neorv32.svd index febed041b..2d60918a4 100644 --- a/sw/svd/neorv32.svd +++ b/sw/svd/neorv32.svd @@ -1404,24 +1404,14 @@ [1:1] Lock write access to control register, clears on reset (HW or WDT) only - - WDT_CTRL_DBEN - [2:2] - Allow WDT to continue operation even when in debug mode - - - WDT_CTRL_SEN - [3:3] - Allow WDT to continue operation even when in sleep mode - WDT_CTRL_STRICT - [4:4] + [2:2] Force hardware reset if reset password is incorrect or if write attempt to locked CTRL register WDT_CTRL_RCAUSE - [6:5] + [4:3] read-only Cause of last system reset: 0=external reset, 1=OCD reset, 2=WDT reset, 3=WDT access violation