Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant Overhaul of the Interpreter's Timing Model #2235

Draft
wants to merge 370 commits into
base: master
Choose a base branch
from

Conversation

Jaklyy
Copy link
Contributor

@Jaklyy Jaklyy commented Dec 13, 2024

Heavily reworks the ARM9 & ARM7 timing models to greatly improve accuracy (and slaughter performance).
Builds upon my work in #2125 and uses the excellent cache implementation found in #1955 (probably want to merge those two first). (hopefully building this pr upon those two doesn't cause any stupid or weird issues with git...? Fingers crossed?)

Implements:

  1. Cache streaming
  2. Write buffer
  3. Bus cycle rounding
  4. Main RAM contention
  5. Improvements to certain instruction timings
  6. Memory stage cycles are now distinguished from the execute stage
  7. Interlocks
  8. Improvements to memory access timings
  9. Minor improvements to DMA timings, particularly main ram dma timings
  10. ARM9 now only stops for DMA when accessing the bus
  11. Fix ExMemCnt having the incorrect default state. (at least for direct boot, non-direct boot state shouldn't matter...?) (also prevents software from toggling certain bits).
  12. Removes a few non-existent cp15 cache commands
  13. Allow stopping mid-instruction should an event occur. (improves dma start timings)
  14. Implement card reads fetching one word ahead. (plus some minor improvements to card read timings)

Known Issues:

  1. JIT is completely broken and will most likely need a significant amount of effort to work again.
  2. Write Buffer is very approximate; it needs a lot more work to really be accurate...
  3. There are actually two different types of interlock, this treats all interlocks as identical, which is wrong.
  4. Most DSi stuff has either not been implemented, or extensively tested yet.
  5. There are probably oodles of regressions, freezes, and crashes I have yet to spot.
  6. Main RAM DMA Timings are slightly worse for long DMAs.
  7. Interpreter is roughly half the speed. This is unfortunately just a consequence of chasing high levels of accuracy, and unlikely to be fixed.
  8. ARM7 DMA has yet to be touched.
  9. Full ExMemCnt defaults have yet to be validated; all I know for sure is that bit 15 should be set by default. (TwilightMenu++ relies on this to boot).
  10. Write buffer also uses a shortcut of sorts. My implementation doesn't actually use and increment the address value passed via the fifo. (hw seemingly does use it?) Im not entirely sure why, but it caused issues.
  11. Nothing is included in savestates yet, so they may be a little broken.
  12. ARM7 should be able to run for a brief moment depending on when exactly it is interrupted by a dma starting. This is unimplemented, but ultimately should be of little consequence since it lacks any non-bus memory regions to execute from.
  13. ARM9&7 do not restart their bursts when interrupted in the middle of one by a dma.
  14. "Async" read/writes on the arm9 aren't stalled by dmas.
  15. DMA and Main RAM Cache streaming are run in advance of the cpu, this could cause issues with irqs/dmas/etc. triggering before they are intended to relative to the cpu.
  16. Sound DMA is still implemented in an inaccurate way, and does not have any timing impact like it should.
  17. Timers are not handled quite right with regards to when they start/stop?
  18. Exact delays for IRQ/DMA/misc events are unverified.
  19. GX timings are unverified, but most likely fine.
  20. Cache PRNG is not correct, but I'd be surprised if it mattered at all.
  21. Write Buffer/Cache Streaming behavior when the CPU is halted are not verified.
  22. How long it takes the CPU to wake up from a Halt is unverified.
  23. The ARM9 should trigger IRQs after code fetches and interlocks, and before instructions are executed. Currently I'm handling them before code fetches to make interlocks simpler to handle...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants