Large-scale PowerPC recompiler rework #641

Exzap · 2023-01-30T05:16:29Z

Disclaimer: This is work-in-progress. I'm opening this draft PR for visibility, so others can track progress and know not to alter recompiler code. Work started on this in November and the ETA for completion is somewhere in the span of the next few months, depending on my motivation.

Goals

I originally started work on the recompiler in 2014 and since then I have learned a lot more about state-of-the-art compiler and IR design. While I'm generally happy with the quality of our code translation, some of the design choices I made along the way make it hard to introduce further optimizations or fixes. A lot of the complexity is at the burden of the x86-64 backend, which means that all of that would have to be reimplemented when targeting another architecture.

Overall, the idea is to make both the front-end (PPC to IR) and the back-end (IR to x86-64) as "dumb" as possible so that all the complex logic can be shifted to operate on platform-independent IR, lowering the burden on platform-specific code.

State

Please do not report bugs yet. In fact I don't recommend trying this out, it's an active construction site.

I know a lot of these are pretty abstract, so in the future I might add a few before-vs-after code examples to this text.

Q&A

Will this PR add ARM support?

No. But it will make adding a new target architecture a lot easier and if I am motivated enough I'll look into adding an aarch64 backend after this is done.

Will this make Cemu faster?

Maybe? After everything is done the recompiler should output faster code, but CPU execution speed generally isn't a bottleneck in Cemu so it's hard to predict whether there will be an actual difference.

What about the proposed plan to use LLVM?

I did quite a bit of research on that. The biggest downside is that LLVM is still quite JIT-unfriendly and comes with significant bloat. Not saying that it wouldn't work, but the cons outweigh the pros in my opinion. Plus we already got a pretty sophisticated recompiler and it would be a waste to throw it away.
On a personal note, I enjoy working on custom solutions more than plugging in libraries so it's easier for me to stay motivated and make progress. In regards to total effort both solutions are about the same.

Wunkolo · 2023-01-30T05:43:54Z

What would be the scope of changing the x64 emitter over to something like xbyak?

With the current x64 emitter, adding a new instruction or class of instructions would involve implementing the encoding for those instructions (REX, VEX, EVEX, ModR/M, SIB, etc) from scratch and then implementing the new instruction in particular AND detecting it the particular CPUID flags when this redundant work can probably just be pushed onto a proven library.

Exzap · 2023-01-30T06:28:26Z

Thanks for pointing out Xbyak, I wasn't aware of it. The assemblers I looked at were always a bit overkill for our purposes, usually focusing on human-friendly API and less towards a simple interface for machine generated code. We only need a very thin emitter, but Xbyak seems to be exactly that.

As part of this rework I also started a new "cleaner" x86-64 high-performance emitter which I auto-generate from encoding tables. The effort for this is relatively minimal, but using a premade emitter would certainly cut down the effort even further. I'll think about it.

amayra · 2023-05-16T22:51:13Z

did you drop this project ?

Exzap · 2023-05-17T10:33:47Z

Nah just busy with other stuff. I'll get back to this eventually

iMonZ · 2023-09-26T19:48:47Z

Nah just busy with other stuff. I'll get back to this eventually

Thanks! ARM64 Support would make the CEMU emulator finally done and future proof!

Wunkolo · 2023-09-26T20:40:03Z

On ARM64: I've been using oaknut on other projects. It is structured very similarly to xbyak.

Gabezin64 · 2023-10-13T00:20:31Z

This will finally fix the lens flare issue in The Wind Waker HD and Twilight Princess HD?

Exzap · 2023-10-13T13:34:18Z

This will finally fix the lens flare issue in The Wind Waker HD and Twilight Princess HD?

That's a graphical issue. It's unaffected by this CPU rework.

Intermediate commit while I'm still fixing things but I didn't want to pile on too many changes in a single commit. New: Reworked PPC->IML converter to first create a graph of basic blocks and then turn those into IML segment(s). This was mainly done to decouple IML design from having PPC specific knowledge like branch target addresses. The previous design also didn't allow to preserve cycle counting properly in all cases since it was based on IML instruction counting. The new solution supports functions with non-continuous body. A pretty common example for this is when functions end with a trailing B instruction to some other place. Current limitations: - BL inlining not implemented - MFTB not implemented - BCCTR and BCLR are only partially implemented Undo vcpkg change

Instead of having fixed macros for BCCTR/BCCTRL/BCLR/BCLRL we now have only one single macro instruction that takes the jump destination as a register parameter. This also allows us to reuse an already loaded LR register (by something like MTLR) instead of loading it again from memory. As a necessary requirement for this: The register allocator now has support for read operations in suffix instructions

Also removed associatedPPCAddress field from IMLInstruction as it's no longer used

Exzap · 2024-10-29T01:32:21Z

I consider this PR complete. There is more work that can be done but it's at a good point to merge so let's do that.
But first it would be nice to get some feedback. Anyone interested in testing this please grab the executable from github actions and let me know about any issues.

Here is a benchmark. Previous PPC JIT in Cemu 2.2:

The reworked PPC JIT from this PR:

(lower numbers are better) While in the benchmark some tasks are much faster, real world performance will probably remain largely the same since CPU emulation never really was a bottleneck for us.

There have also been some general accuracy improvements and the top post has all the under-the-hood changes that were made.

boggydigital · 2024-10-29T04:47:34Z

@Exzap I've tried macOS build and it crashed upon loading pipelines on every single game I've tried.

If that helps - I've confirmed that this doesn't happen on 2.2.

Exzap · 2024-10-29T10:23:13Z

@boggydigital Can you post the log for other games as well

goeiecool9999 · 2024-10-29T10:27:15Z

It's the same story on Linux. The point where the access violation happens varies but most of the time it happens at the jump instruction in this screenshot (same behaviour in different games too). Hopefully this gives you a decent clue.

Exzap · 2024-10-29T10:30:27Z

I was able to get it to crash by turning off BMI2 extension. Unsure if it's directly related to your crashes but we will see. Working on a fix

Exzap · 2024-10-29T11:53:28Z

Can you grab the latest build and check again @goeiecool9999 @boggydigital

goeiecool9999 · 2024-10-29T14:01:29Z

No change. It crashes in the same spot.

boggydigital · 2024-10-29T18:54:06Z

Tried the latest build. It crashes for me as well.

goeiecool9999 · 2024-10-30T01:50:27Z

That fixes it 🥳

boggydigital · 2024-10-30T02:06:45Z

Likewise, I can't repro the crash in any of the ~10 titles I've tried. Thank you @Exzap!

Ammar-Sadaoui · 2024-10-31T11:46:37Z

what is a real bottleneck here if CPU emulation was not the problem ?

Exzap · 2024-10-31T13:29:06Z

what is a real bottleneck here if CPU emulation was not the problem ?

It differs by game, but for the more graphically complex games it's usually the GPU command processor.
Discussing this at full length goes outside the scope of this PR but if you are curious about Cemu's architecture the best place to learn more is our discord where we have discussions about these things and anyone can ask questions.

squidbus · 2024-11-03T20:33:10Z

Tested on macOS with most first party titles and didn't encounter any issues compared to main.

mkrcos · 2024-11-15T17:11:27Z

Tested on linux and most games run great. The only issue that I've found is in Mario Kart 8, it gets stuck on the loading screen after finishing the first race.

SirHrVedel · 2024-11-15T17:24:27Z

Runs great in Windows with most titles i've tried. Only issue there is with it that i've noticed, is the loading gets stuck in Mario Kart 8 when finishing the first race in a cup, and crashes in Wii Party U when attempting to go into any minigame (No stack trace in the log, but the crashing issue also happends on the stable 2.X builds)

Exzap force-pushed the jit-work branch from cdfcd96 to 3590ad9 Compare March 13, 2023 04:10

jcrm1 mentioned this pull request Sep 26, 2023

CI: Add macOS build #274

Merged

Exzap force-pushed the jit-work branch 2 times, most recently from a671611 to 570e2f6 Compare January 13, 2024 16:15

Exzap mentioned this pull request Feb 18, 2024

Deleting some spaces #1088

Closed

Exzap mentioned this pull request May 27, 2024

Cemu crashes without a stack trace or any debugging related files when using ui to enable interpreter #1223

Open

Exzap added 16 commits August 30, 2024 00:47

Latte: Fix race condition on close during game boot

4c16397

PPCRec: Use vector for segment list + deduplicate RA file

f523b21

PPCRec: Use vector for instruction list

0265108

PPCRec: Move Segment and Instruction struct into separate files

b1b46f3

PPCRec: Rename IML structs for better clarity

5b2bc7e

PPCRec: Move debug printing + smaller clean up

625874a

PPCRec: Move analyzer file + move some funcs to IMLInstruction

101a2ef

PPCRec: Move IML optimizer file

e53c6ad

PPCRec: Move IML register allocator

d1fe1a9

PPCRec: Emit x86 movd for non-AVX + more restructuring

27f70d5

PPCRec: Move X64 files into subdirectory and rename

db60ea6

PPCRec: Fix merge conflicts

a5f6faa

PPCRec: Fix single segment loop not being detected

874e376

Also removed associatedPPCAddress field from IMLInstruction as it's no longer used

PPCRec: Remove now unused PPC_ENTER and jumpMarkAddress

93f5615

Exzap added 4 commits October 25, 2024 09:17

PPCRec: Use 32bit mov for 32bit operations

70c99fd

PPCRec: Update spill cost calculation

96d7c75

PPCRec: Refactor read/write access tracking for liveness ranges

636b63f

PPCRec: Clean up some outdated code

126a682

Exzap force-pushed the jit-work branch 2 times, most recently from de1a45e to a52e39d Compare October 27, 2024 13:42

PPCRec: Code cleanup

f309d5d

Exzap force-pushed the jit-work branch from a52e39d to f309d5d Compare October 27, 2024 13:49

Exzap added 2 commits October 28, 2024 09:21

PPCRec: Rework RLWIMI

099d1d4

PPCRec: Optimizations

e332726

Exzap marked this pull request as ready for review October 29, 2024 01:32

PPCRec: Handle edge case for x86 shift instructions

a05b655

PPCRec: Avoid relying on undefined behavior in std::copy_backwards

83569ae

Exzap added 2 commits October 30, 2024 03:49

PPCRec: Fix stack pointer alignment for calls

8219a5f

PPCRec: Use named register constants instead of hardcoding regs

9187044

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large-scale PowerPC recompiler rework #641

Large-scale PowerPC recompiler rework #641

Exzap commented Jan 30, 2023 •

edited

Loading

Wunkolo commented Jan 30, 2023 •

edited

Loading

Exzap commented Jan 30, 2023 •

edited

Loading

amayra commented May 16, 2023

Exzap commented May 17, 2023

iMonZ commented Sep 26, 2023

Wunkolo commented Sep 26, 2023

Gabezin64 commented Oct 13, 2023

Exzap commented Oct 13, 2023

Exzap commented Oct 29, 2024 •

edited

Loading

boggydigital commented Oct 29, 2024 •

edited

Loading

Exzap commented Oct 29, 2024

goeiecool9999 commented Oct 29, 2024

Exzap commented Oct 29, 2024

Exzap commented Oct 29, 2024

goeiecool9999 commented Oct 29, 2024

boggydigital commented Oct 29, 2024 •

edited

Loading

goeiecool9999 commented Oct 30, 2024

boggydigital commented Oct 30, 2024

Ammar-Sadaoui commented Oct 31, 2024

Exzap commented Oct 31, 2024

squidbus commented Nov 3, 2024

mkrcos commented Nov 15, 2024

SirHrVedel commented Nov 15, 2024 •

edited

Loading

Large-scale PowerPC recompiler rework #641

Are you sure you want to change the base?

Large-scale PowerPC recompiler rework #641

Conversation

Exzap commented Jan 30, 2023 • edited Loading

Goals

State

Q&A

Will this PR add ARM support?

Will this make Cemu faster?

What about the proposed plan to use LLVM?

Wunkolo commented Jan 30, 2023 • edited Loading

Exzap commented Jan 30, 2023 • edited Loading

amayra commented May 16, 2023

Exzap commented May 17, 2023

iMonZ commented Sep 26, 2023

Wunkolo commented Sep 26, 2023

Gabezin64 commented Oct 13, 2023

Exzap commented Oct 13, 2023

Exzap commented Oct 29, 2024 • edited Loading

boggydigital commented Oct 29, 2024 • edited Loading

Exzap commented Oct 29, 2024

goeiecool9999 commented Oct 29, 2024

Exzap commented Oct 29, 2024

Exzap commented Oct 29, 2024

goeiecool9999 commented Oct 29, 2024

boggydigital commented Oct 29, 2024 • edited Loading

goeiecool9999 commented Oct 30, 2024

boggydigital commented Oct 30, 2024

Ammar-Sadaoui commented Oct 31, 2024

Exzap commented Oct 31, 2024

squidbus commented Nov 3, 2024

mkrcos commented Nov 15, 2024

SirHrVedel commented Nov 15, 2024 • edited Loading

Exzap commented Jan 30, 2023 •

edited

Loading

Wunkolo commented Jan 30, 2023 •

edited

Loading

Exzap commented Jan 30, 2023 •

edited

Loading

Exzap commented Oct 29, 2024 •

edited

Loading

boggydigital commented Oct 29, 2024 •

edited

Loading

boggydigital commented Oct 29, 2024 •

edited

Loading

SirHrVedel commented Nov 15, 2024 •

edited

Loading