-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
goalc: Some more macOS ARM64 work #3290
base: master
Are you sure you want to change the base?
Conversation
…ut arm instructions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I think this is a good start.
My main concern at this point is that ARM is actually quite a bit different from x86, so it will be hard to fit the ARM instructions into the existing pattern. I also think that using ifdef
s to change between arm/x86 code generation will make it harder to maintain.
I'm looking through the compiler code now, and I'm wondering if we should just do the division at the codegen_object_file
level - just totally new code for x86 and ARM at that point. We'll need a new linking format, different logic for picking instructions, different function prologues/epilogues, etc. With this approach, we'd probably want each IR to support an do_codegen_x86
and do_codegen_arm
version, which would give a x86Instruction
on x86 or ArmInstruction
on ARM. I haven't thought this through fully yet.
if (saved_reg.is_xmm()) { | ||
if (saved_reg.is_128bit_simd()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(unrelated to your changes here) I think that we will probably want an entirely separate do_goal_function
for x86 vs. ARM. I think the way that the stack pointer is manipulated will be different, and this currently has some x86-specific things like sub_gpr64_imm8s
which is using an x86-only smaller size instruction because an immediate is only 8-bits.
if (saved_reg.is_xmm()) { | ||
if (saved_reg.is_128bit_simd()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to keep these as is_xmm
. I think having both an is_xmm
(true only for x86 XMM) and an is_128bit_simd
(true for both arm/x86 vector regs) is good, rather than trying to have a abstraction over "general purpose" and "128bit" registers at this level.
ARM will probably have different code for loading/storing register to the stack. It has instructions for storing multiple registers at once, rather than a sequence of push
instructions like we used on x86.
@@ -862,7 +862,8 @@ RegAllocInstr IR_ConditionalBranch::to_rai() { | |||
void IR_ConditionalBranch::do_codegen(emitter::ObjectGenerator* gen, | |||
const AllocationResult& allocs, | |||
emitter::IR_Record irec) { | |||
Instruction jump_instr(0); | |||
#ifndef __aarch64__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather keep the ability for the compiler to generate either ARM or x86 when built on either ARM or x86.
In C++, the code inside the #else
here won't be checked by the compiler at all on x86, which will make it harder for developers on x86 to know if their changes broke the build on ARM.
if (cc.return_reg && cc.return_reg->is_xmm()) { | ||
if (cc.return_reg && cc.return_reg->is_128bit_simd()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I think this is a good place to use is_128bit_simd
over is_xmm
InstructionInfo(const emitter::Instruction& _instruction, Kind _kind, int _ir_idx) | ||
InstructionInfo(const emitter::Instruction _instruction, Kind _kind, int _ir_idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on what I've read above, I think that you have classes which inherit from Instruction
, and you'll pass them here. This will cause the copy constructor of Instruction
(the base class) to be called, which will cause slicing (see https://stackoverflow.com/questions/274626/what-is-object-slicing#274636). This ends up being undefined behavior and probably means that any virtual methods of Instruction
that you call will crash or silently do the wrong thing.
The way around this is usually to pass a std::unique_ptr<Instruction>
or std::shared_ptr<Instruction>
... but it might be worth checking what we actually need an instruction for here. As far as I see, we just need the length, rather than the whole instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(oops - I think I'm wrong since Instruction
is just InstructionX86
or InstructionARM
)
In that case, can we keep this as a const ref?
@@ -67,23 +67,37 @@ void CodeTester::emit_return() { | |||
* Pops RSP always, which is weird, but doesn't cause issues. | |||
*/ | |||
void CodeTester::emit_pop_all_gprs(bool exclude_rax) { | |||
#ifndef __aarch64__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better not to have ifdef ARM
and instead have some runtime variable. Otherwise it'll be really hard to make sure goalc
actually builds on x86 and ARM.
#if defined(__APPLE__) && defined(__aarch64__) | ||
mprotect(code_buffer, code_buffer_capacity, PROT_EXEC | PROT_READ); | ||
auto ret = ((u64(*)())code_buffer)(); | ||
mprotect(code_buffer, code_buffer_capacity, PROT_WRITE | PROT_READ); | ||
return ret; | ||
#else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 this is a good place to use the ifdef - it's relatively simple, and the mprotect
doesn't exist on windows.
/*! | ||
* Move data from src to dst. Moves all 64-bits of the GPR. | ||
*/ | ||
extern Instruction mov_gpr64_gpr64(Register dst, Register src); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need extern
here.
ASSERT(src.is_gpr()); | ||
InstructionX86 instr(0x89); | ||
instr.set_modrm_and_rex(src.hw_id(), dst.hw_id(), 3, true); | ||
return instr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning an InstructionX86
by value like this will also cause slicing
|
||
Instruction movd_gpr32_xmm32(Register dst, Register src) { | ||
ASSERT(dst.is_gpr()); | ||
ASSERT(src.is_128bit_simd()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a case when we should keep is_xmm
I think. In this case, we're only moving the lower 32 bits between an XMM and GPR, so the xmm is being used as a 32-bit floating point register. I tried to keep the concept of "register type" and "register size" separate, since there's a lot of cases where you only use part of a register (x86 GPRs are 64-bits, but for some goal ops we treat them as 32-bits too).
Not a whole lot in this PR, but I want to get this off my laptop where it's been sitting for a few months.
Started to split apart the implementation for different architectures and get CodeTester tests passing. At one point I proved to myself that I got it working (it being a single simple instruction) but then things fell apart when i began the grueling refactor.