-
Notifications
You must be signed in to change notification settings - Fork 449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting Darling To Build On An ARM64 Device #642
Comments
It didn't take too long for a build error to occur. The weird this is that this error is not caused by Darling's code, but rather the header files. This error comes from
Here is the full log: make_log.txt Edit: I figure out the problem, it has to do with the It makes sense why this would fail (since I don't have the 32bit libraries installed). Seems like disabling Edit2: I added an if condition to build the 32bit version if |
The next road blocker I am running into is
Going off of this useful link, adding The weird thing about this issue is that it does not occur in the original project (at least from my very basic compile test).
Edit: No, updating it doesn't fix the issue. It has to do with the fact that |
What is
I noticed that some of the files in that folder are similar to cctools-port:cctools/include/foreign. The boolean.h file in the darling repo doesn't include the other architectures that are found in the cctools-port boolean.h file. Are the files in the |
You might also need to update https://github.com/darlinghq/darling/tree/master/src/kernel/libsyscall/bsdsyscalls for ARM64, (currently the macros only check for i386 or x86_64). |
@TheBrokenRail I am actually going through that currently. Right now I am just copying and pasting Apple's xnu code in there, but it looks like I am going to have to make some manual changes as well. |
Will your work support ARM32, then Darling will work on older Raspberry Pis/rooted Android devices as well! |
@TheBrokenRail Not in it's current form. I probably won't support it until I merge ARM64 support into darling and move to a distro that also supports ARM32 (Manjaro ARM only supports ARM64). For now, I want to keep things simple. |
Any updates on this? |
@TheBrokenRail I have been busy with some IRL stuff (working hard to get my entry level career job!), so I haven't had a lot of time to work on this. IRL stuff aside, I am waiting for @LubosD to update Darling to use the new |
@CuriousTommy Done. I happened to do just that because I'm trying to update |
Hope to get an update on this. |
Right now I’m trying to update Apple’s source code to their newer version (some of the newer code contain better support for ARM64). This needs to be completed before I can continue my work on building Daring on ARM64 devices. You can see my progress on the Edit: |
any update on this , i'm just curious on this initiative , i have Ubuntu touch arm64 and maybe it will run on this if arm64 build will be success , |
Generally speaking, the situation is the same as before. The As for the current status on |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
I guess I should have investigated a bit before spending my morning setting up arm64 Ubuntu in a VM on my Apple Silicon machine... However, this is a very interesting project, and being able to build and run darling on arm64 is very desirable. Are there any major hurdles standing in the way of supporting arm64? I'm more than happy to help in any way I can. |
The situation is mostly the same as what I document above. The sources need to updated before I can continue my work on ARM64 support. Beside getting the sources updated (you can see my progress here), I am running into issues where I can't get JavaScriptCore and Heimdal to build.
If you are interested in looking into getting JavaScriptCore and Heimdal to build with the new source changes, I can provide more details in #1173. You will need to do this on a x86_64 device (or use a translation layer). Be warned through, this stuff is very involved. |
Darlingserver Issues Notes
|
#ifdef __x86_64__ | |
__asm__ __volatile__ ( | |
"movq %1, %%rdi\n" | |
"movq 80(%0), %%rsp\n" | |
"movq 40(%0), %%rsi\n" | |
"movq 8(%0), %%rdx\n" | |
"testq %%rdx, %%rdx\n" | |
"jnz 1f\n" | |
"movq 72(%0), %%rdx\n" // wqthread hack: if 3rd arg is null, we pass the stack bottom | |
"1:\n" | |
"movq 16(%0), %%rcx\n" | |
"movq 24(%0), %%r8\n" | |
"movq 32(%0), %%r9\n" | |
"movq %%rdi, 56(%0)\n" | |
"movq (%0), %%rax\n" | |
"andq $-0x10, %%rsp\n" | |
"pushq $0\n" | |
"pushq $0\n" | |
"jmpq *%%rax\n" | |
:: "a" (&args), "di" (args.pth)); |
Notes on args struct
darling/src/startup/mldr/elfcalls/threads.c
Lines 203 to 204 in 4a74c18
struct arg_struct* in_args = (struct arg_struct*) p; | |
struct arg_struct args; |
darling/src/startup/mldr/elfcalls/threads.c
Lines 52 to 70 in 4a74c18
typedef void (*thread_ep)(void**, int, ...); | |
struct arg_struct | |
{ | |
thread_ep entry_point; | |
uintptr_t real_entry_point; | |
uintptr_t arg1; // `user_arg` for normal threads; `keventlist` for workqueues | |
uintptr_t arg2; // `stack_addr` for normal threads; `flags` for workqueues | |
uintptr_t arg3; // `flags` for normal threads; `nkevents` for workqueues | |
union { | |
void* _backwards_compat; // kept around to avoid modifiying assembly | |
int port; | |
}; | |
unsigned long pth_obj_size; | |
void* pth; | |
darling_thread_create_callbacks_t callbacks; | |
uintptr_t stack_bottom; | |
uintptr_t stack_addr; | |
bool is_workqueue; | |
}; |
darling/src/startup/mldr/elfcalls/elfcalls.h
Lines 8 to 15 in 4a74c18
struct darling_thread_create_callbacks { | |
unsigned int (*thread_self_trap)(void); | |
void (*thread_set_tsd_base)(void*, int); | |
void (*rpc_guard)(int); | |
void (*rpc_unguard)(int); | |
}; | |
typedef const struct darling_thread_create_callbacks* darling_thread_create_callbacks_t; |
Linkage issues
get_threadtask
/usr/bin/ld: /home/user/Documents/CodingProjects/GitHub/darling/src/external/darlingserver/duct-tape/xnu/osfmk/ipc/ipc_kmsg.c:6001: undefined reference to `get_threadtask'
This one confuses me... looking at source code, this should also affect i386/x86_64 as well...
// Taken from src/external/darlingserver/duct-tape/xnu/osfmk/ipc/ipc_kmsg.c
mach_msg_trailer_size_t
ipc_kmsg_trailer_size(
mach_msg_option_t option,
__unused thread_t thread)
{
if (!(option & MACH_RCV_TRAILER_MASK)) {
return MACH_MSG_TRAILER_MINIMUM_SIZE;
} else {
return REQUESTED_TRAILER_SIZE(thread_is_64bit_addr(thread), option);
}
}
// Taken from src/external/darlingserver/duct-tape/xnu/osfmk/kern/thread.h
#define thread_is_64bit_addr(thd) \
task_has_64Bit_addr(get_threadtask(thd))
// Taken from src/external/darlingserver/duct-tape/xnu/osfmk/kern/bsd_kern.c
task_t
get_threadtask(thread_t th)
{
return th->task;
}
Yet i386/x86_64 builds fine without it...
(lldb) image lookup --symbol get_threadtask
(lldb)
But ipc_kmsg_trailer_size
exists in the executable ???
(lldb) image lookup --symbol ipc_kmsg_trailer_size
1 symbols match 'ipc_kmsg_trailer_size' in /usr/local/bin/darlingserver:
Address: darlingserver[0x000000000052df50] (darlingserver.PT_LOAD[1]..text + 1199280)
Summary: darlingserver`ipc_kmsg_trailer_size at ipc_kmsg.c:5997
(lldb)
Older Notes
Researching `ptrace` For ARM64
/home/user/Documents/CodingProjects/GitHub/darling/src/external/darlingserver/src/thread.cpp:148:16: error: use of undeclared identifier 'PTRACE_GETREGS'; did you mean 'PTRACE_GETREGSET'?
if (ptrace(PTRACE_GETREGS, id, 0, ®s) == -1) {
^~~~~~~~~~~~~~
PTRACE_GETREGSET
Going off of the man page for ptrace
, "PTRACE_GETREGS
is not present on all architectures". Unfortunately, ARM64 is one of them. Going off of the documentation, it seems that PTRACE_GETREGSET
is the closest equivalent to PTRACE_GETREGS
.
Example (from my understanding, could be wrong):
// To grab general-purpose registers
#include <sys/ptrace.h> // ptrace function
#include <elf.h> // NT_PRSTATUS & NT_ARM_*
#include <sys/uio.h> // struct iovec
#include <sys/user.h> // user_regs_struct
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
void print_registers(struct user_regs_struct *reg) {
for (int i=0; i < 31; i++) {
printf("reg%d: 0x%016llx\n", i, reg->regs[i]);
}
printf("sp: 0x%016llx\n", reg->sp);
printf("pc: 0x%016llx\n", reg->pc);
printf("pstate: 0x%016llx\n", reg->pstate);
}
int main() {
pid_t child = fork();
if (child == -1) {
perror("Fork unable to create child");
return -1;
}
else if (child == 0) {
// Child process
if (ptrace(PTRACE_TRACEME, -1, NULL, NULL) == -1) { perror("Unable to send request to be traced"); return -1; }
if (raise(SIGSTOP) == -1) { perror("Unable to stop process"); return -1; }
}
else {
// Parent process
sleep(1);
struct user_regs_struct reg = {0};
struct iovec reg_iov = {
.iov_base = ®,
.iov_len = sizeof(reg)
};
long result = ptrace(PTRACE_GETREGSET, child, NT_PRSTATUS, ®_iov);
if (result == -1) {
perror("Unable to grab general-purpose registers");
return -1;
}
print_registers(®);
}
}
Investigating Build Failure: no member named 'dtape_interlock' in 'struct lck_mtx'
/home/user/Documents/CodingProjects/GitHub/darling/src/external/darlingserver/duct-tape/src/misc.c:242:51: error: no member named 'dtape_interlock' in 'struct lck_mtx'
return wq->dtape_waitq_interlock.dtape_interlock.dtape_interlock.dtape_mutex.dtape_owner == (uintptr_t)current_thread();
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
struct waitq {
uint32_t /* flags */
waitq_type:2, /* only public field */
waitq_fifo:1, /* fifo wakeup policy? */
waitq_prepost:1, /* waitq supports prepost? */
waitq_irq:1, /* waitq requires interrupts disabled */
waitq_isvalid:1, /* waitq structure is valid */
waitq_turnstile:1, /* waitq is embedded in a turnstile */
waitq_eventmask:_EVENT_MASK_BITS;
/* the wait queue set (set-of-sets) to which this queue belongs */
#ifdef __DARLING__
// simple_lock_data_t dtape_waitq_interlock
decl_simple_lock_data(, dtape_waitq_interlock);
#else
#if __arm64__
hw_lock_bit_t waitq_interlock; /* interlock */
#else
hw_lock_data_t waitq_interlock; /* interlock */
#endif /* __arm64__ */
#endif // __DARLING__
uint64_t waitq_set_id;
uint64_t waitq_prepost_id;
union {
queue_head_t waitq_queue; /* queue of elements - used for waitq not embedded in turnstile or ports */
struct priority_queue_sched_max waitq_prio_queue; /* priority ordered queue of elements - used for waitqs embedded in turnstiles */
struct { /* used for waitqs embedded in ports */
struct turnstile *waitq_ts; /* used to store receive turnstile of the port */
union {
void *waitq_tspriv; /* non special-reply port, used to store the watchport element for port used to store
* receive turnstile of the port */
int waitq_priv_pid; /* special-reply port, used to store the pid that copies out the send once right of the
* special-reply port. */
};
};
};
};
// From duct-tape/xnu/osfmk/arm/simple_lock.h
typedef usimple_lock_data_t simple_lock_data_t;
// From duct-tape/internal-include/darlingserver/duct-tape/simple_lock.h
struct usimple_lock {
lck_spin_t dtape_interlock;
};
typedef struct usimple_lock usimple_lock_data_t;
typedef usimple_lock_data_t* usimple_lock_t;
// From duct-tape/internal-include/darlingserver/duct-tape/locks.h
typedef struct lck_spin {
lck_mtx_t dtape_interlock;
} lck_spin_t;
// Mimics includes in `duct-tape/src/misc.c` and prints `waitq` struct.
#include <darlingserver/duct-tape.h>
#include <darlingserver/duct-tape/hooks.internal.h>
#include <darlingserver/duct-tape/log.h>
#include <darlingserver/duct-tape/processor.h>
#include <darlingserver/duct-tape/memory.h>
#include <darlingserver/duct-tape/task.h>
#include <darlingserver/duct-tape/psynch.h>
#include <kern/waitq.h>
#include <kern/clock.h>
#include <kern/turnstile.h>
#include <kern/thread_call.h>
#include <ipc/ipc_init.h>
#include <ipc/ipc_space.h>
#include <ipc/ipc_object.h>
#include <ipc/ipc_pset.h>
#include <kern/host.h>
#include <kern/sync_sema.h>
#include <kern/ux_handler.h>
#include <ipc/ipc_importance.h>
#include <kern/ipc_host.h>
#include <sys/types.h>
int main() {
struct waitq d = {0};
__builtin_dump_struct(&d, &printf);
}
ARM64 waitq struct output
struct waitq {
uint32_t waitq_type : 2 = 0
uint32_t waitq_fifo : 1 = 0
uint32_t waitq_prepost : 1 = 0
uint32_t waitq_irq : 1 = 0
uint32_t waitq_isvalid : 1 = 0
uint32_t waitq_turnstile : 1 = 0
uint32_t waitq_eventmask : 25 = 0
simple_lock_data_t dtape_waitq_interlock = {
lck_mtx_t dtape_interlockA = {
dtape_mutex_t dtape_mutex = {
volatile uintptr_t dtape_owner = 0
libsimple_lock_t dtape_queue_lock = {
uint32_t state = 0
}
dtape_mutex_head_t dtape_queue_head = {
struct dtape_mutex_link * tqh_first = (nil)
struct dtape_mutex_link ** tqh_last = (nil)
}
}
}
}
uint64_t waitq_set_id = 0
uint64_t waitq_prepost_id = 0
queue_head_t waitq_queue = {
struct queue_entry * next = (nil)
struct queue_entry * prev = (nil)
}
struct priority_queue_sched_max waitq_prio_queue = {
struct priority_queue_entry_sched * pq_root = (nil)
}
struct turnstile * waitq_ts = (nil)
void * waitq_tspriv = (nil)
int waitq_priv_pid = 0
}
x86_64 waitq struct output
struct waitq {
uint32_t waitq_type : 2 = 0
uint32_t waitq_fifo : 1 = 0
uint32_t waitq_prepost : 1 = 0
uint32_t waitq_irq : 1 = 0
uint32_t waitq_isvalid : 1 = 0
uint32_t waitq_turnstile : 1 = 0
uint32_t waitq_eventmask : 25 = 0
simple_lock_data_t dtape_waitq_interlock = {
lck_spin_t dtape_interlock = {
lck_mtx_t dtape_interlock = {
dtape_mutex_t dtape_mutex = {
volatile uintptr_t dtape_owner = 0
libsimple_lock_t dtape_queue_lock = {
uint32_t state = 0
}
dtape_mutex_head_t dtape_queue_head = {
struct dtape_mutex_link * tqh_first = (nil)
struct dtape_mutex_link ** tqh_last = (nil)
}
}
}
}
}
uint64_t waitq_set_id = 0
uint64_t waitq_prepost_id = 0
queue_head_t waitq_queue = {
struct queue_entry * next = (nil)
struct queue_entry * prev = (nil)
}
struct priority_queue_sched_max waitq_prio_queue = {
struct priority_queue_entry_sched * pq_root = (nil)
}
struct turnstile * waitq_ts = (nil)
void * waitq_tspriv = (nil)
int waitq_priv_pid = 0
}
I would like to try crosscompilation for AArch64 based nVIDIA Jetson devices. Any initial pointer will be appreciated. Thank you beforehand. |
It will be a while before you can build Darling for an ARM64 device. I still need to fix the source code. |
MLDR Issues NotesMissing implementation of
|
#ifdef __x86_64__ | |
__asm__ __volatile__ ( | |
"movq %1, %%rdi\n" | |
"movq 80(%0), %%rsp\n" | |
"movq 40(%0), %%rsi\n" | |
"movq 8(%0), %%rdx\n" | |
"testq %%rdx, %%rdx\n" | |
"jnz 1f\n" | |
"movq 72(%0), %%rdx\n" // wqthread hack: if 3rd arg is null, we pass the stack bottom | |
"1:\n" | |
"movq 16(%0), %%rcx\n" | |
"movq 24(%0), %%r8\n" | |
"movq 32(%0), %%r9\n" | |
"movq %%rdi, 56(%0)\n" | |
"movq (%0), %%rax\n" | |
"andq $-0x10, %%rsp\n" | |
"pushq $0\n" | |
"pushq $0\n" | |
"jmpq *%%rax\n" | |
:: "a" (&args), "di" (args.pth)); |
Notes on args struct
darling/src/startup/mldr/elfcalls/threads.c
Lines 203 to 204 in 4a74c18
struct arg_struct* in_args = (struct arg_struct*) p; | |
struct arg_struct args; |
darling/src/startup/mldr/elfcalls/threads.c
Lines 52 to 70 in 4a74c18
typedef void (*thread_ep)(void**, int, ...); | |
struct arg_struct | |
{ | |
thread_ep entry_point; | |
uintptr_t real_entry_point; | |
uintptr_t arg1; // `user_arg` for normal threads; `keventlist` for workqueues | |
uintptr_t arg2; // `stack_addr` for normal threads; `flags` for workqueues | |
uintptr_t arg3; // `flags` for normal threads; `nkevents` for workqueues | |
union { | |
void* _backwards_compat; // kept around to avoid modifiying assembly | |
int port; | |
}; | |
unsigned long pth_obj_size; | |
void* pth; | |
darling_thread_create_callbacks_t callbacks; | |
uintptr_t stack_bottom; | |
uintptr_t stack_addr; | |
bool is_workqueue; | |
}; |
darling/src/startup/mldr/elfcalls/elfcalls.h
Lines 8 to 15 in 4a74c18
struct darling_thread_create_callbacks { | |
unsigned int (*thread_self_trap)(void); | |
void (*thread_set_tsd_base)(void*, int); | |
void (*rpc_guard)(int); | |
void (*rpc_unguard)(int); | |
}; | |
typedef const struct darling_thread_create_callbacks* darling_thread_create_callbacks_t; |
Breaking down the x86_64 ASM
__asm__ __volatile__ (
"movq %1, %%rdi\n"
"movq 80(%0), %%rsp\n"
"movq 40(%0), %%rsi\n" // 2nd arg
"movq 8(%0), %%rdx\n" // 3rd arg
"testq %%rdx, %%rdx\n"
"jnz 1f\n"
"movq 72(%0), %%rdx\n"
"1:\n"
"movq 16(%0), %%rcx\n" // 4th arg
"movq 24(%0), %%r8\n" // 5th arg
"movq 32(%0), %%r9\n" // 6th arg
"movq %%rdi, 56(%0)\n" // 1st arg
"movq (%0), %%rax\n"
"andq $-0x10, %%rsp\n"
"pushq $0\n"
"pushq $0\n"
"jmpq *%%rax\n"
:: "a" (&args), "di" (args.pth));
// Notes on the position of the struct
struct arg_struct
{
thread_ep entry_point; // 0
uintptr_t real_entry_point; // 8
uintptr_t arg1; // 16
uintptr_t arg2; // 24
uintptr_t arg3; // 32
union {
void* _backwards_compat; // kept around to avoid modifiying assembly
int port;
}; // 40
unsigned long pth_obj_size; // 48
void* pth; // 56
darling_thread_create_callbacks_t callbacks; // 64
uintptr_t stack_bottom; // 72
uintptr_t stack_addr; // 80
bool is_workqueue; // 88
};
https://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html#s5
asm ( assembler template : output operands /* optional */ : input operands /* optional */ : list of clobbered registers /* optional */ );
If there are no output operands but there are input operands, you must place two consecutive colons surrounding the place where the output operands would go.
When the "r" constraint is specified, gcc may keep the variable in any of the available GPRs. To specify the register, you must directly specify the register names by using specific register constraints. They are:
+---+--------------------+ | r | Register(s) | +---+--------------------+ | a | %eax, %ax, %al | | b | %ebx, %bx, %bl | | c | %ecx, %cx, %cl | | d | %edx, %dx, %dl | | S | %esi, %si | | D | %edi, %di | +---+--------------------+
"i" : An immediate integer operand (one with constant value) is allowed. This includes symbolic constants whose values will be known only at assembly time.
Pseudo-C
args.stack_addr = rsp;
args._backwards_compat = rsi;
args.real_entry_point = rdx
if (rdx == NULL) {
args.real_entry_point = rdx;
}
args.arg1 = rcx;
args.arg2 = r8;
args.arg3 = r9;
rdi = args.pth;
rax = &args;
rsp -= 16 // 0x10
push(0);
push(0);
jump_without_return(rax);
I got a question, since this is AT&T syntax, isn't the operand order reversed? Wouldn't it be: rdi = args.pth; // MARK: redundant?
rsp = args.stack_addr;
rsi = args._backwards_compat;
rdx = args.real_entry_point;
if (rdx == NULL) {
rdx = args.real_entry_point;
}
rcx = args.arg1;
r8 = args.arg2;
r9 = args.arg3;
args.pth = rdi; // MARK: redundant cuz of first line?
rax = args->entry_point;
rsp -= 16 // 0x10
push(0);
push(0);
jump_without_return(rax); ARM equivalent (chatgpt made, but hand optimized 🙂): __asm__ __volatile__ (
"mov x0, %1\n"
"ldr x2, [%0, #80]\n"
"ldr x3, [%0, #40]\n"
"ldr x4, [%0, #8]\n"
"cbnz x4, 1f\n"
"ldr x4, [%0, #72]\n" // wqthread hack: if 3rd arg is null, we pass the stack bottom
"1:\n"
"ldr x5, [%0, #16]\n"
"ldr x6, [%0, #24]\n"
"ldr x7, [%0, #32]\n"
"str x0, [%0, #56]\n"
"ldr x8, [%0]\n"
"and sp, sp, #-0x10\n"
"stp xzr, xzr, [sp, #-16]!\n"
"br x8\n"
:: "a" (&args), "r" (args.pth)
: "x0", "x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "cc", "memory"
); |
You're right. I forgot that there was more than one way for representing x86_64 assembly.
I actually got this part wrong, it's suppose to be
I didn't expect anyone to provide me an ARM64 translation, thanks! One thing I'm going to change is how the values are assigned to the registers. I'm not a huge fan of doing |
|
The idea here is to do as much as possible in C, since that's portable (can be shared among architectures) and easier to understand, and you also don't have to hardcode things like fields offsets. For x86_64, GCC (and Clang) provide constraints, where you can tell it to place an input (or output, but in this case we've got none) into a specific register (or class of registers), like this: // wqthread hack: if 3rd arg is null, we pass the stack bottom
long arg3 = args->arg3;
if (arg3 == 0) {
arg3 = (long) args->stack_bottom;
}
// Make super sure the stack pointer is 16-aligned.
void *stack_ptr = align_16(args->stack_ptr);
asm volatile(
// Zero out the frame base register.
"xorq %%rbp, %%rbp\n"
// Switch to the new stack.
"movq %[stack_ptr], %%rsp\n"
// Push a fake return address.
"pushq $0\n"
// Jump to the entry point.
"jmp *%[jump_here]" ::
"D"(args->arg1), // "D" means %rdi
"S"(args->arg2), // "S" means %rsi
"d"(arg3), // "d" means %rdx
// "r" means any general-purpose register;
// we also give it a name "jump_here" that
// we'll be able to use inside the asm to
// refer to it, instead of %0
[jump_here] "r"(args->jump_here),
// Same for the stack pointer.
[stack_ptr] "r"(stack_ptr)
);
// The above never returns, let the compiler know that.
__builtin_unreachable(); (on Godbolt). For aarch64, specific register constraints are not available, and instead you're supposed to use explicit register local variables: register long arg1 asm("x0") = args->arg1;
register long arg2 asm("x1") = args->arg2;
register long arg3 asm("x2") = args->arg3;
// wqthread hack: if 3rd arg is null, we pass the stack bottom
if (arg3 == 0) {
arg3 = (long) args->stack_bottom;
}
// Make super sure the stack pointer is 16-aligned.
void *stack_ptr = align_16(args->stack_ptr);
asm volatile(
// Switch to the new stack.
"mov sp, %[stack_ptr]\n"
// Store a fake zero frame.
"stp xzr, xzr, [sp, #-16]!\n"
// Jump to the entry point.
"br %[jump_here]" ::
"r"(arg1),
"r"(arg2),
"r"(arg3),
[jump_here] "r"(args->jump_here),
[stack_ptr] "r"(stack_ptr)
);
// The above never returns, let the compiler know that.
__builtin_unreachable(); (on Godbolt). Wait, but aren't
Even on x86_64, not all registers have corresponding constraints, so you have to resort to explicit register variables if you want to place your variable into r8 or some such. Disclaimer: I'm not as comfortable with arm/aarch64 assembly, I could have messed something up here. I'm only showing a few args, your real version has more things. Also note that there should be no need to clobber anything since we're never returning from the inline asm. (And a nitpick: Bugaev, my last name, is of course capitalized, but bugaevc, my display name, is all lowercase.) |
I'm not that great at inline x86 asm tbh. Got a question though, what's the difference between |
@bugaevc By the way, AARCH64 passes the new frame entirely within registers, so to create a null frame for the function we're jumping to, you would zero-out the frame pointer ( asm volatile(
// Switch to the new stack.
"mov sp, %[stack_ptr]\n"
// Set up a fake zero frame by zeroing the frame pointer and link register
"mov x29, xzr\n"
"mov x30, xzr\n"
// Jump to the entry point.
"br %[jump_here]" ::
"r"(arg1),
"r"(arg2),
"r"(arg3),
[jump_here] "r"(args->jump_here),
[stack_ptr] "r"(stack_ptr)
);
@johnothwolo |
Riggght, because the stack needs to be 16 byte aligned! |
I'll need to create an ARM64 equivalent of the
; typedef void* marg_list;
; void __invoke__(
; void (*msgSend)(...), // rdi
; void *retdata, // rsi
; marg_list args, // rdx
; size_t frame_length, // ecx
; const char *return_type // r8
; )
; Make new call frame
push %rbp
movq %rsp, %rbp
; Push following values to stack
push %rdi ; void (*msgSend)(...)
push %rsi ; void *retdata
push %r8 ; const char *return_type
; rsi = rdx (args)
movq %rdx, %rsi
; Push stack down and align
subq %rcx, %rsp ; rsp -= frame_length
andq $-16, %rsp ; rsp &= -16
; Shift stack contents (frame_length/8) times, 8 bytes at a time
; TODO: More efficient than the Lpush loop in i386 assembly above
movq %rsp, %rdi ; rdi = rsp (stack pointer)
shrq $3, %rcx ; frame_length = frame_length >> 3 (frame_length / 8)
cld ; Clear direction flag (Incrementing the pointer to the data
; after every iteration | See https://stackoverflow.com/a/9636772/5988706)
rep movsq ; Move RCX (frame_length) quadwords (8 bytes) from RSI (args) to RDI (stack pointer).
; Copy args into registers
; (Why do we grab the values in this order)?
movq 0xb0(%rsp), %rax ; rax = rsp[0xb0]
movapd 0xa0(%rsp), %xmm7 ; xmm7 = rsp[0xa0]
movapd 0x90(%rsp), %xmm6 ; xmm6 = rsp[0x90]
movapd 0x80(%rsp), %xmm5 ; xmm5 = rsp[0x80]
movapd 0x70(%rsp), %xmm4 ; xmm4 = rsp[0x70]
movapd 0x60(%rsp), %xmm3 ; xmm3 = rsp[0x60]
movapd 0x50(%rsp), %xmm2 ; xmm2 = rsp[0x50]
movapd 0x40(%rsp), %xmm1 ; xmm1 = rsp[0x40]
movapd 0x30(%rsp), %xmm0 ; xmm0 = rsp[0x30]
movq 0x28(%rsp), %r9 ; r9 = rsp[0x28]
movq 0x20(%rsp), %r8 ; r8 = rsp[0x20]
movq 0x18(%rsp), %rcx ; rcx = rsp[0x18]
movq 0x10(%rsp), %rdx ; rdx = rsp[0x10]
movq 8(%rsp), %rsi ; rsi = rsp[0x08]
movq (%rsp), %rdi ; rdi = rsp[0x00]
addq $224, %rsp ; rsp += 224 (We restore the stack?)
movq -8(%rbp), %r10 ; r10 = objc_msgSend
callq *%r10 ; call objc_msgSend
; Grab retdata and return_type
movq -16(%rbp), %rsi ; rsi = retdata
movq -24(%rbp), %rcx ; rcx = return_type
; cl is the lower 8 bits to rcx
; 0x44 is 'D' in ASCII
cmpb $0x44, %cl ; if (returnType[0] == 'D') // long double
je Llongdoubleret. ; goto Llongdoubleret
; Store the return double value into `retdata` array
movapd %xmm1, 32(%rsi)
movapd %xmm0, 16(%rsi)
; Store the return int128 value into `retdata` array
movq %rdx, 8(%rsi)
movq %rax, (%rsi)
; goto Ldone
jmp Ldone
Llongdoubleret:
; Store the return long double value into `retdata` array
fstpt (%rsi)
Ldone:
; restore old call frame
movq %rbp, %rsp
pop %rbp
; Return
ret Tracing the method calls
Using test class: // This debug method is added to `NSMethodSignature.m`
- (void) darlingDebugPrinting {
printf("{\n");
for (NSUInteger i = 0; i < _count; i++) {
printf("\t{ ");
printf("_types[%lu].size: %lu, ", (unsigned long)i, _types[i].size);
printf("_types[%lu].alignment: %lu, ", (unsigned long)i, _types[i].alignment);
printf("_types[%lu].offset: %zu, ", (unsigned long)i, _types[i].offset);
printf("_types[%lu].type: \"%s\" ", (unsigned long)i, _types[i].type);
printf("}\n");
}
printf("};\n");
}
To get a better understanding on what the argument's value should be for For example, if you run |
I'll need to convert the following
;/**************************************
; * The marg_list's layout is:
; * d0 <-- args
; * d1
; * d2 | increasing address
; * d3 v
; * d4
; * d5
; * d6
; * d7
; * a1
; * a2
; * a3
; * a4
; * stack args...
; *
; * typedef struct objc_sendv_margs {
; * int a[4];
; * int stackArgs[...];
; * };
; *
; **************************************/
;
; __CF_forwarding_prep_0
; __CF_forwarding_prep_1
;
.section __TEXT,__text,regular,pure_instructions
.globl __CF_forwarding_prep_0
.globl __CF_forwarding_prep_1
.align 4, 0x90
__CF_forwarding_prep_0:
__CF_forwarding_prep_1:
push %rbp
movq %rsp, %rbp
; Copy args from regs into a stack var
subq $0xd0, %rsp
movq %rax, 0xb0(%rsp)
movapd %xmm7, 0xa0(%rsp)
movapd %xmm6, 0x90(%rsp)
movapd %xmm5, 0x80(%rsp)
movapd %xmm4, 0x70(%rsp)
movapd %xmm3, 0x60(%rsp)
movapd %xmm2, 0x50(%rsp)
movapd %xmm1, 0x40(%rsp)
movapd %xmm0, 0x30(%rsp)
movq %r9, 0x28(%rsp)
movq %r8, 0x20(%rsp)
movq %rcx, 0x18(%rsp)
movq %rdx, 0x10(%rsp)
movq %rsi, 8(%rsp)
movq %rdi, (%rsp)
; rdi (arg1), rsi (arg2)
; id ___forwarding___(struct objc_sendv_margs *args, void *returnStorage)
movq %rsp, %rdi
leaq 0xc0(%rsp), %rsi
call ____forwarding___
; check for forwarding completion
cmpq $0, %rax
jne Lfail
; if it's nil, we're done
; now, load the return value from the on-stack storage
; and jump back to our caller
; here's how we get the return values (see NSInvoke.S)
movq 0xc0(%rsp), %rax
movq 0xc8(%rsp), %rdx
movapd 0xd0(%rsp), %xmm0
movapd 0xe0(%rsp), %xmm1
movq %rbp, %rsp
pop %rbp
ret
Lfail:
; if we got a non-nil value, it's our forwarding targe
movq %rax, %rdi
movq 0x80(%rsp), %rax
movapd 0xa0(%rsp), %xmm7
movapd 0x90(%rsp), %xmm6
movapd 0x80(%rsp), %xmm5
movapd 0x70(%rsp), %xmm4
movapd 0x60(%rsp), %xmm3
movapd 0x50(%rsp), %xmm2
movapd 0x40(%rsp), %xmm1
movapd 0x30(%rsp), %xmm0
movq 0x28(%rsp), %r9
movq 0x20(%rsp), %r8
movq 0x18(%rsp), %rcx
movq 0x10(%rsp), %rdx
movq 8(%rsp), %rsi
; movq (%rsp), %rdi // self overwritten
movq %rbp, %rsp
pop %rbp
; restart message send
jmp _objc_msgSend
;
; __CF_forwarding_prep_b
;
.globl __CF_forwarding_prep_b
.align 4, 0x90
__CF_forwarding_prep_b:
push %rbp
movq %rsp, %rbp
; Copy args from regs into a stack var
subq $0xd0, %rsp
movq %rax, 0xb0(%rsp)
movapd %xmm7, 0xa0(%rsp)
movapd %xmm6, 0x90(%rsp)
movapd %xmm5, 0x80(%rsp)
movapd %xmm4, 0x70(%rsp)
movapd %xmm3, 0x60(%rsp)
movapd %xmm2, 0x50(%rsp)
movapd %xmm1, 0x40(%rsp)
movapd %xmm0, 0x30(%rsp)
movq %r9, 0x28(%rsp)
movq %r8, 0x20(%rsp)
movq %rcx, 0x18(%rsp)
movq %rdx, 0x10(%rsp)
movq %rsi, 8(%rsp)
movq %rdi, (%rsp)
; call into the actual forwarder
; void __block_forwarding__(void* frame)
movq %rsp, %rdi
call ___block_forwarding__
movq %rbp, %rsp
pop %rbp
ret |
This is when trying to cmake inside debian arm with UTM, inside Apple silicon:
|
@superbonaci Weird... I though I fixed that. With that being said, I don't recommend anyone trying to build or use the ARM64 branch for now, it's still very WIP.
|
Also the package
|
I'm planning to update the build instructions for Fedora to include the dependencies needed for ARM64 (for the other distros, I'll let other people create PRs for the needed ARM64 dependencies). However, I'll only do that after ARM64 support is ready. |
possibly relevant currently LINUX_SYSCALL() needs an architecture dependent identifier to find it. Switching to syscall names could make the ARM64 port more future resistant. I would do it myself, but at the current moment I can't even get any code functioning within darwin, so until I've tackled some interim projects I won't be doing this. |
To help me better understand, what do you mean by "architecture dependent identifier"? Are you referring to having macros/const values for the syscall numbers? |
a. You would be totally right to not understand it. I worded it poorly. When I wrote this I was thinking: "If I can make linux syscalls by name in c code; the LINUX_SYSCALL() macro, can too. A hidden assumption I was making was that since I could access any syscall from my c code by name I could access them all by name from my c code. Lately I've been getting doubt about this mostly in a "if I can think of it other smarter people could have thought of it too" way. If I'm right nobody working on the darling project needs to be looking up Linux syscall numbers, because the writers of our compilers, libraries and kernels already have. |
I understand that ARM64 support isn't a high priority for Darling, but there are some parts that I probably would not understand or know how to resolve on my own. I am hoping the Darling team would be able to help me or push me at the right direction.
So I recently got a Pinebook Pro and installed Manjaro ARM on it. Majority of the packages needed are available, with the exception of
gcc-multilib
andlib32-gcc-libs
. As a result, I have disabledTARGET_i386
. I should include a proper build target for ARM64, but for the time being, I want to see how far I can get away with usingTARGET_x86_64
.Any issues I have with ARM64 will be posted here (to avoid clutter since ARM64 isn't an officially supported platform). I also included some cmake logs if you need to take a look at the logs.
cmake_log.txt
CMakeError.log
CMakeOutput.log
The text was updated successfully, but these errors were encountered: