Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting Darling To Build On An ARM64 Device #642

Open
CuriousTommy opened this issue Jan 23, 2020 · 40 comments
Open

Getting Darling To Build On An ARM64 Device #642

CuriousTommy opened this issue Jan 23, 2020 · 40 comments

Comments

@CuriousTommy
Copy link
Contributor

I understand that ARM64 support isn't a high priority for Darling, but there are some parts that I probably would not understand or know how to resolve on my own. I am hoping the Darling team would be able to help me or push me at the right direction.


So I recently got a Pinebook Pro and installed Manjaro ARM on it. Majority of the packages needed are available, with the exception of gcc-multilib and lib32-gcc-libs. As a result, I have disabled TARGET_i386. I should include a proper build target for ARM64, but for the time being, I want to see how far I can get away with using TARGET_x86_64.

Any issues I have with ARM64 will be posted here (to avoid clutter since ARM64 isn't an officially supported platform). I also included some cmake logs if you need to take a look at the logs.

cmake_log.txt
CMakeError.log
CMakeOutput.log

@CuriousTommy
Copy link
Contributor Author

CuriousTommy commented Jan 23, 2020

It didn't take too long for a build error to occur. The weird this is that this error is not caused by Darling's code, but rather the header files. This error comes from /usr/include/asm/sigcontext.h and /usr/include/sys/user.h

In file included from /home/thomasa/Downloads/darling/src/libelfloader/native/threads.c:27:
In file included from /usr/include/signal.h:291:
In file included from /usr/include/bits/sigcontext.h:30:
/usr/include/asm/sigcontext.h:77:2: error: unknown type name '__uint128_t'
        __uint128_t vregs[32];
        ^
In file included from /home/thomasa/Downloads/darling/src/libelfloader/native/threads.c:27:

Here is the full log: make_log.txt

Edit: I figure out the problem, it has to do with the -m32 build option located here.

It makes sense why this would fail (since I don't have the 32bit libraries installed). Seems like disabling TARGET_i386 isn't enough.

Edit2: I added an if condition to build the 32bit version if TARGET_i386 is set.

@CuriousTommy
Copy link
Contributor Author

CuriousTommy commented Jan 24, 2020

The next road blocker I am running into is cctools-port. It seems like the code currently has issues detecting the machine as arm64.

/home/thomasa/Downloads/darling/src/external/cctools-port/cctools/ld64/src/../../include/foreign/mach/machine/boolean.h:39:2: error: architecture not supported
...
/home/thomasa/Downloads/darling/src/external/cctools-port/cctools/ld64/src/../../include/foreign/mach/machine/vm_types.h:39:2: error: architecture not supported
...

Going off of this useful link, adding __aarch64__ fixed this error message.

The weird thing about this issue is that it does not occur in the original project (at least from my very basic compile test).

I am currently trying to use an updated version of cctool-port in my fork to see if that fixes any other complication issues I am experiencing.

Edit: No, updating it doesn't fix the issue. It has to do with the fact that __arm64__ is an Apple thing.

@CuriousTommy
Copy link
Contributor Author

What is platform-include used for? bootstrap_cmds is currently failing.

/home/thomasa/Downloads/darling/src/bootstrap_cmd/include/mach/machine/boolean.h:35:2 error: architecture not supported
#error architecture not supported

I noticed that some of the files in that folder are similar to cctools-port:cctools/include/foreign. The boolean.h file in the darling repo doesn't include the other architectures that are found in the cctools-port boolean.h file. Are the files in the platform-include folder old?

@TheBrokenRail
Copy link
Contributor

You might also need to update https://github.com/darlinghq/darling/tree/master/src/kernel/libsyscall/bsdsyscalls for ARM64, (currently the macros only check for i386 or x86_64).

@CuriousTommy
Copy link
Contributor Author

@TheBrokenRail I am actually going through that currently. Right now I am just copying and pasting Apple's xnu code in there, but it looks like I am going to have to make some manual changes as well.

@TheBrokenRail
Copy link
Contributor

Will your work support ARM32, then Darling will work on older Raspberry Pis/rooted Android devices as well!

@CuriousTommy
Copy link
Contributor Author

CuriousTommy commented Feb 15, 2020

@TheBrokenRail Not in it's current form. I probably won't support it until I merge ARM64 support into darling and move to a distro that also supports ARM32 (Manjaro ARM only supports ARM64).

For now, I want to keep things simple.

@TheBrokenRail
Copy link
Contributor

Any updates on this?

@CuriousTommy
Copy link
Contributor Author

CuriousTommy commented Apr 17, 2020

@TheBrokenRail I have been busy with some IRL stuff (working hard to get my entry level career job!), so I haven't had a lot of time to work on this.

IRL stuff aside, I am waiting for @LubosD to update Darling to use the new cctool-port code. After the new code is used, I can continue working on the arm-support branch (when I have the free time).

@LubosD
Copy link
Member

LubosD commented Apr 17, 2020

@CuriousTommy Done. I happened to do just that because I'm trying to update dyld to the newest "dyld3".

@ghost
Copy link

ghost commented Jan 12, 2022

Hope to get an update on this.

@CuriousTommy
Copy link
Contributor Author

CuriousTommy commented Jan 13, 2022

Hope to get an update on this.

Right now I’m trying to update Apple’s source code to their newer version (some of the newer code contain better support for ARM64). This needs to be completed before I can continue my work on building Daring on ARM64 devices.

You can see my progress on the update_sources_11.5 branch. Just keep in mind that this will take a while to complete. Updating the source code isn’t always straight forward…

Edit: I put together a project so that people can see the current status in regards to update the source code. I made another project that goes into better detail on the sources that I have updated.

@jhay06
Copy link

jhay06 commented Sep 21, 2022

any update on this , i'm just curious on this initiative , i have Ubuntu touch arm64 and maybe it will run on this if arm64 build will be success ,

@CuriousTommy
Copy link
Contributor Author

any update on this , i'm just curious on this initiative

Generally speaking, the situation is the same as before. The update_sources_11.5 branch needs to be merged into master before I can continue work on implementing ARM64 support.

As for the current status on update_sources_11.5, the code does build successfully; however, some programs are currently broken (such as notifyd). I need to fix those broken applications (and update any remaining sources that have not be updated yet).

@jhay06

This comment was marked as off-topic.

@CuriousTommy

This comment was marked as off-topic.

@jhay06

This comment was marked as off-topic.

@CuriousTommy

This comment was marked as off-topic.

@dingari
Copy link
Contributor

dingari commented Dec 9, 2022

I guess I should have investigated a bit before spending my morning setting up arm64 Ubuntu in a VM on my Apple Silicon machine...

However, this is a very interesting project, and being able to build and run darling on arm64 is very desirable. Are there any major hurdles standing in the way of supporting arm64? I'm more than happy to help in any way I can.

@CuriousTommy
Copy link
Contributor Author

Are there any major hurdles standing in the way of supporting arm64?

The situation is mostly the same as what I document above. The sources need to updated before I can continue my work on ARM64 support.

Beside getting the sources updated (you can see my progress here), I am running into issues where I can't get JavaScriptCore and Heimdal to build.

I'm more than happy to help in any way I can.

If you are interested in looking into getting JavaScriptCore and Heimdal to build with the new source changes, I can provide more details in #1173.

You will need to do this on a x86_64 device (or use a translation layer). Be warned through, this stuff is very involved.

@CuriousTommy CuriousTommy unpinned this issue Jan 20, 2023
@CuriousTommy CuriousTommy pinned this issue May 27, 2023
@CuriousTommy
Copy link
Contributor Author

CuriousTommy commented May 27, 2023

Darlingserver Issues Notes

darling_thread_entry - x86_64 Porting Notes

I need to port the following assembly code to ARM64

#ifdef __x86_64__
__asm__ __volatile__ (
"movq %1, %%rdi\n"
"movq 80(%0), %%rsp\n"
"movq 40(%0), %%rsi\n"
"movq 8(%0), %%rdx\n"
"testq %%rdx, %%rdx\n"
"jnz 1f\n"
"movq 72(%0), %%rdx\n" // wqthread hack: if 3rd arg is null, we pass the stack bottom
"1:\n"
"movq 16(%0), %%rcx\n"
"movq 24(%0), %%r8\n"
"movq 32(%0), %%r9\n"
"movq %%rdi, 56(%0)\n"
"movq (%0), %%rax\n"
"andq $-0x10, %%rsp\n"
"pushq $0\n"
"pushq $0\n"
"jmpq *%%rax\n"
:: "a" (&args), "di" (args.pth));

Notes on args struct

struct arg_struct* in_args = (struct arg_struct*) p;
struct arg_struct args;

typedef void (*thread_ep)(void**, int, ...);
struct arg_struct
{
thread_ep entry_point;
uintptr_t real_entry_point;
uintptr_t arg1; // `user_arg` for normal threads; `keventlist` for workqueues
uintptr_t arg2; // `stack_addr` for normal threads; `flags` for workqueues
uintptr_t arg3; // `flags` for normal threads; `nkevents` for workqueues
union {
void* _backwards_compat; // kept around to avoid modifiying assembly
int port;
};
unsigned long pth_obj_size;
void* pth;
darling_thread_create_callbacks_t callbacks;
uintptr_t stack_bottom;
uintptr_t stack_addr;
bool is_workqueue;
};

struct darling_thread_create_callbacks {
unsigned int (*thread_self_trap)(void);
void (*thread_set_tsd_base)(void*, int);
void (*rpc_guard)(int);
void (*rpc_unguard)(int);
};
typedef const struct darling_thread_create_callbacks* darling_thread_create_callbacks_t;

Linkage issues

get_threadtask

/usr/bin/ld: /home/user/Documents/CodingProjects/GitHub/darling/src/external/darlingserver/duct-tape/xnu/osfmk/ipc/ipc_kmsg.c:6001: undefined reference to `get_threadtask'

This one confuses me... looking at source code, this should also affect i386/x86_64 as well...

// Taken from src/external/darlingserver/duct-tape/xnu/osfmk/ipc/ipc_kmsg.c
mach_msg_trailer_size_t
ipc_kmsg_trailer_size(
	mach_msg_option_t option,
	__unused thread_t thread)
{
	if (!(option & MACH_RCV_TRAILER_MASK)) {
		return MACH_MSG_TRAILER_MINIMUM_SIZE;
	} else {
		return REQUESTED_TRAILER_SIZE(thread_is_64bit_addr(thread), option);
	}
}
// Taken from src/external/darlingserver/duct-tape/xnu/osfmk/kern/thread.h
#define thread_is_64bit_addr(thd)       \
	task_has_64Bit_addr(get_threadtask(thd))
// Taken from src/external/darlingserver/duct-tape/xnu/osfmk/kern/bsd_kern.c
task_t
get_threadtask(thread_t th)
{
	return th->task;
}

Yet i386/x86_64 builds fine without it...

(lldb) image lookup --symbol get_threadtask
(lldb) 

But ipc_kmsg_trailer_size exists in the executable ???

(lldb) image lookup --symbol ipc_kmsg_trailer_size
1 symbols match 'ipc_kmsg_trailer_size' in /usr/local/bin/darlingserver:
        Address: darlingserver[0x000000000052df50] (darlingserver.PT_LOAD[1]..text + 1199280)
        Summary: darlingserver`ipc_kmsg_trailer_size at ipc_kmsg.c:5997

(lldb)

Older Notes

Researching `ptrace` For ARM64
/home/user/Documents/CodingProjects/GitHub/darling/src/external/darlingserver/src/thread.cpp:148:16: error: use of undeclared identifier 'PTRACE_GETREGS'; did you mean 'PTRACE_GETREGSET'?
                                if (ptrace(PTRACE_GETREGS, id, 0, &regs) == -1) {
                                           ^~~~~~~~~~~~~~
                                           PTRACE_GETREGSET

Going off of the man page for ptrace, "PTRACE_GETREGS is not present on all architectures". Unfortunately, ARM64 is one of them. Going off of the documentation, it seems that PTRACE_GETREGSET is the closest equivalent to PTRACE_GETREGS.

Example (from my understanding, could be wrong):

// To grab general-purpose registers
#include <sys/ptrace.h> // ptrace function
#include <elf.h> // NT_PRSTATUS & NT_ARM_*
#include <sys/uio.h> // struct iovec
#include <sys/user.h> // user_regs_struct

#include <stdio.h>
#include <signal.h>
#include <unistd.h>

void print_registers(struct user_regs_struct *reg) {
    for (int i=0; i < 31; i++) {
        printf("reg%d: 0x%016llx\n", i, reg->regs[i]);
    }

    printf("sp: 0x%016llx\n", reg->sp);
    printf("pc: 0x%016llx\n", reg->pc);
    printf("pstate: 0x%016llx\n", reg->pstate);
}

int main() {
    pid_t child = fork();

    if (child == -1) {
        perror("Fork unable to create child");
        return -1;
    }

    else if (child == 0) {
        // Child process
        if (ptrace(PTRACE_TRACEME, -1, NULL, NULL) == -1) { perror("Unable to send request to be traced"); return -1; }
        if (raise(SIGSTOP) == -1) { perror("Unable to stop process"); return -1; }
    }

    else {
        // Parent process
        sleep(1);

        struct user_regs_struct reg = {0};
        struct iovec reg_iov = {
            .iov_base = &reg,
            .iov_len = sizeof(reg)
        };

        long result = ptrace(PTRACE_GETREGSET, child, NT_PRSTATUS, &reg_iov);
        if (result == -1) {
            perror("Unable to grab general-purpose registers");
            return -1;
        }

        print_registers(&reg);
    }
}
Investigating Build Failure: no member named 'dtape_interlock' in 'struct lck_mtx'
/home/user/Documents/CodingProjects/GitHub/darling/src/external/darlingserver/duct-tape/src/misc.c:242:51: error: no member named 'dtape_interlock' in 'struct lck_mtx'
        return wq->dtape_waitq_interlock.dtape_interlock.dtape_interlock.dtape_mutex.dtape_owner == (uintptr_t)current_thread();
               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
struct waitq {
	uint32_t /* flags */
	    waitq_type:2,        /* only public field */
	    waitq_fifo:1,        /* fifo wakeup policy? */
	    waitq_prepost:1,     /* waitq supports prepost? */
	    waitq_irq:1,         /* waitq requires interrupts disabled */
	    waitq_isvalid:1,     /* waitq structure is valid */
	    waitq_turnstile:1,   /* waitq is embedded in a turnstile */
	    waitq_eventmask:_EVENT_MASK_BITS;
	/* the wait queue set (set-of-sets) to which this queue belongs */
#ifdef __DARLING__
	// simple_lock_data_t dtape_waitq_interlock
	decl_simple_lock_data(, dtape_waitq_interlock);
#else
#if __arm64__
	hw_lock_bit_t   waitq_interlock;        /* interlock */
#else
	hw_lock_data_t  waitq_interlock;        /* interlock */
#endif /* __arm64__ */
#endif // __DARLING__

	uint64_t waitq_set_id;
	uint64_t waitq_prepost_id;
	union {
		queue_head_t            waitq_queue;               /* queue of elements - used for waitq not embedded in turnstile or ports */
		struct priority_queue_sched_max waitq_prio_queue;  /* priority ordered queue of elements - used for waitqs embedded in turnstiles */
		struct {                                           /* used for waitqs embedded in ports */
			struct turnstile   *waitq_ts;              /* used to store receive turnstile of the port */
			union {
				void               *waitq_tspriv;  /* non special-reply port, used to store the watchport element for port used to store
				                                    * receive turnstile of the port */
				int                waitq_priv_pid; /* special-reply port, used to store the pid that copies out the send once right of the
				                                    * special-reply port. */
			};
		};
	};
};
// From duct-tape/xnu/osfmk/arm/simple_lock.h
typedef usimple_lock_data_t     simple_lock_data_t;

// From duct-tape/internal-include/darlingserver/duct-tape/simple_lock.h
struct usimple_lock {
	lck_spin_t dtape_interlock;
};

typedef struct usimple_lock usimple_lock_data_t;
typedef usimple_lock_data_t* usimple_lock_t;
// From duct-tape/internal-include/darlingserver/duct-tape/locks.h
typedef struct lck_spin {
	lck_mtx_t dtape_interlock;
} lck_spin_t;

// Mimics includes in `duct-tape/src/misc.c` and prints `waitq` struct.
#include <darlingserver/duct-tape.h>
#include <darlingserver/duct-tape/hooks.internal.h>
#include <darlingserver/duct-tape/log.h>
#include <darlingserver/duct-tape/processor.h>
#include <darlingserver/duct-tape/memory.h>
#include <darlingserver/duct-tape/task.h>
#include <darlingserver/duct-tape/psynch.h>

#include <kern/waitq.h>
#include <kern/clock.h>
#include <kern/turnstile.h>
#include <kern/thread_call.h>
#include <ipc/ipc_init.h>
#include <ipc/ipc_space.h>
#include <ipc/ipc_object.h>
#include <ipc/ipc_pset.h>
#include <kern/host.h>
#include <kern/sync_sema.h>
#include <kern/ux_handler.h>
#include <ipc/ipc_importance.h>
#include <kern/ipc_host.h>

#include <sys/types.h>

int main() {
    struct waitq d = {0};
    __builtin_dump_struct(&d, &printf);
}

ARM64 waitq struct output

struct waitq {
  uint32_t waitq_type : 2 = 0
  uint32_t waitq_fifo : 1 = 0
  uint32_t waitq_prepost : 1 = 0
  uint32_t waitq_irq : 1 = 0
  uint32_t waitq_isvalid : 1 = 0
  uint32_t waitq_turnstile : 1 = 0
  uint32_t waitq_eventmask : 25 = 0
  simple_lock_data_t dtape_waitq_interlock = {
    lck_mtx_t dtape_interlockA = {
      dtape_mutex_t dtape_mutex = {
        volatile uintptr_t dtape_owner = 0
        libsimple_lock_t dtape_queue_lock = {
          uint32_t state = 0
        }
        dtape_mutex_head_t dtape_queue_head = {
          struct dtape_mutex_link * tqh_first = (nil)
          struct dtape_mutex_link ** tqh_last = (nil)
        }
      }
    }
  }
  uint64_t waitq_set_id = 0
  uint64_t waitq_prepost_id = 0
  queue_head_t waitq_queue = {
    struct queue_entry * next = (nil)
    struct queue_entry * prev = (nil)
  }
  struct priority_queue_sched_max waitq_prio_queue = {
    struct priority_queue_entry_sched * pq_root = (nil)
  }
  struct turnstile * waitq_ts = (nil)
  void * waitq_tspriv = (nil)
  int waitq_priv_pid = 0
}

x86_64 waitq struct output

struct waitq {
  uint32_t waitq_type : 2 = 0
  uint32_t waitq_fifo : 1 = 0
  uint32_t waitq_prepost : 1 = 0
  uint32_t waitq_irq : 1 = 0
  uint32_t waitq_isvalid : 1 = 0
  uint32_t waitq_turnstile : 1 = 0
  uint32_t waitq_eventmask : 25 = 0
  simple_lock_data_t dtape_waitq_interlock = {
    lck_spin_t dtape_interlock = {
      lck_mtx_t dtape_interlock = {
        dtape_mutex_t dtape_mutex = {
          volatile uintptr_t dtape_owner = 0
          libsimple_lock_t dtape_queue_lock = {
            uint32_t state = 0
          }
          dtape_mutex_head_t dtape_queue_head = {
            struct dtape_mutex_link * tqh_first = (nil)
            struct dtape_mutex_link ** tqh_last = (nil)
          }
        }
      }
    }
  }
  uint64_t waitq_set_id = 0
  uint64_t waitq_prepost_id = 0
  queue_head_t waitq_queue = {
    struct queue_entry * next = (nil)
    struct queue_entry * prev = (nil)
  }
  struct priority_queue_sched_max waitq_prio_queue = {
    struct priority_queue_entry_sched * pq_root = (nil)
  }
  struct turnstile * waitq_ts = (nil)
  void * waitq_tspriv = (nil)
  int waitq_priv_pid = 0
}

@1div0
Copy link

1div0 commented Jun 29, 2023

I would like to try crosscompilation for AArch64 based nVIDIA Jetson devices. Any initial pointer will be appreciated. Thank you beforehand.

@CuriousTommy
Copy link
Contributor Author

It will be a while before you can build Darling for an ARM64 device. I still need to fix the source code.

@CuriousTommy
Copy link
Contributor Author

CuriousTommy commented Jul 2, 2023

MLDR Issues Notes

Missing implementation of _get_commpage_priv_address

/*
[  6%] Linking C executable mldr
/usr/bin/ld: CMakeFiles/mldr.dir/commpage.c.o: in function `commpage_setup':
/home/user/Documents/CodingProjects/GitHub/darling/src/startup/mldr/commpage.c:51: undefined reference to `_get_commpage_priv_address'
/usr/bin/ld: /home/user/Documents/CodingProjects/GitHub/darling/src/startup/mldr/commpage.c:51: undefined reference to `_get_commpage_priv_address'
/usr/bin/ld: /home/user/Documents/CodingProjects/GitHub/darling/src/startup/mldr/commpage.c:52: undefined reference to `_get_commpage_priv_address'
/usr/bin/ld: /home/user/Documents/CodingProjects/GitHub/darling/src/startup/mldr/commpage.c:52: undefined reference to `_get_commpage_priv_address'
/usr/bin/ld: /home/user/Documents/CodingProjects/GitHub/darling/src/startup/mldr/commpage.c:53: undefined reference to `_get_commpage_priv_address'
/usr/bin/ld: CMakeFiles/mldr.dir/commpage.c.o:/home/user/Documents/CodingProjects/GitHub/darling/src/startup/mldr/commpage.c:53: more undefined references to `_get_commpage_priv_address' follow
clang-16: error: linker command failed with exit code 1 (use -v to see invocation)
*/

void commpage_setup(bool _64bit)
{
// ...
	signature = (char*)CGET(_COMM_PAGE_SIGNATURE);
	version = (uint16_t*)CGET(_COMM_PAGE_VERSION);
	cpu_caps64 = (uint64_t*)CGET(_COMM_PAGE_CPU_CAPABILITIES64);
   	cpu_caps = (uint32_t*)CGET(_COMM_PAGE_CPU_CAPABILITIES);
// ...
}
// [xnu]/osfmk/mach/arm/vm_types.h:101
typedef uintptr_t               vm_offset_t;

// [xnu]/osfmk/mach/vm_types.h:41
typedef vm_offset_t     	vm_address_t;

// [xnu]/osfmk/arm/commpage/commpage.c:317
vm_address_t
_get_commpage_priv_address(void)
{
	return sharedpage_rw_addr;
}

// [xnu]/osfmk/arm/commpage/commpage.c:95
void
commpage_populate(void)
{
// ...
	// Create the data and the text commpage
	vm_map_address_t kernel_data_addr, kernel_text_addr, user_text_addr;
	pmap_create_sharedpages(&kernel_data_addr, &kernel_text_addr, &user_text_addr);

	sharedpage_rw_addr = kernel_data_addr;
	sharedpage_rw_text_addr = kernel_text_addr;
	commPagePtr = (vm_address_t) _COMM_PAGE_BASE_ADDRESS;
// ...
}

// [xnu]/osfmk/arm/pmap.c:12934 ?
void
pmap_create_sharedpages(vm_map_address_t *kernel_data_addr, vm_map_address_t *kernel_text_addr,
    vm_map_address_t *user_text_addr)
{
// ...
	*kernel_data_addr = 0;
	*kernel_text_addr = 0;
	*user_text_addr = 0;
// ...
	/* For manipulation in kernel, go straight to physical page */
	*kernel_data_addr = phystokv(data_pa);
	*kernel_text_addr = (text_pa) ? phystokv(text_pa) : 0;

	return;
}
`darling_thread_entry` - x86_64 ASM Porting Notes

I need to port the following assembly code to ARM64

#ifdef __x86_64__
__asm__ __volatile__ (
"movq %1, %%rdi\n"
"movq 80(%0), %%rsp\n"
"movq 40(%0), %%rsi\n"
"movq 8(%0), %%rdx\n"
"testq %%rdx, %%rdx\n"
"jnz 1f\n"
"movq 72(%0), %%rdx\n" // wqthread hack: if 3rd arg is null, we pass the stack bottom
"1:\n"
"movq 16(%0), %%rcx\n"
"movq 24(%0), %%r8\n"
"movq 32(%0), %%r9\n"
"movq %%rdi, 56(%0)\n"
"movq (%0), %%rax\n"
"andq $-0x10, %%rsp\n"
"pushq $0\n"
"pushq $0\n"
"jmpq *%%rax\n"
:: "a" (&args), "di" (args.pth));

Notes on args struct

struct arg_struct* in_args = (struct arg_struct*) p;
struct arg_struct args;

typedef void (*thread_ep)(void**, int, ...);
struct arg_struct
{
thread_ep entry_point;
uintptr_t real_entry_point;
uintptr_t arg1; // `user_arg` for normal threads; `keventlist` for workqueues
uintptr_t arg2; // `stack_addr` for normal threads; `flags` for workqueues
uintptr_t arg3; // `flags` for normal threads; `nkevents` for workqueues
union {
void* _backwards_compat; // kept around to avoid modifiying assembly
int port;
};
unsigned long pth_obj_size;
void* pth;
darling_thread_create_callbacks_t callbacks;
uintptr_t stack_bottom;
uintptr_t stack_addr;
bool is_workqueue;
};

struct darling_thread_create_callbacks {
unsigned int (*thread_self_trap)(void);
void (*thread_set_tsd_base)(void*, int);
void (*rpc_guard)(int);
void (*rpc_unguard)(int);
};
typedef const struct darling_thread_create_callbacks* darling_thread_create_callbacks_t;

Breaking down the x86_64 ASM

	__asm__ __volatile__ (
	"movq %1, %%rdi\n"
	"movq 80(%0), %%rsp\n"
	"movq 40(%0), %%rsi\n" // 2nd arg
	"movq 8(%0), %%rdx\n" // 3rd arg
	"testq %%rdx, %%rdx\n"
	"jnz 1f\n"
	"movq 72(%0), %%rdx\n"
	"1:\n"
	"movq 16(%0), %%rcx\n" // 4th arg
	"movq 24(%0), %%r8\n" // 5th arg
	"movq 32(%0), %%r9\n" // 6th arg
	"movq %%rdi, 56(%0)\n" // 1st arg
	"movq (%0), %%rax\n"
	"andq $-0x10, %%rsp\n"
	"pushq $0\n"
	"pushq $0\n"
	"jmpq *%%rax\n"
	:: "a" (&args), "di" (args.pth));

// Notes on the position of the struct
struct arg_struct
{
	thread_ep entry_point; // 0
	uintptr_t real_entry_point; // 8
	uintptr_t arg1; // 16
	uintptr_t arg2; // 24
	uintptr_t arg3; // 32
	union {
		void* _backwards_compat; // kept around to avoid modifiying assembly
		int port;
	}; // 40
	unsigned long pth_obj_size; // 48
	void* pth; // 56
	darling_thread_create_callbacks_t callbacks; // 64
	uintptr_t stack_bottom; // 72
	uintptr_t stack_addr; // 80
	bool is_workqueue; // 88
};

https://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html#s5

       asm ( assembler template 
           : output operands                  /* optional */
           : input operands                   /* optional */
           : list of clobbered registers      /* optional */
           );

If there are no output operands but there are input operands, you must place two consecutive colons surrounding the place where the output operands would go.

When the "r" constraint is specified, gcc may keep the variable in any of the available GPRs. To specify the register, you must directly specify the register names by using specific register constraints. They are:

    +---+--------------------+
    | r |    Register(s)     |
    +---+--------------------+
    | a |   %eax, %ax, %al   |
    | b |   %ebx, %bx, %bl   |
    | c |   %ecx, %cx, %cl   |
    | d |   %edx, %dx, %dl   |
    | S |   %esi, %si        |
    | D |   %edi, %di        |
    +---+--------------------+

"i" : An immediate integer operand (one with constant value) is allowed. This includes symbolic constants whose values will be known only at assembly time.

Pseudo-C

args.stack_addr = rsp;
args._backwards_compat = rsi;
args.real_entry_point = rdx
if (rdx == NULL) {
    args.real_entry_point = rdx;
}
args.arg1 = rcx;
args.arg2 = r8;
args.arg3 = r9;
rdi = args.pth;
rax = &args;
rsp -= 16 // 0x10
push(0);
push(0);
jump_without_return(rax);

@johnothwolo
Copy link

johnothwolo commented Jul 3, 2023

Pseudo-C

I got a question, since this is AT&T syntax, isn't the operand order reversed? Wouldn't it be:

rdi = args.pth; // MARK: redundant?
rsp = args.stack_addr;
rsi = args._backwards_compat;
rdx = args.real_entry_point;
if (rdx == NULL) {
    rdx = args.real_entry_point;
}
rcx = args.arg1;
r8 = args.arg2;
r9 = args.arg3;
args.pth = rdi; // MARK: redundant cuz of first line?
rax = args->entry_point;
rsp -= 16 // 0x10
push(0);
push(0);
jump_without_return(rax);

ARM equivalent (chatgpt made, but hand optimized 🙂):

__asm__ __volatile__ (
    "mov x0, %1\n"
    "ldr x2, [%0, #80]\n"
    "ldr x3, [%0, #40]\n"
    "ldr x4, [%0, #8]\n"
    "cbnz x4, 1f\n"
    "ldr x4, [%0, #72]\n" // wqthread hack: if 3rd arg is null, we pass the stack bottom
    "1:\n"
    "ldr x5, [%0, #16]\n"
    "ldr x6, [%0, #24]\n"
    "ldr x7, [%0, #32]\n"
    "str x0, [%0, #56]\n"
    "ldr x8, [%0]\n"
    "and sp, sp, #-0x10\n"
    "stp xzr, xzr, [sp, #-16]!\n"
    "br x8\n"
    :: "a" (&args), "r" (args.pth)
    : "x0", "x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "cc", "memory"
);

@CuriousTommy
Copy link
Contributor Author

CuriousTommy commented Jul 3, 2023

I got a question, since this is AT&T syntax, isn't the operand order reversed? Wouldn't it be:

You're right. I forgot that there was more than one way for representing x86_64 assembly.

rsp -= 16 // 0x10

I actually got this part wrong, it's suppose to be rsp &= 16 // 0x10 rsp &= -16 // 0x10

ARM equivalent (chatgpt made, but hand optimized 🙂):

__asm__ __volatile__ (
    "mov x0, %1\n"
    "ldr x2, [%0, #80]\n"
    "ldr x3, [%0, #40]\n"
    "ldr x4, [%0, #8]\n"
    "cbnz x4, 1f\n"
    "ldr x4, [%0, #72]\n" // wqthread hack: if 3rd arg is null, we pass the stack bottom
    "1:\n"
    "ldr x5, [%0, #16]\n"
    "ldr x6, [%0, #24]\n"
    "ldr x7, [%0, #32]\n"
    "str x0, [%0, #56]\n"
    "ldr x8, [%0]\n"
    "and sp, sp, #-0x10\n"
    "stp xzr, xzr, [sp, #-16]!\n"
    "br x8\n"
    :: "a" (&args), "r" (args.pth)
    : "x0", "x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "cc", "memory"
);

I didn't expect anyone to provide me an ARM64 translation, thanks!

One thing I'm going to change is how the values are assigned to the registers. I'm not a huge fan of doing ldr x2, [%0, #80], since it's not clear what variable we are assigning to x2. Bugaevc bugaevc told me there is a way to assign variables to a register, but I need to figure out how to do that.

@bugaevc
Copy link
Member

bugaevc commented Jul 3, 2023

it's suppose to be rsp &= 16 // 0x10

rsp &= -16 rather (or ~15, which is the same as -16). This zeroes out the four last bits.

@bugaevc
Copy link
Member

bugaevc commented Jul 4, 2023

Bugaevc told me there is a way to assign variables to a register, but I need to figure out how to do that.

The idea here is to do as much as possible in C, since that's portable (can be shared among architectures) and easier to understand, and you also don't have to hardcode things like fields offsets. For x86_64, GCC (and Clang) provide constraints, where you can tell it to place an input (or output, but in this case we've got none) into a specific register (or class of registers), like this:

// wqthread hack: if 3rd arg is null, we pass the stack bottom 
long arg3 = args->arg3;
if (arg3 == 0) {
    arg3 = (long) args->stack_bottom;
}

// Make super sure the stack pointer is 16-aligned.
void *stack_ptr = align_16(args->stack_ptr);

asm volatile(
    // Zero out the frame base register.
    "xorq %%rbp, %%rbp\n"
    // Switch to the new stack.
    "movq %[stack_ptr], %%rsp\n"
    // Push a fake return address.
    "pushq $0\n"
    // Jump to the entry point.
    "jmp *%[jump_here]" ::

    "D"(args->arg1),  // "D" means %rdi
    "S"(args->arg2),  // "S" means %rsi
    "d"(arg3),        // "d" means %rdx

    // "r" means any general-purpose register;
    // we also give it a name "jump_here" that
    // we'll be able to use inside the asm to
    // refer to it, instead of %0
    [jump_here] "r"(args->jump_here),
    // Same for the stack pointer.
    [stack_ptr] "r"(stack_ptr)
);
// The above never returns, let the compiler know that.
__builtin_unreachable();

(on Godbolt). For aarch64, specific register constraints are not available, and instead you're supposed to use explicit register local variables:

register long arg1 asm("x0") = args->arg1;
register long arg2 asm("x1") = args->arg2;
register long arg3 asm("x2") = args->arg3;

// wqthread hack: if 3rd arg is null, we pass the stack bottom 
if (arg3 == 0) {
    arg3 = (long) args->stack_bottom;
}

// Make super sure the stack pointer is 16-aligned.
void *stack_ptr = align_16(args->stack_ptr);

asm volatile(
    // Switch to the new stack.
    "mov sp, %[stack_ptr]\n"
    // Store a fake zero frame.
    "stp xzr, xzr, [sp, #-16]!\n"
    // Jump to the entry point.
    "br %[jump_here]" ::
    "r"(arg1),
    "r"(arg2),
    "r"(arg3),
    [jump_here] "r"(args->jump_here),
    [stack_ptr] "r"(stack_ptr)
);
// The above never returns, let the compiler know that.
__builtin_unreachable();

(on Godbolt).

Wait, but aren't register variables a very old, very deprecated C feature? Yes, but that's not it, this is GCC's explicit register variables extension (register asm("")), this is not in standard C and not deprecated. When you do it this way,

  • the variable is guaranteed to be placed into the register you name,
  • the compiler won't reuse the register for anything else,
  • if you pass it with an "r" contraint into inline asm, it is guaranteed to be passed in this very same register, not copied to another one.

Even on x86_64, not all registers have corresponding constraints, so you have to resort to explicit register variables if you want to place your variable into r8 or some such.

Disclaimer: I'm not as comfortable with arm/aarch64 assembly, I could have messed something up here. I'm only showing a few args, your real version has more things. Also note that there should be no need to clobber anything since we're never returning from the inline asm.

(And a nitpick: Bugaev, my last name, is of course capitalized, but bugaevc, my display name, is all lowercase.)

@johnothwolo
Copy link

rsp &= -16 rather (or ~15, which is the same as -16). This zeroes out the four last bits.

I'm not that great at inline x86 asm tbh. Got a question though, what's the difference between rsp -= 16 and rsp &= -16?

@facekapow
Copy link
Member

asm volatile(
    // Switch to the new stack.
    "mov sp, %[stack_ptr]\n"
    // Store a fake zero frame.
    "stp xzr, xzr, [sp, #-16]!\n"
    // Jump to the entry point.
    "br %[jump_here]" ::
    "r"(arg1),
    "r"(arg2),
    "r"(arg3),
    [jump_here] "r"(args->jump_here),
    [stack_ptr] "r"(stack_ptr)
);

@bugaevc By the way, AARCH64 passes the new frame entirely within registers, so to create a null frame for the function we're jumping to, you would zero-out the frame pointer (fp or x29) and link register (lr or x30) instead:

asm volatile(
    // Switch to the new stack.
    "mov sp, %[stack_ptr]\n"
    // Set up a fake zero frame by zeroing the frame pointer and link register
    "mov x29, xzr\n"
    "mov x30, xzr\n"
    // Jump to the entry point.
    "br %[jump_here]" ::
    "r"(arg1),
    "r"(arg2),
    "r"(arg3),
    [jump_here] "r"(args->jump_here),
    [stack_ptr] "r"(stack_ptr)
);

I'm not that great at inline x86 asm tbh. Got a question though, what's the difference between rsp -= 16 and rsp &= -16?

@johnothwolo rsp -= 16 just subtracts 16 bytes from the stack pointer (equivalent to growing it by 16 bytes). rsp &= -16, on the other hand, aligns it to 16 bytes by bitwise ANDing it with -16. This is because the 2's complement representation of -16 is 0xFFFFFFFFFFFFFFF0 (i.e. all ones except the 4 least significant bits). When you bitwise AND something with this value, it just discards the 4 least significant bits from that value, effectively making it 16-byte aligned.

@johnothwolo
Copy link

rsp -= 16 just subtracts 16 bytes from the stack pointer (equivalent to growing it by 16 bytes). rsp &= -16, on the other hand, aligns it to 16 bytes by bitwise ANDing it with -16.

Riggght, because the stack needs to be 16 byte aligned!

@CuriousTommy
Copy link
Contributor Author

CuriousTommy commented Oct 7, 2023

I'll need to create an ARM64 equivalent of the __invoke__ method that lives in NSInvoke.S (or NSInvoke-x86.S).

  • Understanding the x86_64 assembly code can help me better understand how to implement the ARM64 code. However, the logic for ARM64 may differ from the x86_64 code. I'll probably need to research how the objc_msgSend method works.
  • For now, I'll ignore the CFI stuff. However, the CFI will need to be implemented to support proper exception handling.
  • __invoke__ seems to rely on a few pieces before it is called:
    • _frame - Seems to hold arguments that are applied through the __invoke__ assembly code. It's not yet clear why the arguments are structured the way they are.
      • There are two ways the _frame variable is initialized in NSInvocation. Either it copies the provided frame (likely from _CF_forwarding_prep_b), or it creates an empty frame, where the arguments can be store into by using the - (void)setArgument:(void *)argumentLocation atIndex:(NSInteger)idx method.
        • When it comes to the setArgument:atIndex: method, it seems to rely on NSMethodSignature to figure out where the arguments should be stored.
; typedef void* marg_list;
; void __invoke__(
;     void (*msgSend)(...),            // rdi 
;     void *retdata,                   // rsi
;     marg_list args,                  // rdx
;     size_t frame_length,             // ecx
;     const char *return_type          // r8
; )

; Make new call frame
push %rbp
movq %rsp, %rbp

; Push following values to stack
push %rdi   ; void (*msgSend)(...)
push %rsi   ; void *retdata
push %r8    ; const char *return_type

; rsi = rdx (args)
movq %rdx, %rsi

; Push stack down and align
subq %rcx, %rsp    ; rsp -= frame_length
andq $-16, %rsp    ; rsp &= -16

; Shift stack contents (frame_length/8) times, 8 bytes at a time
; TODO: More efficient than the Lpush loop in i386 assembly above
movq %rsp, %rdi       ; rdi = rsp (stack pointer)
shrq $3, %rcx         ; frame_length = frame_length >> 3 (frame_length / 8)
cld                   ; Clear direction flag (Incrementing the pointer to the data
                      ;   after every iteration | See https://stackoverflow.com/a/9636772/5988706)
rep movsq             ; Move RCX (frame_length) quadwords (8 bytes) from RSI (args) to RDI (stack pointer).

; Copy args into registers
; (Why do we grab the values in this order)?
movq 0xb0(%rsp), %rax      ; rax  = rsp[0xb0]
movapd 0xa0(%rsp), %xmm7   ; xmm7 = rsp[0xa0]
movapd 0x90(%rsp), %xmm6   ; xmm6 = rsp[0x90]
movapd 0x80(%rsp), %xmm5   ; xmm5 = rsp[0x80]
movapd 0x70(%rsp), %xmm4   ; xmm4 = rsp[0x70]
movapd 0x60(%rsp), %xmm3   ; xmm3 = rsp[0x60]
movapd 0x50(%rsp), %xmm2   ; xmm2 = rsp[0x50]
movapd 0x40(%rsp), %xmm1   ; xmm1 = rsp[0x40]
movapd 0x30(%rsp), %xmm0   ; xmm0 = rsp[0x30]
movq 0x28(%rsp), %r9       ; r9   = rsp[0x28]
movq 0x20(%rsp), %r8       ; r8   = rsp[0x20]
movq 0x18(%rsp), %rcx      ; rcx  = rsp[0x18]
movq 0x10(%rsp), %rdx      ; rdx  = rsp[0x10]
movq 8(%rsp), %rsi         ; rsi  = rsp[0x08]
movq (%rsp), %rdi          ; rdi  = rsp[0x00]

addq $224, %rsp       ; rsp += 224 (We restore the stack?)
movq -8(%rbp), %r10   ; r10 = objc_msgSend
callq *%r10           ; call objc_msgSend

; Grab retdata and return_type
movq -16(%rbp), %rsi   ; rsi = retdata
movq -24(%rbp), %rcx   ; rcx = return_type

; cl is the lower 8 bits to rcx
; 0x44 is 'D' in ASCII
cmpb $0x44, %cl        ; if (returnType[0] == 'D') // long double
je Llongdoubleret.     ;     goto Llongdoubleret

; Store the return double value into `retdata` array
movapd %xmm1, 32(%rsi)
movapd %xmm0, 16(%rsi)

; Store the return int128 value into `retdata` array
movq %rdx, 8(%rsi)
movq %rax, (%rsi)

; goto Ldone
jmp Ldone

Llongdoubleret:
; Store the return long double value into `retdata` array 
fstpt (%rsi)

Ldone:
; restore old call frame
movq %rbp, %rsp
pop %rbp

; Return
ret

Tracing the method calls

The two methods that call _initWithMethodSignature:frame:

  1. - (instancetype)initWithMethodSignature:(NSMethodSignature *)sig
    1. + (instancetype)invocationWithMethodSignature:(NSMethodSignature *)sig
  2. + (instancetype)_invocationWithMethodSignature: (NSMethodSignature*)sig frame: (void*)frame
    1. void __block_forwarding__(void* frame)
    2. _CF_forwarding_prep_b
      1. This function stores the registers in a stack variable (in the same order as in the __invoke__ method).

^^^^^

This method seem to be responsible for initalizing the _frame variable.
- (instancetype)_initWithMethodSignature: (NSMethodSignature*)sig frame: (void*)frame

  • This method uses calloc to initalize the _frame with zeros. If the frame is not NULL, then the contents of frame is copied to _frame.

^^^^^

A common pattern I noticed with the following methods that call _invokeUsingIMP:withFrame:, is that they all use _frame (or a copy of it).

  1. - (void) invokeUsingIMP: (IMP) imp
  2. - (void)invoke
  3. - (void) invokeSuper

^^^^^

- (void) _invokeUsingIMP: (IMP) imp withFrame: (void *) frame

^^^^^

[ Calls __invoke__(...) ]


Using test class: -[FunObjClass funConcatenation:withSecondArg:withThirdArg:withFourthArg:withFifthArg:withSixthArg:] :

// This debug method is added to `NSMethodSignature.m`
- (void) darlingDebugPrinting {
    printf("{\n");
    for (NSUInteger i = 0; i < _count; i++) {
        printf("\t{ ");
        printf("_types[%lu].size: %lu, ", (unsigned long)i, _types[i].size);
        printf("_types[%lu].alignment: %lu, ", (unsigned long)i, _types[i].alignment);
        printf("_types[%lu].offset: %zu, ", (unsigned long)i, _types[i].offset);
        printf("_types[%lu].type: \"%s\" ", (unsigned long)i, _types[i].type);
        printf("}\n");
    }
    printf("};\n");
}
{
        { _types[0].size: 8, _types[0].alignment: 8, _types[0].offset: 0, _types[0].type: "@" }
        { _types[1].size: 8, _types[1].alignment: 8, _types[1].offset: 0, _types[1].type: "@" }
        { _types[2].size: 8, _types[2].alignment: 8, _types[2].offset: 8, _types[2].type: ":" }
        { _types[3].size: 8, _types[3].alignment: 8, _types[3].offset: 16, _types[3].type: "@" }
        { _types[4].size: 8, _types[4].alignment: 8, _types[4].offset: 24, _types[4].type: "@" }
        { _types[5].size: 8, _types[5].alignment: 8, _types[5].offset: 32, _types[5].type: "@" }
        { _types[6].size: 8, _types[6].alignment: 8, _types[6].offset: 40, _types[6].type: "@" }
        { _types[7].size: 8, _types[7].alignment: 8, _types[7].offset: 224, _types[7].type: "@" }
        { _types[8].size: 8, _types[8].alignment: 8, _types[8].offset: 232, _types[8].type: "@" }
};

To get a better understanding on what the argument's value should be for +[NSMethodSignature signatureWithObjCTypes:], you'll need to look into the method_getTypeEncoding function.

For example, if you run [FunObjClass instanceMethodSignatureForSelector:selector];, the types argument would become @64@0:8@16@24@32@40@48@56.

@CuriousTommy
Copy link
Contributor Author

CuriousTommy commented Oct 28, 2023

I'll need to convert the following x86_64 CFForwardingPrep.S code arm64

  • One thing I find interesting is that the only __CF_forwarding_prep_0 exist on macOS.
;/**************************************
; * The marg_list's layout is:
; * d0   <-- args
; * d1
; * d2   |  increasing address
; * d3   v
; * d4
; * d5
; * d6
; * d7
; * a1
; * a2
; * a3
; * a4
; * stack args...
; * 
; * typedef struct objc_sendv_margs {
; *    int  a[4];
; *    int  stackArgs[...];
; * };
; *
; **************************************/



;
; __CF_forwarding_prep_0
; __CF_forwarding_prep_1
;

.section __TEXT,__text,regular,pure_instructions
.globl __CF_forwarding_prep_0
.globl __CF_forwarding_prep_1
.align 4, 0x90

__CF_forwarding_prep_0:
__CF_forwarding_prep_1:
push %rbp
movq %rsp, %rbp

; Copy args from regs into a stack var
subq   $0xd0, %rsp
movq   %rax, 0xb0(%rsp)
movapd %xmm7, 0xa0(%rsp)
movapd %xmm6, 0x90(%rsp)
movapd %xmm5, 0x80(%rsp)
movapd %xmm4, 0x70(%rsp)
movapd %xmm3, 0x60(%rsp)
movapd %xmm2, 0x50(%rsp)
movapd %xmm1, 0x40(%rsp)
movapd %xmm0, 0x30(%rsp)
movq   %r9, 0x28(%rsp)
movq   %r8, 0x20(%rsp)
movq   %rcx, 0x18(%rsp)
movq   %rdx, 0x10(%rsp)
movq   %rsi, 8(%rsp)
movq   %rdi, (%rsp)

; rdi (arg1), rsi (arg2)
; id ___forwarding___(struct objc_sendv_margs *args, void *returnStorage)
movq   %rsp, %rdi
leaq   0xc0(%rsp), %rsi
call   ____forwarding___ 

; check for forwarding completion
cmpq   $0, %rax 
jne    Lfail

; if it's nil, we're done
; now, load the return value from the on-stack storage
; and jump back to our caller

; here's how we get the return values (see NSInvoke.S)
movq   0xc0(%rsp), %rax
movq   0xc8(%rsp), %rdx
movapd 0xd0(%rsp), %xmm0
movapd 0xe0(%rsp), %xmm1

movq   %rbp, %rsp
pop    %rbp

ret

Lfail:
; if we got a non-nil value, it's our forwarding targe
movq   %rax, %rdi
movq   0x80(%rsp), %rax
movapd 0xa0(%rsp), %xmm7
movapd 0x90(%rsp), %xmm6
movapd 0x80(%rsp), %xmm5
movapd 0x70(%rsp), %xmm4
movapd 0x60(%rsp), %xmm3
movapd 0x50(%rsp), %xmm2
movapd 0x40(%rsp), %xmm1
movapd 0x30(%rsp), %xmm0
movq   0x28(%rsp), %r9
movq   0x20(%rsp), %r8
movq   0x18(%rsp), %rcx
movq   0x10(%rsp), %rdx
movq   8(%rsp), %rsi
; movq   (%rsp), %rdi // self overwritten

movq   %rbp, %rsp
pop    %rbp

; restart message send
jmp    _objc_msgSend



;
; __CF_forwarding_prep_b
;

.globl __CF_forwarding_prep_b
.align 4, 0x90

__CF_forwarding_prep_b:
push %rbp
movq %rsp, %rbp

; Copy args from regs into a stack var
subq   $0xd0, %rsp
movq   %rax, 0xb0(%rsp)
movapd %xmm7, 0xa0(%rsp)
movapd %xmm6, 0x90(%rsp)
movapd %xmm5, 0x80(%rsp)
movapd %xmm4, 0x70(%rsp)
movapd %xmm3, 0x60(%rsp)
movapd %xmm2, 0x50(%rsp)
movapd %xmm1, 0x40(%rsp)
movapd %xmm0, 0x30(%rsp)
movq   %r9, 0x28(%rsp)
movq   %r8, 0x20(%rsp)
movq   %rcx, 0x18(%rsp)
movq   %rdx, 0x10(%rsp)
movq   %rsi, 8(%rsp)
movq   %rdi, (%rsp)

; call into the actual forwarder
; void __block_forwarding__(void* frame)
movq   %rsp, %rdi
call   ___block_forwarding__

movq   %rbp, %rsp
pop    %rbp
ret

@superbonaci
Copy link

This is when trying to cmake inside debian arm with UTM, inside Apple silicon:

$ cat /etc/issue
Debian GNU/Linux 12 \n \l
$ uname -r
6.1.0-18-arm64
Including component: gui
Including component: iokitd
Python 2 not available; bytecode compilation is disabled
-- Could NOT find Vulkan (missing: Vulkan_LIBRARY Vulkan_INCLUDE_DIR) (found version "")
Did not find required libraries (Vulkan and LLVM); building without Metal support
-- Found dsymutil: /usr/bin/dsymutil
-- Found Setcap: /usr/sbin/setcap  
In file included from /home/debian/darling/build/cinctest.c:1:
/usr/lib/llvm-14/lib/clang/14.0.6/include/cpuid.h:14:2: error: this header is for x86 only
#error this header is for x86 only
 ^
1 error generated.
CMake Error at cmake/compiler_include.cmake:11 (message):
  Cannot detect compiler header include path
Call Stack (most recent call first):
  src/CMakeLists.txt:57 (GetCompilerInclude)


-- Configuring incomplete, errors occurred!

@CuriousTommy
Copy link
Contributor Author

@superbonaci Weird... I though I fixed that.

With that being said, I don't recommend anyone trying to build or use the ARM64 branch for now, it's still very WIP.

  • You still need to make code changes to the get the ARM64 version to build successfully.
  • Even if you get it to successfully build, you would still get stuck in dyld (no ARM64 macOS app is going to work at this point).

@superbonaci
Copy link

Also the package libc6-dev-i386 does not exist in Debian 12 ARM64 (mentioned here https://docs.darlinghq.org/build-instructions.html):

$ apt list libc6*
Listing... Done
libc6-amd64-cross/stable,now 2.36-8cross1 all [installed,automatic]
libc6-amd64-i386-cross/stable,now 2.36-8cross1 all [installed,automatic]
libc6-amd64-x32-cross/stable,now 2.36-8cross1 all [installed,automatic]
libc6-arc-cross/stable 2.36-8cross1 all
libc6-arm64-cross/stable,now 2.36-8cross1 all [installed,automatic]
libc6-armel-cross/stable 2.36-8cross1 all
libc6-armhf-cross/stable 2.36-8cross1 all
libc6-dbg/stable,stable-security 2.36-9+deb12u4 arm64
libc6-dev-amd64-cross/stable,now 2.36-8cross1 all [installed]
libc6-dev-amd64-i386-cross/stable,now 2.36-8cross1 all [installed]
libc6-dev-amd64-x32-cross/stable,now 2.36-8cross1 all [installed]
libc6-dev-arc-cross/stable 2.36-8cross1 all
libc6-dev-arm64-cross/stable,now 2.36-8cross1 all [installed]
libc6-dev-armel-cross/stable 2.36-8cross1 all
libc6-dev-armhf-cross/stable 2.36-8cross1 all
libc6-dev-hppa-cross/stable 2.36-8cross1 all
libc6-dev-i386-amd64-cross/stable,now 2.36-8cross1 all [installed]
libc6-dev-i386-cross/stable,now 2.36-8cross1 all [installed]
libc6-dev-i386-x32-cross/stable,now 2.36-8cross1 all [installed]
libc6-dev-m68k-cross/stable 2.36-8cross1 all
libc6-dev-mips-cross/stable 2.36-8cross2 all
libc6-dev-mips32-mips64-cross/stable 2.36-8cross2 all
libc6-dev-mips32-mips64el-cross/stable 2.36-8cross2 all
libc6-dev-mips32-mips64r6-cross/stable 2.36-8cross2 all
libc6-dev-mips32-mips64r6el-cross/stable 2.36-8cross2 all
libc6-dev-mips32-mipsn32-cross/stable 2.36-8cross2 all
libc6-dev-mips32-mipsn32el-cross/stable 2.36-8cross2 all
libc6-dev-mips32-mipsn32r6-cross/stable 2.36-8cross2 all
libc6-dev-mips32-mipsn32r6el-cross/stable 2.36-8cross2 all
libc6-dev-mips64-cross/stable 2.36-8cross2 all
libc6-dev-mips64-mips-cross/stable 2.36-8cross2 all
libc6-dev-mips64-mipsel-cross/stable 2.36-8cross2 all
libc6-dev-mips64-mipsn32-cross/stable 2.36-8cross2 all
libc6-dev-mips64-mipsn32el-cross/stable 2.36-8cross2 all
libc6-dev-mips64-mipsn32r6-cross/stable 2.36-8cross2 all
libc6-dev-mips64-mipsn32r6el-cross/stable 2.36-8cross2 all
libc6-dev-mips64-mipsr6-cross/stable 2.36-8cross2 all
libc6-dev-mips64-mipsr6el-cross/stable 2.36-8cross2 all
libc6-dev-mips64el-cross/stable 2.36-8cross2 all
libc6-dev-mips64r6-cross/stable 2.36-8cross2 all
libc6-dev-mips64r6el-cross/stable 2.36-8cross2 all
libc6-dev-mipsel-cross/stable 2.36-8cross2 all
libc6-dev-mipsn32-cross/stable 2.36-8cross2 all
libc6-dev-mipsn32-mips-cross/stable 2.36-8cross2 all
libc6-dev-mipsn32-mips64-cross/stable 2.36-8cross2 all
libc6-dev-mipsn32-mips64el-cross/stable 2.36-8cross2 all
libc6-dev-mipsn32-mips64r6-cross/stable 2.36-8cross2 all
libc6-dev-mipsn32-mips64r6el-cross/stable 2.36-8cross2 all
libc6-dev-mipsn32-mipsel-cross/stable 2.36-8cross2 all
libc6-dev-mipsn32-mipsr6-cross/stable 2.36-8cross2 all
libc6-dev-mipsn32-mipsr6el-cross/stable 2.36-8cross2 all
libc6-dev-mipsn32el-cross/stable 2.36-8cross2 all
libc6-dev-mipsn32r6-cross/stable 2.36-8cross2 all
libc6-dev-mipsn32r6el-cross/stable 2.36-8cross2 all
libc6-dev-mipsr6-cross/stable 2.36-8cross2 all
libc6-dev-mipsr6el-cross/stable 2.36-8cross2 all
libc6-dev-powerpc-cross/stable 2.36-8cross1 all
libc6-dev-powerpc-ppc64-cross/stable 2.36-8cross1 all
libc6-dev-ppc64-cross/stable 2.36-8cross1 all
libc6-dev-ppc64-powerpc-cross/stable 2.36-8cross1 all
libc6-dev-ppc64el-cross/stable 2.36-8cross1 all
libc6-dev-riscv64-cross/stable 2.36-8cross1 all
libc6-dev-s390-s390x-cross/stable 2.36-8cross1 all
libc6-dev-s390x-cross/stable 2.36-8cross1 all
libc6-dev-sh4-cross/stable 2.36-8cross1 all
libc6-dev-sparc-sparc64-cross/stable 2.36-8cross1 all
libc6-dev-sparc64-cross/stable 2.36-8cross1 all
libc6-dev-x32-amd64-cross/stable 2.36-8cross1 all
libc6-dev-x32-cross/stable,now 2.36-8cross1 all [installed,automatic]
libc6-dev-x32-i386-cross/stable,now 2.36-8cross1 all [installed]
libc6-dev/stable,stable-security,now 2.36-9+deb12u4 arm64 [installed]
libc6-hppa-cross/stable 2.36-8cross1 all
libc6-i386-amd64-cross/stable,now 2.36-8cross1 all [installed,automatic]
libc6-i386-cross/stable,now 2.36-8cross1 all [installed,automatic]
libc6-i386-x32-cross/stable,now 2.36-8cross1 all [installed,automatic]
libc6-m68k-cross/stable 2.36-8cross1 all
libc6-mips-cross/stable 2.36-8cross2 all
libc6-mips32-mips64-cross/stable 2.36-8cross2 all
libc6-mips32-mips64el-cross/stable 2.36-8cross2 all
libc6-mips32-mips64r6-cross/stable 2.36-8cross2 all
libc6-mips32-mips64r6el-cross/stable 2.36-8cross2 all
libc6-mips32-mipsn32-cross/stable 2.36-8cross2 all
libc6-mips32-mipsn32el-cross/stable 2.36-8cross2 all
libc6-mips32-mipsn32r6-cross/stable 2.36-8cross2 all
libc6-mips32-mipsn32r6el-cross/stable 2.36-8cross2 all
libc6-mips64-cross/stable 2.36-8cross2 all
libc6-mips64-mips-cross/stable 2.36-8cross2 all
libc6-mips64-mipsel-cross/stable 2.36-8cross2 all
libc6-mips64-mipsn32-cross/stable 2.36-8cross2 all
libc6-mips64-mipsn32el-cross/stable 2.36-8cross2 all
libc6-mips64-mipsn32r6-cross/stable 2.36-8cross2 all
libc6-mips64-mipsn32r6el-cross/stable 2.36-8cross2 all
libc6-mips64-mipsr6-cross/stable 2.36-8cross2 all
libc6-mips64-mipsr6el-cross/stable 2.36-8cross2 all
libc6-mips64el-cross/stable 2.36-8cross2 all
libc6-mips64r6-cross/stable 2.36-8cross2 all
libc6-mips64r6el-cross/stable 2.36-8cross2 all
libc6-mipsel-cross/stable 2.36-8cross2 all
libc6-mipsn32-cross/stable 2.36-8cross2 all
libc6-mipsn32-mips-cross/stable 2.36-8cross2 all
libc6-mipsn32-mips64-cross/stable 2.36-8cross2 all
libc6-mipsn32-mips64el-cross/stable 2.36-8cross2 all
libc6-mipsn32-mips64r6-cross/stable 2.36-8cross2 all
libc6-mipsn32-mips64r6el-cross/stable 2.36-8cross2 all
libc6-mipsn32-mipsel-cross/stable 2.36-8cross2 all
libc6-mipsn32-mipsr6-cross/stable 2.36-8cross2 all
libc6-mipsn32-mipsr6el-cross/stable 2.36-8cross2 all
libc6-mipsn32el-cross/stable 2.36-8cross2 all
libc6-mipsn32r6-cross/stable 2.36-8cross2 all
libc6-mipsn32r6el-cross/stable 2.36-8cross2 all
libc6-mipsr6-cross/stable 2.36-8cross2 all
libc6-mipsr6el-cross/stable 2.36-8cross2 all
libc6-powerpc-cross/stable 2.36-8cross1 all
libc6-powerpc-ppc64-cross/stable 2.36-8cross1 all
libc6-ppc64-cross/stable 2.36-8cross1 all
libc6-ppc64-powerpc-cross/stable 2.36-8cross1 all
libc6-ppc64el-cross/stable 2.36-8cross1 all
libc6-riscv64-cross/stable 2.36-8cross1 all
libc6-s390-s390x-cross/stable 2.36-8cross1 all
libc6-s390x-cross/stable 2.36-8cross1 all
libc6-sh4-cross/stable 2.36-8cross1 all
libc6-sparc-sparc64-cross/stable 2.36-8cross1 all
libc6-sparc64-cross/stable 2.36-8cross1 all
libc6-x32-amd64-cross/stable 2.36-8cross1 all
libc6-x32-cross/stable,now 2.36-8cross1 all [installed,automatic]
libc6-x32-i386-cross/stable,now 2.36-8cross1 all [installed,automatic]
libc6.1-alpha-cross/stable 2.36-8cross1 all
libc6.1-dev-alpha-cross/stable 2.36-8cross1 all
libc6/stable,stable-security,now 2.36-9+deb12u4 arm64 [installed]

@CuriousTommy
Copy link
Contributor Author

CuriousTommy commented Feb 11, 2024

Also the package libc6-dev-i386 does not exist in Debian 12 ARM64

I'm planning to update the build instructions for Fedora to include the dependencies needed for ARM64 (for the other distros, I'll let other people create PRs for the needed ARM64 dependencies).

However, I'll only do that after ARM64 support is ready.

@Informeli
Copy link

Informeli commented Apr 16, 2024

possibly relevant currently LINUX_SYSCALL() needs an architecture dependent identifier to find it. Switching to syscall names could make the ARM64 port more future resistant.

I would do it myself, but at the current moment I can't even get any code functioning within darwin, so until I've tackled some interim projects I won't be doing this.

@CuriousTommy
Copy link
Contributor Author

possibly relevant currently LINUX_SYSCALL() needs an architecture dependent identifier to find it. Switching to syscall names could make the ARM64 port more future resistant.

To help me better understand, what do you mean by "architecture dependent identifier"? Are you referring to having macros/const values for the syscall numbers?

@Informeli
Copy link

possibly relevant currently LINUX_SYSCALL() needs an architecture dependent identifier to find it. Switching to syscall names could make the ARM64 port more future resistant.

To help me better understand, what do you mean by "architecture dependent identifier"? Are you referring to having macros/const values for the syscall numbers?

a. You would be totally right to not understand it. I worded it poorly.
b. Close. I wanted function prototypes/headers for the syscalls.
example, because I'm still not certain I got the terminology right(c code):
int sys_settimeofday(struct timeval *restrict tv, struct timezone *_Nullable restrict tz); or int sys_settimeofday(void *restrict tv, void *_Nullable restrict tz);

When I wrote this I was thinking: "If I can make linux syscalls by name in c code; the LINUX_SYSCALL() macro, can too.
That's way more hardware agnostic than typing cpu dependant syscall number values and could even pave the way for more intelligent defaults than returning 0 or NULL ."

A hidden assumption I was making was that since I could access any syscall from my c code by name I could access them all by name from my c code.
I made this assumption, because the way named syscalls are formatted(lower case) indicates that compiler treats them as a function and not a macro, since if they are functions they are clearly dynamically loaded you don't have to tell the compiler what they contain or even where to find them just what they're called.

Lately I've been getting doubt about this mostly in a "if I can think of it other smarter people could have thought of it too" way.
Possible reason why I could be wrong about this:
Maybe the compiler has to leave the identification of syscalls by name to the static header files who achieve this through macros and const values like you proposed.
This would imply the system doesn't have the information which syscall name is associated with which syscall number, because otherwise not implementing this while implementing the same mechanism for standard libraries seems like more work than implementing it.
I find this one dubious, because when I observed syscalls and library calls with xtrace they all had names and syscall numbers, which would impossibly tedious to implement if that information wasn't provided by the program or the system and the program only provides information the system can use in system calls(if the program works). This could be mac specific, but that requires two maybes.
Also at least the man project knows all syscall names and at least the kernel knows all syscall numbers, so the information is already on the system.

If I'm right nobody working on the darling project needs to be looking up Linux syscall numbers, because the writers of our compilers, libraries and kernels already have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests