support for arm64 #308
Replies: 13 comments 14 replies
-
Thanks for your interest! Unfortunately I don't have an aarch64 CPU to build on. Are GH200s not compatible with x86_64? I wonder if setuptools allows emulating different CPU arches. We currently use GH actions to build the wheels. |
Beta Was this translation helpful? Give feedback.
-
Yeah, that's what I was wondering, too. I'm not totally clear on it, but I think what I'm using (it's on vast.ai, and if I rent it as interruptible, it costs pennies, and I've gotten hours out of it before someone interrupts me lol). https://resources.nvidia.com/en-us-grace-cpu/grace-hopper-superchip The thing is some sort of GPU+CPU on one die (or something like that). I love it. It absolutely TROUNCES the A100, and you get 96G of vram (it may use some sort of unified memory, I'm not sure). The nvlink equivalent for it is coherent with the system RAM. Look at that absolute unit. It's glorious. |
Beta Was this translation helpful? Give feedback.
-
But yeah, it's aarch64, definitely, which makes it kinda annoying for finding packages. I've been using abacusai/gh200-llm/llm-train-serve as the docker image, or one of nvidia's own. I've mostly used koboldcpp with it, as I don't have to deal with finding any python packages. I haven't gotten this working with it, because, well, dephell, and I'm not a python guy, even though I should be. llama.cpp/koboldcpp are straightforward to get to work (I build it with cmake/ninja and it flies through it). |
Beta Was this translation helpful? Give feedback.
-
I didn't know they had GH200s on vast! I'll look into this and see if I can set up a build pipeline for arm CPUs. Will let you know. Thanks for reporting! |
Beta Was this translation helpful? Give feedback.
-
On that note, @BlairSadewitz , when you have the chance, can you confirm if all the packages in requirements.txt are available for that CPU arch? It would be pointless if aphrodite is built for it, but not its dependencies. |
Beta Was this translation helpful? Give feedback.
-
Yeah. I got it to work again, but easily. I started with this docker image: ghcr.io/abacusai/gh200-llm/llm-train-serve:latest Then, requirements.txt | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/requirements.txt b/requirements.txt
index 5cf906f..5847777 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -3,7 +3,7 @@ psutil
ray >= 2.9
sentencepiece
numpy
-torch == 2.2.0
+torch >= 2.2.0
transformers >= 4.36.0 # for mixtral
uvicorn
openai # for fastapi's openai proxy emulation
@@ -24,4 +24,4 @@ rich I then installed the requirements, and it recognized everything from the system. Then I did:
then: |
Beta Was this translation helpful? Give feedback.
-
Nice, thanks for confirming. On a related note, I'm not seeing any GH200's on vast.ai - are they so rare, the GPU type isn't listed yet? |
Beta Was this translation helpful? Give feedback.
-
At least half the time when I do that, no one else rents it for at least like an hour or so, often way more. |
Beta Was this translation helpful? Give feedback.
-
I think that wheel I built somehow picked up the wrong pytorch dependency, but I'm not on it now to check. Hmm. I think it's about time to like maybe eat or something. It sure would be handy if conda or pip or SOMEWHERE had this stuff prebuilt. They don't even have the wheels for CUDA on pytorch.org (that I've found, anyway). Ugh. |
Beta Was this translation helpful? Give feedback.
-
Hi, so I found some docker image with CUDA 12.1 and pytorch 2.2.0 built for aarch64. Granted, I don't know if it actually WORKS (well, I mean, the machine runs it, and llama.cpp works on it, but I haven't tested pytorch yet). Saves me a lot of effort, because really to get this to work, I was gonna otherwise have to build pytorch myself. As exciting a rite of passage as it seems, I'd rather not pay for GPU time to do it, haha. I've either tracked down or built the various other dependencies myself, and the build seems to be fine. However, I ABSOLUTELY CANNOT GET THE DAMN THING TO INSTALL hadamard.safetensors and the objects. I've tried ripping out the logic in setup.py and FORCING it to do it, and it just won't. I am absolutely at a loss as to why the thing will just not install it. Do you have any idea WTF is going on here? Also, I already have pytorch 2.2.0 in my conda environment, yet it always wants to reinstall it itself. I do not want it to do that, because that pytorch has no GPU support, lol. Do you have any idea why it insists on doing that? I mean, it's a dependency, yeah, OK, but it is clearly already present. The conda environment I'm using is a clone of the base environment, but it also happens if I don't use conda at all. Thanks. |
Beta Was this translation helpful? Give feedback.
-
Hey, there are pytorch 2.3.0 + CUDA (11.8/12.1) images available on Docker Hub as of today for aarch64, so I'm stoked lol. Wasn't expecting that to happen until the next (minor version?) release. Oddly enough, there aren't any for amd64, but I presume they're building them, as the aarch64 images are only like 14 hours old. Unfortunately, nothing new seems to be available in the anaconda repos. I really like using the GH200, because I can rent it pretty cheap as an interruptible instance--hard to go wrong with 96GB of VRAM on one device. I think there may be less demand for them because a lot of docker images simply don't support the architecture. I'm going to see how this works out with it. Also, notably 2.3.0 includes (not that I am capable of doing anything with it, but the first one, especially, seems like it might be useful): [Beta] Support for User-defined Triton kernels in torch.compile And: [PROTOTYPE] Weight-Only-Quantization introduced into Inductor CPU backend |
Beta Was this translation helpful? Give feedback.
-
I built pytorch 2.3.0 on the GH200 yesterday: -rw-r--r-- 1 blair staff 188M Apr 27 23:09 torch-2.3.0-cp310-cp310-cu121-linux_aarch64.whl There are plenty of operating systems that are way easier to build than that thing, lol. NetBSD takes one command. Happily, that machine is such a beastly unit that it absolutely blazed through it. I mean, to be fair, I wouldn't rate my competence as particularly high; on the other hand, I can't remember the last time I had any difficulty whatsoever following instructions to build something [that should build in the first place]. I ended up downloading the source for the conda package, ripping out the stuff that didn't need to run, setting the variables that it needed, etc. (took like 15 minutes, tops), then just turned it loose on the root of the filesystem. Worked right out the the gate, haha. I didn't do 2.2.x because all of the NVIDIA pytorch containers use snapshots, so I figured it wasn't quite ripe yet. I then built aphrodite-engine with it, and it seemed to work fine. That was the last dependency I needed to build a wheel for (the others are xformers, flash_attn, and triton). Having 96GB of VRAM on one device is pretty sweet. Next, I'm gonna try to set up proper builds for these packages, as they just don't seem to be forthcoming. I assume that's probably because most people that actually use a GH200 have their own development environments that they build this stuff in for their own purposes as a matter of course, or use the containers. Are you planning on switching to pytorch 2.3.0? |
Beta Was this translation helpful? Give feedback.
-
Have you ever taken a look at spack, the package manager? The thing will build pytorch to order with a simple command. It's been making my life a lot easier on aarch64. |
Beta Was this translation helpful? Give feedback.
-
Hi,
I was just wondering if you could build wheels for aarch64 (to support the GH200). BTW, I'm impressed with all the new features that you've recently added.
Thanks,
Blair
Beta Was this translation helpful? Give feedback.
All reactions