Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rocmPackages: drop symlinks & split outputs #276846

Open
SomeoneSerge opened this issue Dec 26, 2023 · 9 comments
Open

rocmPackages: drop symlinks & split outputs #276846

SomeoneSerge opened this issue Dec 26, 2023 · 9 comments
Labels
6.topic: closure size The final size of a derivation, including its dependencies 6.topic: rocm

Comments

@SomeoneSerge
Copy link
Contributor

SomeoneSerge commented Dec 26, 2023

...if splittable. CC #197885

Issue description

❯ nix eval nixpkgs#rocmPackages.rocblas.outputs
[ "out" ]
❯ nix build --refresh github:philiptaron/llama.cpp/nix#rocm --print-out-paths | xargs nix path-info --human-readable --closure-size --recursive | grep G'$'
/nix/store/8fpf7jmnmyf96bjabx4r8rj3vki8z7pj-rocm-llvm-clang-unwrapped-5.7.1        1.7G
/nix/store/8xa2qq8xkd4qjmlary2j0wmblmkfb5q0-rocm-llvm-llvm-5.7.1                   1.5G
/nix/store/mslhw90n1yi8xwmvz60zh408g15hmica-rocm-llvm-clang-5.7.1                  3.8G
/nix/store/yc0v7l0bka0hcyy82vkqhaxmj5c91lz8-rocm-runtime-5.7.1                     3.8G
/nix/store/2bhwj31y2fa4kqc3sp4bs1b1nd4d8qic-rocminfo-5.7.1                         3.9G
/nix/store/wcmixj44ihf7mjpf5wn77y9gr1p9zzxv-rocm-llvm-binutils-5.7.1               2.0G
/nix/store/ivdnfrb7ymzbcv12a52jnz9sdf6fvpdp-rocm-llvm-binutils-wrapper-5.7.1       2.0G
/nix/store/6xj537dwafp618f084hkl4gvlf5sfqqa-rocm-llvm-clang-wrapper-5.7.1          4.0G
/nix/store/68xb3gyp792qyhpg6wwpamh9w08iwdkr-clr-5.7.1                              4.4G
/nix/store/pk2gwl2bqr3yqwb8kb524ixbdbkms7dp-rocblas-tensile-gfx90-5.7.1            1.8G
/nix/store/9fm8cfjwd9nhpc06bz4fdgmc4gwdi2dq-rocblas-5.7.1                          7.9G
/nix/store/gblj9yaj27f50h5fh2vyr774ygxvd3qf-rocsparse-5.7.1                        5.4G
/nix/store/8q2vbagp8j602490l2v3rwn435l71k03-rocsolver-5.7.1                       10.4G
/nix/store/fdbz6r7h9kxzypyfz9h1mq93qwsrq9jx-hipblas-5.7.1                         10.4G
/nix/store/1b6qcr7l9p0yx4mji8c6pbjcybz4fp88-llama-cpp-rocm-0.0.0                  10.6G
❯ # Baseline:
❯ nix build --refresh github:philiptaron/llama.cpp/nix#{cuda,default} --print-out-paths | xargs nix path-info --human-readable --closure-size 
/nix/store/61xjz6l35r1c6xp4v5py8iaia18bs5xy-llama-cpp-cuda-0.0.0         985.6M
/nix/store/ndx1xc79in3kmf65r4h1a28j3i72yv8q-llama-cpp-blas-0.0.0         340.1M

image

@NixOS/rocm-maintainers (and @NixOS/cuda-maintainers as potentially interested)

Technical details

❯ nix flake metadata github:philiptaron/llama.cpp/nix
Resolved URL:  github:philiptaron/llama.cpp/nix
Locked URL:    github:philiptaron/llama.cpp/4522c47a2282a595344ba8dcb85222910b2ffc4f
Description:   Port of Facebook's LLaMA model in C/C++
Path:          /nix/store/rylr1mlncn5jrziws9wm641fp9mc5dsf-source
Revision:      4522c47a2282a595344ba8dcb85222910b2ffc4f
Last modified: 2023-12-26 04:07:02
Inputs:
├───flake-parts: github:hercules-ci/flake-parts/34fed993f1674c8d06d58b37ce1e0fe5eebcb9f5 (2023-12-01 23:39:28)
│   └───nixpkgs-lib: github:NixOS/nixpkgs/e92039b55bcd58469325ded85d4f58dd5a4eaf58?dir=lib (2023-11-29 10:33:01)
└───nixpkgs: github:NixOS/nixpkgs/75dd68c36f458c6593c5bbb48abfd3e59bfed380 (2023-12-26 03:05:57)
@SomeoneSerge SomeoneSerge added 6.topic: rocm 6.topic: closure size The final size of a derivation, including its dependencies labels Dec 26, 2023
@SomeoneSerge SomeoneSerge changed the title rocmPackages: split outputs rocmPackages: split outputs? Dec 26, 2023
@SomeoneSerge
Copy link
Contributor Author

image

(I haven't looked up what those are yet)

@SomeoneSerge
Copy link
Contributor Author

SomeoneSerge commented Dec 26, 2023

Symlinks are the main reasons the runtime closure is so huge (CC #260299):

for path in ${gfx80} ${gfx90} ${gfx94} ${gfx10} ${gfx11} ${fallbacks}; do
ln -s $path/lib/rocblas/library/* build/Tensile/library
done

We should work out some propagatedBuildInputs-based solution and ensure that tools like CMake handle the splayed layouts well.

I wonder if we could start by just moving the symlinks into a separate, say, dev output and hoping that the dependency isn't retained at runtime

@Madouura
Copy link
Contributor

Pretty sure we need everything that's in the (split) rocblas closure.
I haven't really found a better solution than what we have now.

@Madouura
Copy link
Contributor

Symlinks are the main reasons the runtime closure is so huge (CC #260299):

for path in ${gfx80} ${gfx90} ${gfx94} ${gfx10} ${gfx11} ${fallbacks}; do
ln -s $path/lib/rocblas/library/* build/Tensile/library
done

We should work out some propagatedBuildInputs-based solution and ensure that tools like CMake handle the splayed layouts well.

I wonder if we could start by just moving the symlinks into a separate, say, dev output and hoping that the dependency isn't retained at runtime

Ah I see now.
Hmm, we could do each separately behind an optionalString guard.

@SomeoneSerge
Copy link
Contributor Author

Hmm, we could do each separately behind an optionalString guard.

That doesn't really sound like much of a solution... but are these actually ever accessed at runtime?

@Madouura
Copy link
Contributor

Hmm, we could do each separately behind an optionalString guard.

That doesn't really sound like much of a solution... but are these actually ever accessed at runtime?

After thinking about it in the shower, it would still be problematic, since even if we got it to only do say, "gfx90", the user would still have to recompile rocblas itself (which shouldn't be too bad, since it's only using one gpu target!), or use a non-nixos cache.

@SomeoneSerge
Copy link
Contributor Author

SomeoneSerge commented Dec 27, 2023

Add the Tensile library to your application's CMake target. The Tensile library will be written, compiled and linked to your application at application-compile-time.
https://github.com/ROCm/Tensile/wiki

This sounds more plausible (that we don't need these (e.g. in pytorch) after the build).

Notably, rocblas/cmake doesn't reference them neither by the extensions nor by the relative paths:

❯ ag hsaco result/lib/cmake/rocblas/ --follow
❯ ag library result/lib/cmake/rocblas/ --follow
<target names and variables but no paths>

I tried looking at rocm-merged from torchWithRocm in search of references for these files and didn't find (so far) any that'd look relevant either:

XX in 🌐 YYYYY in /nix/store/a7fwwnhppk6h97hb65dl3rwd4iqxs61p-rocm-merged
❯ ag --follow --search-binary --max-count=4 '\.hsaco'
Binary file rocblas/lib/librocblas.so matches.

Binary file lib/librocblas.so.3 matches.

Binary file lib/librocblas.so matches.
ERR: Too many matches in ./lib/rocblas/library/TensileManifest.txt. Skipping the rest of this file.

lib/rocblas/library/TensileManifest.txt
625:/build/source/build/Tensile/library/TensileLibrary_Type_DD_Contraction_l_Ailk_Bjlk_Cijk_Dijk_fallback_gfx803.hsaco
626:/build/source/build/Tensile/library/TensileLibrary_Type_DD_Contraction_l_Ailk_Bjlk_Cijk_Dijk_fallback_gfx900.hsaco
627:/build/source/build/Tensile/library/TensileLibrary_Type_DD_Contraction_l_Ailk_Bjlk_Cijk_Dijk_fallback_gfx906-xnack-.hsaco
628:/build/source/build/Tensile/library/TensileLibrary_Type_DD_Contraction_l_Ailk_Bjlk_Cijk_Dijk_fallback_gfx908-xnack-.hsaco

Binary file lib/librocblas.so.3.1.0 matches.
❯ strings rocblas/lib/librocblas.so | rg 'hsaco$'
.hsaco
❯ ag --max-count 4 --follow library/
include/CL/cl_platform.h
494:    /* http://msdn.microsoft.com/en-us/library/373ak2y1%28VS.71%29.aspx                                                 */
ERR: Too many matches in ./lib/rocblas/library/TensileManifest.txt. Skipping the rest of this file.

lib/rocblas/library/TensileManifest.txt
1:/build/source/build/Tensile/library/TensileLibrary_Type_HS_HPA_Contraction_l_Ailk_Bljk_Cijk_Dijk_gfx906.dat

The /build/source/build/ clearly is just cruft coming from the sandbox.

So the question stands: who's using them?

@SomeoneSerge SomeoneSerge changed the title rocmPackages: split outputs? rocmPackages: drop symlinks & split outputs Jan 13, 2024
@mschwaig
Copy link
Member

mschwaig commented Feb 15, 2024

Fedora has added tensile to their rocblas builds for the 6.0.2 release.

They write that MIOpen needs it.

https://src.fedoraproject.org/rpms/rocblas/c/918d514378861b900624c73b91fb75059c96dbf0?branch=rawhide

@mschwaig
Copy link
Member

TLDR: I would like to try to add a rocblasWithTensile package that is built on top of rocblas(without tensile), and consumers who don't need it should pick the one without.

I don't know which other consumers besides MIOpen need/benefit from tensile.

Looking at https://github.com/ROCm/rocBLAS/blob/adb8567f1bdad56b3b688a0b6dec1f79bf438ab4/CMakeLists.txt
i wonder if it is possible with a few changes to split the build into derivations like this

  1. rocblas: first build rocblas without buildTensile
  2. gfx[...device..]: put together a nice working directory to build ONLY the tensile parts per gpuTarget with only $path/lib/rocblas/library as the output
  3. rocblasWithTensile: create rocblas with Tensile by symlinking from 1 and 2 as required

Consumers who don´t need tensile could get that as rocblasWithTensile.override { buildTensile = false; }, even from the cache. That flag already exists. We could decide if we also want to expose it as a package and use that in most downstream consumers. For changes to gpuTargets, caching should also work with this setup. Provided the grouping by GPU arch is not too important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: closure size The final size of a derivation, including its dependencies 6.topic: rocm
Projects
None yet
Development

No branches or pull requests

3 participants