Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rocmPackages_6: init at 6.0.2(?) #289187

Closed
wants to merge 20 commits into from
Closed

Conversation

ScatteredRay
Copy link
Contributor

@ScatteredRay ScatteredRay commented Feb 16, 2024

Description of changes

#197885

Upgrading to ROCm 6.0.2

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.05 Release Notes (or backporting 23.05 and 23.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@ScatteredRay
Copy link
Contributor Author

Hey, @mschwaig what are you testing out? My rocm6 branch seems to be building rocmModules_6.rocm-all and the project I'm working on seems to work find against it. Maybe you have something you'd like to test against it?

@mschwaig
Copy link
Member

I can test llama-cpp and ollamain the following days. There are also a few small things I noticed, that I could put in a review, if you're ready for that.

@ScatteredRay
Copy link
Contributor Author

Yeah, why not. It'll be one of my first PRs so will need some love I'm sure.

@mschwaig
Copy link
Member

I did some testing now, but I did not get to what I actually wanted to do yet.

By running nixpkgs-review I found that the build for libjpeg-turbo, which is upstream from mivisionx-hip is still failing on your branch. I think it was failing on master a few days ago as well.

I have not gotten to testing llama-cpp or ollama yet.

The command I ran is

nix run github:mic92/nixpks-review -- pr --checkout commit --build-args="--max-jobs 4"  289187

The --checkout commit flag is to make sure that the broken libjpeg-turbo is not just on the merge target and --build-args="--max-jobs 4" is because I think I run out of RAM without that.

I would like to have one of our branches in a state where everything builds, but it's a pain to test this since I only have 64 GB of RAM on my primary machine right now.
I actually started a run ofnixpkgs-review in the cloud with 256 GB, but I will abort this now due to the issue with libjpeg-turbo.

I think to address this on your branch it would make sense to rebase your work onto a some state of master where all the dependencies of ROCm properly build. Alternatively waiting for a version of master where that build failure is resolved and removing the -checkout commit flag would work as well, but that's bad to collaborate on because it's a moving target.

Maybe it makes sense to move this out of the draft stage or do whatever else is required so that hydra can try building it.

@mschwaig
Copy link
Member

mschwaig commented Feb 20, 2024

I managed to get a complete run of

nix run github:mic92/nixpkgs-review -- rev --build-args="--max-jobs 1" HEAD

for this branch now.

Result of nixpkgs-review run on x86_64-linux 1

7 packages marked as broken and skipped:
  • rocmPackages_6.llvm.flang
  • rocmPackages_6.llvm.flang.doc
  • rocmPackages_6.llvm.flang.info
  • rocmPackages_6.llvm.flang.man
  • rocmPackages_6.llvm.libclc
  • rocmPackages_6.rdc
  • rocmPackages_6.rdc.doc
7 packages failed to build:
  • rocmPackages_6.migraphx
  • rocmPackages_6.miopen-opencl
  • rocmPackages_6.mivisionx (rocmPackages_6.mivisionx-hip)
  • rocmPackages_6.mivisionx-cpu
  • rocmPackages_6.mivisionx-opencl
  • rocmPackages_6.rocmlir
  • rocmPackages_6.rocmlir.external
87 packages built:
  • rocmPackages_6.clang-ocl
  • rocmPackages_6.clr
  • rocmPackages_6.clr.icd
  • rocmPackages_6.composable_kernel
  • rocmPackages_6.half
  • rocmPackages_6.hip-common
  • rocmPackages_6.hipblas
  • rocmPackages_6.hipcc
  • rocmPackages_6.hipcub
  • rocmPackages_6.hipfft
  • rocmPackages_6.hipfort
  • rocmPackages_6.hipify
  • rocmPackages_6.hiprand
  • rocmPackages_6.hipsolver
  • rocmPackages_6.hipsparse
  • rocmPackages_6.hsa-amd-aqlprofile-bin
  • rocmPackages_6.llvm.bintools
  • rocmPackages_6.llvm.clang
  • rocmPackages_6.llvm.clang-tools-extra
  • rocmPackages_6.llvm.clang-tools-extra.doc
  • rocmPackages_6.llvm.clang-tools-extra.info
  • rocmPackages_6.llvm.clang-tools-extra.man
  • rocmPackages_6.llvm.clang-unwrapped
  • rocmPackages_6.llvm.clang-unwrapped.doc
  • rocmPackages_6.llvm.clang-unwrapped.info
  • rocmPackages_6.llvm.clang-unwrapped.man
  • rocmPackages_6.llvm.compiler-rt
  • rocmPackages_6.llvm.libc
  • rocmPackages_6.llvm.libc.doc
  • rocmPackages_6.llvm.libcxx
  • rocmPackages_6.llvm.libcxx.doc
  • rocmPackages_6.llvm.libcxxabi
  • rocmPackages_6.llvm.libunwind
  • rocmPackages_6.llvm.libunwind.doc
  • rocmPackages_6.llvm.lld
  • rocmPackages_6.llvm.lld.doc
  • rocmPackages_6.llvm.lldb
  • rocmPackages_6.llvm.lldb.doc
  • rocmPackages_6.llvm.lldb.info
  • rocmPackages_6.llvm.lldb.man
  • rocmPackages_6.llvm.llvm
  • rocmPackages_6.llvm.llvm.doc
  • rocmPackages_6.llvm.llvm.info
  • rocmPackages_6.llvm.llvm.man
  • rocmPackages_6.llvm.mlir
  • rocmPackages_6.llvm.openmp
  • rocmPackages_6.llvm.openmp.doc
  • rocmPackages_6.llvm.openmp.info
  • rocmPackages_6.llvm.openmp.man
  • rocmPackages_6.llvm.polly
  • rocmPackages_6.llvm.polly.doc
  • rocmPackages_6.llvm.polly.info
  • rocmPackages_6.llvm.polly.man
  • rocmPackages_6.llvm.pstl
  • rocmPackages_6.llvm.rocmClangStdenv
  • rocmPackages_6.miopen (rocmPackages_6.miopen-hip)
  • rocmPackages_6.rccl
  • rocmPackages_6.rocalution
  • rocmPackages_6.rocblas
  • rocmPackages_6.rocdbgapi
  • rocmPackages_6.rocdbgapi.doc
  • rocmPackages_6.rocfft
  • rocmPackages_6.rocgdb
  • rocmPackages_6.rocm-cmake
  • rocmPackages_6.rocm-comgr
  • rocmPackages_6.rocm-core
  • rocmPackages_6.rocm-device-libs
  • rocmPackages_6.rocm-docs-core
  • rocmPackages_6.rocm-docs-core.dist
  • rocmPackages_6.rocm-runtime
  • rocmPackages_6.rocm-smi
  • rocmPackages_6.rocm-thunk
  • rocmPackages_6.rocminfo
  • rocmPackages_6.rocmlir-rock
  • rocmPackages_6.rocprim
  • rocmPackages_6.rocprofiler
  • rocmPackages_6.rocr-debug-agent
  • rocmPackages_6.rocsolver
  • rocmPackages_6.rocsparse
  • rocmPackages_6.rocthrust
  • rocmPackages_6.roctracer
  • rocmPackages_6.rocwmma
  • rocmPackages_6.rpp (rocmPackages_6.rpp-hip)
  • rocmPackages_6.rpp-cpu
  • rocmPackages_6.rpp-opencl
  • rocmPackages_6.tensile
  • rocmPackages_6.tensile.dist

EDIT: as far as I can tell the build for those failing packages did not even run because of libjpeg-turbo being broken.

@ScatteredRay
Copy link
Contributor Author

ScatteredRay commented Feb 20, 2024

I think to address this on your branch it would make sense to rebase your work onto a some state of master where all the dependencies of ROCm properly build.

Happy, to do a rebase. I can possibly find the last version that libjpeg-turbo built on, but not sure if there is any reasonable way to find a version that builds all dependents?

I would like to have one of our branches in a state where everything builds, but it's a pain to test this since I only have 64 GB of RAM on my primary machine right now.

By everything, do you mean everything everything? Or all dependents? I have ~190GBs locally, so could possibly leave something running overnight.

Maybe it makes sense to move this out of the draft stage or do whatever else is required so that hydra can try building it

Sounds like I should move this out of draft after I find a rebase target? Sounds good to me.

@mschwaig
Copy link
Member

I think to address this on your branch it would make sense to rebase your work onto a some state of master where all the dependencies of ROCm properly build.

Happy, to do a rebase. I can possibly find the last version that libjpeg-turbo built on, but not sure if there is any reasonable way to find a version that builds all dependents?

You can just use the commit that I started my branch on for the rebase. That's f459aeeb8d1fa5930e183d0ff40ebfdaa5ffacc6.
I don't know a good way to 'compute' that, but usually a lot of things break with big merges and then recover over time.
So you can look at how much of nixpkgs actually builds on hydra over time and pick one of the peaks, and there will be a good chance that all of the upstream dependencies are going to build. In this case I was just luckier than you.

I would like to have one of our branches in a state where everything builds, but it's a pain to test this since I only have 64 GB of RAM on my primary machine right now.

By everything, do you mean everything everything? Or all dependents? I have ~190GBs locally, so could possibly leave something running overnight.

nixpkgs-review builds packages you touched and every other package that they have a downstream effect on.
The downstream stuff for ROCm is basically nothing, unless you look into setting config.rocmSupport = true; for nixpkgs-review. I have not looked into that.

Maybe it makes sense to move this out of the draft stage or do whatever else is required so that hydra can try building it

Sounds like I should move this out of draft after I find a rebase target? Sounds good to me.

Yes it you think that everything that's supposed to work is working you can move it out of draft, which signals to people that it's ready to take a look. I don't actually know if this will trigger some infra to actually build it, or if we have to @ a bot for that. I have never worked on a PR that requires so much compute to build before, so I never had to rely on having build results visible online.

@mschwaig
Copy link
Member

mschwaig commented Feb 29, 2024

Since so far you did not rebase this branch to get rid of that issue with libjpeg-turbo I had a look at the diff between our branches again, and added a few commits with your changes to mine. I hope that's OK for you.

The only changes that I did not take so far where these (diff from my perspective).

diff --git a/pkgs/development/rocm-modules/6/rocm-runtime/default.nix b/pkgs/development/rocm-modules/6/rocm-runtime/default.nix
index 8c3d0cdc976d..149c9b23c933 100644
--- a/pkgs/development/rocm-modules/6/rocm-runtime/default.nix
+++ b/pkgs/development/rocm-modules/6/rocm-runtime/default.nix
@@ -16,7 +16,7 @@
 
 stdenv.mkDerivation (finalAttrs: {
   pname = "rocm-runtime";
-  version = "6.0.2";
+  version = "6.0.0";
 
   src = fetchFromGitHub {
     owner = "ROCm";
@@ -45,6 +45,7 @@ stdenv.mkDerivation (finalAttrs: {
   postPatch = ''
     patchShebangs image/blit_src/create_hsaco_ascii_file.sh
     patchShebangs core/runtime/trap_handler/create_trap_handler_header.sh
+    patchShebangs image/blit_src/create_hsaco_ascii_file.sh
     patchShebangs core/runtime/blit_shaders/create_blit_shader_header.sh
 
     substituteInPlace CMakeLists.txt \
diff --git a/pkgs/development/rocm-modules/6/rocm-thunk/default.nix b/pkgs/development/rocm-modules/6/rocm-thunk/default.nix
index 99a1d3c542d1..9e88f1c7d9f1 100644
--- a/pkgs/development/rocm-modules/6/rocm-thunk/default.nix
+++ b/pkgs/development/rocm-modules/6/rocm-thunk/default.nix
@@ -10,7 +10,7 @@
 
 stdenv.mkDerivation (finalAttrs: {
   pname = "rocm-thunk";
-  version = "6.0.2";
+  version = "6.0.0";
 
   src = fetchFromGitHub {
     owner = "ROCm";

I was under the impression that there would be more differences between our branches. Last time I looked I thought there were a bunch of different hashes. I did not see a lot of that now, so there's not really a lot that I would ask in a review in terms of code, besides asking who's version numbers in the above diff are right and why you did 23a4e3b and de5d15d, since those packages built for me without the changes.

@mschwaig
Copy link
Member

I rented a VM again to build my PR overnight with nixpkgs-review (using --checkout commit so changes in master are not taken into account yet).

If everything builds like that I will look at building against master, or move my PR out of draft (if that's OK with you).
If there are still build failures I will ping this thread as well, to see how we can take it over the finish line.

@mschwaig
Copy link
Member

mschwaig commented Mar 4, 2024

After the nightly build for #287846 there were still a few broken packages.

I fixed some of those build failures, but migraphxis still broken and there are still a few other things that I still need to do listed in the top comment.

@ScatteredRay
Copy link
Contributor Author

Thanks for working on that. I've been away for a bit. Going to check it out a bit this weekend.

@ScatteredRay
Copy link
Contributor Author

Took a look through hydra, it looks like 12f69cd might have been the last successful build of rocmPackages_5 ?

ScatteredRay and others added 10 commits March 8, 2024 12:09
Ran updateScript and a few minor hash fixes, and version renames.
New files need to have patched shebangs.

Output structure of rocm-runtime changed, removing fixup script.
…get upgraded to a 7+ since that should go in rocmModules_7
Upgraded patches, and added missing packages.
Removed gcc12Stdenv usage for some of the packages, moved what I could to rocmClangStdenv and the rest to default stdenv. This resolved a few errors with linking the test cases.
There have not been updates avaialble for this package for some time.
I'm assuming it makes sense to drop it.
The build still fails, but it fails later.
mschwaig and others added 7 commits March 8, 2024 12:09
@mschwaig
Copy link
Member

mschwaig commented Mar 17, 2024

I have not looked into the specific commit that you mentioned, but rocmPackages_5.rocprofiler has been failing on up to date master for me.

I have found the files that were missing from the migraphx build. I hope that will run through now.
I have opened my PR for reviews now.

Since my PR has some of your changes I wanted to ask if it would be OK for you if I merge some of your changes that way?

@ScatteredRay
Copy link
Contributor Author

Since my PR has some of your changes I wanted to ask if it would be OK for you if I merge some of your changes that way?

Of course merge away

@ulrikstrid ulrikstrid mentioned this pull request Mar 19, 2024
13 tasks
@ulrikstrid
Copy link
Member

Hey, I'm currently trying to use this with Ollama, rocblas seems to just get stuck forever on 4 warnings generated when compiling for gfx942

I'm also running a nixpkgs-review on this PR and at least rocmlir-rockand rocmlir are broken

@mschwaig mschwaig closed this Mar 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants