Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[RFC + PR] Use TCP for {LLVM / Torch-MLIR / StableHLO} Green Commit S…
…ync (#11) ## Why When bumping LLVM up, it is crucial to be able to test all downstream repos depending on it to ensure they work **in tandem** (and not just in isolation). In the past, LLVM upgrades were simpler because torch-mlir took a hard dependency on mhlo/stablehlo and, in doing so, ensured that the llvm "green commit" (sha1) that torch-mlir and stablehlo were built+tested against was pre-identified. During this time mlir-tcp was developed on a branch of torch-mlir. This meant when upgrades were needed downstream, we’d simply point to torch-mlir@HEAD (sha4) and pick the llvm-project (sha1) and mhlo/stablehlo (sha3) hashes it’d refer to, since these are already tested to work together. This became our set of green commits (llvm@sha1, stablehlo@sha3, torch-mlir@sha4) for downstream integrations (e.g cruise monorepo). <img width="500" alt="image" src="https://github.com/cruise-automation/mlir-tcp/assets/19234106/42078522-466c-449f-8d7e-496facc1447c"> At present the situation is complicated because torch-mlir no longer takes a hard dependency on stablehlo (stablehlo e2e tests [disabled](llvm/torch-mlir#2460)). Here's details from a recent upgrade scenario that motivated this RFC. We picked torch-mlir@HEAD which was right after the llvm bump in llvm/torch-mlir#2511 pointing to llvm/llvm-project@b44b349, but soon realized (when we started building torch-mlir) that the llvm bazel build upstream was broken: ``` ERROR: /root/.cache/bazel/_bazel_root/b89349c08f7224396763d14fe35cba11/external/llvm-project/mlir/BUILD.bazel:5837:18: TdGenerate external/llvm-project/mlir/include/mlir/Dialect/LLVMIR/NVVMOpsInterface.h.inc failed: (Exit 1): mlir-tblgen failed: error executing command ... external/llvm-project/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td:20:9: error: Could not find include file 'mlir/Dialect/LLVMIR/BasicPtxBuilderInterface.td' include "mlir/Dialect/LLVMIR/BasicPtxBuilderInterface.td" ^ ``` The bazel fixes followed in a subsequent commit at llvm/llvm-project@28b27c1. Hence llvm had to be re-bumped in torch-mlir (llvm/torch-mlir#2517). However, after a bit more work we hit these failing stablehlo tests, which surfaced the fact that stablehlo pointed to by torch-mlir could no longer be used, and we had to separately identify the sha3 of stablehlo that would build cleanly against sha1 of llvm. ``` @stablehlo//stablehlo/conversions/tosa/tests:binary.mlir.test FAILED in 0.7s @stablehlo//stablehlo/tests:print_stablehlo.mlir.test FAILED in 4.7s ``` This meant the burden of identifying the llvm green commit (that works across the board) is shifted further downstream from torch-mlir. Incidentally we are in a great position to leverage mlir-tcp to identify the set of green commits, given it already directly depends on each of these repos. <img width="500" alt="image" src="https://github.com/cruise-automation/mlir-tcp/assets/19234106/cadd38c4-71ec-45b0-8888-85ac0bfd4e99"> ## What This PR is an attempt to leverage the mlir-tcp repo as our "proxy" for such downstream integrations, and _I think_ contains everything needed to be able to do that. ## How Specifically, we should now be able to run these from the comfort of `mlir-tcp`: ```shell bazel test --config=clang_linux @llvm-project//mlir/... bazel test --config=clang_linux @stablehlo//... bazel test --config=clang_linux @torch-mlir//... ``` We provide `local_repos.bzl` that allows easier local testing of patches that later need to be upstreamed, and while they're being upstreamed we could land them as patches to our `http_archive` targets. Note: I include a `stablehlo.patch` that allows testing stablehlo from `mlir-tcp`. This is temporary and can be removed once openxla/stablehlo#1810 lands. This PR also enables each of the 3p test suites as GHA workflows (non-merge gating for now, we can change this). These workflows are automatically skipped unless a change is made to `deps.bzl` (which usually means bumping 3p deps), as it would be unnecessary to run them for every PR and `main` commit post-merge. Here's a snapshot from this PR's workflows, having bumped stablehlo commit. <img width="747" alt="image" src="https://github.com/cruise-automation/mlir-tcp/assets/19234106/e535ed39-33f7-4941-958c-3a5d0c0adef6">
- Loading branch information