forked from tensorflow/tensorflow
-
Notifications
You must be signed in to change notification settings - Fork 100
Develop upstream sync 250825 #3087
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mmakevic-amd
wants to merge
1,631
commits into
develop-upstream
Choose a base branch
from
develop-upstream-sync-250825
base: develop-upstream
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
+543,634
−61,921
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ScXfjiang
approved these changes
Sep 1, 2025
i-chaochen
requested changes
Sep 1, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I notice @ScXfjiang approved it, so I request changes, we need to list all skipped UTs in this weekly-sync.
I have put the skipped tests in the Kanban ticket. |
PiperOrigin-RevId: 795785519
PiperOrigin-RevId: 795787496
This pass rewrites `scaled-dot` HLO instructions into a sequence of `convert`, `broadcast`, `multiply`, and `dot` operations. The `scaled-dot` operands and scales are first converted to BF16. The scales are then broadcast and reshaped to match the operand shapes. Finally, element-wise multiplications are performed between operands and their respective scales, and the results are used as inputs to a standard `dot` instruction. We run it unconditionally because we don't have any support for the scaled-dot on the codegen side. PiperOrigin-RevId: 795800493
PiperOrigin-RevId: 795839055
PiperOrigin-RevId: 795856504
…flow::FromAbslStatus`, `tensorflow::ToAbslStatus` PiperOrigin-RevId: 795900634
PiperOrigin-RevId: 795958757
PiperOrigin-RevId: 796037587
PiperOrigin-RevId: 796037667
… source also feeds into the same instruction. PiperOrigin-RevId: 796119376
PiperOrigin-RevId: 796148874
PiperOrigin-RevId: 796246237
PiperOrigin-RevId: 796246377
PiperOrigin-RevId: 796247106
PiperOrigin-RevId: 796254391
PiperOrigin-RevId: 796289153
…oProto There is a constructor `DeviceDescription(const GpuDeviceInfoProto&)` but I would like to introduce some validation when constructing a `DeviceDescription` from a `GpuDeviceInfoProto`. Therefore I'm replacing the constructor by a factory function which can return a `absl::StatusOr<DeviceDescription>` in case of a validation error. This change does not yet introduce the validation; it only migrates all the users of the now removed constructor. The exact changes in this CL: 1. Remove `DeviceDescription::DeviceDescription(const GpuDeviceInfoProto&)` 2. Add `static absl::StatusOr<DeviceDescription> DeviceDescription::FromProto(const GpuDeviceInfoProto&)` 3. Remove `Compiler::TargetConfig::TargetConfig(const GpuDeviceInfoProto&)` 4. Add `static absl::StatusOr<Compiler::TargetConfig> Compiler::TargetConfig::FromProto(const GpuDeviceInfoProto&)` 5. Fix up all call sites PiperOrigin-RevId: 796298477
Fixed grammar in error message for invalid regularization penalty numbers. Changed "is not a property value" to "is not a valid property value" for better clarity and correctness.
PiperOrigin-RevId: 796310648
Those jobs have been using L4 GPUs instead of T4 GPUs for a while now, but the name was still the old one. PiperOrigin-RevId: 796316549
PiperOrigin-RevId: 796325270
PiperOrigin-RevId: 796325322
…ording Imported from GitHub PR openxla/xla#30239 Non-MNNVL clusters don't need some NVML libraries like `nvmlDeviceGetGpuFabricInfoV`, the current message wording includes `Error` thus causing some confusion for debugging. This PR removes the error keyword from the message and makes it a warning. Copybara import of the project: -- 5a950b82b3dc1d78e2ee75e87fae6740be802ea8 by Terry Sun <[email protected]>: tweak message wording -- 46982450277fae84da18c3880138cc009f1fc32e by Terry Sun <[email protected]>: drop test Merging this change closes tensorflow#30239 PiperOrigin-RevId: 796336762
… oneAPI Imported from GitHub PR openxla/xla#30072 This PR addresses a linking failure caused by an overflow of command-line flags, resulting in an exit code 127 error during the linking stage. To resolve this, we introduced the following changes: **Improved Handling of Whole-Archive Object Files** Object files with .o or .lo extensions are now linked using the --whole-archive and --no-whole-archive flags. This forces the linker to include all symbols from these files, ensuring none are removed during linking. This change helps reduce the total number of linker flags while preserving necessary symbols, which in turn prevents command-line overflow issues. For better debugging, we introduced support for the VERBOSE=1 environment variable. When set, it prints the full command line used to invoke the compiler, which helps with diagnosing cross-compilation issues and verifying correct toolchain usage. Copybara import of the project: -- eb2414f53d263f1aa802ca0e1bfb87c222d6a2fe by mraunak <[email protected]>: Fix the linking error -- 50809c553c37567782e09e5de98140d1fdc9a82b by mraunak <[email protected]>: remove duplicates Merging this change closes tensorflow#30072 PiperOrigin-RevId: 796336822
…erate on `!tt.ptr` types. This change modifies `triton_xla.extract` and `triton_xla.insert` to take `!tt.ptr` types for the base memory operand instead of `tensor` types. An explicit `shape` attribute is added to both ops to represent the shape of the original tensor in memory. `triton_xla.insert` no longer produces a result. The emitter and transformation passes are updated to reflect these changes, including handling scalar loads/stores separately. PiperOrigin-RevId: 796348485
PiperOrigin-RevId: 798342110
PiperOrigin-RevId: 798344886
…s always blocking since the behavior is that even if the key does not exist, it will be initialized (consistent with other kv-store implementations). PiperOrigin-RevId: 798349776
…needed after compilation is done PiperOrigin-RevId: 798356871
PiperOrigin-RevId: 798357657
PiperOrigin-RevId: 798358577
PiperOrigin-RevId: 798370221
…ventPool::Handle> and treat the definition event specially (to avoid some locking). Crucially this allows in the future making event_ an indirect async value in order to implement the PjRtDeviceEventPromise API. Reverts fd32c1a PiperOrigin-RevId: 798372919
…oadcast_to. This enables the prepare-quantize pass to move Quantize/Dequantize operations (QDQs) through broadcast ops. This is valid for per-tensor quantization as broadcasting doesn't change the range of tensor values. A test case is added to verify the propagation. PiperOrigin-RevId: 798375402
PiperOrigin-RevId: 798410650
…ime faster. Move CallContext into hlo_opcode since it's based on opcode and to avoid circular dependency. And skip_async_execution_thread_overwrite is no longer needed, removed. PiperOrigin-RevId: 798413636
PiperOrigin-RevId: 798507258
PiperOrigin-RevId: 798525391
…ory pressure PiperOrigin-RevId: 798631247
…nalValue`. This change decouples `OriginalValue` from a specific `Shape`, allowing it to represent original values without being constrained by a fixed shape structure. The `OriginalValueProto` is updated to remove the shape field, and related code is adjusted accordingly. Also, this cl removes OriginalValuePointer and adds some generic utils for using shared_ptr as map keys and use those instead. PiperOrigin-RevId: 798633988
PiperOrigin-RevId: 798756132
PiperOrigin-RevId: 798756149
…n CopyCleanUpPass. If memory effect is attached to an op rather than an OpOperand, the `effect.getValue()` can be nullptr. PiperOrigin-RevId: 798861213
.. as long as the order of contracting and free dimensions still match triton expectations and we don't need to add a transpose. PiperOrigin-RevId: 798984441
…est. This change updates the default compute capability used in `compilation_provider_test.cc` from sm_52 to sm_80. Additionally, the test for the spilling detection now uses a more elaborate kernel. Both changes were necessary to support CUDA 13 which requires at least compute capability 8.0 and only allows setting the maximum register count to 24 (before 16 was allowed). So this required a more elaborate data shuffling logic to push the usage of 24 registers and trigger spilling. PiperOrigin-RevId: 798991435
33fbd29
to
00c3dd0
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Skipped Tests
TensorFlow
none
XLA
527f636
@local_xla//xla/xla/backends/gpu/runtime:command_buffer_conversion_pass_test
dc1376b
@local_xla//xla/xla/backends/gpu/codegen/triton:dot_algorithms_legacy_test
f527f4e
@local_xla//xla/xla/service/gpu:determinism_test
33fbd29
@local_xla//xla/xla/service/gpu:tests:hlo_lit_tests