Develop upstream sync 250825 #3087

mmakevic-amd · 2025-08-25T14:01:34Z

Skipped Tests

TensorFlow

none

XLA

527f636
@local_xla//xla/xla/backends/gpu/runtime:command_buffer_conversion_pass_test

CommandBufferConversionPassTest.ConvertWhileThunk
CommandBufferConversionPassTest.ConvertWhileThunkWithAsyncPair

dc1376b
@local_xla//xla/xla/backends/gpu/codegen/triton:dot_algorithms_legacy_test

TritonAndBlasSupportForDifferentTensorSizes.IsDotAlgorithmSupportedByTriton

f527f4e
@local_xla//xla/xla/service/gpu:determinism_test

DeterminismTest.Conv

33fbd29
@local_xla//xla/xla/service/gpu:tests:hlo_lit_tests

i-chaochen

I notice @ScXfjiang approved it, so I request changes, we need to list all skipped UTs in this weekly-sync.

ScXfjiang · 2025-09-01T16:49:16Z

I notice @ScXfjiang approved it, so I request changes, we need to list all skipped UTs in this weekly-sync.

I have put the skipped tests in the Kanban ticket.

PiperOrigin-RevId: 795785519

PiperOrigin-RevId: 795787496

This pass rewrites `scaled-dot` HLO instructions into a sequence of `convert`, `broadcast`, `multiply`, and `dot` operations. The `scaled-dot` operands and scales are first converted to BF16. The scales are then broadcast and reshaped to match the operand shapes. Finally, element-wise multiplications are performed between operands and their respective scales, and the results are used as inputs to a standard `dot` instruction. We run it unconditionally because we don't have any support for the scaled-dot on the codegen side. PiperOrigin-RevId: 795800493

PiperOrigin-RevId: 795839055

PiperOrigin-RevId: 795856504

…flow::FromAbslStatus`, `tensorflow::ToAbslStatus` PiperOrigin-RevId: 795900634

PiperOrigin-RevId: 795958757

PiperOrigin-RevId: 796037587

PiperOrigin-RevId: 796037667

… source also feeds into the same instruction. PiperOrigin-RevId: 796119376

PiperOrigin-RevId: 796148874

PiperOrigin-RevId: 796246237

PiperOrigin-RevId: 796246377

PiperOrigin-RevId: 796247106

PiperOrigin-RevId: 796254391

PiperOrigin-RevId: 796289153

…oProto There is a constructor `DeviceDescription(const GpuDeviceInfoProto&)` but I would like to introduce some validation when constructing a `DeviceDescription` from a `GpuDeviceInfoProto`. Therefore I'm replacing the constructor by a factory function which can return a `absl::StatusOr<DeviceDescription>` in case of a validation error. This change does not yet introduce the validation; it only migrates all the users of the now removed constructor. The exact changes in this CL: 1. Remove `DeviceDescription::DeviceDescription(const GpuDeviceInfoProto&)` 2. Add `static absl::StatusOr<DeviceDescription> DeviceDescription::FromProto(const GpuDeviceInfoProto&)` 3. Remove `Compiler::TargetConfig::TargetConfig(const GpuDeviceInfoProto&)` 4. Add `static absl::StatusOr<Compiler::TargetConfig> Compiler::TargetConfig::FromProto(const GpuDeviceInfoProto&)` 5. Fix up all call sites PiperOrigin-RevId: 796298477

Fixed grammar in error message for invalid regularization penalty numbers. Changed "is not a property value" to "is not a valid property value" for better clarity and correctness.

PiperOrigin-RevId: 796310648

Those jobs have been using L4 GPUs instead of T4 GPUs for a while now, but the name was still the old one. PiperOrigin-RevId: 796316549

PiperOrigin-RevId: 796325270

PiperOrigin-RevId: 796325322

…ording Imported from GitHub PR openxla/xla#30239 Non-MNNVL clusters don't need some NVML libraries like `nvmlDeviceGetGpuFabricInfoV`, the current message wording includes `Error` thus causing some confusion for debugging. This PR removes the error keyword from the message and makes it a warning. Copybara import of the project: -- 5a950b82b3dc1d78e2ee75e87fae6740be802ea8 by Terry Sun <[email protected]>: tweak message wording -- 46982450277fae84da18c3880138cc009f1fc32e by Terry Sun <[email protected]>: drop test Merging this change closes tensorflow#30239 PiperOrigin-RevId: 796336762

… oneAPI Imported from GitHub PR openxla/xla#30072 This PR addresses a linking failure caused by an overflow of command-line flags, resulting in an exit code 127 error during the linking stage. To resolve this, we introduced the following changes: **Improved Handling of Whole-Archive Object Files** Object files with .o or .lo extensions are now linked using the --whole-archive and --no-whole-archive flags. This forces the linker to include all symbols from these files, ensuring none are removed during linking. This change helps reduce the total number of linker flags while preserving necessary symbols, which in turn prevents command-line overflow issues. For better debugging, we introduced support for the VERBOSE=1 environment variable. When set, it prints the full command line used to invoke the compiler, which helps with diagnosing cross-compilation issues and verifying correct toolchain usage. Copybara import of the project: -- eb2414f53d263f1aa802ca0e1bfb87c222d6a2fe by mraunak <[email protected]>: Fix the linking error -- 50809c553c37567782e09e5de98140d1fdc9a82b by mraunak <[email protected]>: remove duplicates Merging this change closes tensorflow#30072 PiperOrigin-RevId: 796336822

…erate on `!tt.ptr` types. This change modifies `triton_xla.extract` and `triton_xla.insert` to take `!tt.ptr` types for the base memory operand instead of `tensor` types. An explicit `shape` attribute is added to both ops to represent the shape of the original tensor in memory. `triton_xla.insert` no longer produces a result. The emitter and transformation passes are updated to reflect these changes, including handling scalar loads/stores separately. PiperOrigin-RevId: 796348485

PiperOrigin-RevId: 798338574

PiperOrigin-RevId: 798342110

PiperOrigin-RevId: 798344886

…s always blocking since the behavior is that even if the key does not exist, it will be initialized (consistent with other kv-store implementations). PiperOrigin-RevId: 798349776

…needed after compilation is done PiperOrigin-RevId: 798356871

PiperOrigin-RevId: 798357657

PiperOrigin-RevId: 798358577

PiperOrigin-RevId: 798370221

…ventPool::Handle> and treat the definition event specially (to avoid some locking). Crucially this allows in the future making event_ an indirect async value in order to implement the PjRtDeviceEventPromise API. Reverts fd32c1a PiperOrigin-RevId: 798372919

…oadcast_to. This enables the prepare-quantize pass to move Quantize/Dequantize operations (QDQs) through broadcast ops. This is valid for per-tensor quantization as broadcasting doesn't change the range of tensor values. A test case is added to verify the propagation. PiperOrigin-RevId: 798375402

PiperOrigin-RevId: 798375472

PiperOrigin-RevId: 798410650

…ime faster. Move CallContext into hlo_opcode since it's based on opcode and to avoid circular dependency. And skip_async_execution_thread_overwrite is no longer needed, removed. PiperOrigin-RevId: 798413636

PiperOrigin-RevId: 798507258

PiperOrigin-RevId: 798525391

…ory pressure PiperOrigin-RevId: 798631247

…nalValue`. This change decouples `OriginalValue` from a specific `Shape`, allowing it to represent original values without being constrained by a fixed shape structure. The `OriginalValueProto` is updated to remove the shape field, and related code is adjusted accordingly. Also, this cl removes OriginalValuePointer and adds some generic utils for using shared_ptr as map keys and use those instead. PiperOrigin-RevId: 798633988

PiperOrigin-RevId: 798756132

PiperOrigin-RevId: 798756149

…n CopyCleanUpPass. If memory effect is attached to an op rather than an OpOperand, the `effect.getValue()` can be nullptr. PiperOrigin-RevId: 798861213

.. as long as the order of contracting and free dimensions still match triton expectations and we don't need to add a transpose. PiperOrigin-RevId: 798984441

…est. This change updates the default compute capability used in `compilation_provider_test.cc` from sm_52 to sm_80. Additionally, the test for the spilling detection now uses a more elaborate kernel. Both changes were necessary to support CUDA 13 which requires at least compute capability 8.0 and only allows setting the maximum register count to 24 (before 16 was allowed). So this required a more elaborate data shuffling logic to push the usage of 24 registers and trigger spilling. PiperOrigin-RevId: 798991435

i-chaochen requested review from i-chaochen and ScXfjiang September 1, 2025 11:59

ScXfjiang approved these changes Sep 1, 2025

View reviewed changes

i-chaochen requested changes Sep 1, 2025

View reviewed changes

allanrenucci and others added 25 commits September 17, 2025 16:45

[NFC] Fix visibility issue.

3c55687

PiperOrigin-RevId: 795785519

compat: Update forward compatibility horizon to 2025-08-16

34b901f

PiperOrigin-RevId: 795787496

Remove redundant simplification of indexing_map.

67952b0

PiperOrigin-RevId: 795839055

Internal change only.

5f0bf0a

PiperOrigin-RevId: 795856504

Delete deprecated tsl::FromAbslStatus, tsl::ToAbslStatus, `tensor…

1d63fc8

…flow::FromAbslStatus`, `tensorflow::ToAbslStatus` PiperOrigin-RevId: 795900634

Automated Code Change

7426232

PiperOrigin-RevId: 795958757

Update GraphDef version to 2322.

821d924

PiperOrigin-RevId: 796037587

compat: Update forward compatibility horizon to 2025-08-17

b50653f

PiperOrigin-RevId: 796037667

For variadic scatter (fusion), a copy cannot be removed if the copy's…

0a1ebfa

… source also feeds into the same instruction. PiperOrigin-RevId: 796119376

Automated Code Change

38510b7

PiperOrigin-RevId: 796148874

Automated Code Change

0259cab

PiperOrigin-RevId: 796246237

Automated Code Change

f640ae1

PiperOrigin-RevId: 796246377

[TensorListGetItem] Allow int64 element shapes.

901681b

PiperOrigin-RevId: 796247106

Update ops-related pbtxt files.

12dba17

PiperOrigin-RevId: 796254391

Update workspace buffer while applying config in cudnn backend.

225cedf

PiperOrigin-RevId: 796289153

Improve grammar in regularizer error message

c72cdfc

Fixed grammar in error message for invalid regularization penalty numbers. Changed "is not a property value" to "is not a valid property value" for better clarity and correctness.

Automated Code Change

1b55968

PiperOrigin-RevId: 796310648

Rename XLA GPU GitHub Actions jobs

53f567d

Those jobs have been using L4 GPUs instead of T4 GPUs for a while now, but the name was still the old one. PiperOrigin-RevId: 796316549

Update GraphDef version to 2323.

f22281c

PiperOrigin-RevId: 796325270

compat: Update forward compatibility horizon to 2025-08-18

de79102

PiperOrigin-RevId: 796325322

tensorflower-gardener and others added 29 commits September 17, 2025 16:56

Reverts a91d779

76436ef

PiperOrigin-RevId: 798338574

Enable h100 gemm fusion test now that bug is fixed.

0dc4cfb

PiperOrigin-RevId: 798342110

[XLA:CPU] Clamp tanh at +/- 20 rather than infinity.

59ea097

PiperOrigin-RevId: 798344886

Expose IncrementKeyValue from the coordination service. This call i…

7ea7d74

…s always blocking since the behavior is that even if the key does not exist, it will be initialized (consistent with other kv-store implementations). PiperOrigin-RevId: 798349776

[xla] Add HloModule::Finalize to release internal data structure not …

8271f7a

…needed after compilation is done PiperOrigin-RevId: 798356871

Add a protobuf to capture benchmark results.

d836124

PiperOrigin-RevId: 798357657

Use a unique launch ID for every execution.

c813c08

PiperOrigin-RevId: 798358577

Refine divisibility checks in SPMD PatternMatchMergeOrSplitSharding.

2c206e2

PiperOrigin-RevId: 798370221

Reverts de7cbaf

ee2125d

PiperOrigin-RevId: 798375472

[Auto Sharding] Add Sharding_config dump for duebgging.

7d0befb

PiperOrigin-RevId: 798410650

Skip setting async thread for embedded computations to make compile t…

b0e4b1b

…ime faster. Move CallContext into hlo_opcode since it's based on opcode and to avoid circular dependency. And skip_async_execution_thread_overwrite is no longer needed, removed. PiperOrigin-RevId: 798413636

Update GraphDef version to 2328.

24ade6b

PiperOrigin-RevId: 798507258

compat: Update forward compatibility horizon to 2025-08-23

b82c433

PiperOrigin-RevId: 798525391

Add flag to prioritize aggressive flex annotation scheduling over mem…

6597bac

…ory pressure PiperOrigin-RevId: 798631247

Update GraphDef version to 2329.

dc7af6e

PiperOrigin-RevId: 798756132

compat: Update forward compatibility horizon to 2025-08-24

7769878

PiperOrigin-RevId: 798756149

Guard for effect.getValue() == nullptr before getDefiningOp on it i…

78d2b5d

…n CopyCleanUpPass. If memory effect is attached to an op rather than an OpOperand, the `effect.getValue()` can be nullptr. PiperOrigin-RevId: 798861213

[XLA:GPU] accept dots with batch dimension in generic triton emitter

a0e7744

.. as long as the order of contracting and free dimensions still match triton expectations and we don't need to add a transpose. PiperOrigin-RevId: 798984441

Fix merge conflicts

f3fe768

Change ML_WHEEL_TYPE default back to snapshot

41516a7

Skip unsupported subtests

b89153b

Skip failing tests

f9011b2

Disable DeterminismTest.Conv

88c3359

Disable sorting.hlo.test

809c426

Move cupti_tracer to cuda specific deps

00c3dd0

mmakevic-amd force-pushed the develop-upstream-sync-250825 branch from 33fbd29 to 00c3dd0 Compare September 17, 2025 17:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Develop upstream sync 250825 #3087

Develop upstream sync 250825 #3087

mmakevic-amd commented Aug 25, 2025 •

edited by i-chaochen

Loading

Uh oh!

i-chaochen left a comment •

edited

Loading

Uh oh!

ScXfjiang commented Sep 1, 2025

Uh oh!

Uh oh!

Develop upstream sync 250825 #3087

Are you sure you want to change the base?

Develop upstream sync 250825 #3087

Conversation

mmakevic-amd commented Aug 25, 2025 • edited by i-chaochen Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Skipped Tests

TensorFlow

XLA

Uh oh!

i-chaochen left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ScXfjiang commented Sep 1, 2025

Uh oh!

Uh oh!

mmakevic-amd commented Aug 25, 2025 •

edited by i-chaochen

Loading

i-chaochen left a comment •

edited

Loading