Skip to content

Conversation

mmakevic-amd
Copy link

@mmakevic-amd mmakevic-amd commented Aug 25, 2025

Skipped Tests

TensorFlow

none

XLA

527f636
@local_xla//xla/xla/backends/gpu/runtime:command_buffer_conversion_pass_test

CommandBufferConversionPassTest.ConvertWhileThunk
CommandBufferConversionPassTest.ConvertWhileThunkWithAsyncPair

dc1376b
@local_xla//xla/xla/backends/gpu/codegen/triton:dot_algorithms_legacy_test

TritonAndBlasSupportForDifferentTensorSizes.IsDotAlgorithmSupportedByTriton

f527f4e
@local_xla//xla/xla/service/gpu:determinism_test

DeterminismTest.Conv

33fbd29
@local_xla//xla/xla/service/gpu:tests:hlo_lit_tests

Copy link
Collaborator

@i-chaochen i-chaochen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice @ScXfjiang approved it, so I request changes, we need to list all skipped UTs in this weekly-sync.

@ScXfjiang
Copy link

I notice @ScXfjiang approved it, so I request changes, we need to list all skipped UTs in this weekly-sync.

I have put the skipped tests in the Kanban ticket.

allanrenucci and others added 25 commits September 17, 2025 16:45
PiperOrigin-RevId: 795785519
This pass rewrites `scaled-dot` HLO instructions into a sequence of `convert`, `broadcast`, `multiply`, and `dot` operations. The `scaled-dot` operands and scales are first converted to BF16. The scales are then broadcast and reshaped to match the operand shapes. Finally, element-wise multiplications are performed between operands and their respective scales, and the results are used as inputs to a standard `dot` instruction.

We run it unconditionally because we don't have any support for the scaled-dot on the codegen side.

PiperOrigin-RevId: 795800493
PiperOrigin-RevId: 795856504
…flow::FromAbslStatus`, `tensorflow::ToAbslStatus`

PiperOrigin-RevId: 795900634
PiperOrigin-RevId: 795958757
PiperOrigin-RevId: 796037587
… source

also feeds into the same instruction.

PiperOrigin-RevId: 796119376
PiperOrigin-RevId: 796148874
PiperOrigin-RevId: 796246237
PiperOrigin-RevId: 796246377
PiperOrigin-RevId: 796254391
…oProto

There is a constructor `DeviceDescription(const GpuDeviceInfoProto&)` but I would
like to introduce some validation when constructing a `DeviceDescription` from a `GpuDeviceInfoProto`.

Therefore I'm replacing the constructor by a factory function which can return a `absl::StatusOr<DeviceDescription>` in case of a validation error.

This change does not yet introduce the validation; it only migrates all the users of the now removed constructor.

The exact changes in this CL:

1. Remove `DeviceDescription::DeviceDescription(const GpuDeviceInfoProto&)`
2. Add `static absl::StatusOr<DeviceDescription> DeviceDescription::FromProto(const GpuDeviceInfoProto&)`
3. Remove `Compiler::TargetConfig::TargetConfig(const GpuDeviceInfoProto&)`
4. Add `static absl::StatusOr<Compiler::TargetConfig> Compiler::TargetConfig::FromProto(const GpuDeviceInfoProto&)`
5. Fix up all call sites

PiperOrigin-RevId: 796298477
Fixed grammar in error message for invalid regularization penalty numbers. Changed "is not a property value" to "is not a valid property value" for better clarity and correctness.
PiperOrigin-RevId: 796310648
Those jobs have been using L4 GPUs instead of T4 GPUs for a while now, but the name was still the old one.

PiperOrigin-RevId: 796316549
PiperOrigin-RevId: 796325270
…ording

Imported from GitHub PR openxla/xla#30239

Non-MNNVL clusters don't need some NVML libraries like `nvmlDeviceGetGpuFabricInfoV`, the current message wording includes `Error` thus causing some confusion for debugging. This PR removes the error keyword from the message and makes it a warning.
Copybara import of the project:

--
5a950b82b3dc1d78e2ee75e87fae6740be802ea8 by Terry Sun <[email protected]>:

tweak message wording

--
46982450277fae84da18c3880138cc009f1fc32e by Terry Sun <[email protected]>:

drop test

Merging this change closes tensorflow#30239

PiperOrigin-RevId: 796336762
… oneAPI

Imported from GitHub PR openxla/xla#30072

This PR addresses a linking failure caused by an overflow of command-line flags, resulting in an exit code 127 error during the linking stage. To resolve this, we introduced the following changes:

**Improved Handling of Whole-Archive Object Files**
Object files with .o or .lo extensions are now linked using the --whole-archive and --no-whole-archive flags. This forces the linker to include all symbols from these files, ensuring none are removed during linking. This change helps reduce the total number of linker flags while preserving necessary symbols, which in turn prevents command-line overflow issues.

For better debugging, we introduced support for the VERBOSE=1 environment variable. When set, it prints the full command line used to invoke the compiler, which helps with diagnosing cross-compilation issues and verifying correct toolchain usage.
Copybara import of the project:

--
eb2414f53d263f1aa802ca0e1bfb87c222d6a2fe by mraunak <[email protected]>:

Fix the linking error
--
50809c553c37567782e09e5de98140d1fdc9a82b by mraunak <[email protected]>:

remove duplicates

Merging this change closes tensorflow#30072

PiperOrigin-RevId: 796336822
…erate on `!tt.ptr` types.

This change modifies `triton_xla.extract` and `triton_xla.insert` to take `!tt.ptr` types for the base memory operand instead of `tensor` types. An explicit `shape` attribute is added to both ops to represent the shape of the original tensor in memory. `triton_xla.insert` no longer produces a result. The emitter and transformation passes are updated to reflect these changes, including handling scalar loads/stores separately.

PiperOrigin-RevId: 796348485
tensorflower-gardener and others added 29 commits September 17, 2025 16:56
PiperOrigin-RevId: 798338574
…s always blocking since the behavior is that even if the key does not exist, it will be initialized (consistent with other kv-store implementations).

PiperOrigin-RevId: 798349776
…needed after compilation is done

PiperOrigin-RevId: 798356871
…ventPool::Handle>

and treat the definition event specially (to avoid some locking).

Crucially this allows in the future making event_ an indirect async value
in order to implement the PjRtDeviceEventPromise API.

Reverts fd32c1a

PiperOrigin-RevId: 798372919
…oadcast_to.

This enables the prepare-quantize pass to move Quantize/Dequantize operations (QDQs) through broadcast ops. This is valid for per-tensor quantization as broadcasting doesn't change the range of tensor values.

A test case is added to verify the propagation.

PiperOrigin-RevId: 798375402
PiperOrigin-RevId: 798375472
…ime faster. Move CallContext into hlo_opcode since it's based on opcode and to avoid circular dependency.

And skip_async_execution_thread_overwrite is no longer needed, removed.

PiperOrigin-RevId: 798413636
PiperOrigin-RevId: 798507258
…nalValue`.

This change decouples `OriginalValue` from a specific `Shape`, allowing it to represent original values without being constrained by a fixed shape structure. The `OriginalValueProto` is updated to remove the shape field, and related code is adjusted accordingly.

Also, this cl removes OriginalValuePointer and adds some generic utils for using shared_ptr as map keys and use those instead.

PiperOrigin-RevId: 798633988
PiperOrigin-RevId: 798756132
…n CopyCleanUpPass.

If memory effect is attached to an op rather than an OpOperand, the `effect.getValue()` can be nullptr.

PiperOrigin-RevId: 798861213
.. as long as the order of contracting and free dimensions still match triton expectations and we don't need to add a transpose.

PiperOrigin-RevId: 798984441
…est.

This change updates the default compute capability used in `compilation_provider_test.cc` from sm_52 to sm_80.

Additionally, the test for the spilling detection now uses a more elaborate kernel.

Both changes were necessary to support CUDA 13 which requires at least compute capability 8.0 and only allows setting the maximum register count to 24 (before 16 was allowed). So this required a more elaborate data shuffling logic to push the usage of 24 registers and trigger spilling.

PiperOrigin-RevId: 798991435
@mmakevic-amd mmakevic-amd force-pushed the develop-upstream-sync-250825 branch from 33fbd29 to 00c3dd0 Compare September 17, 2025 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.