Fixed some issues around compiling on Windows. #15444

eaplatanios · 2024-07-29T15:31:58Z

This PR fixes some issues I bumped into when trying to compile XLA on Windows. I still haven't gotten GPU support to work but I'm making progress. The CPU only version compiles fine after some of the changes in this PR. I'll point out some specific issues this PR fixes in comments.

There are also TSL-specific changes that are pulled in a separate PR (#15499).

eaplatanios · 2024-07-29T15:33:31Z

third_party/tsl/third_party/gpus/cuda/build_defs.bzl.tpl

-    configured_version = "%{cuda_version}"
-    configured_major = int(configured_version.split('.')[0])
-    configured_minor = int(configured_version.split('.')[1])
+    # Strip "64_" which appears in the CUDA version on Windows.


This is mostly self-explanatory. In Windows, the CUDA version looks something like 64_121. There are other parts of the build that already handle such version numbers but it was not being handled properly here.

eaplatanios · 2024-07-29T15:34:20Z

xla/backends/profiler/gpu/cupti_buffer_events.cc

-      .context_id = graph_trace->contextId,
-      .stream_id = graph_trace->streamId,
-      .graph_id = graph_trace->graphId,
+      /* .type = */ CuptiTracerEventType::CudaGraph,


XLA is configured to build using C++ 17. However, this is a C++ 20 feature, resulting in the following error when trying to compile on Windows:

error C7555: use of designated initializers requires at least '/std:c++20'

eaplatanios · 2024-07-29T15:35:36Z

xla/backends/profiler/gpu/cupti_buffer_events.h

@@ -56,7 +56,7 @@ struct MemcpyDetails {
  int8_t dst_mem_kind;

  // ID of the hardware channel on which this operation ran.
-  uint32_t channel_id = -1;
+  uint32_t channel_id = static_cast<uint32_t>(-1);


This resulted in an implicit type narrowing error (I believe it was C2397). The explicit static cast fixes it.

eaplatanios · 2024-07-29T15:36:16Z

xla/service/cpu/runtime/conv_impl.h

@@ -41,7 +41,7 @@ void EigenConv2DImpl(
    Eigen::Index padding_y_after, Eigen::Index lhs_x_dilation,
    Eigen::Index lhs_y_dilation, Eigen::Index rhs_x_dilation,
    Eigen::Index rhs_y_dilation, Eigen::Index feature_group_count,
-    std::optional<std::function<void()>> done_callback = std::nullopt) {


This resulted in this error:

error C2765: 'tensorflow::xla::EigenConv2DImpl': an explicit specialization or instantiation of a function template cannot have any default arguments

I just removed a couple default arguments that were causing this error and propagated them at call sites where they were missing.

eaplatanios · 2024-07-30T03:17:44Z

xla/pjrt/gpu/se_gpu_pjrt_compiler.cc

@@ -199,13 +199,17 @@ StreamExecutorGpuCompiler::Compile(CompileOptions options,
 #endif
 }

+#if TENSORFLOW_USE_ROCM


The nested macro seems to not be supported by MSVC. Pushing the inner ifdef outside the other macro seems to work and doesn't change the behavior/functionality of the code here.

eaplatanios · 2024-07-30T03:19:59Z

xla/service/gpu/kernels/cutlass_gemm_custom_kernel.cc

    alignas(128) std::byte storage[1024];
+#endif


This results in the following error on Windows:

error C2719: '<args_0>': formal parameter with requested alignment of 128 won't be aligned

cc @dimvar who previously made the change from 64 to 128.

…cpp-17-fixes

eaplatanios · 2024-07-30T15:45:24Z

I also have some changes for third_party/tsl. I assume I should make those directly to the openxla/tsl repository? If I do that, how do I get the submodule here to update since it appears to be a copy of the code?

ezhulenev · 2024-07-30T15:55:06Z

@ddunl

ddunl · 2024-07-30T16:41:35Z

Thanks!! LGTM, for TSL changes I think it's easiest to open a separate PR on this repo which does the TSL edits in the copy. We are in limbo with TSL right now, some of it has been moved here to xla/tsl and I'm still moving the rest. It'd be good if the third_party/gpus/cuda change could be a part of that TSL PR also.

eaplatanios · 2024-07-30T18:53:42Z

Thanks!! LGTM, for TSL changes I think it's easiest to open a separate PR on this repo which does the TSL edits in the copy. We are in limbo with TSL right now, some of it has been moved here to xla/tsl and I'm still moving the rest. It'd be good if the third_party/gpus/cuda change could be a part of that TSL PR also.

Thanks that's super helpful! I'll go ahead and open that PR as well in this repo.

This saves one register and a few instructions in the hot loop. name old time/op new time/op delta BM_SelectAndScatterF32/128/process_time 377µs ± 4% 371µs ± 2% -1.73% BM_SelectAndScatterF32/256/process_time 1.55ms ± 4% 1.52ms ± 2% -1.98% BM_SelectAndScatterF32/512/process_time 6.64ms ± 4% 6.58ms ± 4% -0.93% FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#15444 from eaplatanios:u/eaplatanios/cpp-17-fixes 73f3cd7e0135ec05c97595f795ec318fb635bd32 PiperOrigin-RevId: 657602607

name old cpu/op new cpu/op delta BM_SelectAndScatterF32/128/process_time 373µs ± 2% 337µs ± 2% -9.74% BM_SelectAndScatterF32/256/process_time 1.54ms ± 3% 1.39ms ± 4% -10.04% BM_SelectAndScatterF32/512/process_time 7.08ms ± 7% 6.42ms ± 6% -9.29% FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#15444 from eaplatanios:u/eaplatanios/cpp-17-fixes 73f3cd7e0135ec05c97595f795ec318fb635bd32 PiperOrigin-RevId: 657676415

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#15444 from eaplatanios:u/eaplatanios/cpp-17-fixes 73f3cd7e0135ec05c97595f795ec318fb635bd32 PiperOrigin-RevId: 657802573

This saves one register and a few instructions in the hot loop. name old time/op new time/op delta BM_SelectAndScatterF32/128/process_time 377µs ± 4% 371µs ± 2% -1.73% BM_SelectAndScatterF32/256/process_time 1.55ms ± 4% 1.52ms ± 2% -1.98% BM_SelectAndScatterF32/512/process_time 6.64ms ± 4% 6.58ms ± 4% -0.93% FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#15444 from eaplatanios:u/eaplatanios/cpp-17-fixes 73f3cd7e0135ec05c97595f795ec318fb635bd32 PiperOrigin-RevId: 657602607

name old cpu/op new cpu/op delta BM_SelectAndScatterF32/128/process_time 889µs ± 1% 740µs ± 3% -16.70% BM_SelectAndScatterF32/256/process_time 3.64ms ± 2% 3.00ms ± 1% -17.64% BM_SelectAndScatterF32/512/process_time 15.3ms ± 1% 13.1ms ± 3% -14.61% FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#15444 from eaplatanios:u/eaplatanios/cpp-17-fixes 73f3cd7e0135ec05c97595f795ec318fb635bd32 PiperOrigin-RevId: 657693426

name old cpu/op new cpu/op delta BM_SelectAndScatterF32/128/process_time 385µs ± 2% 378µs ± 4% -1.82% BM_SelectAndScatterF32/256/process_time 1.58ms ± 2% 1.56ms ± 2% -1.77% BM_SelectAndScatterF32/512/process_time 7.24ms ± 4% 7.07ms ± 6% -2.39% FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#15444 from eaplatanios:u/eaplatanios/cpp-17-fixes 73f3cd7e0135ec05c97595f795ec318fb635bd32 PiperOrigin-RevId: 657437409

This can be disabled with the flag --xla_gpu_mlir_emitter_level, setting it to any value < 4. Change some tests to still use the old emitters. We have separate IR tests for the new emitters, and keeping the old tests running with the old emitters ensures we still have coverage for the old emitters, in case we need to rollback. One notable change with enabling emitter level 4 is that the heuristic to avoid code duplication due to cache invalidation is disabled. This was always a a workaround, and the new emitters fixed the problem. This is the most common source of why the tests behave differently between the old and the new emitters. FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#15444 from eaplatanios:u/eaplatanios/cpp-17-fixes 73f3cd7e0135ec05c97595f795ec318fb635bd32 PiperOrigin-RevId: 653901032

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#15444 from eaplatanios:u/eaplatanios/cpp-17-fixes 73f3cd7e0135ec05c97595f795ec318fb635bd32 PiperOrigin-RevId: 655567725

…eline. Since Shardy is inside the middle of the XLA pipeline, after converting down to HLO, we need to run the Shardy export pipeline to preserve the SDY ops and sharding attributes for when we come back from HLO to MLIR when Shardy propagation is run. FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#15444 from eaplatanios:u/eaplatanios/cpp-17-fixes 73f3cd7e0135ec05c97595f795ec318fb635bd32 PiperOrigin-RevId: 649052479

This saves one register and a few instructions in the hot loop. name old time/op new time/op delta BM_SelectAndScatterF32/128/process_time 377µs ± 4% 371µs ± 2% -1.73% BM_SelectAndScatterF32/256/process_time 1.55ms ± 4% 1.52ms ± 2% -1.98% BM_SelectAndScatterF32/512/process_time 6.64ms ± 4% 6.58ms ± 4% -0.93% FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#15444 from eaplatanios:u/eaplatanios/cpp-17-fixes 73f3cd7e0135ec05c97595f795ec318fb635bd32 PiperOrigin-RevId: 657602607

name old cpu/op new cpu/op delta BM_SelectAndScatterF32/128/process_time 889µs ± 1% 740µs ± 3% -16.70% BM_SelectAndScatterF32/256/process_time 3.64ms ± 2% 3.00ms ± 1% -17.64% BM_SelectAndScatterF32/512/process_time 15.3ms ± 1% 13.1ms ± 3% -14.61% FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#15444 from eaplatanios:u/eaplatanios/cpp-17-fixes 73f3cd7e0135ec05c97595f795ec318fb635bd32 PiperOrigin-RevId: 657693426

name old cpu/op new cpu/op delta BM_SelectAndScatterF32/128/process_time 385µs ± 2% 378µs ± 4% -1.82% BM_SelectAndScatterF32/256/process_time 1.58ms ± 2% 1.56ms ± 2% -1.77% BM_SelectAndScatterF32/512/process_time 7.24ms ± 4% 7.07ms ± 6% -2.39% FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#15444 from eaplatanios:u/eaplatanios/cpp-17-fixes 73f3cd7e0135ec05c97595f795ec318fb635bd32 PiperOrigin-RevId: 657437409

Imported from GitHub PR openxla/xla#15444 This PR fixes some issues I bumped into when trying to compile XLA on Windows. I still haven't gotten GPU support to work but I'm making progress. The CPU only version compiles fine after some of the changes in this PR. I'll point out some specific issues this PR fixes in comments. There are also TSL-specific changes that are pulled in a separate PR (#15499). Copybara import of the project: -- eacee95f41abc49a21516ee389861d84a40eca85 by eaplatanios <[email protected]>: Fixed some issues around compiling on Windows. -- b12e4cf0d23c2690111125a651e486ec6a112e54 by eaplatanios <[email protected]>: . -- e23ef176de72cf04555242174a19a407884f3f0e by eaplatanios <[email protected]>: . -- bdae19b9e15c396985703bb7e88a4db6fcddc7f6 by eaplatanios <[email protected]>: . -- 2f90e6ba564f92fafa564b104ed0ce82b7642563 by eaplatanios <[email protected]>: . -- 57009793b74c4d7d51fb39547a70a3ec142dadab by eaplatanios <[email protected]>: . -- a978b1f7f70d49f1426fe46b107fdcc3618e3085 by eaplatanios <[email protected]>: . -- d7fe81dc9cf909a6a8d70e2be8cfffca4063493e by eaplatanios <[email protected]>: . -- fc40d919619330bce596555613e425cb6267eea4 by eaplatanios <[email protected]>: . -- 326aec3fd73a67ca3c667cfeb5c88a8ffa52eb3d by eaplatanios <[email protected]>: . -- a7603b7e1be990ff012440c74bd2c2ecbc2b1e2f by eaplatanios <[email protected]>: . -- edcc97a67016584c285d84ac732952c572283119 by eaplatanios <[email protected]>: . -- cec244808a8df163f9a803db450ca2bebdda9315 by eaplatanios <[email protected]>: . -- df3eb2215eea9076cb352378c5745e113df7cc7d by eaplatanios <[email protected]>: . -- 8997345fd1e1aa6f55e445615460124c6e14417c by eaplatanios <[email protected]>: . -- 219a9f1bff7fb12c3407ab2e47512560001900fe by eaplatanios <[email protected]>: . -- 73f3cd7e0135ec05c97595f795ec318fb635bd32 by eaplatanios <[email protected]>: . Merging this change closes #15444 PiperOrigin-RevId: 657937707

name old cpu/op new cpu/op delta BM_SelectAndScatterF32/128/process_time 373µs ± 2% 337µs ± 2% -9.74% BM_SelectAndScatterF32/256/process_time 1.54ms ± 3% 1.39ms ± 4% -10.04% BM_SelectAndScatterF32/512/process_time 7.08ms ± 7% 6.42ms ± 6% -9.29% FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#15444 from eaplatanios:u/eaplatanios/cpp-17-fixes 73f3cd7e0135ec05c97595f795ec318fb635bd32 PiperOrigin-RevId: 657676415

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#15444 from eaplatanios:u/eaplatanios/cpp-17-fixes 73f3cd7e0135ec05c97595f795ec318fb635bd32 PiperOrigin-RevId: 657960174

Updates LLVM usage to match [42d641ef5cc4](llvm/llvm-project@42d641ef5cc4) FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#15444 from eaplatanios:u/eaplatanios/cpp-17-fixes 73f3cd7e0135ec05c97595f795ec318fb635bd32 PiperOrigin-RevId: 657972192

@hawkinsp

This is following up on #15444. There's still one issue blocking Windows support that I haven't resolved. I've described it here. Any suggestions/advice on how to proceed for that one would be helpful. cc @hawkinsp @ddunl and also fyi @metab0t. This closes #15499. PiperOrigin-RevId: 662267938

@hawkinsp

This is following up on #15444. There's still one issue blocking Windows support that I haven't resolved. I've described it here. Any suggestions/advice on how to proceed for that one would be helpful. cc @hawkinsp @ddunl and also fyi @metab0t. This closes #15499. PiperOrigin-RevId: 662267938

@hawkinsp

This is following up on #15444. There's still one issue blocking Windows support that I haven't resolved. I've described it here. Any suggestions/advice on how to proceed for that one would be helpful. cc @hawkinsp @ddunl and also fyi @metab0t. This closes #15499. PiperOrigin-RevId: 662267938

@hawkinsp

This is following up on #15444. There's still one issue blocking Windows support that I haven't resolved. I've described it here. Any suggestions/advice on how to proceed for that one would be helpful. cc @hawkinsp @ddunl and also fyi @metab0t. This closes #15499. PiperOrigin-RevId: 662267938

@hawkinsp

This is following up on #15444. There's still one issue blocking Windows support that I haven't resolved. I've described it here. Any suggestions/advice on how to proceed for that one would be helpful. cc @hawkinsp @ddunl and also fyi @metab0t. This closes #15499. PiperOrigin-RevId: 663079395

eaplatanios added 2 commits July 29, 2024 08:30

Fixed some issues around compiling on Windows.

eacee95

.

b12e4cf

eaplatanios commented Jul 29, 2024

View reviewed changes

eaplatanios mentioned this pull request Jul 29, 2024

cuda_stub for Windows #6993

Open

eaplatanios added 8 commits July 29, 2024 15:57

.

e23ef17

.

bdae19b

.

2f90e6b

.

5700979

.

a978b1f

.

d7fe81d

.

fc40d91

.

326aec3

eaplatanios commented Jul 30, 2024

View reviewed changes

.

a7603b7

eaplatanios commented Jul 30, 2024

View reviewed changes

eaplatanios added 5 commits July 29, 2024 20:29

.

edcc97a

.

cec2448

Merge branch 'main' of github.com:eaplatanios/xla into u/eaplatanios/…

28342ae

…cpp-17-fixes

.

df3eb22

.

8997345

NaiyerRizz requested a review from ezhulenev July 30, 2024 06:59

NaiyerRizz self-assigned this Jul 30, 2024

ezhulenev requested a review from ddunl July 30, 2024 15:55

copybara-service bot mentioned this pull request Jul 31, 2024

[xla:cpu] Optimize Thunk::OkExecuteEvent tensorflow/tensorflow#72819

Merged

copybara-service bot mentioned this pull request Jul 31, 2024

[xla:cpu] Use iterators for executing thunks sequentially tensorflow/tensorflow#72823

Merged

copybara-service bot mentioned this pull request Jul 31, 2024

Automated Code Change tensorflow/tensorflow#72870

Merged

copybara-service bot mentioned this pull request Jul 31, 2024

Update XNNPack version and add it's new KleidiAI dependency. tensorflow/tensorflow#72445

Merged

copybara-service bot mentioned this pull request Jul 31, 2024

Hide the SDY dialect right before MLIR->HLO conversion in the XLA pipeline. tensorflow/tensorflow#72386

Merged

copybara-service bot mentioned this pull request Jul 31, 2024

[XLA:GPU] Make "DumpingWorks" test smaller. tensorflow/tensorflow#72880

Merged

copybara-service bot mentioned this pull request Jul 31, 2024

Integrate LLVM at llvm/llvm-project@42d641ef5cc4 tensorflow/tensorflow#72881

Merged

eaplatanios mentioned this pull request Jul 31, 2024

Fixed remaining issues for Windows CUDA support. #15565

Closed

copybara-service bot mentioned this pull request Aug 12, 2024

Support TSL compilation on Windows with CUDA support. #16007

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed some issues around compiling on Windows. #15444

Fixed some issues around compiling on Windows. #15444

eaplatanios commented Jul 29, 2024 •

edited

Loading

eaplatanios Jul 29, 2024

eaplatanios Jul 29, 2024

eaplatanios Jul 29, 2024

eaplatanios Jul 29, 2024 •

edited

Loading

eaplatanios Jul 30, 2024

eaplatanios Jul 30, 2024

eaplatanios commented Jul 30, 2024

ezhulenev commented Jul 30, 2024

ddunl commented Jul 30, 2024

eaplatanios commented Jul 30, 2024

Fixed some issues around compiling on Windows. #15444

Fixed some issues around compiling on Windows. #15444

Conversation

eaplatanios commented Jul 29, 2024 • edited Loading

eaplatanios Jul 29, 2024

Choose a reason for hiding this comment

eaplatanios Jul 29, 2024

Choose a reason for hiding this comment

eaplatanios Jul 29, 2024

Choose a reason for hiding this comment

eaplatanios Jul 29, 2024 • edited Loading

Choose a reason for hiding this comment

eaplatanios Jul 30, 2024

Choose a reason for hiding this comment

eaplatanios Jul 30, 2024

Choose a reason for hiding this comment

eaplatanios commented Jul 30, 2024

ezhulenev commented Jul 30, 2024

ddunl commented Jul 30, 2024

eaplatanios commented Jul 30, 2024

eaplatanios commented Jul 29, 2024 •

edited

Loading

eaplatanios Jul 29, 2024 •

edited

Loading