Update OpenXLA-pin to Nov24 #3

wbmc · 2023-12-05T11:11:46Z

Update from PR

…ary. The library can be (and is) tested. PiperOrigin-RevId: 582142162

…ions CustomKernel and CustomFusion are already unique enough, no need to put them into a unique namespace. PiperOrigin-RevId: 582174192

PiperOrigin-RevId: 582177793

PiperOrigin-RevId: 582199198

PiperOrigin-RevId: 582275667

Serializing them is in line with all other module properties which affect compilation (aliasing, layout, etc.), and not serializing creates an impure compilation environment where IR does not and can not capture semantics of the module. PiperOrigin-RevId: 582290809

We do not need to check the backend config field. PiperOrigin-RevId: 582292178

PiperOrigin-RevId: 582305685

PiperOrigin-RevId: 582343268

…isualization The HTML codepath has bitrotted, is not tested, and isn't currently working. Let's use the same approach as for fusion visualization, as it is working. PiperOrigin-RevId: 582346207

@xla-rotation

Imported from GitHub PR openxla#6964 Here is a fix for the oncoming build brakes due to recebt changes in GpuDriver API. Besides, I have also fixed the issue with headers in xla/service/gpu/ir_emitter_unnested.cc: otherwise, this would generate linker errors on ROCM platform when TF_HIPBLASLT=0 @xla-rotation: would you have a look, please ? Copybara import of the project: -- 14c2e30 by Pavel Emeliyanenko <[email protected]>: fixing buildbrakes -- 22b0962 by Pavel Emeliyanenko <[email protected]>: fixing buildifier warnings Merging this change closes openxla#6964 COPYBARA_INTEGRATE_REVIEW=openxla#6964 from ROCmSoftwarePlatform:ci_rocm_build_brakes_231113 22b0962 PiperOrigin-RevId: 582361553

- Make loop detection more accurate by recording the latest instance of an instruction with matching fingerprint. - If the loop value allocation type isn't supported by the optimizer, still allow that tensor to get alternate memory allocation using the usual MSA algorithm. - Export the minimum num loop iteration as a field in the proto. PiperOrigin-RevId: 582363814

PiperOrigin-RevId: 582369654

Updates LLVM usage to match [ed86e740effa](llvm/llvm-project@ed86e740effa) PiperOrigin-RevId: 582371041

PiperOrigin-RevId: 582385702

… shardings when enumerating sharding strategies for those ops. This is as opposed to the previous approach of using sharding propagation to infer operand shardings given the dot/conv. This approach does not work when one is looking to shard the contraction dimension and is therefore less cleaner than this new approach. PiperOrigin-RevId: 582397850

PiperOrigin-RevId: 582399236

http://github.com/tensorflow/runtime/commit/f5091e2c05925158e0be192370a37a6cf6fcf241. PiperOrigin-RevId: 582401066

PiperOrigin-RevId: 582419743

Re-arrange structs/classes declarations in kernel.h to avoid forward declaring arguments types. PiperOrigin-RevId: 582428745

PiperOrigin-RevId: 582440759

PiperOrigin-RevId: 582447653

Add a boolean field, no_parallel_gpu_op, to CollectiveBackendConfig. This field asserts that an asynchronous collective operation does not execute in parallel with other operations in GPU. The default value of the attribute is false, which should lead to conservative runtime behavior. Add BackendConfig test for the field. Add gpu-schedule-postprocessing pass, to refine the attribute value. Add test cases for the pass. PiperOrigin-RevId: 582457930

PiperOrigin-RevId: 582458600

…nc ops to be efficiently scheduled. PiperOrigin-RevId: 582470977

…that are not necessarily in the same computation as the use. PiperOrigin-RevId: 582495253

…ys and values to std::string_view. I plan to add a caller that has a std::vector<char>, and this saves a copy in that case. PiperOrigin-RevId: 582500430

PiperOrigin-RevId: 582526517

Updates LLVM usage to match [5d6304f01742](llvm/llvm-project@5d6304f01742) PiperOrigin-RevId: 582539795

http://github.com/tensorflow/runtime/commit/fce7c27b3191264a8ed581e03900e094c793593a. PiperOrigin-RevId: 582546998

… integrate PiperOrigin-RevId: 584838739

PiperOrigin-RevId: 584840548

PiperOrigin-RevId: 584841378

Imported from GitHub PR openxla#7201 ncclGetLastError return the last log entry generated at the "WARN/ERROR" level. Here is an example of the new error: ``` NCCL operation ncclCommInitRank(&comm, nranks, id, rank) failed: unhandled cuda error (run with NCCL_DEBUG=INFO for details). Last NCCL warning(error) log entry (may be unrelated) 'Cuda failure 2 'out of memory''.; current tracing scope: all-reduce-start.285; current profiling annotation: XlaModule:#hlo_module=pjit__wrapped_step_fn,program_id=25#. ``` The new part is: ``` Last NCCL warning(error) log entry (may be unrelated) 'Cuda failure 2 'out of memory''. ``` Copybara import of the project: -- 348df80 by Frederic Bastien <[email protected]>: Add extra error information when NCCL error out. Merging this change closes openxla#7201 COPYBARA_INTEGRATE_REVIEW=openxla#7201 from nouiz:nccl_warn_log_as_error_upstream 348df80 PiperOrigin-RevId: 584842170

…pu docker image Imported from GitHub PR openxla#7237 The "devel" docker image is not updated and does not seem maintained anymore. We probably should recommend a most up-to-date image. Copybara import of the project: -- 3412f30 by Mehdi Amini <[email protected]>: Update build_from_source.md doc to point to latest-gpu docker image The "devel" docker image is not updated and does not seem maintained anymore. We probably should recommend a most up-to-date image. -- 4b7a36b by Mehdi Amini <[email protected]>: Update build_from_source.md Merging this change closes openxla#7237 COPYBARA_INTEGRATE_REVIEW=openxla#7237 from joker-eph:patch-2 4b7a36b PiperOrigin-RevId: 584848729

PiperOrigin-RevId: 584852376

Obviously, an experienced XLA engineer would know that IsElementwise/IsOpElementwise/IsElementwiseImpl/IsElementwiseOnOperand are very different functions and one should be very careful when using them. PiperOrigin-RevId: 584873063

This is needed once we want to enable it by default. PiperOrigin-RevId: 584873521

PiperOrigin-RevId: 584878714

PiperOrigin-RevId: 584907007

…n --xla_gpu_autotuner_level=0 is set Instead, pick the first tiling available. This is consistent with autotuner_level=0 behavior in non-deviceless mode, and allows for better QOL while developing without a (matching) GPU. PiperOrigin-RevId: 584908154

…itTritonFusion xla/tests:dot_operation_test_autotune_disabled_gpu_a100 was flaky because of this. PiperOrigin-RevId: 584935941

…rough decoding APIs PiperOrigin-RevId: 584940021

PiperOrigin-RevId: 584941044

This adds the necessary changes in XlaBuilder API, verifier, and shape inference following StableHLO rules for unbounded dynamism. Implicit broadcasting support in XlaBuilder API will be addressed in a follow up CL. PiperOrigin-RevId: 584967526

PiperOrigin-RevId: 585006293

Updates LLVM usage to match [af7a1453526a](llvm/llvm-project@af7a1453526a) PiperOrigin-RevId: 585072558

http://github.com/tensorflow/runtime/commit/4347953799d066962cb1897814de77c8e195499d. PiperOrigin-RevId: 585077428

PiperOrigin-RevId: 585083299

Boundary functions seemed like a nice and easy abstraction for fusions, but they turned out to be too difficult to use in practice. The main problem is that everything is still based on HloInstructions, whose users and operands are difficult to traverse in general. The solution introduced here is to introduce an HloFusionAdaptor class with a simple interface, and an HloInstructionAdaptor which always behaves as if the HLO was completely unfused. If I had more time, I would have made smaller change. PiperOrigin-RevId: 585087631

This is another step towards tile analysis being able to tile all HLOs. Tiling dot requires some care to ensure that output dimensions are mapped to the appropriate dimensions. The [StableHLO specification for dot_general](https://github.com/openxla/stablehlo/blob/main/docs/spec.md#dot_general) describes how output dimensions are constructed from the input dimensions and the operation's attributes. PiperOrigin-RevId: 585095167

They don't use gml_st dialect. PiperOrigin-RevId: 585099478

This op is unsupported by tile analysis. Adding a test so that we don't shoot ourselves in the foot by using `isElementwise` method, for example. PiperOrigin-RevId: 585123986

We can revert this when/if we need this. PiperOrigin-RevId: 585124862

wbmc · 2023-12-05T11:37:11Z

…art #3 PiperOrigin-RevId: 599039077

Currently we look for ptxas and nvlink in a few different places on the host machine, then we choose the first found binary without taking its version into account. If the chosen binary doesn't fulfill our version requirements we will later fail even if there was a suitable ptxas or nvlink in the search path in the first place. This change makes it take the version of each binary into account when going through the search path. Unsuitable binaries will be discarded right away and the search continues until we are out of locations to check. This should help with host environments that have multiple CUDA toolkits installed and should make ptxas and nvlink selection more robust. The concreate changes: 1. `FindCudaExecutable` now also takes a minimum version and a list of forbidden (think buggy) versions that are supposed to be skipped. 2. `WarnIfBadPtxAsVersion` has been removed. It was checking for ptxas < 11.1 which is way older than our minimum supported version of 11.8 and was not doing anything given the check described in #3. 3. There was another version check for `ptxas` in `NVPTXCompiler::ChooseLinkingMethod` which was checking for `version(ptxas)` < 11.8. This has also been removed/replace by the version check described in #4. 4. Version checking for `ptxas` and `nvlink` has been consolidated into 2 methods `FindPtxAsExectuable` and `FindNvLinkExecutable`. These methods hard code the current minimum version (and the list of excluded versions) of each tool in one place. It's still not great but at least less spaghetti-like. PiperOrigin-RevId: 618797392

pizzud and others added 30 commits November 13, 2023 18:07

[xla_compile][NFC] Extract the compilation and file-writing to a libr…

a129c9b

…ary. The library can be (and is) tested. PiperOrigin-RevId: 582142162

[xla:gpu] NFC: Remove nested kernel namespace from custom kernels/fus…

3e2b71a

…ions CustomKernel and CustomFusion are already unique enough, no need to put them into a unique namespace. PiperOrigin-RevId: 582174192

Enable macOS Arm64 nightly builds

5159949

PiperOrigin-RevId: 582177793

Enable NVCC only for XLA's build_gpu_nvcc Kokoro job

548a766

PiperOrigin-RevId: 582199198

Reverts fbe8a54

1e05b09

PiperOrigin-RevId: 582275667

[XLA:GPU] Loosen up expectations on Int8 gemm test.

b4479af

We do not need to check the backend config field. PiperOrigin-RevId: 582292178

Error out if ptxas version < 11.8

f1f77cd

PiperOrigin-RevId: 582305685

Fix test gpu_aot_compilation_test

c5800a1

PiperOrigin-RevId: 582343268

[XLA] [NFC] Unify graph rendering for fusion visualization and HTML v…

e25203a

…isualization The HTML codepath has bitrotted, is not tested, and isn't currently working. Let's use the same approach as for fusion visualization, as it is working. PiperOrigin-RevId: 582346207

Enable running passes for H100

2a10123

PiperOrigin-RevId: 582369654

Integrate LLVM at llvm/llvm-project@ed86e740effa

6843ab3

Updates LLVM usage to match [ed86e740effa](llvm/llvm-project@ed86e740effa) PiperOrigin-RevId: 582371041

Add nvml headers for Windows, based on openxla#6994

01cbfb3

PiperOrigin-RevId: 582385702

[xla:gpu] Disable gpu_aot_compilation_test in jitrt_executable_tests

ccd828e

PiperOrigin-RevId: 582399236

Update TFRT dependency to use revision

3d46b55

http://github.com/tensorflow/runtime/commit/f5091e2c05925158e0be192370a37a6cf6fcf241. PiperOrigin-RevId: 582401066

[xla:gpu] NFC: Fix typo in filecheck based test

1be779f

PiperOrigin-RevId: 582419743

[stream_executor] NFC: Rename KernelArgsArrayBase to KernelArgs

3a8a598

Re-arrange structs/classes declarations in kernel.h to avoid forward declaring arguments types. PiperOrigin-RevId: 582428745

Create script for generating compile_commands.json

e963478

PiperOrigin-RevId: 582440759

Prevent OOB indexing in StableHLO/MHLO ops.

e176b35

PiperOrigin-RevId: 582447653

Populate manual sharding for asynchronous instructions.

da8f498

PiperOrigin-RevId: 582458600

[XLA] Exclude async ops from elapsed times in MSA since we expect asy…

508c700

…nc ops to be efficiently scheduled. PiperOrigin-RevId: 582470977

Fix a bug in sliced prefetching in which it allows slice start times …

b3d902c

…that are not necessarily in the same computation as the use. PiperOrigin-RevId: 582495253

[TSL] Change coordination service const std::string& arguments for ke…

182544f

…ys and values to std::string_view. I plan to add a caller that has a std::vector<char>, and this saves a copy in that case. PiperOrigin-RevId: 582500430

[xla:gpu] Add name to custom kernels to improve logging and debugging

dece2a6

PiperOrigin-RevId: 582526517

Integrate LLVM at llvm/llvm-project@5d6304f01742

0528619

Updates LLVM usage to match [5d6304f01742](llvm/llvm-project@5d6304f01742) PiperOrigin-RevId: 582539795

Update TFRT dependency to use revision

8fb606f

http://github.com/tensorflow/runtime/commit/fce7c27b3191264a8ed581e03900e094c793593a. PiperOrigin-RevId: 582546998

gflegar and others added 25 commits November 23, 2023 03:00

Update XLA's Triton pipeline with the missing changes from the latest…

ae906af

… integrate PiperOrigin-RevId: 584838739

[xla:gpu] Allow missing ObjFile / MLIR in GpuCompiler::Export.

3803c88

PiperOrigin-RevId: 584840548

[XLA] Disable autosharding tests in OSS until it's fully fixed upstream

0cfb44a

PiperOrigin-RevId: 584841378

Fix a typo in comment.

a1dd132

PiperOrigin-RevId: 584852376

Add MoveCopyToUsers pass to multi-headed attention fusion pipeline.

a53fcaa

This is needed once we want to enable it by default. PiperOrigin-RevId: 584873521

[XLA:GPU] Tiled fusion: fix nested slicing.

7239eb4

PiperOrigin-RevId: 584878714

Import openai/triton from GitHub.

2f88516

PiperOrigin-RevId: 584907007

[XLA:GPU] Fix occasional nullptr dereference in IrEmitterUnnested::Em…

a191d92

…itTritonFusion xla/tests:dot_operation_test_autotune_disabled_gpu_a100 was flaky because of this. PiperOrigin-RevId: 584935941

[xla:ffi] Sketched basic diagnostics infrastructure and plumbed it th…

1dba8f6

…rough decoding APIs PiperOrigin-RevId: 584940021

[stream_executor] Add If-Else conditional commands to CommandBuffer

f6856c2

PiperOrigin-RevId: 584941044

Adding fingerprint to module's profile information

af6dbb7

PiperOrigin-RevId: 585006293

Integrate LLVM at llvm/llvm-project@af7a1453526a

7e2c601

Updates LLVM usage to match [af7a1453526a](llvm/llvm-project@af7a1453526a) PiperOrigin-RevId: 585072558

Update TFRT dependency to use revision

a3bfe5d

http://github.com/tensorflow/runtime/commit/4347953799d066962cb1897814de77c8e195499d. PiperOrigin-RevId: 585077428

[TileAnalysis] Add indexing computation for fusionOp.

eb15c02

PiperOrigin-RevId: 585083299

Move vectorize_copy.cc and copy_removal.cc passes out of gml_st.

e5a0ad2

They don't use gml_st dialect. PiperOrigin-RevId: 585099478

[TileAnalysis] Add a test for dynamic-update-slice.

acd327d

This op is unsupported by tile analysis. Adding a test so that we don't shoot ourselves in the foot by using `isElementwise` method, for example. PiperOrigin-RevId: 585123986

Put thlo & gml_st on ice.

8744c9a

We can revert this when/if we need this. PiperOrigin-RevId: 585124862

Merge commit '8744c9a94782cd7804f015e6d29df253437af3cb' into HEAD

e826787

github-actions bot added the kokoro:force-run label Dec 5, 2023

wbmc closed this Dec 5, 2023

wbmc pushed a commit that referenced this pull request Jan 19, 2024

[xla:gpu] Do not use ncclSend and ncclRecv directly and use NcclApi p…

48a80dd

…art #3 PiperOrigin-RevId: 599039077

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update OpenXLA-pin to Nov24 #3

Update OpenXLA-pin to Nov24 #3

wbmc commented Dec 5, 2023

wbmc commented Dec 5, 2023

Update OpenXLA-pin to Nov24 #3

Update OpenXLA-pin to Nov24 #3

Conversation

wbmc commented Dec 5, 2023

wbmc commented Dec 5, 2023