Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update OpenXLA-pin to Nov24 #3

Closed
wants to merge 635 commits into from
Closed

Update OpenXLA-pin to Nov24 #3

wants to merge 635 commits into from

Conversation

wbmc
Copy link

@wbmc wbmc commented Dec 5, 2023

  • Update from PR

pizzud and others added 30 commits November 13, 2023 18:07
…ary.

The library can be (and is) tested.

PiperOrigin-RevId: 582142162
…ions

CustomKernel and CustomFusion are already unique enough, no need to put them into a unique namespace.

PiperOrigin-RevId: 582174192
PiperOrigin-RevId: 582177793
PiperOrigin-RevId: 582275667
Serializing them is in line with all other module properties which affect
compilation (aliasing, layout, etc.), and not serializing creates an impure
compilation environment where IR does not and can not capture semantics of the
module.

PiperOrigin-RevId: 582290809
We do not need to check the backend config field.

PiperOrigin-RevId: 582292178
PiperOrigin-RevId: 582305685
PiperOrigin-RevId: 582343268
…isualization

The HTML codepath has bitrotted, is not tested, and isn't currently working.
Let's use the same approach as for fusion visualization, as it is working.

PiperOrigin-RevId: 582346207
Imported from GitHub PR openxla#6964

Here is a fix for the oncoming build brakes due to recebt changes in GpuDriver API.

Besides, I have also fixed the issue with headers in xla/service/gpu/ir_emitter_unnested.cc: otherwise, this would generate linker errors on ROCM platform when TF_HIPBLASLT=0

@xla-rotation: would you have a look, please ?
Copybara import of the project:

--
14c2e30 by Pavel Emeliyanenko <[email protected]>:

fixing buildbrakes

--
22b0962 by Pavel Emeliyanenko <[email protected]>:

fixing buildifier warnings

Merging this change closes openxla#6964

COPYBARA_INTEGRATE_REVIEW=openxla#6964 from ROCmSoftwarePlatform:ci_rocm_build_brakes_231113 22b0962
PiperOrigin-RevId: 582361553
- Make loop detection more accurate by recording the latest instance of an
  instruction with matching fingerprint.
- If the loop value allocation type isn't supported by the optimizer, still
  allow that tensor to get alternate memory allocation using the usual MSA
  algorithm.
- Export the minimum num loop iteration as a field in the proto.

PiperOrigin-RevId: 582363814
PiperOrigin-RevId: 582369654
Updates LLVM usage to match
[ed86e740effa](llvm/llvm-project@ed86e740effa)

PiperOrigin-RevId: 582371041
… shardings when enumerating sharding strategies for those ops.

This is as opposed to the previous approach of using sharding propagation to infer operand shardings given the dot/conv. This approach does not work when one is looking to shard the contraction dimension and is therefore less cleaner than this new approach.

PiperOrigin-RevId: 582397850
Re-arrange structs/classes declarations in kernel.h to avoid forward declaring arguments types.

PiperOrigin-RevId: 582428745
Add a boolean field, no_parallel_gpu_op, to CollectiveBackendConfig. This field
asserts that an asynchronous collective operation does not execute in parallel
with other operations in GPU. The default value of the attribute is false,
which should lead to conservative runtime behavior.

Add BackendConfig test for the field.

Add gpu-schedule-postprocessing pass, to refine the attribute value. Add test
cases for the pass.

PiperOrigin-RevId: 582457930
…nc ops to be efficiently scheduled.

PiperOrigin-RevId: 582470977
…that are not necessarily in the same computation as the use.

PiperOrigin-RevId: 582495253
…ys and values to std::string_view.

I plan to add a caller that has a std::vector<char>, and this saves a copy in that case.

PiperOrigin-RevId: 582500430
Updates LLVM usage to match
[5d6304f01742](llvm/llvm-project@5d6304f01742)

PiperOrigin-RevId: 582539795
gflegar and others added 25 commits November 23, 2023 03:00
Imported from GitHub PR openxla#7201

ncclGetLastError return the last log entry generated at the "WARN/ERROR" level.

Here is an example of the new error:
```
NCCL operation ncclCommInitRank(&comm, nranks, id, rank) failed: unhandled cuda error (run with NCCL_DEBUG=INFO for details). Last NCCL warning(error) log entry (may be unrelated) 'Cuda failure 2 'out of memory''.; current tracing scope: all-reduce-start.285; current profiling annotation: XlaModule:#hlo_module=pjit__wrapped_step_fn,program_id=25#.
```

The new part is:
```
Last NCCL warning(error) log entry (may be unrelated) 'Cuda failure 2 'out of memory''.
```
Copybara import of the project:

--
348df80 by Frederic Bastien <[email protected]>:

Add extra error information when NCCL error out.

Merging this change closes openxla#7201

COPYBARA_INTEGRATE_REVIEW=openxla#7201 from nouiz:nccl_warn_log_as_error_upstream 348df80
PiperOrigin-RevId: 584842170
…pu docker image

Imported from GitHub PR openxla#7237

The "devel" docker image is not updated and does not seem maintained anymore. We probably should recommend a most up-to-date image.
Copybara import of the project:

--
3412f30 by Mehdi Amini <[email protected]>:

Update build_from_source.md doc to point to latest-gpu docker image

The "devel" docker image is not updated and does not seem maintained anymore.
We probably should recommend a most up-to-date image.
--
4b7a36b by Mehdi Amini <[email protected]>:

Update build_from_source.md

Merging this change closes openxla#7237

COPYBARA_INTEGRATE_REVIEW=openxla#7237 from joker-eph:patch-2 4b7a36b
PiperOrigin-RevId: 584848729
PiperOrigin-RevId: 584852376
Obviously, an experienced XLA engineer would know that IsElementwise/IsOpElementwise/IsElementwiseImpl/IsElementwiseOnOperand are very different functions and one should be very careful when using them.

PiperOrigin-RevId: 584873063
This is needed once we want to enable it by default.

PiperOrigin-RevId: 584873521
PiperOrigin-RevId: 584878714
PiperOrigin-RevId: 584907007
…n --xla_gpu_autotuner_level=0 is set

Instead, pick the first tiling available.

This is consistent with autotuner_level=0 behavior in non-deviceless mode, and
allows for better QOL while developing without a (matching) GPU.

PiperOrigin-RevId: 584908154
…itTritonFusion

xla/tests:dot_operation_test_autotune_disabled_gpu_a100 was flaky because of this.

PiperOrigin-RevId: 584935941
…rough decoding APIs

PiperOrigin-RevId: 584940021
This adds the necessary changes in XlaBuilder API, verifier, and shape inference following StableHLO rules for unbounded dynamism.

Implicit broadcasting support in XlaBuilder API will be addressed in a follow up CL.

PiperOrigin-RevId: 584967526
Updates LLVM usage to match
[af7a1453526a](llvm/llvm-project@af7a1453526a)

PiperOrigin-RevId: 585072558
Boundary functions seemed like a nice and easy abstraction for fusions,
but they turned out to be too difficult to use in practice. The main
problem is that everything is still based on HloInstructions, whose
users and operands are difficult to traverse in general.

The solution introduced here is to introduce an HloFusionAdaptor class
with a simple interface, and an HloInstructionAdaptor which always behaves
as if the HLO was completely unfused.

If I had more time, I would have made smaller change.

PiperOrigin-RevId: 585087631
This is another step towards tile analysis being able to tile all HLOs.
Tiling dot requires some care to ensure that output dimensions are mapped to the
appropriate dimensions. The [StableHLO specification for dot_general](https://github.com/openxla/stablehlo/blob/main/docs/spec.md#dot_general)
describes how output dimensions are constructed from the input dimensions and the
operation's attributes.

PiperOrigin-RevId: 585095167
They don't use gml_st dialect.

PiperOrigin-RevId: 585099478
This op is unsupported by tile analysis. Adding a test so that we don't shoot
ourselves in the foot by using `isElementwise` method, for example.

PiperOrigin-RevId: 585123986
We can revert this when/if we need this.

PiperOrigin-RevId: 585124862
@wbmc
Copy link
Author

wbmc commented Dec 5, 2023

@wbmc wbmc closed this Dec 5, 2023
wbmc pushed a commit that referenced this pull request Jan 19, 2024
ApsarasX pushed a commit that referenced this pull request Apr 8, 2024
Currently we look for ptxas and nvlink in a few different places on the host machine, then we choose the first found binary without taking its version into account. If the chosen binary doesn't fulfill our version requirements we will later fail even if there was a suitable ptxas or nvlink in the search path in the first place.

This change makes it take the version of each binary into account when going through the search path. Unsuitable binaries will be discarded right away and the search continues until we are out of locations to check.

This should help with host environments that have multiple CUDA toolkits installed and should make ptxas and nvlink selection more robust.

The concreate changes:

1. `FindCudaExecutable` now also takes a minimum version and a list of forbidden (think buggy) versions that are supposed to be skipped.
2. `WarnIfBadPtxAsVersion` has been removed. It was checking for ptxas < 11.1 which is way older than our minimum supported version of 11.8 and was not doing anything given the check described in #3.
3. There was another version check for `ptxas` in `NVPTXCompiler::ChooseLinkingMethod` which was checking for `version(ptxas)` < 11.8. This has also been removed/replace by the version check described in #4.
4. Version checking for `ptxas` and `nvlink` has been consolidated into 2 methods `FindPtxAsExectuable` and `FindNvLinkExecutable`. These methods hard code the current minimum version (and the list of excluded versions) of each tool in one place. It's still not great but at least less spaghetti-like.

PiperOrigin-RevId: 618797392
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.