[RUNTIME] TPU Backend Support #433

merrymercy · 2022-05-07T21:27:30Z

Background

Currently, Alpa only supports GPU. With XLA as Alpa's backend, it is easy to support TPU as well. We can call the auto-sharding pass to generate the sharding annotated HLO and then call the closed-source TPU compiler.

TODO

Support shard_parallel on TPU
Reproduce benchmark results on TPU

OhadRubin · 2022-10-09T05:05:15Z

What is the status of this issue?
It would be nice to run a larger models on multiple v3-8.

merrymercy · 2022-10-11T23:21:59Z

@OhadRubin It should be supported very soon. See our tpu-support branch and the official tensorflow code

merrymercy · 2022-11-09T01:13:27Z

#764

zhisbug · 2022-11-15T07:53:55Z

Is TPU+pipeshard still in our scope? @merrymercy @ZYHowell

merrymercy · 2022-11-17T16:12:01Z

No. I updated the second todo item as ”reproduce benchmark results on TPU”

jon-chuang · 2023-04-04T15:55:12Z

Reproduce benchmark results on TPU

I guess free TPU is not good enough to run a proper benchmark...😅 wonder if Google would sponsor many TPUs for research purposes

Lime-Cakes · 2023-05-06T12:02:49Z

Pipeshard isn't going to be supported on TPU?

ZYHowell · 2023-05-06T15:46:40Z

We have no plan mainly because 1) the TPU backend of XLA is close sourced; 2) unlike NCCL for GPU, the TPU has no communication library exposed

Lime-Cakes · 2023-05-06T18:05:38Z

Thanks!

~~Is there going to be none-cuda jaxlib (the alpa fork)? And is there some guide for installing alpa for TPU? It seems all the guide is for CUDA machines.~~

~~Seems like the CUDA wheels worked fine~~

Edit: Actually had conflicts with tpulib. Might have done something wrong. Errors pops up while trying to run on tpu.

Do we have to build our own jaxlib with the enable_tpu flag on?

ZYHowell · 2023-05-09T15:01:45Z

Yes I think you need to compile it(we can't do so mainly because this needs a TPU backend with TPU lib to compile). But in case we only supports shard parallel in the TPU side, and the part is already in the upstream official jax/jaxlib support, I'd suppose you to try the pjit.auto

Lime-Cakes · 2023-05-09T16:01:01Z

It's upstream into jax? Wow, I didn't know that. Thank you so much!

merrymercy assigned ZYHowell May 7, 2022

merrymercy added the enhancement New feature label Dec 20, 2022

ZYHowell mentioned this issue May 15, 2023

Building alpa/jaxlib for TPU #924

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RUNTIME] TPU Backend Support #433

[RUNTIME] TPU Backend Support #433

merrymercy commented May 7, 2022 •

edited

Loading

OhadRubin commented Oct 9, 2022

merrymercy commented Oct 11, 2022

merrymercy commented Nov 9, 2022

zhisbug commented Nov 15, 2022

merrymercy commented Nov 17, 2022 •

edited

Loading

jon-chuang commented Apr 4, 2023

Lime-Cakes commented May 6, 2023

ZYHowell commented May 6, 2023

Lime-Cakes commented May 6, 2023 •

edited

Loading

ZYHowell commented May 9, 2023 •

edited

Loading

Lime-Cakes commented May 9, 2023

[RUNTIME] TPU Backend Support #433

[RUNTIME] TPU Backend Support #433

Comments

merrymercy commented May 7, 2022 • edited Loading

Background

TODO

OhadRubin commented Oct 9, 2022

merrymercy commented Oct 11, 2022

merrymercy commented Nov 9, 2022

zhisbug commented Nov 15, 2022

merrymercy commented Nov 17, 2022 • edited Loading

jon-chuang commented Apr 4, 2023

Lime-Cakes commented May 6, 2023

ZYHowell commented May 6, 2023

Lime-Cakes commented May 6, 2023 • edited Loading

ZYHowell commented May 9, 2023 • edited Loading

Lime-Cakes commented May 9, 2023

merrymercy commented May 7, 2022 •

edited

Loading

merrymercy commented Nov 17, 2022 •

edited

Loading

Lime-Cakes commented May 6, 2023 •

edited

Loading

ZYHowell commented May 9, 2023 •

edited

Loading