Add shard on batch mode. Als update version of torchxla2 #80

qihqi · 2024-05-13T23:20:15Z

No description provided.

FanhaiLu1 · 2024-05-13T23:23:15Z

jetstream_pt/cache_manager.py

@@ -12,10 +12,10 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-import torch_xla2
 import jax
 import jax.numpy as jnp
 import torch


Thanks for the change! It's more meaningful than wrap and unwrap...

FanhaiLu1 · 2024-05-13T23:31:27Z

jetstream_pt/engine.py

-    )
+    if self.env.shard_on_batch:
+      return Prefix(
+          self.replicated,  # cache is replicated because bs=1 for prefill


The multiple prefill isolated instance's performance is better than this replicated sharding (or we use samller vm to test instead of 4chips or 8 chips).

Hi Fanhai, do you mean each TPU chip do certain length of the prefill?

do you mean having 8 InterleavedEngine with one device each; vs. sharding on batch from jax?

The 8 InterleavedEngine vs sharding on batch from jax.

Of course because there is no collective operations involved. IIUC then the best config might be enable disaggregated serving and use multiple prefill engine instances instead of sharding.

FanhaiLu1 · 2024-05-13T23:32:10Z

jetstream_pt/engine.py

+      )
+    else:
+      return DecodeState(
+          self.replicated,  # shard on batch


remove the comments?

FanhaiLu1 · 2024-05-13T23:33:52Z

jetstream_pt/layers.py

-      scores = torch_xla2.extra.call_jax(
-          jnp.einsum, "ikjl,ikml->ikjm", xq, keys
-      ) / math.sqrt(head_dim)
-      self.env.apply_sharding(scores, axis=1)


Is matrix transpose HLO issues been fixed?

Can you share the PR for the fix?

And also update b/329899712?

pytorch/xla#7009

Thank you for the fix! Not directly related, but I see you specially treated einsum to fix the issue. Please also fix b/329899713 for matmul. But the conversation in this PR can be closed now.

wang2yn84

Thank you for improving the Torch xla2, adding Gemma 2 support, and clean up the code! If we can have 1 PR for each task, that'll be great and won't block you from merging.

wang2yn84 · 2024-05-14T16:38:32Z

jetstream_pt/engine.py

    self.cache_sharding = self.env.cache_sharding

+    jax.config.update("jax_enable_x64", False)


nit, we have config set up in both engine.py and the script, in the future we should place them together.

wang2yn84 · 2024-05-14T16:40:14Z

jetstream_pt/engine.py

-          for (k, v), (ks, vs) in torch_xla2.tensor.wrap(
-              list(zip(caches, cache_scales))
-          )
+          for (k, v), (ks, vs) in from_jax(list(zip(caches, cache_scales)))


Nit, since Pytorch should be the focus, is it better to use "to_torch" and "from_torch" instead of "from_jax" and "to_jax"? Ideally we should not see any Jax related term.

wang2yn84 · 2024-05-14T16:46:06Z

jetstream_pt/engine.py

    with self._lock:
-      with torch_xla2.tensor.XLADispatchMode():
+      with torch_xla2.default_env():


For Pytorch user, DispatchMode is more descriptive than default env? And we also need some comments here why we need TorchDispatchMode here.

wang2yn84 · 2024-05-14T16:48:55Z

jetstream_pt/engine.py

-    )
+    if self.env.shard_on_batch:
+      return Prefix(
+          self.replicated,  # cache is replicated because bs=1 for prefill


Hi Fanhai, do you mean each TPU chip do certain length of the prefill?

wang2yn84 · 2024-05-14T16:50:21Z

jetstream_pt/engine.py

-        self.replicated,
-    )
+    if self.env.shard_on_batch:
+      return DecodeState(


Nit, can we do

return DecodeState(
self.x_sharding if self.env.shard_on_batch else self.replicated, # shard on batch
self.cache_sharding,
self.replicated,
self.replicated,
self.replicated,
self.replicated,
self.replicated,
)

wang2yn84 · 2024-05-14T16:54:06Z

jetstream_pt/layers.py

+      if self.env.shard_on_batch:
+        self.env.apply_sharding(output, axis=0)
+      else:
+        self.env.apply_sharding(output, axis=1)


Nit, can we use self.env.shard_on_batch to select the axis instead of duplicate the apply_sharding?

wang2yn84 · 2024-05-14T16:54:28Z

jetstream_pt/layers.py

-      self.env.apply_sharding(xq, axis=2)
-      self.env.apply_sharding(xk, axis=2)
-      self.env.apply_sharding(xv, axis=2)
+      if self.env.shard_on_batch:


wang2yn84 · 2024-05-14T16:54:53Z

jetstream_pt/third_party/gemma/model.py

@@ -148,15 +148,15 @@ def forward(
    xk = xk.view(batch_size, -1, self.num_kv_heads, self.head_dim)
    xv = xv.view(batch_size, -1, self.num_kv_heads, self.head_dim)

-    if self.num_kv_heads > 1:
+    if self.env.shard_on_batch:


wang2yn84 · 2024-05-14T16:56:13Z

run_interactive.py

@@ -75,6 +75,9 @@
 _SHARDING_CONFIG = flags.DEFINE_string(
    "sharding_config", "", "config file for sharding"
 )
+_SHARD_ON_BATCH = flags.DEFINE_bool(


Nit, can you add when enabled, it overwrites the sharding_config or something similar?

wang2yn84 · 2024-05-14T16:56:25Z

run_server.py

@@ -89,6 +89,9 @@
 _SHARDING_CONFIG = flags.DEFINE_string(
    "sharding_config", "", "config file for sharding"
 )
+_SHARD_ON_BATCH = flags.DEFINE_bool(


Add shard on batch mode. Als update version of torchxla2

b6bb1fa

qihqi requested review from lsy323, wang2yn84 and FanhaiLu1 May 13, 2024 23:20

FanhaiLu1 reviewed May 13, 2024

View reviewed changes

wang2yn84 reviewed May 14, 2024

View reviewed changes

qihqi force-pushed the hanq_dev branch from b0d861a to 7c9b55a Compare May 14, 2024 18:49

qihqi requested review from wang2yn84 and FanhaiLu1 May 14, 2024 18:50

Address comments

cffc09a

qihqi force-pushed the hanq_dev branch from 7c9b55a to cffc09a Compare May 14, 2024 19:46

lsy323 approved these changes May 14, 2024

View reviewed changes

FanhaiLu1 approved these changes May 14, 2024

View reviewed changes

wang2yn84 approved these changes May 14, 2024

View reviewed changes

FanhaiLu1 merged commit 776c1c4 into main May 14, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add shard on batch mode. Als update version of torchxla2 #80

Add shard on batch mode. Als update version of torchxla2 #80

qihqi commented May 13, 2024

FanhaiLu1 May 13, 2024

FanhaiLu1 May 13, 2024

wang2yn84 May 14, 2024

qihqi May 14, 2024

FanhaiLu1 May 14, 2024

wang2yn84 May 14, 2024

FanhaiLu1 May 13, 2024

qihqi May 14, 2024

FanhaiLu1 May 13, 2024

qihqi May 14, 2024

FanhaiLu1 May 14, 2024

wang2yn84 May 14, 2024

qihqi May 14, 2024

wang2yn84 May 14, 2024

wang2yn84 left a comment

wang2yn84 May 14, 2024

wang2yn84 May 14, 2024

qihqi May 14, 2024

wang2yn84 May 14, 2024

qihqi May 14, 2024

wang2yn84 May 14, 2024

wang2yn84 May 14, 2024

qihqi May 14, 2024

wang2yn84 May 14, 2024

qihqi May 14, 2024

wang2yn84 May 14, 2024

qihqi May 14, 2024

wang2yn84 May 14, 2024

qihqi May 14, 2024

wang2yn84 May 14, 2024

qihqi May 14, 2024

wang2yn84 May 14, 2024

qihqi May 14, 2024

		self.cache_sharding = self.env.cache_sharding

		jax.config.update("jax_enable_x64", False)

Add shard on batch mode. Als update version of torchxla2 #80

Add shard on batch mode. Als update version of torchxla2 #80

Conversation

qihqi commented May 13, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wang2yn84 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment