pmap inside jit #5681

cgarciae · 2021-02-08T19:03:38Z

Here w is being captured by g so it works in terms of shapes but its unclear if / how w being distributed to each device.

import jax

@jax.jit
def f(w, x):
    @jax.pmap
    def g(x):
        return w * x

    return g(x)

What is actually happening here?

Edit
Assume:

x.shape == (device, batch, d)
w.shape == (batch, d)

The text was updated successfully, but these errors were encountered:

jakevdp · 2021-02-09T00:02:50Z

You can see how this is passed to XLA using jax.make_jaxpr:

x = jnp.ones((1, 2, 3))
w = jnp.ones((2, 3))
jax.make_jaxpr(f)(w, x)

{ lambda  ; a b.
  let c = xla_call[ backend=None
                    call_jaxpr={ lambda  ; a b.
                                 let c = xla_pmap[ axis_name=<axis 0x7f4c49bbad08>
                                                   axis_size=1
                                                   backend=None
                                                   call_jaxpr={ lambda  ; a b.
                                                                let c = mul a b
                                                                in (c,) }
                                                   devices=None
                                                   donated_invars=(False, False)
                                                   global_arg_shapes=(None,)
                                                   global_axis_size=None
                                                   in_axes=(None, 0)
                                                   name=g
                                                   out_axes=(0,) ] a b
                                 in (c,) }
                    device=None
                    donated_invars=(False, False)
                    name=f ] a b
  in (c,) }

In particular, you can see that both a (which represents w) and b (which represents x) are passed as arguments to pmap, with in_axes=(None, 0). I believe this means that the values of w are replicated on each device in order to perform the computation, similarly to if you had defined the function like this:

@jax.jit
def f(w, x):
  @jax.partial(jax.pmap, in_axes=(None, 0))
  def g(w, x):
    return w * x
  return g(w, x)

which generates a nearly identical jaxpr.

skye · 2021-02-10T02:30:23Z

Also note that calling pmap inside of jit is not usually what you want! pmap already compiles your function the same way jit does, and furthermore, adding the extra jit can often causes performance issues (see #2926 for more info on why).

cgarciae · 2021-02-10T14:01:26Z

Thanks @jakevdp, your answer was really helpful in understand a bit more about jax tracing in general!

@skye thanks for the tip! Luckily I got a warning message about this when actually running some test code on a TPU that pointed towards this :)

jakevdp added the question Questions for the JAX team label Feb 9, 2021

cgarciae closed this as completed Feb 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pmap inside jit #5681

pmap inside jit #5681

cgarciae commented Feb 8, 2021 •

edited

Loading

jakevdp commented Feb 9, 2021 •

edited

Loading

skye commented Feb 10, 2021

cgarciae commented Feb 10, 2021 •

edited

Loading

pmap inside jit #5681

pmap inside jit #5681

Comments

cgarciae commented Feb 8, 2021 • edited Loading

jakevdp commented Feb 9, 2021 • edited Loading

skye commented Feb 10, 2021

cgarciae commented Feb 10, 2021 • edited Loading

cgarciae commented Feb 8, 2021 •

edited

Loading

jakevdp commented Feb 9, 2021 •

edited

Loading

cgarciae commented Feb 10, 2021 •

edited

Loading