Training on TPU #89

Dhanachandra · 2022-01-12T07:57:15Z

How to train the GPT2-xl on TPU? And which TPU can be used to train? And what would be RAM size?

Noah-Huppert · 2022-06-28T04:16:38Z

I'm not 100% sure, because I decided to ditch my TPU efforts before I got training working (TPUs ended up being way to expensive and during my dev work I was on a way too small VM so training was failing to do OOM errors on the VM), but I think if you put the following code before the tf.Session() is created in train.py it will connect to a TPU:

tpu_resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu="<TPU NODE NAME HERE>")
tf.config.experimental_connect_to_cluster(tpu_resolver)
tf.tpu.experimental.initialize_tpu_system(tpu_resolver)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training on TPU #89

Training on TPU #89

Dhanachandra commented Jan 12, 2022

Noah-Huppert commented Jun 28, 2022

Training on TPU #89

Training on TPU #89

Comments

Dhanachandra commented Jan 12, 2022

Noah-Huppert commented Jun 28, 2022