From 8c875aa48ae4eab91dedaa01ee95311353a0bab8 Mon Sep 17 00:00:00 2001 From: JackCaoG <59073027+JackCaoG@users.noreply.github.com> Date: Thu, 30 Nov 2023 16:48:10 -0800 Subject: [PATCH] Update troubleshooting doc with few common env var combination (#5929) --- TROUBLESHOOTING.md | 38 +++++++++++++++++++++----------------- 1 file changed, 21 insertions(+), 17 deletions(-) diff --git a/TROUBLESHOOTING.md b/TROUBLESHOOTING.md index ca3ddf702c2a..e2425cbc8280 100644 --- a/TROUBLESHOOTING.md +++ b/TROUBLESHOOTING.md @@ -259,6 +259,10 @@ only be enabled for debugging. * ```XLA_SYNC_WAIT```: Forces the XLA tensor sync operation to wait for its completion, before moving to the next step. +* ```XLA_USE_EAGER_DEBUG_MODE```: Forces the XLA tensor to execute eagerly, meaning compile and execute the torch operations one + by one. This is useful to bypass the long compilation time but overall step time will be a lot slower and memory usage will be higher + since all compiler optimizaiton will be skipped. + * ```XLA_USE_BF16```: If set to 1, transforms all the _PyTorch_ _Float_ values into _BiFloat16_ when sending to the _TPU_ device. Note that when using `XLA_USE_BF16=1` tensor arithmetic will be done in reduced precision and so tensors will not be accurate if accumulated over time. @@ -278,28 +282,12 @@ only be enabled for debugging. * ```XLA_USE_F16```: If set to 1, transforms all the _PyTorch_ _Float_ values into _Float16_ (_PyTorch_ _Half_ type) when sending to devices which supports them. -* ```XLA_USE_32BIT_LONG```: If set to 1, maps _PyTorch_ _Long_ types to _XLA_ 32bit type. - On the versions of the TPU HW at the time of writing, 64bit integer computations are - expensive, so setting this flag might help. It should be verified by the user that truncating - to 32bit values is a valid operation according to the use of _PyTorch_ _Long_ values in it. - * ```TF_CPP_LOG_THREAD_ID```: If set to 1, the TF logs will show the thread ID helping with debugging multithreaded processes. * ```TF_CPP_VMODULE```: Environment variable used for TF VLOGs and takes the form of `TF_CPP_VMODULE=name=value,...`. Note that for VLOGs you must set - `TF_CPP_MIN_LOG_LEVEL=0`. For PyTorch/XLA using a configuration like - `TF_CPP_VMODULE=tensor=5` would enable logging such as: - - ``` - 2019-10-03 17:23:56.419040: I 27891 torch_xla/csrc/tensor.cpp:1104] - Executing IR graph hash 4211381954965020633 on device TPU:3 done! - 2019-10-03 17:23:56.419448: I 27890 torch_xla/csrc/tensor.cpp:1104] - Executing IR graph hash 15483856951158150605 on device TPU:5 done! - 2019-10-03 17:23:56.419539: I 27896 torch_xla/csrc/tensor.cpp:1104] - Executing IR graph hash 4211381954965020633 on device TPU:4 done! - ... - ``` + `TF_CPP_MIN_LOG_LEVEL=0`. * ```TF_CPP_MIN_LOG_LEVEL```: Level to print messages for. `TF_CPP_MIN_LOG_LEVEL=0` will turn on INFO logging, `TF_CPP_MIN_LOG_LEVEL=1` WARNING and so on. Our PyTorch/XLA `TF_VLOG` uses @@ -308,3 +296,19 @@ only be enabled for debugging. * ```XLA_DUMP_HLO_GRAPH```: If set to `=1` in case of a compilation or execution error the offending HLO graph will be dumped as part of the runtime error raised by `xla_util.cc`. +### Common Debugging Environment Variables Combinations + +* Record the graph execution in the IR format + ``` + XLA_SAVE_TENSORS_FMT="hlo" XLA_SAVE_TENSORS_FILE="/tmp/save1.hlo" + ``` + +* Record the graph execution in the HLO format + ``` + XLA_SAVE_TENSORS_FMT="text" XLA_SAVE_TENSORS_FILE="/tmp/save1.ir" + ``` + +* Show debugging VLOG for runtime and graph compilation/execution + ``` + TF_CPP_MIN_LOG_LEVEL=0 TF_CPP_VMODULE="xla_graph_executor=5,pjrt_computation_client=3" + ```