This is the official code of paper: Usable XAI: 10 Strategies Towards Exploiting Explainability in the LLM Era. In this repo, we implement several explanation methods for LLMs, including a gradient-based attribution method, a EK-FAC approximated influence function, and an in-context demonstration strategy. Our implementations could be easily extended to various language model families, such as GPT-2, LLaMA, and Mistral. This codebase could serve as a foundational resource for advancing discussions on XAI in the era of LLMs.
-
Setup: We assume that you manage the environment with Conda library.
>>> conda create -n UsableXAI python=3.9 -y >>> conda activate UsableXAI >>> pip install -U requirements.txt
-
Dataset: We include three public datasets for case studies: MultiRC, HalluEval-V2, and SciFact. They are located in the
./datasets/
folder.
- The implemented explanation methods are in the
./libs/core/
folder. Our implementation should be easily adapted to different language model families from the Huggingfacetransformers
library.
Our case studies are listed in the ./Case_Studies/
folder.
-
Hallucination Detection
We propose to use attribution scores between the responses and the prompts to develop a hallucination detector. Our case study shows that a smaller language model (i.e., Vicuna-7B) can be used to detect the hallucinated responses generated by a large model (i.e., ChatGPT). See details at here.
-
LLM Response Verification
We propose to use attribution scores between the responses and the input contents to estimate whether a generated response is reliable or not. Our case study shows that the content highlighted by the attribution scores can be used to verify the quality of the corresponding response. See details at here.
-
Training Sample Influence Estimation
We implement the influence function for LLMs (e.g., Vicuna-13B and Mistral-7B) according to the EK-FAC estimation suggested by Roger Grosse et al. (2023). Our case study shows that EK-FAC is an practical strategy to estimate the contribution of each training samples for response generation. See details at here.
-
Is CoT Really Making LLM Inferences Explainable?
We consider the fidelity metric to measure the faithfulness of Chain-of-Thoughts (CoTs) in explaining model predictions. Our case study shows that the explanation contents in CoTs can generally be regarded as the explanation for the final prediction. However, these explanations may not be faithful to the final prediction in some cases. Details are coming soon.
- Improve documents and readme
- Release the explanation methods as a package