DeLLMa: Decision Making Under Uncertainty with Large Language Models

Setup the environment by first downloading this repository and then running:

pip install -e .

Reproduce DeLLMa results by running

bash ./results.sh

Baselines

Query GPT-4 for baseline methods. [AGENT] can be chosen from {farmer, trader}, and [BASELINE] can be chosen from {zero-shot, cot, self-consistency}

python main.py --agent_name [AGENT] --dellma_mode [BASELINE] --results_path PATH/TO/RESULT

Evaluate performance of baseline methods.

python evalute_dellma.py --agent_name [AGENT] --pref_enum_mode [BASELINE] --results_path PATH/TO/RESULT

DeLLMa Agents

Query GPT-4 for DeLLMa-Naive. Here, [SIZE] denotes the total samples size we use for DeLLMa-Naive (i.e. distributed across all actions). We use 50 in our paper.

python main.py --agent_name [AGENT] --dellma_mode rank --sample_size [SIZE] --results_path PATH/TO/RESULT

Query GPT-4 for DeLLMa-{Pairs, Top1}. Here, [SIZE] denotes the per action sample size. We use 64 for our best performing agent and ablate from 4 to 64 in our ablation studies. [PCT] denotes the proportions shared between minibatches. For farmer, we use 0.25 and for trader we use 0.5. In our ablation studies we study values from 0, 0.25, 0.5, 0.75.

python main.py --agent_name [AGENT] --dellma_mode rank-minibatch --sample_size [SIZE] --overlap_pct [PCT] --results_path PATH/TO/RESULT

Evaluate DeLLMa agents. [DeLLMa] can be chosen from {rank, rank-minibatch}. [ALPHA] controls the regularization strength when optimizing the Bradley-Terry Model with the ILSR algorithm, [SOFTMAX] governs how we normalize the utility function after the ILSR procedure. Choices from full (apply softmax to the complete utility vector) and action (first group all state-action pairs by action, then softmax). [TEMP] temperature scaling for the softmax.

python evaluate_dellma.py    \
    --agent_name [AGENT]     \
    --pref_enum_mode [DeLLMa]  \
    --sample_size [SIZE]     \
    --overlap_pct [PCT]      \
    --alpha [ALPHA]          \
    --softmax_mode [SOFTMAX] \
    --temperature [TEMP]     \
    --results_path PATH/TO/RESULT

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
agent		agent
cache		cache
data		data
results		results
utils		utils
.gitignore		.gitignore
README.md		README.md
evaluate_dellma.py		evaluate_dellma.py
main.py		main.py
requirements.txt		requirements.txt
results.sh		results.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeLLMa: Decision Making Under Uncertainty with Large Language Models

Baselines

DeLLMa Agents

About

Releases

Packages

Contributors 2

Languages

DeLLMa/DeLLMa

Folders and files

Latest commit

History

Repository files navigation

DeLLMa: Decision Making Under Uncertainty with Large Language Models

Baselines

DeLLMa Agents

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages