An example utils.py
:
api_key = "<API key>"
temperature = <model temperature>
delay_time = <the seconds between each request>
model = "<the name of the model>"
In main.py
, specify the server parameters:
template
: a list of prompt templates.version
: a list of question versions.language
: a list of language versions.label
: a list of level option labels.order
: a list of level orders.questionnaire_name
: the selected questionnaire.name_exp
: name of the save file.
Start a Server
class, all pre-testing cases are created and stored in save/<name_exp>.json
test = Server(questionnaire_name, template, version, language, label, order, name_exp)
Load the saved file as a new save, a protection mechanism for test interruption
test = load("<save_path>", "<new_save_name>")
Run for all pre-testing cases
test.run()
from server import *
template = ['t1','t2','t3','t4','t5']
version = ['v1','v2','v3','v4','v5']
language = ['En', 'Zh', 'Ko', 'Es', 'Fr', 'De', 'It', 'Ar', 'Ru', 'Ja']
label = ['n', 'al', 'au', 'rl', 'ru']
order = ['r', 'f']
questionnaire_name = 'BFI'
name_exp = 'bfi-save'
bfi_test = Server(questionnaire_name, template, version, language, label, order, name_exp)
bfi_test.run()
In main.py
, execute:
rephrase("<questionnaire_name>", "<specified_language>")
For more details, please refer to this paper. Please remember to cite us if you find our work helpful in your work!
@inproceedings{huang2024reliability,
author = {Jen{-}tse Huang and
Wenxiang Jiao and
Man Ho Lam and
Eric John Li and
Wenxuan Wang and
Michael R. Lyu},
title = {On the Reliability of Psychological Scales on Large Language Models},
booktitle = {The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP Main)},
year = {2024}
}