The data was downloaded from Controllable Dialogue project.
We also use the pre-trained model from the same github repo.
The hyperparameters of these models is listed in model_configs.py
.
The stats.csv
summarize the human evaluation performance (i.e., ground truth) of 10 metrics for each model.
The code for generating the data is similar to ParlAI/issues/2855.
There are 4 different settings:
opedata
: all human-model evaluation logsopedata_half
: subsampled by 50%opedata_small
: subsampled by 10%opedata_hard
: filter out very similar experence data for each target model. See below.
There are 28 folders within each setting containing pre-processed OPE data for the 28 target models.
The json fileds of data.json
:
-
dialog
: a sequence of responses [e1,a1,a1',e2,a2,a2',...], where where ei: response from human, ai: response from behavior model, ai': response from target model.
Each response contains a speaker tag and resposne text.
Explanation: [e1,a1,e2,a2,...] is the original experience data. ai' is generated by target model conditioned on previous experience ([e1,a1,e2,a2,...,ei]) as illustrated in below. -
reward
: 10 human evaluation score for behavior dialogue (e0,a0,e1,a1,....). -
model_persona
,human_persona
: the persona of the models -
For the
opedata_hard
setting, if ai and ai' are similar for all i's, the experience dialogue is filtered out. -
The
data.json
file can be directed used by ENIGMA.
Install from Controllable Dialogue.
from parlai.core.agents import create_agent
from parlai.core.params import ParlaiParser
import model_configs
SILENCE = '__SILENCE__'
PERSONA_PREFIX = "your persona: "
# Setup Agent
model_name = 'responsiveness_model_bfw_setting_00'
config = getattr(model_configs, model_name)
parser = ParlaiParser(True, True)
opt = parser.parse_args()
opt.update(config)
opt['override'] = config
agent = create_agent(opt, True)
agent.reset()
# Observe model_persona
model_persona = "i grew up in the industry.\nwe spent summers ring at national parks.\ni am a country music singer.\nmy favorite band is rascal flats."
model_persona = PERSONA_PREFIX + model_persona.replace("\n", "\n" + PERSONA_PREFIX)
# run for 10 rounds
for i in range(10):
text = input()
if i == 0:
text = model_persona + '\n' + text
observed = agent.observe(
{'id': 'SPEAKER_1', 'text': text, 'episode_done': False}
)
response = agent.act()
print(response)
Abigail See, Stephen Roller, Douwe Kiela, Jason Weston. What makes a good conversation? How controllable attributes affect human judgments.