Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

variables in dataset.py #3

Open
qfzhu opened this issue Nov 28, 2019 · 4 comments
Open

variables in dataset.py #3

qfzhu opened this issue Nov 28, 2019 · 4 comments

Comments

@qfzhu
Copy link

qfzhu commented Nov 28, 2019

Thanks for sharing the code.
May I ask what the variables "base_conv", "bias_conv", "base_nonc", and "bias_nonc" stand for?

@golsun
Copy link
Owner

golsun commented Nov 28, 2019

hi @qfzhu ,
conv = "conversation"
nonc = "non-conversation"
base = background, non-stylized data
bias = corpus of the target style

@qfzhu
Copy link
Author

qfzhu commented Nov 29, 2019

Thanks for the rely.
I think there are only two types of data: parallel non-stylized conversational data (base_conv) and monolingual stylized data (bias_nonc).
What the "base_nonc" and "bias_conv" refer to,
how can I build them from the two types of data,
and are they necessary for training the model?

@golsun
Copy link
Owner

golsun commented Nov 29, 2019

I think there are only two types of data: parallel non-stylized conversational data (base_conv) and monolingual stylized data (bias_nonc).

yes this is true for the implementation in the paper

What the "base_nonc" and "bias_conv" refer to,
how can I build them from the two types of data,
and are they necessary for training the model?

these two are optional (and we didn't use them in the paper).
bias_conv requires extraction of conversation from stylized corpus. this may be done for some corpus, e.g. Holmes novels (e.g. by regex), but not available for others, e.g. arXiv.
for base_nonc, you can simply use all the utterance turns from base_conv

@golsun
Copy link
Owner

golsun commented Dec 4, 2019

close as questions are answered.

@golsun golsun closed this as completed Dec 4, 2019
@golsun golsun reopened this Feb 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants