You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to understand the implementation of the anti-LM model, in particular the meaning of this line:
line 128: all_prob_t = model_step(dummy_encoder_inputs, cand['dec_inp'], dptr, target_weights, bucket_id)
where dummy_encoder_inputs is dummy_encoder_inputs = [np.array([data_utils.PAD_ID]) for _ in range(len(encoder_inputs))].
in tf_chatbot_seq2seq_antilm/lib/seq2seq_model_utils.py.
This is presumably the probability of the target (P(T)) from the paper https://arxiv.org/pdf/1510.03055.pdf, but how does feeding in an encoder input sequence of PAD give you the probability of T?
Anyone have any ideas?
Cheers,
Kuhan
The text was updated successfully, but these errors were encountered:
Probably training a separate language model on the dataset containing target responses would be an idea to calculate P(T)? Or using dummy encoder_inputs means given no input sentence, what's the probability of target response T.
@kuhanw I think it might make sense. Using PAD as the initial input of the encoder results in the same initial state of the decoder corresponding different p(T|S). Naturally, the first several output words of the decoder are more influenced by p(T|S) rather than U(T), that is consistent with the original thought of JiweiLi's paper. Commonly, the decoder is considered as a language model, and I think input empty is a simple way to implement anti-MMI without external Model. I hope it could be explicated by the author.
Hi all,
I am trying to understand the implementation of the anti-LM model, in particular the meaning of this line:
line 128: all_prob_t = model_step(dummy_encoder_inputs, cand['dec_inp'], dptr, target_weights, bucket_id)
where dummy_encoder_inputs is dummy_encoder_inputs = [np.array([data_utils.PAD_ID]) for _ in range(len(encoder_inputs))].
in tf_chatbot_seq2seq_antilm/lib/seq2seq_model_utils.py.
This is presumably the probability of the target (P(T)) from the paper https://arxiv.org/pdf/1510.03055.pdf, but how does feeding in an encoder input sequence of PAD give you the probability of T?
Anyone have any ideas?
Cheers,
Kuhan
The text was updated successfully, but these errors were encountered: