How to inference with Transformer ? decoder.decode_seq() vs decoder() ? #632

chenjunweii · 2019-03-16T14:59:09Z

chenjunweii
Mar 16, 2019

Hi, guys, I train a transformer net recently, but i got a problem

I thought decoder.decode_seq is used for training, and we have to input target_seq, so we can train the net with teacher forcing, am i wrong ?

there are two different code down below, first one use the api decoder.decode_seq(), second one use decoder() to implement teacher forcing, I thought these two code could get same result, but actually not, and i cant tell why, anyone knows why ?

or how could I do the inference correctly ?

Thanks

decoder.decode_seq()

state = decoder.init_state_from_encoder(encoder_output)

decoder_output = decoder.decode_seq(target_seq, state)

decoder()

state = decoder.init_state_from_encoder(encoder_output)

decoder_output = []

decoder_output.append( one_hot('[BOS]') )

for seq in time_major_target_seq:

    output, state, _ = decoder(seq, state)

    decoder_output.append(output)

decoder_output = nd.concat(decoder_output)

Answered by szhengac

Mar 21, 2019

@chenjunweii For inference, you can use following code:

gluon-nlp/scripts/machine_translation/train_transformer.py

Lines 241 to 248 in 6ec0c84

     samples, _, sample_valid_length = \  
   translator.translate(src_seq=src_seq, src_valid_length=src_valid_length)  
   max_score_sample = samples[:, 0, :].asnumpy()  
   sample_valid_length = sample_valid_length[:, 0].asnumpy()  
   for i in range(max_score_sample.shape[0]):  
   translation_out.append(  
   [tgt_vocab.idx_to_token[ele] for ele in  
   max_score_sample[i][1:(sample_valid_length[i] - 1)]])  

 

It uses translator to output the sampled sequences. Specifically, it internally uses beam search and function decode_step.

View full answer

szha · 2019-03-16T18:12:30Z

szha
Mar 16, 2019
Maintainer

@chenjunweii you're right in that decode_seq is implementing teacher forcing. Note that in teacher forcing, the decoder knows the ground truth of the previous step, whereas a regular decoder only takes its own prediction from last step. This means for the decoder, in the teacher forcing method it's actually dealing with a simpler problem than the free decode in the second case. Small error in free decode can accumulate from the previous steps and cause the prediction to go worse with longer sequences.

0 replies

chenjunweii · 2019-03-16T23:06:03Z

chenjunweii
Mar 16, 2019
Author

@szha Thanks for your reply, I know the difference between free running and teacher, but what i mean is i thought the second code is also teacher forcing, decoder take each step of target sequence as input, so i thought two code would have the same result, but not, and I can't figure out what is the problem of the second code , Thanks

0 replies

szha · 2019-03-17T00:23:56Z

szha
Mar 17, 2019
Maintainer

🤦‍♂️ sorry that I completely misunderstood you as I missed that the seq variable is from ground truth. cc @szhengac

0 replies

szhengac · 2019-03-17T19:00:29Z

szhengac
Mar 17, 2019
Maintainer

decoder.decode_seq is used for training, and has some additional operations on making mask.

0 replies

szhengac · 2019-03-21T12:04:25Z

szhengac
Mar 21, 2019
Maintainer

@chenjunweii For inference, you can use following code:

gluon-nlp/scripts/machine_translation/train_transformer.py

Lines 241 to 248 in 6ec0c84

    
           samples, _, sample_valid_length = \ 
        
               translator.translate(src_seq=src_seq, src_valid_length=src_valid_length) 
        
           max_score_sample = samples[:, 0, :].asnumpy() 
        
           sample_valid_length = sample_valid_length[:, 0].asnumpy() 
        
           for i in range(max_score_sample.shape[0]): 
        
               translation_out.append( 
        
                   [tgt_vocab.idx_to_token[ele] for ele in 
        
                    max_score_sample[i][1:(sample_valid_length[i] - 1)]])

It uses translator to output the sampled sequences. Specifically, it internally uses beam search and function decode_step.

0 replies

lambdaofgod · 2019-03-26T12:31:31Z

lambdaofgod
Mar 26, 2019

@szhengac What are shapes of inputs in translator.translate?

I tried to run them as in your example, but I get a mismatch

MXNetError: Shape inconsistent, Provided = [n_hid,3200], inferred shape=(n_hid,n_hid)

Where 3200 = sequence length * n_hid

Details:

example_input_batch is a tensor of shape batch_size=100, length=25

I run either
translator.translate(example_input_batch, None)
or

translator.translate(example_input_batch, nd.full((example_output_batch.shape[0]), ctx=context, val=example_output_batch.shape[1]))

I added print-debugs to your code to show intermediate tensor shapes, when I print before offending line (samples, scores, sample_valid_length = self._sampler(inputs, decoder_states) I get

inputs (100,)
decoder_states [(100, 25, 128), (100, 25)]
_ []
encoder_outputs (100, 25, 128)
src_valid_length (100,)
src_seq (100, 25)

0 replies

szhengac · 2019-03-26T12:40:32Z

szhengac
Mar 26, 2019
Maintainer

@lambdaofgod The input has following shape:

gluon-nlp/scripts/machine_translation/translation.py

Lines 60 to 61 in e1910c5

    
                   src_seq : mx.nd.NDArray 
        
                       Shape (batch_size, length)

src_seq is a tensor of int. src_valid_length has shape (batch_size, ), wherein i-th entry stores the sequence length of preprocessed i-th sample.

0 replies

lambdaofgod · 2019-03-26T12:49:37Z

lambdaofgod
Mar 26, 2019

@szhengac my prints seem to validate that, if so, do you have an idea why doesn't it work?

For me it seems like second decoder_state should only have shape (batch_size,) but I've also checked your NMT script for that and its shapes are consistent with what I get here... Still, your NMT script runs, and I get that bug, even though the input is consistent with the one from your example

0 replies

szhengac · 2019-03-26T13:15:13Z

szhengac
Mar 26, 2019
Maintainer

@lambdaofgod Can you comment out model.hybridize(static_alloc=static_alloc) and see which line cause the error?

0 replies

chenjunweii · 2019-03-26T14:21:55Z

chenjunweii
Mar 26, 2019
Author

@szhengac Thanks for your reply, and that's what I need thanks !

by the way what is Parallel Transformer ? could it speed up training process ?

0 replies

szhengac · 2019-03-26T14:27:30Z

szhengac
Mar 26, 2019
Maintainer

@chenjunweii ParallelTransformer uses multi-threading for multi-gpu training. The naive implementation for multi-gpu training in gluon does not fully achieve parallelization.

0 replies

lambdaofgod · 2019-03-26T14:35:16Z

lambdaofgod
Mar 26, 2019

@szhengac

Ok, this is getting unwieldy, I created this issue

0 replies

szha · 2019-04-01T18:19:12Z

szha
Apr 1, 2019
Maintainer

Let me know if you need this issue reopened.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to inference with Transformer ? decoder.decode_seq() vs decoder() ? #632

{{title}}

Replies: 13 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

	samples, _, sample_valid_length = \
	translator.translate(src_seq=src_seq, src_valid_length=src_valid_length)
	max_score_sample = samples[:, 0, :].asnumpy()
	sample_valid_length = sample_valid_length[:, 0].asnumpy()
	for i in range(max_score_sample.shape[0]):
	translation_out.append(
	[tgt_vocab.idx_to_token[ele] for ele in
	max_score_sample[i][1:(sample_valid_length[i] - 1)]])

How to inference with Transformer ? decoder.decode_seq() vs decoder() ? #632

chenjunweii Mar 16, 2019

Replies: 13 comments

szha Mar 16, 2019 Maintainer

chenjunweii Mar 16, 2019 Author

szha Mar 17, 2019 Maintainer

szhengac Mar 17, 2019 Maintainer

szhengac Mar 21, 2019 Maintainer

lambdaofgod Mar 26, 2019

szhengac Mar 26, 2019 Maintainer

lambdaofgod Mar 26, 2019

szhengac Mar 26, 2019 Maintainer

chenjunweii Mar 26, 2019 Author

szhengac Mar 26, 2019 Maintainer

lambdaofgod Mar 26, 2019

szha Apr 1, 2019 Maintainer

chenjunweii
Mar 16, 2019

szha
Mar 16, 2019
Maintainer

chenjunweii
Mar 16, 2019
Author

szha
Mar 17, 2019
Maintainer

szhengac
Mar 17, 2019
Maintainer

szhengac
Mar 21, 2019
Maintainer

lambdaofgod
Mar 26, 2019

szhengac
Mar 26, 2019
Maintainer

lambdaofgod
Mar 26, 2019

szhengac
Mar 26, 2019
Maintainer

chenjunweii
Mar 26, 2019
Author

szhengac
Mar 26, 2019
Maintainer

lambdaofgod
Mar 26, 2019

szha
Apr 1, 2019
Maintainer