-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attention on different input and output length #14
Comments
Hello, Aayushee - I've practiced with this library a bit, ultimately I made it work for my practice project (though it does conflict with later version of keras installation as another issue with time distributed dense layer suggests). I have also experienced similar very-imperfect translations at some point when the model was not well tuned - but I was able to make it work eventually. Notice that the first and second symbols in your translations are different, so your model is technically able to generate different translations. Perhaps the model has simply not learned the right translations yet? With long sequences, the parameter space of the model may be too complex (e.g. high curvature) to be learned quickly. I have chosen to stick with words rather than symbols for output encoding to shorten sequence length and facilitate learning. Could you confirm what happens if you run the optimization further? Can you see loss function improving substantially as you tune the model? I suggest to use relatively small learning rate, and go through many iterations of gradient descend to see if you can notice improvement. |
Hi Aayushee, |
@chungfu27 If return_sequences are made true with repeat vector then you will be getting this error before passing to the decoder ValueError: Input 0 is incompatible with layer repeat_vector_1: expected ndim=2, found ndim=3 |
Yeah @chungfu27 .That's right,as said by ghost that making return sequence True,it won't be possible to use repeat vector which makes it incompatible for different lengths |
Yeah, @chungfu27. It doesn't make sense to make return_sequence false. |
Hello
Thanks a lot for providing easy to understand tutorial and attention layer implementation.
I am trying to use attention on a dataset with different input and output length.
My training data sequence consists of size 6004 (600 4-dimensional points) and output one hot encoded is of size 7066 (66 symbols represented in a 70 length vector). I have to map the 600 points sequence to the 70 symbols for ~15000 such sequences.
Just after applying LSTM layer, I tried using a Repeated Vector with the output length for a small dataset. I read that Repeated Vector is used in encoder decoder models where output and input sequence are not of same length. Here is what I tried:
x_train.shape=(50,600,4)
y_train.shape=(50,70,66)
inputs = Input(shape=(x_train.shape[1:]))
rnn_encoded = Bidirectional(LSTM(32, return_sequences=False),name='bidirectional_1',merge_mode='concat',trainable=True)(inputs)
encoded = RepeatVector(y_train.shape[1])(rnn_encoded)
y_hat = AttentionDecoder(70,name='attention_decoder_1',output_dim=y_train.shape[2], return_probabilities=False, trainable=True)(encoded)
But the prediction from this model always gives same symbols in the output sequence after every run:
'decoded model output:', ['d', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I'])
('decoded original output:', ['A', ' ', 'M', 'O', 'V', 'E', ' ', 't', 'o', ' ', 's', 't', 'o', 'p', ' ', 'M', 'r', ' ', '.', ' ', 'G', 'a', 'i', 't', 's', 'k', 'e', 'l', 'l', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0'])
Could you please give an idea where I am going wrong and what can I possibly do to solve the problem?
Any help would be much appreciated.
Thanks
Aayushee
The text was updated successfully, but these errors were encountered: