It generates a caption describing the picture given to the model. I used Xception model pretrained from keras for feature extraction , LSTM, Glove Embeddings for handling text data, CNN for handling images and combine this by add to get prediction of words and then make prediction either by Greedy search or Beam search.