The bulk of the code is in the jupyter notebook called Multimodal Learning. Submission.pdf has a lot of writing in it. The coolest part is definitely LiMBeRModel.py The simplest is FlickrEval.py For anyone interested, I have some self-contained environments for the daring. Good Luck.