Document similarity measures are basis the several downstream applications in the area of natural language processing (NLP) and information retrieval (IR).
- ERCNN: Enhanced Recurrent Convolutional Neural Networks for Learning Sentence Similarity
- BERT
- GPT
- Generative Pre-Training-2 (GPT-2)
- Universal Language Model Fine-tuning (ULMFiT)
- XLNet
Overcoming BERT's 512 token limit:
- Long-form document classification with BERTr/bert_document_classification)
- BERT-AL: BERT for Arbitrarily Long Document Understanding
- Blockwise Self-Attention for Long Document Understanding
- BP-Transformer: Modelling Long-Range Context via Binary Partitioning. (2019).
- Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context.
- Longformer: The Long-Document Transformer
- Reformer
- Compressive Transformers for Long-Range Sequence Modelling
- Siamese Recurrent Architectures for Learning Sentence Similarity
- SMASH-RNN: [Jiang, J. et al. 2019. Semantic Text Matching for Long-Form Documents. The World Wide Web Conference on - WWW ’19 (New York, New York, USA, 2019), 795–806.]
- [Liu, B. et al. 2018. Matching Article Pairs with Graphical Decomposition and Convolutions. (Feb. 2018).]
- [Simple and Effective Text Matching with Richer Alignment Features]
- [Enhanced Text Matching Based on Semantic Transformation]