Document Similarity

Document similarity measures are basis the several downstream applications in the area of natural language processing (NLP) and information retrieval (IR).

Sentence-level

ERCNN: Enhanced Recurrent Convolutional Neural Networks for Learning Sentence Similarity

BERT and other Transformer Language Models

BERT
GPT
Generative Pre-Training-2 (GPT-2)
Universal Language Model Fine-tuning (ULMFiT)
XLNet

Overcoming BERT's 512 token limit:

Long-form document classification with BERTr/bert_document_classification)
BERT-AL: BERT for Arbitrarily Long Document Understanding
Blockwise Self-Attention for Long Document Understanding
BP-Transformer: Modelling Long-Range Context via Binary Partitioning. (2019).
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context.
Longformer: The Long-Document Transformer
Reformer
Compressive Transformers for Long-Range Sequence Modelling

Siamese Networks

Siamese Recurrent Architectures for Learning Sentence Similarity
SMASH-RNN: [Jiang, J. et al. 2019. Semantic Text Matching for Long-Form Documents. The World Wide Web Conference on - WWW ’19 (New York, New York, USA, 2019), 795–806.]
[Liu, B. et al. 2018. Matching Article Pairs with Graphical Decomposition and Convolutions. (Feb. 2018).]

Text matching

[Simple and Effective Text Matching with Richer Alignment Features]
[Enhanced Text Matching Based on Semantic Transformation]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Document Similarity

Sentence-level

BERT and other Transformer Language Models

Siamese Networks

Text matching

Files

README.md

Latest commit

History

README.md

File metadata and controls

Document Similarity

Sentence-level

BERT and other Transformer Language Models

Siamese Networks

Text matching