Skip to content

Latest commit

 

History

History
37 lines (26 loc) · 1.45 KB

README.md

File metadata and controls

37 lines (26 loc) · 1.45 KB

Document Similarity

Document similarity measures are basis the several downstream applications in the area of natural language processing (NLP) and information retrieval (IR).

Sentence-level

  • ERCNN: Enhanced Recurrent Convolutional Neural Networks for Learning Sentence Similarity

BERT and other Transformer Language Models

  • BERT
  • GPT
  • Generative Pre-Training-2 (GPT-2)
  • Universal Language Model Fine-tuning (ULMFiT)
  • XLNet

Overcoming BERT's 512 token limit:

  • Long-form document classification with BERTr/bert_document_classification)
  • BERT-AL: BERT for Arbitrarily Long Document Understanding
  • Blockwise Self-Attention for Long Document Understanding
  • BP-Transformer: Modelling Long-Range Context via Binary Partitioning. (2019).
  • Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context.
  • Longformer: The Long-Document Transformer
  • Reformer
  • Compressive Transformers for Long-Range Sequence Modelling

Siamese Networks

  • Siamese Recurrent Architectures for Learning Sentence Similarity
  • SMASH-RNN: [Jiang, J. et al. 2019. Semantic Text Matching for Long-Form Documents. The World Wide Web Conference on - WWW ’19 (New York, New York, USA, 2019), 795–806.]
  • [Liu, B. et al. 2018. Matching Article Pairs with Graphical Decomposition and Convolutions. (Feb. 2018).]

Text matching

  • [Simple and Effective Text Matching with Richer Alignment Features]
  • [Enhanced Text Matching Based on Semantic Transformation]