Awesome Long-Tail Learning

This repo pays specially attention to the long-tailed distribution, where labels follow a long-tailed or power-law distribution in the training dataset or/and test dataset. Related papers are sumarized, including its application in computer vision, in particular image classification, and extreme multi-label learning (XML), in particular text categorization.

🔆 Updated 2021-04-23

Long-tailed Distribution
- Long-tailed Distribution in Computer Vision
- eXtreme Multi-label Learning

Long-tailed Distribution in Computer Vision

Type of Long-Tail Learning Methods

Type	`TST`	`IS`	`CBS`	`CLW`	`NC`	`ENS`	`DA`
Meaning	Two-Stage Training	Instance Sampling	Class-Balanced Sampling	Class-Level Weighting	Normalized Classifier	Ensemble	Data Augmentation

Year	Venue	Title	Remark
2021	Arxiv	Balanced Knowledge Distillation for Long-tailed Learning	`CBS`+`IS`, Code
2021	Arxiv	Class-Balanced Distillation for Long-Tailed Visual Recognition	`TST`+`NC`+`CBS`+`IS`, by Google Research
2021	Arxiv	Distributional Robustness Loss for Long-tail Learning	`TST`+`CBS`
2021	CVPR	Improving Calibration for Long-Tailed Recognition	`DA`
2021	CVPR	Distribution Alignment: A Unified Framework for Long-tail Visual Recognition	`TST`
2021	CVPR	Adversarial Robustness under Long-Tailed Distribution
2021	CVPR	CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning	by Google
2021	ICLR	LONG-TAILED RECOGNITION BY ROUTING DIVERSE DISTRIBUTION-AWARE EXPERTS	`ENS`+`NC`, Code, by Zi-Wei Liu
2021	ICLR	Long-Tail Learning via Logit Adjustment	by Google
2021	AAAI	Bag of Tricks for Long-Tailed Visual Recognition with Deep Convolutional Neural Networks
2020	CVPR	Equalization Loss for Long-Tailed Object Recognition
2020	CVPR	Deep Representation Learning on Long-tailed Data: A Learnable Embedding Augmentation Perspective
2020	ICLR	Decoupling representation and classifier for long-tailed recognition	Code
2020	NeurIPS	Rethinking the Value of Labels for Improving Class-Imbalanced Learning	Code
2020	CVPR	Bbn: Bilateral-branch network with cumulative learning for long-tailed visual recognition	Code
2021	Arxiv	Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification
2019	NeurIPS	Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss	Code
2019	CVPR	Large-Scale Long-Tailed Recognition in an Open World	Code, bibtex, by CUHK
2018	-	iNatrualist. The inaturalist 2018 competition dataset	long-tailed dataset
2017	Arxiv	The Devil is in the Tails: Fine-grained Classification in the Wild
2017	NeurIPS	Learning to model the tail

eXtreme Multi-label Learning

Binary Relevance

Year	Venue	Title
2019	Machine learning	Data Scarcity, Robustness and Extreme Multi-label Classification
2019	WSDM	Slice: Scalable linear extreme classifiers trained on 100 million labels for related searches
2017	KDD	PPDSparse: A Parallel Primal-Dual Sparse Method for Extreme Classification
2017	AISTATS	Label Filters for Large Scale Multilabel Classification
2016	WSDM	DiSMEC - Distributed Sparse Machines for Extreme Multi-label Classification
2016	ICML	PD-Sparse: A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification

Tree-based Methods

Year	Venue	Title	Remark
2020	arXiv	Probabilistic Label Trees for Extreme Multi-label Classification	PLT survey, code
2020	arXiv	Online probabilistic label trees
2020	AISTATS	LdSM: Logarithm-depth Streaming Multi-label Decision Trees	Instance tree,c++ code
2019	NeurIPS	AttentionXML: Extreme Multi-Label Text Classification with Multi-Label Attention Based Recurrent Neural Networks	Label tree
2019	arXiv	Bonsai - Diverse and Shallow Trees for Extreme Multi-label Classification	Label tree
2018	ICML	CRAFTML, an Efficient Clustering-based Random Forest for Extreme Multi-label Learning	Instance tree
2018	WWW	Parabel: Partitioned Label Trees for Extreme Classification with Application to Dynamic Search Advertising	Label tree...by Manik Varma
2016	ICML	Extreme F-Measure Maximization using Sparse Probability Estimates	Label tree
2016	KDD	Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications	Instance tree
2014	KDD	A Fast, Accurate and Stable Tree-classifier for eXtreme Multi-label Learning	Instance tree, python implementation
2013	ICML	Label Partitioning For Sublinear Ranking	Label tree
2013	WWW	Multi-Label Learning with Millions of Labels: Recommending Advertiser Bid Phrases for Web Pages	Instance tree, Random Forest, Gini Index
2011	NeurIPS	Efficient label tree learning for large scale object recognition	Label tree, multi-class
2010	NeurIPS	Label embedding trees for large multi-class tasks	Label tree, multi-class
2008	ECML Workshop	Effective and Efficient Multilabel Classification in Domains with Large Number of Labels	Label tree

Embedding-based Methods

Year	Venue	Title	Remark
2019	AAAI	Distributional Semantics Meets Multi-Label Learning	bibtex
2019	arXiv	Ranking-Based Autoencoder for Extreme Multi-label Classification
2019	NeurIPS	Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Ouput Spaces	by Google Research
2017	KDD	AnnexML: Approximate Nearest Neighbor Search for Extreme Multi-label Classification
2015	NeurIPS	Sparse Local Embeddings for Extreme Multi-label Classification
2014	ICML	Large-scale Multi-label Learning with Missing Labels
2014	ICML	Multi-label Classification via Feature-aware Implicit Label Space Encoding
2013	ICML	Efficient Multi-label Classification with Many Labels
2012	NeurIIPS	Feature-aware Label Space Dimension Reduction for Multi-label Classification
2011	IJCAI	WSABIE: Scaling Up To Large Vocabulary Image Annotation	bibtex
2009	NeurIPS	Multi-Label Prediction via Compressed Sensing
2008	KDD	Extracting Shared Subspaces for Multi-label Classification

Speed-up and Compression

Year	Venue	Title	Remark
2020	KDD	Large-Scale Training System for 100-Million Classification at Alibaba	Applied Data Science Track
2020	arXiv	SOLAR: Sparse Orthogonal Learned and Random Embeddings
2020	ICLR	EXTREME CLASSIFICATION VIA ADVERSARIAL SOFTMAX APPROXIMATION
2019	AISTATS	Stochastic Negative Mining for Learning with Large Output Spaces	by Google
2019	NeurIPS	Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products	Rice University, bibtex
2019	arXiv	An Embarrassingly Simple Baseline for eXtreme Multi-label Prediction
2019	arXiv	Accelerating Extreme Classification via Adaptive Feature Agglomeration	bibtex, authors from IIT
2019	SDM	Fast Training for Large-Scale One-versus-All Linear Classifiers using Tree-Structured Initialization	code bibtex

Noval XML Settings

Year	Venue	Title	Remark
2020	arXiv	Extreme Multi-label Classification from Aggregated Labels	by Inderjit Dhillon. This paper considers multi-instance learning in XML
2020	arXiv	Unbiased Loss Functions for Extreme Classification With Missing Labels	by Rohit Babbar. Missing labels
2020	ICML	Deep Streaming Label Learning	code, by Dacheng Tao, streaming multi-label learning
2016	arXiv	Streaming Label Learning for Modeling Labels on the Fly	by Dacheng Tao, streaming multi-label learning

Theoritical Studies

Year	Venue	Title	Remark
2019	ICML	Sparse Extreme Multi-label Learning with Oracle Property	Code, by Weiwei Liu
2019	NeurIPS	Multilabel reductions: what is my loss optimising?	bibtex, by Google

Text Classification

Year	Venue	Title	Remark
2020	KDD	Correlation Networks for Extreme Multi-label Text Classification	code
2020	arXiv	GNN-XML: Graph Neural Networks for Extreme Multi-label Text Classification
2020	ICML	Pretrained Generalized Autoregressive Model with Adaptive Probabilistic Label Clusters for Extreme Multi-label Text Classification	code
2019	ACL	Large-Scale Multi-Label Text Classification on EU Legislation	Eur-Lex 4.3K, bibtex
2019	arXiv	X-BERT: eXtreme Multi-label Text Classification with BERT	code by Yiming Yang, Inderjit Dhillon
2019	NeurIPS	AttentionXML: Extreme Multi-Label Text Classification with Multi-Label Attention Based Recurrent Neural Networks
2018	EMNLP	Few-Shot and Zero-Shot Multi-Label Learning for Structured Label Spaces	few-shot, zero-shot, evaluation metric
2018	NeurIPS	A no-regret generalization of hierarchical softmax to extreme multi-label classification	code, PLT code
2017	SIGIR	Deep Learning for Extreme Multi-label Text Classification	by Yiming Yang at CMU, bibtex

Others

Label Correlation

Year	Venue	Title
2019	ICML	DL2: Training and Querying Neural Networks with Logic
2015	KDD	Discovering and Exploiting Deterministic Label Relationships in Multi-Label Learning
2010	KDD	Multi-Label Learning by Exploiting Label Dependency

Long-tailed Continual Learning

Year	Venue	Title	Remark
2020	ECCV	Imbalanced Continual Learning with Partitioning Reservoir Sampling

Train/Test Split

Year	Venue	Title	Remark
2021	Arxiv	Stratified Sampling for Extreme Multi-Label Data

XML Seminar

Year	Venue	Title	Remark
2019	Dagstuhl Seminar 18291	Extreme Classification

Name		Name	Last commit message	Last commit date
Latest commit History 130 Commits
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Long-Tail Learning

🔆 Updated 2021-04-23

Long-tailed Distribution in Computer Vision

Type of Long-Tail Learning Methods

eXtreme Multi-label Learning

Binary Relevance

Tree-based Methods

Embedding-based Methods

Speed-up and Compression

Noval XML Settings

Theoritical Studies

Text Classification

Others

Label Correlation

Long-tailed Continual Learning

Train/Test Split

XML Seminar

Survey References:

XML Datasets link

Extreme Classification Workshops link

About

Releases

Packages

License

lem89757/Extreme-Multi-label-Learning

Folders and files

Latest commit

History

Repository files navigation

Awesome Long-Tail Learning

🔆 Updated 2021-04-23

Long-tailed Distribution in Computer Vision

Type of Long-Tail Learning Methods

eXtreme Multi-label Learning

Binary Relevance

Tree-based Methods

Embedding-based Methods

Speed-up and Compression

Noval XML Settings

Theoritical Studies

Text Classification

Others

Label Correlation

Long-tailed Continual Learning

Train/Test Split

XML Seminar

Survey References:

XML Datasets link

Extreme Classification Workshops link

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages