This repo pays specially attention to the long-tailed distribution, where labels follow a long-tailed or power-law distribution in the training dataset or/and test dataset. Related papers are sumarized, including its application in computer vision, in particular image classification, and extreme multi-label learning (XML), in particular text categorization.
- Long-tailed Distribution
Type | TST |
IS |
CBS |
CLW |
NC |
ENS |
DA |
---|---|---|---|---|---|---|---|
Meaning | Two-Stage Training | Instance Sampling | Class-Balanced Sampling | Class-Level Weighting | Normalized Classifier | Ensemble | Data Augmentation |
Year | Venue | Title | Remark |
---|---|---|---|
2019 | Machine learning | Data Scarcity, Robustness and Extreme Multi-label Classification | |
2019 | WSDM | Slice: Scalable linear extreme classifiers trained on 100 million labels for related searches | |
2017 | KDD | PPDSparse: A Parallel Primal-Dual Sparse Method for Extreme Classification | |
2017 | AISTATS | Label Filters for Large Scale Multilabel Classification | |
2016 | WSDM | DiSMEC - Distributed Sparse Machines for Extreme Multi-label Classification | |
2016 | ICML | PD-Sparse: A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification |
Year | Venue | Title | Remark |
---|---|---|---|
2019 | AAAI | Distributional Semantics Meets Multi-Label Learning | bibtex |
2019 | arXiv | Ranking-Based Autoencoder for Extreme Multi-label Classification | |
2019 | NeurIPS | Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Ouput Spaces | by Google Research |
2017 | KDD | AnnexML: Approximate Nearest Neighbor Search for Extreme Multi-label Classification | |
2015 | NeurIPS | Sparse Local Embeddings for Extreme Multi-label Classification | |
2014 | ICML | Large-scale Multi-label Learning with Missing Labels | |
2014 | ICML | Multi-label Classification via Feature-aware Implicit Label Space Encoding | |
2013 | ICML | Efficient Multi-label Classification with Many Labels | |
2012 | NeurIIPS | Feature-aware Label Space Dimension Reduction for Multi-label Classification | |
2011 | IJCAI | WSABIE: Scaling Up To Large Vocabulary Image Annotation | bibtex |
2009 | NeurIPS | Multi-Label Prediction via Compressed Sensing | |
2008 | KDD | Extracting Shared Subspaces for Multi-label Classification |
Year | Venue | Title | Remark |
---|---|---|---|
2020 | KDD | Large-Scale Training System for 100-Million Classification at Alibaba | Applied Data Science Track |
2020 | arXiv | SOLAR: Sparse Orthogonal Learned and Random Embeddings | |
2020 | ICLR | EXTREME CLASSIFICATION VIA ADVERSARIAL SOFTMAX APPROXIMATION | |
2019 | AISTATS | Stochastic Negative Mining for Learning with Large Output Spaces | by Google |
2019 | NeurIPS | Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products | Rice University, bibtex |
2019 | arXiv | An Embarrassingly Simple Baseline for eXtreme Multi-label Prediction | |
2019 | arXiv | Accelerating Extreme Classification via Adaptive Feature Agglomeration | bibtex, authors from IIT |
2019 | SDM | Fast Training for Large-Scale One-versus-All Linear Classifiers using Tree-Structured Initialization | code bibtex |
Year | Venue | Title | Remark |
---|---|---|---|
2020 | arXiv | Extreme Multi-label Classification from Aggregated Labels | by Inderjit Dhillon. This paper considers multi-instance learning in XML |
2020 | arXiv | Unbiased Loss Functions for Extreme Classification With Missing Labels | by Rohit Babbar. Missing labels |
2020 | ICML | Deep Streaming Label Learning | code, by Dacheng Tao, streaming multi-label learning |
2016 | arXiv | Streaming Label Learning for Modeling Labels on the Fly | by Dacheng Tao, streaming multi-label learning |
Year | Venue | Title | Remark |
---|---|---|---|
2019 | ICML | Sparse Extreme Multi-label Learning with Oracle Property | Code, by Weiwei Liu |
2019 | NeurIPS | Multilabel reductions: what is my loss optimising? | bibtex, by Google |
Year | Venue | Title | Remark |
---|---|---|---|
2020 | KDD | Correlation Networks for Extreme Multi-label Text Classification | code |
2020 | arXiv | GNN-XML: Graph Neural Networks for Extreme Multi-label Text Classification | |
2020 | ICML | Pretrained Generalized Autoregressive Model with Adaptive Probabilistic Label Clusters for Extreme Multi-label Text Classification | code |
2019 | ACL | Large-Scale Multi-Label Text Classification on EU Legislation | Eur-Lex 4.3K, bibtex |
2019 | arXiv | X-BERT: eXtreme Multi-label Text Classification with BERT | code by Yiming Yang, Inderjit Dhillon |
2019 | NeurIPS | AttentionXML: Extreme Multi-Label Text Classification with Multi-Label Attention Based Recurrent Neural Networks | |
2018 | EMNLP | Few-Shot and Zero-Shot Multi-Label Learning for Structured Label Spaces | few-shot, zero-shot, evaluation metric |
2018 | NeurIPS | A no-regret generalization of hierarchical softmax to extreme multi-label classification | code, PLT code |
2017 | SIGIR | Deep Learning for Extreme Multi-label Text Classification | by Yiming Yang at CMU, bibtex |
Year | Venue | Title | Remark |
---|---|---|---|
2019 | ICML | DL2: Training and Querying Neural Networks with Logic | |
2015 | KDD | Discovering and Exploiting Deterministic Label Relationships in Multi-Label Learning | |
2010 | KDD | Multi-Label Learning by Exploiting Label Dependency |
Year | Venue | Title | Remark |
---|---|---|---|
2020 | ECCV | Imbalanced Continual Learning with Partitioning Reservoir Sampling |
Year | Venue | Title | Remark |
---|---|---|---|
2021 | Arxiv | Stratified Sampling for Extreme Multi-Label Data |
Year | Venue | Title | Remark |
---|---|---|---|
2019 | Dagstuhl Seminar 18291 | Extreme Classification |
- https://arxiv.org/pdf/1901.00248.pdf
- http://www.iith.ac.in/~saketha/research/AkshatMTP2018.pdf
- http://manikvarma.org/pubs/bengio19.pdf
- The Emerging Trends of Multi-Label Learning