Skip to content

Commit

Permalink
data efficiency
Browse files Browse the repository at this point in the history
  • Loading branch information
mmaz committed Oct 30, 2023
1 parent b24d9f9 commit cf2a1eb
Show file tree
Hide file tree
Showing 2 changed files with 34 additions and 2 deletions.
4 changes: 2 additions & 2 deletions benchmarking.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -776,13 +776,13 @@ There are several approaches that can be taken to improve data quality. These me

- Data Cleaning: This involves handling missing values, correcting errors, and removing outliers. Clean data ensures that the model is not learning from noise or inaccuracies.

- Data Interpretability and Explainability: Common techniques include [[LIME]{.underline}](https://arxiv.org/abs/1602.04938) which provides insight into the decision boundaries of classifiers, and [[Shapley values]{.underline}](https://proceedings.neurips.cc/paper_files/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf) which estimate the importance of individual samples in contributing to a model's predictions.
- Data Interpretability and Explainability: Common techniques include LIME [@ribeiro2016should] which provides insight into the decision boundaries of classifiers, and Shapley values [@lundberg2017unified] which estimate the importance of individual samples in contributing to a model's predictions.

- Feature Engineering: Transforming or creating new features can significantly improve model performance by providing more relevant information for learning.

- Data Augmentation: Augmenting data by creating new samples through various transformations can help improve model robustness and generalization.

- Active Learning: This is a semi-supervised learning approach where the model actively queries a human oracle to label the most informative samples [[[Coleman et al, 2020]{.underline}](https://arxiv.org/abs/2007.00077)]. This ensures that the model is trained on the most relevant data.
- Active Learning: This is a semi-supervised learning approach where the model actively queries a human oracle to label the most informative samples [@coleman2022similarity]. This ensures that the model is trained on the most relevant data.

- Dimensionality Reduction: Techniques like PCA can be used to reduce the number of features in a dataset, thereby reducing complexity and training time.

Expand Down
32 changes: 32 additions & 0 deletions references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -540,4 +540,36 @@ @article{xu2023demystifying
author={Xu, Hu and Xie, Saining and Tan, Xiaoqing Ellen and Huang, Po-Yao and Howes, Russell and Sharma, Vasu and Li, Shang-Wen and Ghosh, Gargi and Zettlemoyer, Luke and Feichtenhofer, Christoph},
journal={arXiv preprint arXiv:2309.16671},
year={2023}
}
@inproceedings{coleman2022similarity,
title={Similarity search for efficient active learning and search of rare concepts},
author={Coleman, Cody and Chou, Edward and Katz-Samuels, Julian and Culatana, Sean and Bailis, Peter and Berg, Alexander C and Nowak, Robert and Sumbaly, Roshan and Zaharia, Matei and Yalniz, I Zeki},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={36},
number={6},
pages={6402--6410},
year={2022}
}
@inproceedings{ribeiro2016should,
title={" Why should i trust you?" Explaining the predictions of any classifier},
author={Ribeiro, Marco Tulio and Singh, Sameer and Guestrin, Carlos},
booktitle={Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining},
pages={1135--1144},
year={2016}
}
@article{lundberg2017unified,
title={A unified approach to interpreting model predictions},
author={Lundberg, Scott M and Lee, Su-In},
journal={Advances in neural information processing systems},
volume={30},
year={2017}
}
@inproceedings{coleman2022similarity,
title={Similarity search for efficient active learning and search of rare concepts},
author={Coleman, Cody and Chou, Edward and Katz-Samuels, Julian and Culatana, Sean and Bailis, Peter and Berg, Alexander C and Nowak, Robert and Sumbaly, Roshan and Zaharia, Matei and Yalniz, I Zeki},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={36},
number={6},
pages={6402--6410},
year={2022}
}

0 comments on commit cf2a1eb

Please sign in to comment.