max-andr

Follow

🚀

Maksym Andriushchenko max-andr

🚀

Follow

PhD student @ EPFL🇨🇭. Interested in robustness and generalization in LLMs.

225 followers · 339 following

EPFL
Lausanne
12:54 (UTC +01:00)
https://andriushchenko.me/
@maksym_andr

Achievements

Achievements

Highlights

Pro

Organizations

Pinned Loading

tml-epfl/llm-past-tense tml-epfl/llm-past-tense Public

Does Refusal Training in LLMs Generalize to the Past Tense? [NeurIPS 2024 Safe Generative AI Workshop (Oral)]

Python 57 8
tml-epfl/llm-adaptive-attacks tml-epfl/llm-adaptive-attacks Public

Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [arXiv, Apr 2024]

Shell 222 23
JailbreakBench/jailbreakbench JailbreakBench/jailbreakbench Public

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models [NeurIPS 2024 Datasets and Benchmarks Track]

Python 241 24
RobustBench/robustbench RobustBench/robustbench Public

RobustBench: a standardized adversarial robustness benchmark [NeurIPS 2021 Benchmarks and Datasets Track]

Python 670 99
square-attack square-attack Public

Square Attack: a query-efficient black-box adversarial attack via random search [ECCV 2020]

Python 151 28
relu_networks_overconfident relu_networks_overconfident Public

Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem [CVPR 2019, oral]

Python 182 21