i specialize in building big systems to crunch through big data and develop big models.
i have previously developed one of the largest reinforcement learning systems at openai for openai five, along with llm training infrastructure at meta ai / fair that created opt-175b.
opt-175b was the first release in the industry to include:
- a 175b parameter model for research use
- a 114-page logbook detailing the challenges encountered during the 56 days it took to train a 175b llm on new hardware for the first time
- the entire training codebase
- a full suite of smaller-scale models ranging from 125M to 66B in size for studying scaling laws.
before getting into ai systems, i worked on scaling out data infrastructure and processing pipelines across various cloud providers.
you can refer to my linkedin for more xp info.
in recent years, i have mainly presented talks on openai five and on opt-175b:
-
October 21, 2022: Scale Transform X Conference - Top Tips from Netflix, NVIDIA, and Meta on Large Language Models
-
December 2, 2022: NeurIPS 2022 - Has It Trained Yet? Workshop
-
March 1, 2023: Stanford MLSys Seminar Series
-
April 1, 2023: CMU LLM Seminar
- Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning
- LIMA: Less Is More for Alignment
- A Theory on Adam Instability in Large-Scale Machine Learning
- Effective Theory of Transformers at Initialization
- Scaling Laws for Generative Mixed-Modal Language Models
- OPT: Open Pre-trained Transformer Language Models
- Neural Network Surgery with Sets
- Long-Term Planning and Situational Awareness in OpenAI Five
- Dota 2 with Large Scale Deep Reinforcement Learning