nanowell

Follow

🎯

Focusing

nanowell nanowell

🎯

Focusing

Follow

optimizer.step() carefully

34 followers · 5 following

World

Achievements

Achievements

Highlights

Developer Program Member

Pinned Loading

Q-Sparse-LLM Q-Sparse-LLM Public

My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated

Python 31 2
AdEMAMix-Optimizer-Pytorch AdEMAMix-Optimizer-Pytorch Public

The AdEMAMix Optimizer: Better, Faster, Older.

Python 173 10
Differential-Transformer-PyTorch Differential-Transformer-PyTorch Public

PyTorch implementation of the Differential-Transformer architecture for sequence modeling, specifically tailored as a decoder-only model similar to large language models (LLMs). The architecture in…

Python 45 5
Brainstorm-science Brainstorm-science Public

Sample from uniform distribution towards automation of math.

C 150 21