Phi2 rewrite #1058

winglian · 2024-01-07T07:01:54Z

start over with latest phi2 modeling code from hf repo
Gradient checkpointing for ParallelBlock by @abacaj via https://fxtwitter.com/abacaj/status/1743749460872634506
update gradient checkpointing signature to match transformers
Properly enable flash attention, thanks @vikhyat via https://fxtwitter.com/vikhyatk/status/1743876177037861291
~~disable most of the upcasting to float32 in favor of bfloat16~~

casper-hansen · 2024-01-07T18:54:54Z

Looks good to me. I would remove the code you commented out. You can come back to the PR if you need to look up what changed.

Also, what are the possibilities of using sample packing with Phi2?

winglian · 2024-01-07T19:33:56Z

@casper-hansen will clean up the commented out code

As far as sample packing, it should be pretty straightforward. I started working on a fix for the previous implementation #877 but I may simply start over.

You had mentioned last year figuring out a way to manage sample packing across all the architectures in a more manageable way. I'm happy to take a stab at it if you have a prrof of concept or anything.

casper-hansen · 2024-01-07T19:45:12Z

You had mentioned last year figuring out a way to manage sample packing across all the architectures in a more manageable way. I'm happy to take a stab at it if you have a prrof of concept or anything.

I had a branch going but didn't get to test and further implement it as I got busy with other stuff. The concept is to have one implementation that can be managed more easily managed in one module.

https://github.com/OpenAccess-AI-Collective/axolotl/tree/refactor-flash-attention

…tras

NanoCode012 · 2024-01-08T16:37:13Z

src/axolotl/core/trainer_builder.py

@@ -843,7 +844,14 @@ def build_collator(self, training_args: AxolotlTrainingArguments, **kwargs):
        if self.cfg.model_config_type == "mamba":
            return MambaDataCollator(tokenizer=self.tokenizer)

-        return BatchSamplerDataCollatorForSeq2Seq(
+        if training_args.sample_packing:


I would recommend maybe consolidate the class?

data_collator = BatchSamplerDataCollatorForSeq2Seq if training_args.sample_packing else DataCollatorForSeq2Seq

my IDE doesn't like that 😭

winglian · 2024-01-08T17:24:50Z

alright, looks good on a single 4090 https://api.wandb.ai/links/oaaic/51qvcv4z

fakerybakery · 2024-01-09T02:15:14Z

Hi, do I need to change any configuration options or just use the default ones w/ Phi 2?

* restore to current phi modeling code from phi-2 * enable gradient checkpointing * don't cast everything to float32 all the time * gradient checkpointing for phi2 ParallelBlock module too * fix enabling flash attn for phi2 * add comment about import * fix phi2 example * fix model type check for tokenizer * revert float32 -> bf16 casting changes * support fused dense flash attn * fix the repo for flash-attn * add package name for subdir pkg * fix the data collator when not using sample packing * install packaging for pytests in ci * also fix setup to not install flash attn fused dense subdir if not extras * split out the fused-dense-lib in extra requires * don't train w group_by_length for phi * update integration test to use phi2 * set max steps and save steps for phi e2e tests * try to workaround ssave issue in ci * skip phi2 e2e test for now

winglian added 7 commits January 7, 2024 02:02

restore to current phi modeling code from phi-2

f2fb132

enable gradient checkpointing

1d4e2ac

don't cast everything to float32 all the time

421c319

gradient checkpointing for phi2 ParallelBlock module too

ba3214b

fix enabling flash attn for phi2

6b14db8

add comment about import

158b369

fix phi2 example

1e4ff39

winglian force-pushed the phi2-rewrite branch from 40b7488 to 1e4ff39 Compare January 7, 2024 07:02

fix model type check for tokenizer

e24c7b2

winglian requested review from NanoCode012, tmm1, casper-hansen and hamelsmu January 7, 2024 07:52

winglian added 7 commits January 7, 2024 21:07

revert float32 -> bf16 casting changes

18f1731

support fused dense flash attn

066ced8

fix the repo for flash-attn

c30396f

add package name for subdir pkg

83c8060

fix the data collator when not using sample packing

b4c6158

install packaging for pytests in ci

95abeee

also fix setup to not install flash attn fused dense subdir if not ex…

3ac199e

…tras

NanoCode012 reviewed Jan 8, 2024

View reviewed changes

winglian added 2 commits January 8, 2024 11:46

split out the fused-dense-lib in extra requires

a174ca3

don't train w group_by_length for phi

a4cd052

update integration test to use phi2

b8f6d1e

winglian force-pushed the phi2-rewrite branch from b4332e8 to b8f6d1e Compare January 8, 2024 17:43

set max steps and save steps for phi e2e tests

649b45e

winglian added 2 commits January 8, 2024 13:29

try to workaround ssave issue in ci

632413d

skip phi2 e2e test for now

c923282

winglian merged commit 732851f into main Jan 8, 2024
6 checks passed

winglian deleted the phi2-rewrite branch January 8, 2024 19:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phi2 rewrite #1058

Phi2 rewrite #1058

winglian commented Jan 7, 2024 •

edited

Loading

casper-hansen commented Jan 7, 2024

winglian commented Jan 7, 2024

casper-hansen commented Jan 7, 2024

NanoCode012 Jan 8, 2024

winglian Jan 8, 2024

winglian commented Jan 8, 2024

fakerybakery commented Jan 9, 2024

Phi2 rewrite #1058

Phi2 rewrite #1058

Conversation

winglian commented Jan 7, 2024 • edited Loading

casper-hansen commented Jan 7, 2024

winglian commented Jan 7, 2024

casper-hansen commented Jan 7, 2024

NanoCode012 Jan 8, 2024

Choose a reason for hiding this comment

winglian Jan 8, 2024

Choose a reason for hiding this comment

winglian commented Jan 8, 2024

fakerybakery commented Jan 9, 2024

winglian commented Jan 7, 2024 •

edited

Loading