Modular phi #34361

h3110Fr13nd · 2024-10-23T21:10:38Z

What does this PR do?

Adds Modular Phi #33916

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@ArthurZucker @LysandreJik
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

ArthurZucker · 2024-10-24T15:36:54Z

Awesome PR! 🥳 CC @Cyrilvallez

h3110Fr13nd · 2024-10-25T20:21:40Z

Rebased for failing CIs

ArthurZucker

Hey~
As a first review, let's try to use more strength of inheritance
If you look at the PhiAttention a lot of stuff can be skimmed out because it already exists in Gemma for example. Thus you can only add what's required and use del to remove what's to remove (for example dense is o_proj I think)

PhiForCausalLM and all should inherit from Gemma rather than Llama as they can be 1-1 the same this way!

Congrats on the PR 🤗

Cyrilvallez

Thanks for the PR! 🤗 I let some comments to use more inheritance, you could try to go even further if you can!

src/transformers/models/phi/modular_phi.py

Cyrilvallez · 2024-10-29T13:33:13Z

src/transformers/models/phi/modular_phi.py

+class PhiForTokenClassification(LlamaForTokenClassification):
+    def forward(


This cannot use Llama here as it causes self.classifier to become self.score, which may break weights initialization of pretrained models. You could try to find another model with the same names in the init 🤗

Found LlamaForTokenClassification closest to PhiForTokenClassification. Although it's copied from MptForTokenClassification, But SubClassing it causes issues for build_phi_alibi_tensor (i.e. build_mpt_alibi_tensor).
But I've addressed the issue by deleting self.score and adding self.classifier (basically renaming) from LlamaForTokenClassification.
@Cyrilvallez Let me know if changes required.

HuggingFaceDocBuilderDev · 2024-10-29T13:51:13Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker · 2024-11-15T21:59:21Z

Feel free to ping us again once ready! 🤗

h3110Fr13nd · 2024-11-15T22:02:33Z

Feel free to ping us again once ready! 🤗

@ArthurZucker . Oh, I think that's all. Suggest me any changes if required.

Cyrilvallez

Nice! Just some final details to take care of 🤗 Sorry for the delay, all the team was in Martinique for a big offsite lately, we just came back

src/transformers/models/phi/modular_phi.py

Cyrilvallez · 2024-11-20T16:10:47Z

src/transformers/models/phi/modular_phi.py

+    def __init__(self, config):
+        super().__init__(config)
+        self.model = PhiModel(config)
+        self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=True)
+        self.post_init()
+
+    def forward(
+        self,
+        input_ids: torch.LongTensor = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        position_ids: Optional[torch.LongTensor] = None,
+        past_key_values: Optional[List[torch.FloatTensor]] = None,
+        inputs_embeds: Optional[torch.FloatTensor] = None,
+        labels: Optional[torch.LongTensor] = None,
+        use_cache: Optional[bool] = None,
+        output_attentions: Optional[bool] = None,
+        output_hidden_states: Optional[bool] = None,
+        return_dict: Optional[bool] = None,
+        cache_position: Optional[torch.LongTensor] = None,
+        num_logits_to_keep: int = 0,
+        **loss_kwargs,
+    ) -> Union[Tuple, CausalLMOutputWithPast]:
+        r"""
+        ```python
+        >>> from transformers import AutoTokenizer, PhiForCausalLM
+
+        >>> model = PhiForCausalLM.from_pretrained("microsoft/phi-1")
+        >>> tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-1")
+
+        >>> prompt = "This is an example script ."
+        >>> inputs = tokenizer(prompt, return_tensors="pt")
+
+        >>> # Generate
+        >>> generate_ids = model.generate(inputs.input_ids, max_length=30)
+        >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+        'This is an example script .\n\n\n\nfrom typing import List\n\ndef find_most_common_letter(words: List[str'
+        ```"""
+
+        output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+        output_hidden_states = (
+            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
+        )
+        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
+
+        # decoder outputs consists of (dec_features, layer_state, dec_hidden, dec_attn)
+        outputs = self.model(
+            input_ids=input_ids,
+            attention_mask=attention_mask,
+            position_ids=position_ids,
+            past_key_values=past_key_values,
+            inputs_embeds=inputs_embeds,
+            use_cache=use_cache,
+            output_attentions=output_attentions,
+            output_hidden_states=output_hidden_states,
+            return_dict=return_dict,
+            cache_position=cache_position,
+        )
+
+        hidden_states = outputs[0]
+        # Only compute necessary logits, and do not upcast them to float if we are not computing the loss
+        logits = self.lm_head(hidden_states[:, -num_logits_to_keep:, :])
+
+        loss = None
+        if labels is not None:
+            loss = self.loss_function(logits, labels, self.vocab_size, **loss_kwargs)
+
+        if not return_dict:
+            output = (logits,) + outputs[1:]
+            return (loss,) + output if loss is not None else output
+
+        return CausalLMOutputWithPast(
+            loss=loss,
+            logits=logits,
+            past_key_values=outputs.past_key_values,
+            hidden_states=outputs.hidden_states,
+            attentions=outputs.attentions,
+        )


All this code should not be needed, a single pass should be enough as this is identical with GemmaForCausalLM

Cannot possibly remove the forward method completely, as it is required at least for the documentation. Changed to super().forward(.....). @Cyrilvallez

src/transformers/models/phi/modular_phi.py

h3110Fr13nd · 2024-11-25T21:04:21Z

Rebased. @Cyrilvallez

Cyrilvallez

Hey! Sorry for the delay!! Last small comments to cut even more code, then we're good to go! 🤗

src/transformers/models/phi/modular_phi.py

Cyrilvallez · 2024-12-03T22:55:22Z

src/transformers/models/phi/modular_phi.py

+    def __init__(self, config):
+        super().__init__(config)
+        self.model = PhiModel(config)
+        self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=True)
+        self.post_init()


This can be removed

Hi. @Cyrilvallez. I'm concerned. That removing this will result in the lm_head with bias=False as it is in Gemma. And I'm skeptic that it will affect the model initialization Or at least the output. I don't think we can neglect the bias. So kept the lm_head with bias=True in PhiForCausalLM.
Let me know if I'm correct or not.

Oh yes sorry, did not notice the bias! Then of course we need to keep it. You can still remove the self.model = ... and the post_init() lines though

src/transformers/models/phi/modular_phi.py

Cyrilvallez

All right, LGTM! Great job thanks!
If #34858 gets merged before this one, you'll just need to make sure to modify accordingly, but it will be extremely straightforward!

h3110Fr13nd · 2024-12-04T18:21:32Z

🤗

…amline initialization

…tion logic

…e past_key_values type

…osition embeddings handling

h3110Fr13nd · 2024-12-11T17:28:59Z

All right, LGTM! Great job thanks!
If #34858 gets merged before this one, you'll just need to make sure to modify accordingly, but it will be extremely straightforward!

Done! @Cyrilvallez

ArthurZucker

Nice! Do you want to include the recent changes in #35235 ? 🤗

ArthurZucker · 2024-12-20T11:12:14Z

src/transformers/models/phi/modeling_phi.py

@@ -1089,8 +1082,8 @@ def _prepare_4d_causal_attention_mask_with_cache_position(

 class PhiForCausalLM(PhiPreTrainedModel, GenerationMixin):
    _tied_weights_keys = ["lm_head.weight"]
+    _tp_plan = {"lm_head": "colwise_rep"}


for TP, the config needs TP as well!

ArthurZucker · 2024-12-20T11:14:10Z

src/transformers/models/phi/modular_phi.py

+                config.hidden_size // self.num_heads, eps=config.layer_norm_eps, elementwise_affine=True
+            )
+
+        self.rotary_emb = PhiRotaryEmbedding(config=self.config)


should be removed!

ArthurZucker requested a review from Cyrilvallez October 24, 2024 15:36

h3110Fr13nd force-pushed the modular/phi branch from 48303e6 to d942994 Compare October 25, 2024 17:44

ArthurZucker reviewed Oct 29, 2024

View reviewed changes

Cyrilvallez reviewed Oct 29, 2024

View reviewed changes

h3110Fr13nd force-pushed the modular/phi branch from d942994 to 12b3901 Compare November 15, 2024 20:47

Cyrilvallez reviewed Nov 20, 2024

View reviewed changes

h3110Fr13nd force-pushed the modular/phi branch 2 times, most recently from d67d5b8 to 89199ac Compare November 25, 2024 21:03

Cyrilvallez self-requested a review November 27, 2024 14:02

Cyrilvallez reviewed Dec 3, 2024

View reviewed changes

h3110Fr13nd force-pushed the modular/phi branch from 77d1fd3 to 67ebbaf Compare December 4, 2024 06:49

Cyrilvallez approved these changes Dec 4, 2024

View reviewed changes

h3110Fr13nd added 6 commits December 11, 2024 21:56

Modular Phi

9356611

refactor: remove unnecessary blank lines in modular_phi.py

e573b45

refactor: update PhiAttention to inherit from GemmaAttention and stre…

adde071

…amline initialization

refactor: update PhiAttention to use nn.Module and improve initializa…

628bf38

…tion logic

refactor: update PhiForCausalLM to simplify forward method and enhanc…

44c8b2f

…e past_key_values type

refactor: remove deprecated rotary embedding classes and streamline p…

9bf4eae

…osition embeddings handling

h3110Fr13nd force-pushed the modular/phi branch from 67ebbaf to 9bf4eae Compare December 11, 2024 17:10

Make Style Fixup.

cbcfe15

ArthurZucker reviewed Dec 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modular phi #34361

Modular phi #34361

h3110Fr13nd commented Oct 23, 2024

ArthurZucker commented Oct 24, 2024

h3110Fr13nd commented Oct 25, 2024

ArthurZucker left a comment

Cyrilvallez left a comment

Cyrilvallez Oct 29, 2024

h3110Fr13nd Nov 15, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 29, 2024

ArthurZucker commented Nov 15, 2024

h3110Fr13nd commented Nov 15, 2024

Cyrilvallez left a comment •

edited

Loading

Cyrilvallez Nov 20, 2024

h3110Fr13nd Nov 20, 2024

h3110Fr13nd commented Nov 25, 2024

Cyrilvallez left a comment

Cyrilvallez Dec 3, 2024

h3110Fr13nd Dec 4, 2024

Cyrilvallez Dec 4, 2024

Cyrilvallez left a comment

h3110Fr13nd commented Dec 4, 2024

h3110Fr13nd commented Dec 11, 2024 •

edited

Loading

ArthurZucker left a comment

ArthurZucker Dec 20, 2024

ArthurZucker Dec 20, 2024

		class PhiForTokenClassification(LlamaForTokenClassification):
		def forward(

Modular phi #34361

Are you sure you want to change the base?

Modular phi #34361

Conversation

h3110Fr13nd commented Oct 23, 2024

What does this PR do?

Before submitting

Who can review?

ArthurZucker commented Oct 24, 2024

h3110Fr13nd commented Oct 25, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

Cyrilvallez left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

h3110Fr13nd Nov 15, 2024 • edited Loading

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Oct 29, 2024

ArthurZucker commented Nov 15, 2024

h3110Fr13nd commented Nov 15, 2024

Cyrilvallez left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

h3110Fr13nd commented Nov 25, 2024

Cyrilvallez left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Cyrilvallez left a comment

Choose a reason for hiding this comment

h3110Fr13nd commented Dec 4, 2024

h3110Fr13nd commented Dec 11, 2024 • edited Loading

ArthurZucker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

h3110Fr13nd Nov 15, 2024 •

edited

Loading

Cyrilvallez left a comment •

edited

Loading

h3110Fr13nd commented Dec 11, 2024 •

edited

Loading