You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Just wondering if the model that you provided on huggingface was instruction tuned to perform the needle in the haystack test.
Also, (hypothetically speaking) would some of the practices to reduce GPU requirements also apply to SSSM models? For example, Unsloth reduces the GPU demand so consumer GPUs can train Llama2 -7B and Mistral - 7B models. My 8BG GPU was able to finetune Mistral for a small usecase of mine. It would absolutely amazing to see a Mamba-7B model train for half the resources that Unsloth Mistral 7B needs.
The text was updated successfully, but these errors were encountered:
Hello,
Just wondering if the model that you provided on huggingface was instruction tuned to perform the needle in the haystack test.
Also, (hypothetically speaking) would some of the practices to reduce GPU requirements also apply to SSSM models? For example, Unsloth reduces the GPU demand so consumer GPUs can train Llama2 -7B and Mistral - 7B models. My 8BG GPU was able to finetune Mistral for a small usecase of mine. It would absolutely amazing to see a Mamba-7B model train for half the resources that Unsloth Mistral 7B needs.
The text was updated successfully, but these errors were encountered: