Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why it changes pytorch version #5

Open
Oxi84 opened this issue Sep 10, 2024 · 1 comment
Open

Why it changes pytorch version #5

Oxi84 opened this issue Sep 10, 2024 · 1 comment

Comments

@Oxi84
Copy link

Oxi84 commented Sep 10, 2024

Whu it changes pytorch version and installs different cuda on the system?

This would break most peoples's environments actually, because there can be only one cuda version on the Ubuntu, and it has to match the one in the environment.

@Oxi84
Copy link
Author

Oxi84 commented Sep 10, 2024

Also it is slower than the default, here is one example:

`from turbot5 import T5ForConditionalGeneration, T5Config
from transformers import T5Tokenizer
import torch
import time

Initialize the tokenizer and model

tokenizer = T5Tokenizer.from_pretrained("google-t5/t5-large") # Use smaller model
model = T5ForConditionalGeneration.from_pretrained("google-t5/t5-large",
attention_type='flash', # Specify attention type
use_triton=True).to('cuda')

Enable mixed precision

scaler = torch.cuda.amp.GradScaler()

List of input sentences for translation

input_texts = [
"translate English to German: How old are you?",
"translate English to French: I am learning how to use transformers.",
"translate English to Spanish: This is a test of T5 with Flash attention.",
"translate English to Italian: The sky is clear today.",
"translate English to Portuguese: I like to play soccer on weekends."
]

Tokenize the input sentences (process smaller batches if needed)

input_ids = tokenizer(input_texts, return_tensors="pt", padding=True, truncation=True).input_ids.to('cuda')

Function to measure execution time

def measure_time(func):
start_time = time.time()
result = func()
end_time = time.time()
return result, end_time - start_time

Number of repetitions

num_repetitions = 5
total_time = 0.0

Loop to repeat the translation process 5 times

for i in range(num_repetitions):
with torch.cuda.amp.autocast(): # Enable mixed precision for memory efficiency
outputs, exec_time = measure_time(lambda: model.generate(input_ids))
total_time += exec_time

# Decode and print the translated outputs
translated_texts = [tokenizer.decode(output, skip_special_tokens=True) for output in outputs]
print(f"Iteration {i+1}:")
for input_text, translated_text in zip(input_texts, translated_texts):
    print(f"Input: {input_text}")
    print(f"Translated Output: {translated_text}")

Calculate and print the average execution time

average_time = total_time / num_repetitions
print(f"Average Execution Time: {average_time:.4f} seconds")`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant