Why it changes pytorch version #5

Oxi84 · 2024-09-10T13:45:32Z

Whu it changes pytorch version and installs different cuda on the system?

This would break most peoples's environments actually, because there can be only one cuda version on the Ubuntu, and it has to match the one in the environment.

Oxi84 · 2024-09-10T13:56:47Z

Also it is slower than the default, here is one example:

`from turbot5 import T5ForConditionalGeneration, T5Config
from transformers import T5Tokenizer
import torch
import time

Initialize the tokenizer and model

tokenizer = T5Tokenizer.from_pretrained("google-t5/t5-large") # Use smaller model
model = T5ForConditionalGeneration.from_pretrained("google-t5/t5-large",
attention_type='flash', # Specify attention type
use_triton=True).to('cuda')

Enable mixed precision

scaler = torch.cuda.amp.GradScaler()

List of input sentences for translation

input_texts = [
"translate English to German: How old are you?",
"translate English to French: I am learning how to use transformers.",
"translate English to Spanish: This is a test of T5 with Flash attention.",
"translate English to Italian: The sky is clear today.",
"translate English to Portuguese: I like to play soccer on weekends."
]

Tokenize the input sentences (process smaller batches if needed)

input_ids = tokenizer(input_texts, return_tensors="pt", padding=True, truncation=True).input_ids.to('cuda')

Function to measure execution time

def measure_time(func):
start_time = time.time()
result = func()
end_time = time.time()
return result, end_time - start_time

Number of repetitions

num_repetitions = 5
total_time = 0.0

Loop to repeat the translation process 5 times

for i in range(num_repetitions):
with torch.cuda.amp.autocast(): # Enable mixed precision for memory efficiency
outputs, exec_time = measure_time(lambda: model.generate(input_ids))
total_time += exec_time

# Decode and print the translated outputs
translated_texts = [tokenizer.decode(output, skip_special_tokens=True) for output in outputs]
print(f"Iteration {i+1}:")
for input_text, translated_text in zip(input_texts, translated_texts):
    print(f"Input: {input_text}")
    print(f"Translated Output: {translated_text}")

Calculate and print the average execution time

average_time = total_time / num_repetitions
print(f"Average Execution Time: {average_time:.4f} seconds")`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why it changes pytorch version #5

Why it changes pytorch version #5

Oxi84 commented Sep 10, 2024

Oxi84 commented Sep 10, 2024 •

edited

Loading

Why it changes pytorch version #5

Why it changes pytorch version #5

Comments

Oxi84 commented Sep 10, 2024

Oxi84 commented Sep 10, 2024 • edited Loading

Initialize the tokenizer and model

Enable mixed precision

List of input sentences for translation

Tokenize the input sentences (process smaller batches if needed)

Function to measure execution time

Number of repetitions

Loop to repeat the translation process 5 times

Calculate and print the average execution time

Oxi84 commented Sep 10, 2024 •

edited

Loading