Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WGPU issues in Macbook M1 #19

Open
DavidGOrtega opened this issue Oct 14, 2023 · 5 comments
Open

WGPU issues in Macbook M1 #19

DavidGOrtega opened this issue Oct 14, 2023 · 5 comments

Comments

@DavidGOrtega
Copy link
Contributor

DavidGOrtega commented Oct 14, 2023

As a TVM user Im very excited of this project because of the use of burn and its access to WGPU native. Personally speaking is the way to go.
However my tests are very discouraging. WGPU seems to be performing worse than CPU

WGPU

cargo run --release --bin transcribe --features wgpu-backend  medium  audio16k.wav transcription.txt

     Running `target/release/transcribe medium audio16k.wav en transcription.txt`
Loading waveform...
Loading model...
Depth: 0
...
Chunk 0:  Hello, I am the whisper machine learning model. If you see this as text, then I am working properly.

infer took: 49665 ms

CPU

cargo run --release --bin transcribe medium audio16k.wav en transcription.txt

    Running `target/release/transcribe medium audio16k.wav en transcription.txt`
Loading waveform...
Loading model...
Depth: 0
...
Chunk 0:  Hello, I am the whisper machine learning model. If you see this as text, then I am working properly.

infer took: 19517 ms
Transcription finished.

the code was slightly modified:

fn main() {
    cfg_if::cfg_if! {
        if #[cfg(feature = "wgpu-backend")] {
            type Backend = WgpuBackend<AutoGraphicsApi, f32, i32>;
            let device = WgpuDevice::BestAvailable;
        } else if #[cfg(feature = "torch-backend")] {
            type Backend = TchBackend<f32>;
            let device = TchDevice::Cpu;
        }
    }


... 

let start_time = Instant::now();
    let (text, tokens) = match waveform_to_text(&whisper, &bpe, lang, waveform, sample_rate) {
        Ok((text, tokens)) => (text, tokens),
        Err(e) => {
            eprintln!("Error during transcription: {}", e);
            process::exit(1);
        }
    };
    let end_time = Instant::now();
    let elapsed_time_ms = end_time.duration_since(start_time).as_millis();

    println!("infer took: {} ms", elapsed_time_ms);

Same 3X for tiny CPU vs tiny WGPU

Might not be optimised for my machine? It's not working maybe?

@DavidGOrtega
Copy link
Contributor Author

might be related

@DavidGOrtega
Copy link
Contributor Author

Mps performance is also quite bad, similar to WGPU

type Backend = TchBackend<f32>;
let device = TchDevice::Mps;

@antimora
Copy link

We need to review/profile if the stft (featurizer) is slow. I believe it was implemented manually and might be slow compared to specialized libraries.

@DavidGOrtega
Copy link
Contributor Author

thanks for the reply @antimora Im going to profile and see

@Gadersd
Copy link
Owner

Gadersd commented Oct 14, 2023

I haven't yet prioritized optimization. Caching should speed up the inference significantly. I don't think the burn-wgpu backend has been significantly optimized yet. You might want to check with its maintainers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants