Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research - Dynamic speech reflex #4

Open
yacineMTB opened this issue Jun 12, 2023 · 8 comments
Open

Research - Dynamic speech reflex #4

yacineMTB opened this issue Jun 12, 2023 · 8 comments
Labels
help wanted Extra attention is needed

Comments

@yacineMTB
Copy link
Owner

yacineMTB commented Jun 12, 2023

Right now, I'm planning to initiate the response with a "vim pedal", aka a hotkey, because knowing when to respond is difficult. https://github.com/yacineMTB/talk/blob/master/index.ts#L108-L135

When humans speak to each other, we use intonation and other signals to let the other human know when the floor is open, and we also use it to let the other human know that we want the floor.

Right now, we just need some naive event firing when the speaker stops speaking.
Is this something that we can get out of whisper.cpp's embeddings? Possibly a classifier trained on top of the embeddings?

Also I wouldn't shy away from running a python sidecar that takes requests from the main node proc.

What would be awesome

Figuring out how to either get whisper.cpp, or some sidecar, that takes a byte stream and outputs a continous "activation function" based on likelihood to respond

image

@yacineMTB yacineMTB added the help wanted Extra attention is needed label Jun 12, 2023
@yacineMTB yacineMTB pinned this issue Jun 12, 2023
@odkken
Copy link

odkken commented Jun 12, 2023

Interesting line of thought from here. The issue that immediately pops up for me is "personality" - how much of a pushover is the thing? Does it stop talking as soon as you make a sound? Does it only speak when spoken to?

Repository owner deleted a comment from jmanhype Jun 14, 2023
Repository owner deleted a comment from jmanhype Jun 14, 2023
@yacineMTB
Copy link
Owner Author

yacineMTB commented Jun 14, 2023

@odkken

Does it stop talking as soon as you make a sound? Does it only speak when spoken to?

Abstractly, it's an event that fires based on some activation threshold. The threshold should be configurable!

@stangirala
Copy link

stangirala commented Jun 21, 2023

Thoughts on reusing this? https://github.com/ggerganov/whisper.cpp/blob/1d716d6e34f3f4ba57bd9706a9258a0bdb008153/examples/stream/stream.cpp#L584-L592

If that looks good should be easy enough to modify the current audio stream and fire an event (actually, what event for Talk?)

@simonMoisselin
Copy link

Thoughts on reusing this? https://github.com/ggerganov/whisper.cpp/blob/1d716d6e34f3f4ba57bd9706a9258a0bdb008153/examples/stream/stream.cpp#L584-L592

If that looks good should be easy enough to modify the current audio stream and fire an event (actually, what event for Talk?)

This is just a high-pass filter, it might fire too often for almost any type of noise, I think we need something more specific - ml-based.

But we can use the same logic, just replacing this high_pass_filter

void high_pass_filter(std::vector<float> & data, float cutoff, float sample_rate) {
    const float rc = 1.0f / (2.0f * M_PI * cutoff);
    const float dt = 1.0f / sample_rate;
    const float alpha = dt / (rc + dt);

    float y = data[0];

    for (size_t i = 1; i < data.size(); i++) {
        y = alpha * (y + data[i] - data[i - 1]);
        data[i] = y;
    }
}

@choombaa
Copy link
Collaborator

We could also use [BLANK_AUDIO] as a response reflex when it is transcribed. This might require shrinking the buffer size to reduce latency, I'm not sure how that is controlled right now

@yacineMTB
Copy link
Owner Author

@choombaa I merged your voice detection
good shit

@yacineMTB
Copy link
Owner Author

#39

@yacineMTB
Copy link
Owner Author

Keeping issue open as we might make it a bit more involved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants