Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add turn detector #225

Draft
wants to merge 7 commits into
base: next
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .changeset/kind-buses-destroy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
"@livekit/agents": minor
"@livekit/agents-plugin-livekit": minor
---

feat: add turn detector
113 changes: 113 additions & 0 deletions LICENSES/LicenseRef-LiveKitModelLicense.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
LIVEKIT MODEL LICENSE AGREEMENT

1. Introduction

LiveKit Incorporated ("LiveKit") is making available its proprietary models for
use pursuant to the terms and conditions of this Agreement. As further
described below, you may use these LiveKit models freely but can only use them
together with the LiveKit Agents framework. You cannot use the LiveKit models
on a standalone basis or with any other frameworks.

BY CLICKING "I ACCEPT," OR BY DOWNLOADING, INSTALLING, OR OTHERWISE ACCESSING
OR USING THE LIVEKIT MATERIALS, YOU AGREE THAT YOU HAVE READ AND UNDERSTOOD,
AND, AS A CONDITION TO YOUR USE OF THE LIVEKIT MATERIALS, YOU AGREE TO BE
BOUND BY, THE FOLLOWING TERMS AND CONDITIONS.

2. Definitions

"Agreement" means this LiveKit Model License Agreement.

"Documentation" means the specifications, manuals, and documentation
accompanying any LiveKit Model and distributed by LiveKit.

"Licensee" or "you" means the individual or entity agreeing to be bound by
this Agreement.

"LiveKit Agents" means the proprietary LiveKit software framework for building
real-time multimodal AI applications with programmable backend participants.

"LiveKit Materials" means, collectively, the LiveKit Models and Documentation.

"LiveKit Model" means any of LiveKit's proprietary software models or
algorithms, including machine-learning software code, model weights,
inference-enabling software code, training-enabling software code, and
fine-tuning enabling software code. Any derivative works of a LiveKit Model,
whether developed by LiveKit, you, or any third party, will be deemed the
"LiveKit Model" for the purposes of this Agreement.

3. License Rights

Right to Use LiveKit Materials. Subject to the terms and conditions of this
Agreement, including the requirements of Section 3.b, LiveKit grants you a
nonexclusive, nontransferable, worldwide, royalty-free license under LiveKit's
intellectual property rights to use, reproduce, distribute, copy, and create
derivative works of the LiveKit Materials.

Limitation on Use. As a condition to your use of the LiveKit Materials, you
agree: (i) not to use any LiveKit Models on a standalone basis or with any
frameworks other than LiveKit Agents; (ii) not to use any LiveKit Materials or
any output from, or results of using, LiveKit Models (including any derivative
works thereof) to improve or otherwise develop any other models that are not
LiveKit Models; or (iii) distribute or otherwise make available the LiveKit
Materials (including any derivative works thereof) except (x) pursuant to the
terms of this Agreement, and (y) you reproduce the above copyright notice.

4. Intellectual Property

The LiveKit Materials are owned by LiveKit and its licensors. Except for the
rights granted to you under this Agreement, all rights are reserved and no
other express or implied rights are granted.

You will own any derivative works that you created from the LiveKit Materials,
subject to the terms of this Agreement.

5. Disclaimer

UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING, LIVEKIT PROVIDES
THE LIVEKIT MATERIALS, AND ANY OUTPUT OR RESULTS THEREFROM, ON AN "AS IS"
BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED,
INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OR CONDITIONS OF TITLE,
NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU
ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR
REDISTRIBUTING THE LIVEKIT MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR
USE OF THE LIVEKIT MATERIALS AND ANY OUTPUT AND RESULTS.

6. Limitation of Liability

IN NO EVENT AND UNDER NO LEGAL THEORY, WHETHER IN TORT (INCLUDING NEGLIGENCE),
CONTRACT, OR OTHERWISE, UNLESS REQUIRED BY APPLICABLE LAW (SUCH AS DELIBERATE
AND GROSSLY NEGLIGENT ACTS) OR AGREED TO IN WRITING, WILL LIVEKIT BE LIABLE TO
YOU FOR INDIRECT DAMAGES, INCLUDING ANY SPECIAL, INCIDENTAL, OR CONSEQUENTIAL
DAMAGES OF ANY CHARACTER ARISING AS A RESULT OF THIS AGREEMENT OR OUT OF THE
USE OR INABILITY TO USE THE LIVEKIT MATERIALS OR ANY OUTPUT OR RESULTS
THEREFROM (INCLUDING BUT NOT LIMITED TO DAMAGES FOR LOSS OF GOODWILL, WORK
STOPPAGE, COMPUTER FAILURE OR MALFUNCTION, OR ANY AND ALL OTHER COMMERCIAL
DAMAGES OR LOSSES), EVEN IF LIVEKIT HAS BEEN ADVISED OF THE POSSIBILITY OF
SUCH DAMAGES.

7. Trademarks

This Agreement does not grant permission to use the trade names, trademarks,
service marks, or product names of LiveKit, except as required for reasonable
and customary use in describing the origin of the LiveKit Materials.

8. Term and Termination

The term of this Agreement commences upon your acceptance of this Agreement
and continues in effect until you cease using the LiveKit Materials or it is
terminated by either party (on immediate written notice to the other party).
This Agreement will automatically terminate if you breach any of its terms.
Upon termination, you must immediately cease all use of the LiveKit Materials.
Sections 4, 5, 6, and 9 will survive termination.

9. Governing Law and Venue

This Agreement is subject to the laws of the State of California, without
regard to its conflict of laws principles. The UN Convention on Contracts for
the International Sale of Goods does not apply to this Agreement. The courts
located in San Francisco, California, have exclusive jurisdiction for any
dispute arising out of this Agreement.

+ + + +

Last Updated: November 25, 2024
18 changes: 7 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,20 +27,15 @@ originally written in Python.

<!--END_DESCRIPTION-->

## ✨ [NEW] OpenAI Realtime API support
## ✨ [NEW] In-house phrase endpointing model

We're partnering with OpenAI on a new MultimodalAgent API in the Agents framework. This class
completely wraps OpenAI’s Realtime API, abstract away the raw wire protocol, and provide an
ultra-low latency WebRTC transport between GPT-4o and your users’ devices. This same stack powers
Advanced Voice in the ChatGPT app.

- Try the Realtime API in our [playground](https://playground.livekit.io/)
[[code](https://github.com/livekit-examples/realtime-playground)]
- Check out our [guide](https://docs.livekit.io/agents/openai) to building your first app with this new API
We’ve trained a new, open weights phrase endpointing model that significantly improves end-of-turn
detection and conversational flow between voice agents and users by reducing agent interruptions.
Optimized to run on CPUs, it’s available via [`@livekit/agents-plugin-livekit`](plugins/livekit)
package.

> [!WARNING]
> This SDK is in Developer Preview. During this period, you may encounter bugs, and the APIs may
> change.
> This SDK is in beta. During this period, you may encounter bugs, and the APIs may change.
>
> For production, we recommend using the [more mature version](https://github.com/livekit/agents)
> of this framework, built with Python, which supports a larger number of integrations.
Expand Down Expand Up @@ -72,6 +67,7 @@ The following plugins are available today:
| [@livekit/agents-plugin-deepgram](https://www.npmjs.com/package/@livekit/agents-plugin-deepgram) | STT |
| [@livekit/agents-plugin-elevenlabs](https://www.npmjs.com/package/@livekit/agents-plugin-elevenlabs) | TTS |
| [@livekit/agents-plugin-silero](https://www.npmjs.com/package/@livekit/agents-plugin-silero) | VAD |
| [@livekit/agents-plugin-livekit](https://www.npmjs.com/package/@livekit/agents-plugin-livekit) | End-of-turn detection |

## Usage

Expand Down
8 changes: 7 additions & 1 deletion REUSE.toml
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,16 @@ SPDX-License-Identifier = "Apache-2.0"

# silero onnx file
[[annotations]]
path = ["**/*.onnx"]
path = ["**/silero_vad.onnx"]
SPDX-FileCopyrightText = "2024 Silero Team"
SPDX-License-Identifier = "CC-BY-NC-SA-4.0"

# turn detector onnx file
[[annotations]]
path = ["**/turn_detector.onnx"]
SPDX-FileCopyrightText = "2024 LiveKit, Inc."
SPDX-License-Identifier = "LicenseRef-LiveKitModelLicense"

# testing files
[[annotations]]
path = ["**/.gitattributes", "**.wav"]
Expand Down
1 change: 1 addition & 0 deletions agents/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -28,5 +28,6 @@ export * from './log.js';
export * from './generator.js';
export * from './audio.js';
export * from './transcription.js';
export * from './inference_runner.js';

export { cli, stt, tts, llm, pipeline, multimodal, tokenize, metrics };
19 changes: 19 additions & 0 deletions agents/src/inference_runner.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
// SPDX-FileCopyrightText: 2024 LiveKit, Inc.
//
// SPDX-License-Identifier: Apache-2.0

/** @internal */
export abstract class InferenceRunner {
abstract INFERENCE_METHOD: string;
static registeredRunners: { [id: string]: InferenceRunner } = {};

static registerRunner(runner: InferenceRunner) {
if (InferenceRunner.registeredRunners[runner.INFERENCE_METHOD]) {
throw new Error(`Inference runner ${runner.INFERENCE_METHOD} already registered`);
}
InferenceRunner.registeredRunners[runner.INFERENCE_METHOD] = runner;
}

abstract initialize(): Promise<void>;
abstract run(data: unknown): Promise<unknown>;
}
7 changes: 7 additions & 0 deletions agents/src/ipc/inference_executor.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
// SPDX-FileCopyrightText: 2024 LiveKit, Inc.
//
// SPDX-License-Identifier: Apache-2.0

export abstract class InferenceExecutor {
abstract doInference(method: string, data: unknown): Promise<unknown>;
}
3 changes: 3 additions & 0 deletions agents/src/ipc/inference_proc_executor.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
// SPDX-FileCopyrightText: 2024 LiveKit, Inc.
//
// SPDX-License-Identifier: Apache-2.0
9 changes: 5 additions & 4 deletions agents/src/ipc/job_executor.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,14 @@ export interface ProcOpts {
agent: string;
initializeTimeout: number;
closeTimeout: number;
memoryWarnMB: number;
memoryLimitMB: number;
pingInterval: number;
pingTimeout: number;
highPingThreshold: number;
}

export abstract class JobExecutor {
PING_INTERVAL = 2.5 * 1000;
PING_TIMEOUT = 90 * 1000;
HIGH_PING_THRESHOLD = 0.5 * 1000;

abstract get started(): boolean;
abstract get runningJob(): RunningJobInfo | undefined;

Expand Down
15 changes: 14 additions & 1 deletion agents/src/ipc/message.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,12 @@ import type { LoggerOptions } from '../log.js';
export type IPCMessage =
| {
case: 'initializeRequest';
value: { loggerOptions: LoggerOptions };
value: {
loggerOptions: LoggerOptions;
pingInterval?: number;
pingTimeout?: number;
highPingThreshold?: number;
};
}
| {
case: 'initializeResponse';
Expand All @@ -29,6 +34,14 @@ export type IPCMessage =
case: 'shutdownRequest';
value: { reason?: string };
}
| {
case: 'inferenceRequest';
value: { method: string; requestId: string; data: unknown };
}
| {
case: 'inferenceResponse';
value: { requestId: string; data: unknown; error?: Error };
}
| {
case: 'exiting';
value: { reason?: string };
Expand Down
22 changes: 21 additions & 1 deletion agents/src/ipc/proc_pool.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import { MultiMutex, Mutex } from '@livekit/mutex';
import type { RunningJobInfo } from '../job.js';
import { Queue } from '../utils.js';
import { InferenceExecutor } from './inference_executor.js';
import type { JobExecutor } from './job_executor.js';
import { ProcJobExecutor } from './proc_job_executor.js';

Expand All @@ -20,19 +21,28 @@ export class ProcPool {
procMutex?: MultiMutex;
procUnlock?: () => void;
warmedProcQueue = new Queue<JobExecutor>();
inferenceExecutor: InferenceExecutor;
memoryWarnMB: number;
memoryLimitMB: number;

constructor(
agent: string,
numIdleProcesses: number,
initializeTimeout: number,
closeTimeout: number,
inferenceExecutor: InferenceExecutor,
memoryWarnMB: number,
memoryLimitMB: number,
) {
this.agent = agent;
if (numIdleProcesses > 0) {
this.procMutex = new MultiMutex(numIdleProcesses);
}
this.initializeTimeout = initializeTimeout;
this.closeTimeout = closeTimeout;
this.inferenceExecutor = inferenceExecutor;
this.memoryWarnMB = memoryWarnMB;
this.memoryLimitMB = memoryLimitMB;
}

get processes(): JobExecutor[] {
Expand All @@ -52,7 +62,17 @@ export class ProcPool {
this.procUnlock = undefined;
}
} else {
proc = new ProcJobExecutor(this.agent, this.initializeTimeout, this.closeTimeout);
proc = new ProcJobExecutor(
this.agent,
this.initializeTimeout,
this.closeTimeout,
this.inferenceExecutor,
2500,
60000,
500,
this.memoryWarnMB,
this.memoryLimitMB,
);
this.executors.push(proc);
await proc.start();
await proc.initialize();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ import type { ProcOpts } from './job_executor.js';
import { JobExecutor } from './job_executor.js';
import type { IPCMessage } from './message.js';

export class ProcJobExecutor extends JobExecutor {
export class SupervisedProc {
#opts: ProcOpts;
#started = false;
#closing = false;
Expand All @@ -22,12 +22,26 @@ export class ProcJobExecutor extends JobExecutor {
#join = new Future();
#logger = log().child({ runningJob: this.#runningJob });

constructor(agent: string, initializeTimeout: number, closeTimeout: number) {
super();
constructor(
agent: string,
initializeTimeout: number,
closeTimeout: number,
// XXX(nbsp): memoryWarnMB and memoryLimitMB are currently stubbed and serve no use
memoryWarnMB: number,
memoryLimitMB: number,
pingInterval: number,
pingTimeout: number,
highPingThreshold: number,
) {
this.#opts = {
agent,
initializeTimeout,
closeTimeout,
memoryWarnMB,
memoryLimitMB,
pingInterval,
pingTimeout,
highPingThreshold,
};
}

Expand Down Expand Up @@ -61,21 +75,21 @@ export class ProcJobExecutor extends JobExecutor {

this.#pingInterval = setInterval(() => {
this.#proc!.send({ case: 'pingRequest', value: { timestamp: Date.now() } });
}, this.PING_INTERVAL);
}, this.#opts.pingInterval);

this.#pongTimeout = setTimeout(() => {
this.#logger.warn('job is unresponsive');
clearTimeout(this.#pongTimeout);
clearInterval(this.#pingInterval);
this.#proc!.kill();
this.#join.resolve();
}, this.PING_TIMEOUT);
}, this.#opts.pingTimeout);

const listener = (msg: IPCMessage) => {
switch (msg.case) {
case 'pongResponse': {
const delay = Date.now() - msg.value.timestamp;
if (delay > this.HIGH_PING_THRESHOLD) {
if (delay > this.#opts.highPingThreshold) {
this.#logger.child({ delay }).warn('job executor is unresponsive');
}
this.#pongTimeout?.refresh();
Expand Down Expand Up @@ -119,7 +133,15 @@ export class ProcJobExecutor extends JobExecutor {
this.#init.reject(err);
throw err;
}, this.#opts.initializeTimeout);
this.#proc!.send({ case: 'initializeRequest', value: { loggerOptions } });
this.#proc!.send({
case: 'initializeRequest',
value: {
loggerOptions,
pingInterval: this.#opts.pingInterval,
pingTimeout: this.#opts.pingTimeout,
highPingThreshold: this.#opts.highPingThreshold,
},
});
await once(this.#proc!, 'message').then(([msg]: IPCMessage[]) => {
clearTimeout(timer);
if (msg!.case !== 'initializeResponse') {
Expand Down
Loading
Loading