The best option would be to call Furhat from your fronend code (SpeechState + XState) – in contrast with backend code (only XState) that you used in Lab 3. This is a new feature of SpeechState which is not relased yet. To use it:
- Update SpeechState to “pr7” tag:
yarn up speechstate@pr7
- Follow the example in test/furhat.test.ts. You can ignore test
functions, and focus on the definition of the statechart (inside
setup(...)
). You can integrate this code into your app to enable Furhat lipsync. This is how it works:- SpeechState emits a new event
FURHAT_BLENDSHAPES
. It sends a number of such events to control animation of Furhat. - Your code reacts to these events and invokes
fhBlendShape
actor which sends lip animation parameters to Furhat. - ASR and TTS happens in the browser, and not on the Furhat, so you
don’t need to invoke
furhat/say
andfurhat/listen
methods.
- SpeechState emits a new event
- You can implement a bunch of other animations for Furhat, just like you did in Lab 3. They will be blended together with lip movements, e.g. you can make Furhat smile while speaking.
- Unlike backend code, the frontend code has to deal with something
which is called CORS. Basically, Furhat API will be blocking all
the calls from the browser… The easiest way to get around this
limitation is to run a proxy server:
- Clone cors-anywhere repository into a different folder (not inside your project)
- Install the dependencies:
yarn install
- Run the server:
node server.js
- From now on, to call Furhat prepend the Remote API URL with
http://localhost:8080/
, for instance:http://localhost:8080/http://127.0.0.1:54321/furhat/attend?user=CLOSEST
There is a LLaVA model available at mltgpu. It is a little tricky to provide the image via the HTTP call, because it should be base64-encoded.
The easiest way is to base64-encode your image is to load it on the
canvas (with id="canvas
, see example below), and then use the following method:
const canvas = <HTMLCanvasElement>document.getElementById("canvas");
const image = canvas.toDataURL("image/jpeg").split(";base64,")[1];
Then you can use the completion API with the following body
payload.
JSON.stringify({
model: "llava",
stream: false,
prompt:
"What's on this image?",
images: [image],
});
<canvas id="canvas" height="627" width="627"></canvas>
const canvas = document.getElementById("canvas");
const ctx = canvas.getContext("2d");
img = new Image();
// ctx.strokeRect(1, 1, 626, 626); // to draw the canvas border
img.onload = () =>
{
let hRatio = canvas.width / img.width;
let vRatio = canvas.height / img.height;
let ratio = Math.min(hRatio, vRatio);
let centerShift_x = (canvas.width - img.width * ratio) / 2;
let centerShift_y = (canvas.height - img.height * ratio) / 2;
ctx.drawImage(img,0,0, img.width, img.height, centerShift_x, centerShift_y, img.width * ratio, img.height * ratio)
};
img.src = "https://upload.wikimedia.org/wikipedia/commons/thumb/9/97/G%C3%B6teborg_2503_stitch_%2828573994096%29.jpg/1280px-G%C3%B6teborg_2503_stitch_%2828573994096%29.jpg"