Skip to content

Latest commit

 

History

History
88 lines (80 loc) · 3.93 KB

project-notes.org

File metadata and controls

88 lines (80 loc) · 3.93 KB

Some notes regarding the project

If you want to call Furhat and also show something on the screen…

The best option would be to call Furhat from your fronend code (SpeechState + XState) – in contrast with backend code (only XState) that you used in Lab 3. This is a new feature of SpeechState which is not relased yet. To use it:

  1. Update SpeechState to “pr7” tag:
    yarn up speechstate@pr7  
        
  2. Follow the example in test/furhat.test.ts. You can ignore test functions, and focus on the definition of the statechart (inside setup(...)). You can integrate this code into your app to enable Furhat lipsync. This is how it works:
    • SpeechState emits a new event FURHAT_BLENDSHAPES. It sends a number of such events to control animation of Furhat.
    • Your code reacts to these events and invokes fhBlendShape actor which sends lip animation parameters to Furhat.
    • ASR and TTS happens in the browser, and not on the Furhat, so you don’t need to invoke furhat/say and furhat/listen methods.
  3. You can implement a bunch of other animations for Furhat, just like you did in Lab 3. They will be blended together with lip movements, e.g. you can make Furhat smile while speaking.
  4. Unlike backend code, the frontend code has to deal with something which is called CORS. Basically, Furhat API will be blocking all the calls from the browser… The easiest way to get around this limitation is to run a proxy server:
    • Clone cors-anywhere repository into a different folder (not inside your project)
    • Install the dependencies:
      yarn install
              
    • Run the server:
      node server.js
              
    • From now on, to call Furhat prepend the Remote API URL with http://localhost:8080/, for instance:
      http://localhost:8080/http://127.0.0.1:54321/furhat/attend?user=CLOSEST
              

If you want to generate image descriptions…

There is a LLaVA model available at mltgpu. It is a little tricky to provide the image via the HTTP call, because it should be base64-encoded.

The easiest way is to base64-encode your image is to load it on the canvas (with id="canvas, see example below), and then use the following method:

const canvas = <HTMLCanvasElement>document.getElementById("canvas");
const image = canvas.toDataURL("image/jpeg").split(";base64,")[1];

Then you can use the completion API with the following body payload.

JSON.stringify({
  model: "llava",
  stream: false,
  prompt:
    "What's on this image?",
  images: [image],
});

How to load the image on canvas

<canvas id="canvas" height="627" width="627"></canvas>
const canvas = document.getElementById("canvas");
const ctx = canvas.getContext("2d");
img = new Image();
// ctx.strokeRect(1, 1, 626, 626); // to draw the canvas border
img.onload = () => 
  {
   let hRatio = canvas.width / img.width;
   let vRatio = canvas.height / img.height;
   let ratio = Math.min(hRatio, vRatio);
   let centerShift_x = (canvas.width - img.width * ratio) / 2;
   let centerShift_y = (canvas.height - img.height * ratio) / 2;

   ctx.drawImage(img,0,0, img.width, img.height, centerShift_x, centerShift_y, img.width * ratio, img.height * ratio) 
  };
img.src = "https://upload.wikimedia.org/wikipedia/commons/thumb/9/97/G%C3%B6teborg_2503_stitch_%2828573994096%29.jpg/1280px-G%C3%B6teborg_2503_stitch_%2828573994096%29.jpg"