Skip to content

Latest commit

 

History

History
37 lines (25 loc) · 1.63 KB

5_cloud_run.md

File metadata and controls

37 lines (25 loc) · 1.63 KB

5. Cloud Run

Now that we have all the code for our LangChain RetreivalQA agent we take this code and create a Cloud Run API to expose the functionality.

  • All the services are enabled in terraform. See the cloud-run.tf file
  • The API uses the Google Cloud Python client libraries to connect to the Vertex AI Matching Engine Index Endpoint to retrieve the top k nearest neighbors
  • The API uses FastAPI and to expose the endpoint
  • We use a custom Dockerfile to build the API image including it's depencencies and pre-trained models needed to generate the embeddings

The source for the API can be found at api/main.py.

The API exposes two endpoints:

  1. For only retrieving the top k nearest neighbors
  2. A full QA system that takes a query and uses Vertex AL PaLM to generate an answer

A version of this API and an OpenAPI endpoint is available here.

Cloud BUild

We use Cloud Build to take the source code, dependencies and Dockerfile to build a Docker contianer, upload it to Google Artifact Registry and deploy it to Cloud Run. Look at the cloudbuild.yml file for more details.

Next steps

We can create a simple web UI to interact with the API.

Resources