Skip to content

Latest commit

 

History

History
119 lines (82 loc) · 5.41 KB

deploy_lowcost.md

File metadata and controls

119 lines (82 loc) · 5.41 KB

RAG chat: Deploying with minimal costs

This AI RAG chat application is designed to be easily deployed using the Azure Developer CLI, which provisions the infrastructure according to the Bicep files in the infra folder. Those files describe each of the Azure resources needed, and configures their SKU (pricing tier) and other parameters. Many Azure services offer a free tier, but the infrastructure files in this project do not default to the free tier as there are often limitations in that tier.

However, if your goal is to minimize costs while prototyping your application, follow the steps below before running azd up. Once you've gone through these steps, return to the deployment steps.

📺 Live stream: Deploying from a free account

  1. Log in to your Azure account using the Azure Developer CLI:

    azd auth login
  2. Create a new azd environment for the free resource group:

    azd env new

    Enter a name that will be used for the resource group. This will create a new folder in the .azure folder, and set it as the active environment for any calls to azd going forward.

  3. Use the free tier of App Service:

    azd env set AZURE_APP_SERVICE_SKU F1

    Limitation: You are only allowed a certain number of free App Service instances per region. If you have exceeded your limit in a region, you will get an error during the provisioning stage. If that happens, you can run azd down, then azd env new to create a new environment with a new region.

  4. Use the free tier of Azure AI Search:

    azd env set AZURE_SEARCH_SERVICE_SKU free

    Limitations:

    1. You are only allowed one free search service across all regions. If you have one already, either delete that service or follow instructions to reuse your existing search service.
    2. The free tier does not support semantic ranker, so the app UI will no longer display the option to use the semantic ranker. Note that will generally result in decreased search relevance.
  5. Use the free tier of Azure Document Intelligence (used in analyzing files):

    azd env set AZURE_DOCUMENTINTELLIGENCE_SKU F0

    Limitation for PDF files:

    The free tier will only scan the first two pages of each PDF. In our sample documents, those first two pages are just title pages, so you won't be able to get answers from the documents. You can either use your own documents that are only 2-pages long, or you can use a local Python package for PDF parsing by setting:

    azd env set USE_LOCAL_PDF_PARSER true

    Limitation for HTML files:

    The free tier will only scan the first two pages of each HTML file. So, you might not get very accurate answers from the files. You can either use your own files that are only 2-pages long, or you can use a local Python package for HTML parsing by setting:

    azd env set USE_LOCAL_HTML_PARSER true
  6. Use the free tier of Azure Cosmos DB:

    azd env set AZURE_COSMOSDB_SKU free

    Limitation: You can have only one free Cosmos DB account. To keep your account free of charge, ensure that you do not exceed the free tier limits. For more information, see the Azure Cosmos DB lifetime free tier.

  7. Turn off Azure Monitor (Application Insights):

    azd env set AZURE_USE_APPLICATION_INSIGHTS false

    Application Insights is quite inexpensive already, so turning this off may not be worth the costs saved, but it is an option for those who want to minimize costs.

  8. Use OpenAI.com instead of Azure OpenAI: This is only a necessary step for Azure free/student accounts, as they do not currently have access to Azure OpenAI.

    azd env set OPENAI_HOST openai
    azd env set OPENAI_ORGANIZATION {Your OpenAI organization}
    azd env set OPENAI_API_KEY {Your OpenAI API key}

    Both Azure OpenAI and openai.com OpenAI accounts will incur costs, based on tokens used, but the costs are fairly low for the amount of sample data (less than $10).

  9. Disable vector search:

    azd env set USE_VECTORS false

    By default, the application computes vector embeddings for documents during the data ingestion phase, and then computes a vector embedding for user questions asked in the application. Those computations require an embedding model, which incurs costs per tokens used. The costs are fairly low, so the benefits of vector search would typically outweigh the costs, but it is possible to disable vector support. If you do so, the application will fall back to a keyword search, which is less accurate.

  10. Once you've made the desired customizations, follow the steps in the README to run azd up. We recommend using "eastus" as the region, for availability reasons.

Reducing costs locally

To save costs for local development, you could use an OpenAI-compatible model. Follow steps in local development guide.