Merge pull request #192 from nulib/4570-prompt-markdown

Adjust prompting for markdown and collection
nulib · Mar 13, 2024 · b0f8d86 · b0f8d86
2 parents c619c82 + 066bd22
commit b0f8d86
Show file tree

Hide file tree

Showing 3 changed files with 19 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -132,6 +132,22 @@ bin/start-with-step
 aws stepfunctions create-state-machine --endpoint http://localhost:8083 --definition file://state_machines/av_download.json --name "hlsStitcherStepFunction" --role-arn arn:aws:iam::012345678901:role/DummyRole
 ```
 
+## Deploying a development branch
+
+```
+# sam sync --watch will do hot deploys as you make changes. If you don't want this, switch below command to sam sync or deploy
+
+export STACK_NAME=dc-api-yourdevprefix
+export CONFIG_ENV=staging 
+
+sam sync --watch --stack-name $STACK_NAME \             
+  --config-env $CONFIG_ENV \
+  --config-file ./samconfig.toml \
+  --parameter-overrides $(while IFS='=' read -r key value; do params+=" $key=$value"; done < ./$CONFIG_ENV.parameters && echo "$params CustomDomainHost=$STACK_NAME")
+```
+
+This will give you API routes like: `https://dc-api-yourdevprefix.rdc-staging.library.northwestern.edu/chat-endpoint`
+
 ## Deploying the API manually
 
 - Symlink the `*.parameters` file you need from `tfvars/dc-api/` to the application root

diff --git a/chat/src/helpers/prompts.py b/chat/src/helpers/prompts.py
@@ -2,10 +2,11 @@
 
 
 def prompt_template() -> str:
-    return """Please answer the question based on the documents provided, and include some details about why the documents might be relevant to the particular question:
+    return """Please answer the question based on the documents provided, and include some details about why the documents might be relevant to the particular question. The 'title' field is the document title, and the 'source' field is a UUID that uniquely identifies each document:
 
 Documents:
 {context}
+Format the answer as raw markdown. Insert links when referencing documents by title using it's UUID, as in the following guide: [title](https://dc.library.northwestern.edu/items/UUID). Example: [Judy Collins, Jackson Hole Folk Festival](https://dc.library.northwestern.edu/items/f1ca513b-7d13-4af6-ad7b-8c7ffd1d3a37). Suggest keywords searches using the following guide (example: [jazz musicians](https://dc.library.northwestern.edu/search?q=Jazz+musicians)). Offer search terms that vary in scope, highlight specific individuals or groups, or delve deeper into a topic. Remember to include as many direct links to Digital Collections searches as needed for comprehensive study. The `collection` field contains information about the collection the document belongs to. When many of the documents are from the same collection, mention the collection and link to the collection using the collection title and id: [collection['title']](https://dc.library.northwestern.edu/collections/collection['id']), for example [World War II Poster Collection](https://dc.library.northwestern.edu/collections/faf4f60e-78e0-4fbf-96ce-4ca8b4df597a):
 
 Question:
 {question}

diff --git a/chat/test/helpers/test_metrics.py b/chat/test/helpers/test_metrics.py
@@ -48,7 +48,7 @@ def test_token_usage(self):
 
         expected_result = {
             "answer": 6,
-            "prompt": 36,
+            "prompt": 328,
             "question": 15,
             "source_documents": 1,
         }