Test New LLMs (Llama2, CodeLlama, etc.) on Chat-UI? #3

krrishdholakia · 2023-09-29T18:51:12Z

Notice you forked chat-ui. if you're trying to test other LLMs (codellama, wizardcoder, etc.) with it, I just wrote a 1-click proxy to translate openai calls to huggingface, anthropic, togetherai, etc. api calls.

code

$ pip install litellm

$ litellm --model huggingface/bigcode/starcoder

#INFO:     Uvicorn running on http://0.0.0.0:8000

>> openai.api_base = "http://0.0.0.0:8000"

Here's the PR on adding openai to chat-ui: huggingface#452

I'd love to know if this solves a problem for you

This reverts commit 87c6937.

* Fix reuqest body * update webSearchQueryPromptTemplate * update generate google query parser * Add today's date to google search query creator * crawl top stories if exts; remove answer_box & knowledgeGraph * Create paragraph chunks from top articles * flattened paragprah chunks * update status texts * add gradio client * call gradio app for RAG * Web scrape only "p, li, span" els * add MAX_N_CHUNKS * gradio result typing * parse only <p> elements * rm dev change * update typing WebSearch * buld RAG prompt * Rm dev change * change websearch context msg from user to assisntat type * use hosted gradio app * fix lint * prompt engineering * more prompt engineering * MAX_N_PAGES_SCRAPE = 10 * better error msg * more prompt engineering * revert websearch prompt to previous * rm `top_stories` from websearch as the results are not good * Stop using gradio client, use regular fetch * chore * Rm websearchsummary references as it is no longer used * update readme * Apply suggestions from code review Co-authored-by: Julien Chaumond <[email protected]> * Use tfjs to do embeddings in server node * fix websearch component disapperar after finishing generation * Show sources of closest embeddings used in RAG * fix prompting and also add current date * add comment * comment for search query * sources * hide www * using hostname direclty * Show successful web pages instead of failed ones * rm noisy messages * google query generation using previous messaages as context * handle falcon generation * bring back Browsing webpage msg --------- Co-authored-by: Julien Chaumond <[email protected]> Co-authored-by: Victor Mustar <[email protected]>

* Update README.md * add description of websearch on readme * Apply suggestions from code review Co-authored-by: Victor Muštar <[email protected]> * Update README.md --------- Co-authored-by: Mishig Davaadorj <[email protected]> Co-authored-by: Mishig <[email protected]>

* adjustments and mobile modal * use dvh unit * margin

* Add latex support with marked-katex-extension * Add renderer * Fix marked default option problem * Fix linting error * Fix lock error

* Bump mongodb from 5.3.0 to 5.8.0 Bumps [mongodb](https://github.com/mongodb/node-mongodb-native) from 5.3.0 to 5.8.0. - [Release notes](https://github.com/mongodb/node-mongodb-native/releases) - [Changelog](https://github.com/mongodb/node-mongodb-native/blob/v5.8.0/HISTORY.md) - [Commits](mongodb/node-mongodb-native@v5.3.0...v5.8.0) --- updated-dependencies: - dependency-name: mongodb dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> * Store IP in messageEvents * IP based rate limit * Revert "IP based rate limit" This reverts commit 87c6937. * ip rate limit * move rate limit event to top * Add rate limiting to websearch and title summary (huggingface#433) * [Websearch] update (huggingface#427) * Fix reuqest body * update webSearchQueryPromptTemplate * update generate google query parser * Add today's date to google search query creator * crawl top stories if exts; remove answer_box & knowledgeGraph * Create paragraph chunks from top articles * flattened paragprah chunks * update status texts * add gradio client * call gradio app for RAG * Web scrape only "p, li, span" els * add MAX_N_CHUNKS * gradio result typing * parse only <p> elements * rm dev change * update typing WebSearch * buld RAG prompt * Rm dev change * change websearch context msg from user to assisntat type * use hosted gradio app * fix lint * prompt engineering * more prompt engineering * MAX_N_PAGES_SCRAPE = 10 * better error msg * more prompt engineering * revert websearch prompt to previous * rm `top_stories` from websearch as the results are not good * Stop using gradio client, use regular fetch * chore * Rm websearchsummary references as it is no longer used * update readme * Apply suggestions from code review Co-authored-by: Julien Chaumond <[email protected]> * Use tfjs to do embeddings in server node * fix websearch component disapperar after finishing generation * Show sources of closest embeddings used in RAG * fix prompting and also add current date * add comment * comment for search query * sources * hide www * using hostname direclty * Show successful web pages instead of failed ones * rm noisy messages * google query generation using previous messaages as context * handle falcon generation * bring back Browsing webpage msg --------- Co-authored-by: Julien Chaumond <[email protected]> Co-authored-by: Victor Mustar <[email protected]> * bump to 0.6.0 (huggingface#434) * Update README.md (huggingface#435) * Update README.md * add description of websearch on readme * Apply suggestions from code review Co-authored-by: Victor Muštar <[email protected]> * Update README.md --------- Co-authored-by: Mishig Davaadorj <[email protected]> Co-authored-by: Mishig <[email protected]> * Mobile: fix model selection (huggingface#448) * adjustments and mobile modal * use dvh unit * margin * fix lint on main * Add latex support with marked-katex-extension (huggingface#450) * Add latex support with marked-katex-extension * Add renderer * Fix marked default option problem * Fix linting error * Fix lock error --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Nathan Sarrazin <[email protected]> Co-authored-by: Mishig <[email protected]> Co-authored-by: Julien Chaumond <[email protected]> Co-authored-by: Victor Mustar <[email protected]> Co-authored-by: Mishig Davaadorj <[email protected]> Co-authored-by: Blanchon <[email protected]>

…gingface#451) * feat: Improve error handling and parsing of MODELS environment variable * Add more verbose parsing error * Lint * improve message * lint * refactor error handling and default values in models * improve * format --------- Co-authored-by: Nathan Sarrazin <[email protected]>

* Use `gte-base` as the emebdding model * use `bge-small-en-v1.5` * Revert "use `bge-small-en-v1.5`" This reverts commit 8cfe084. * Use `gte-small`

This reverts commit f88542b.

…ted (huggingface#451)" This reverts commit 8ce8b63.

This reverts commit 1061bc2.

* wip: complete refactor of streaming backend * working refactoring * fix missing first token & perf regression in output quality * lint * Fix websearch loading from db * fix loading * fix invalidate * remove logs * fix SSR error * typo: paragraphs * fixed save on abort * lint * lint * remove debug log in console * lint for real

* Refactor summarization * get rid of debug log * remove old todo

* fix JSON.parse for summerize When serving with TGI, summerize calls this function and it errors with `SyntaxError: Unexpected token d in JSON at position 0` This PR fixes the problem and keeps existing behaviour. * fix types --------- Co-authored-by: Nathan Sarrazin <[email protected]>

* add-copytoclipboardbtn for the all message * fix padding * fix padding * Fix styling * Move before like and dislike button * position and spacing * mobile fix --------- Co-authored-by: Victor Mustar <[email protected]>

nsarrazin and others added 27 commits September 12, 2023 08:54

Store IP in messageEvents

b8c0a1d

IP based rate limit

87c6937

Revert "IP based rate limit"

2e8d14d

This reverts commit 87c6937.

ip rate limit

6ee13bf

move rate limit event to top

ba93cf8

Add rate limiting to websearch and title summary (huggingface#433)

0953d85

bump to 0.6.0 (huggingface#434)

e5afba2

Mobile: fix model selection (huggingface#448)

c867764

* adjustments and mobile modal * use dvh unit * margin

fix lint on main

77df078

Add latex support with marked-katex-extension (huggingface#450)

15bf16f

* Add latex support with marked-katex-extension * Add renderer * Fix marked default option problem * Fix linting error * Fix lock error

Update embedding model for WebSearch (huggingface#437)

f88542b

* Use `gte-base` as the emebdding model * use `bge-small-en-v1.5` * Revert "use `bge-small-en-v1.5`" This reverts commit 8cfe084. * Use `gte-small`

Revert "Update embedding model for WebSearch (huggingface#437)"

1061bc2

This reverts commit f88542b.

Revert "Improve error message when the .env MODELS is not well format…

7ddda31

…ted (huggingface#451)" This reverts commit 8ce8b63.

Revert "Revert "Update embedding model for WebSearch (huggingface#437)""

aa07e29

This reverts commit 1061bc2.

Update README.md (huggingface#455)

afbf680

Refactor summarization so it gets called from backend (huggingface#456)

9960338

* Refactor summarization * get rid of debug log * remove old todo

Make embedding model settings more future-proof (huggingface#454)

3acc11d

error console instead of crashing

5b07906

fix types

0134fe1

Add a message wide copy button (huggingface#453)

a7dc1aa

* add-copytoclipboardbtn for the all message * fix padding * fix padding * Fix styling * Move before like and dislike button * position and spacing * mobile fix --------- Co-authored-by: Victor Mustar <[email protected]>

Update README.md

af76417

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test New LLMs (Llama2, CodeLlama, etc.) on Chat-UI? #3

Test New LLMs (Llama2, CodeLlama, etc.) on Chat-UI? #3

krrishdholakia commented Sep 29, 2023 •

edited

Loading

Test New LLMs (Llama2, CodeLlama, etc.) on Chat-UI? #3

Are you sure you want to change the base?

Test New LLMs (Llama2, CodeLlama, etc.) on Chat-UI? #3

Conversation

krrishdholakia commented Sep 29, 2023 • edited Loading

krrishdholakia commented Sep 29, 2023 •

edited

Loading