-
Notifications
You must be signed in to change notification settings - Fork 773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update LangChain Support #2188
base: master
Are you sure you want to change the base?
Update LangChain Support #2188
Conversation
I just realized that the This sample code (slightly modified from the example in the documentation) from bertopic import BERTopic
from bertopic.representation import LangChain
from langchain.chains.question_answering import load_qa_chain
from langchain_core.documents import Document
from langchain_core.runnables import RunnablePassthrough
representation_llm = ...
representation_prompt = "summarize these documents, here are keywords about them [KEYWORDS]"
chain = (
{
"input_documents": (
lambda inp: [
Document(
page_content=d.page_content.upper()
)
for d in inp["input_documents"]
]
),
"question": RunnablePassthrough(),
}
| load_qa_chain(representation_llm, chain_type="stuff")
| (lambda output: {"output_text": output["output_text"]})
)
representation_model = LangChain(chain, prompt=representation_prompt, nr_docs=2)
docs = [
"The sky is blue and the sun is shining.",
"I love going to the beach on sunny days.",
"Artificial intelligence is transforming the world.",
"Machine learning enables computers to learn from data.",
"It's important to wear sunscreen to avoid sunburns.",
"Deep learning models require a lot of data and computation.",
"Today's weather forecast predicts a clear sky.",
"Neural networks are powerful models in AI.",
"I need to buy a new pair of sunglasses for summer.",
"Natural language processing is a subset of AI."
]
topic_model = BERTopic(representation_model=representation_model)
topics, probabilities = topic_model.fit_transform(docs) results in this prompt being created
I'll fix this as well in the PR 😄 |
…nto input for the chain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your extensive work on this! I left a couple of minor comments here and there.
--- | ||
Topic: | ||
Sample texts from this topic: | ||
[DOCUMENTS] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a newline after this when we fill in [DOCUMENTS] or should we add one? It needs to have the exact same structure as the examples above otherwise it might hurt performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems you got this from the Cohere one and I'm not actually sure whether there is an additional newline... perhaps I should also add one there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I wasn't sure what kind of "default" prompt you would like to use so I just copied the one from another representation.
I ran that default prompt with an example, and you can see the formatted prompt below. It seems like the default separator is to use two newlines between each document (which I guess is better for long documents). I can change this to be a single newline, and remove the "-" from the examples so that the behaviour is the same everywhere. I think I can also make it so that the documents start with the "-" (in that case the code will have to be a bit more complex to allow for a custom document formatter).
This is a list of texts where each collection of texts describes a topic. After each collection of texts, the name of the topic they represent is mentioned as a short, highly descriptive title.
---
Topic:
Sample texts from this topic:
- Traditional diets in most cultures were primarily plant-based with a little meat on top, but with the rise of industrial-style meat production and factory farming, meat has become a staple food.
- Meat, but especially beef, is the worst food in terms of emissions.
- Eating meat doesn't make you a bad person, not eating meat doesn't make you a good one.
Keywords: meat beef eat eating emissions steak food health processed chicken
Topic name: Environmental impacts of eating meat
---
Topic:
Sample texts from this topic:
- I have ordered the product weeks ago but it still has not arrived!
- The website mentions that it only takes a couple of days to deliver but I still have not received mine.
- I got a message stating that I received the monitor but that is not true!
- It took a month longer to deliver than was advised...
Keywords: deliver weeks product shipping long delivery received arrived arrive week
Topic name: Shipping and delivery issues
---
Topic:
Sample texts from this topic:
I love going to the beach on sunny days.
I need to buy a new pair of sunglasses for summer.
Deep learning models require a lot of data and computation.
The sky is blue and the sun is shining.
Keywords: to, the, is, of, data, and, sky, models, learning, ai, artificial, computers, computation, clear, days, for, forecast, deep, enables, going, from, in, blue, buy, avoid, beach, are, learn, language, its
Topic name:```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change this to be a single newline, and remove the "-" from the examples so that the behaviour is the same everywhere. I think I can also make it so that the documents start with the "-" (in that case the code will have to be a bit more complex to allow for a custom document formatter).
Let's go for your first suggestion. That would minimize the amount of additional code necessary whilst still maintaining consistency in the prompting format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified the code and the examples slightly to make the spacing consistent, and add the missing comas between keywords. Now the formatted default prompt looks like this.
================================ Human Message =================================
This is a list of texts where each collection of texts describes a topic. After each collection of texts, the name of the topic they represent is mentioned as a short, highly descriptive title.
---
Topic:
Sample texts from this topic:
Traditional diets in most cultures were primarily plant-based with a little meat on top, but with the rise of industrial-style meat production and factory farming, meat has become a staple food.
Meat, but especially beef, is the worst food in terms of emissions.
Eating meat doesn't make you a bad person, not eating meat doesn't make you a good one.
Keywords: meat, beef, eat, eating, emissions, steak, food, health, processed, chicken
Topic name: Environmental impacts of eating meat
---
Topic:
Sample texts from this topic:
I have ordered the product weeks ago but it still has not arrived!
The website mentions that it only takes a couple of days to deliver but I still have not received mine.
I got a message stating that I received the monitor but that is not true!
It took a month longer to deliver than was advised...
Keywords: deliver, weeks, product, shipping, long, delivery, received, arrived, arrive, week
Topic name: Shipping and delivery issues
---
Topic:
Sample texts from this topic:
I love going to the beach on sunny days.
I need to buy a new pair of sunglasses for summer.
Deep learning models require a lot of data and computation.
The sky is blue and the sun is shining.
Keywords: to, the, is, of, data, and, sky, models, learning, ai, artificial, computers, computation, clear, days, for, forecast, deep, enables, going, from, in, blue, buy, avoid, beach, are, learn, language, its
Topic name:
There are a lot of ways to create a prompt for representation generation, and as I've mentioned here I've just taken an existing one from BERTopic and adapted it slightly. If it works for you I propose to leave it as-is, but I can always change it :)
llm: A LangChain text model or chat model used to generate representations, only needed for basic usage. | ||
Examples include ChatOpenAI or ChatAnthropic. Ignored if a custom chain is provided. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to myself that I should update BERTopic soon to 0.17.0 considering this is an API change. It will break previous implementations and this new feature should not be put in a minor version.
else: | ||
# For list output: | ||
# 1. Convert all elements to stripped strings | ||
# 2. Take up to 10 elements | ||
# 3. Assign decreasing weights from 1.0 to 0.1 | ||
# 4. Pad with empty strings if needed to always have 10 elements | ||
clean_outputs = [str(label).strip() for label in output] | ||
top_labels = clean_outputs[:10] | ||
|
||
# Create (label, weight) pairs with decreasing weights | ||
labels = [(label, 1.0 - (i * 0.1)) for i, label in enumerate(top_labels)] | ||
|
||
# Pad with empty strings if we have less than 10 labels | ||
if len(labels) < 10: | ||
labels.extend([("", 0.0) for _ in range(10 - len(labels))]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting. The output of any generative model in BERTopic is meant to be the same as you did above:
[("my label", 1), ("", 0), ("", 0), ("", 0), ("", 0), ("", 0), ("", 0), ("", 0), ("", 0), ("", 0)]
but what wasn't implemented before is that you could also generate a list of keywords/labels. Do you have an example of when this piece of code would be executed? When is the output a list rather than a single string?
Also, I'm a bit hesistant giving decreasing weights rather than all 1s since (if I'm not mistaken) the weights do not have any meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I wasn't entirely sure about the meaning behind the list of tuples and the weights so I just kept the old behaviour (which you have provided an example of) and, given that format, I assumed it could be extended to allow for generated lists of labels. In the case of lists, I don't mind setting the weights to all 1s (again, I didn't research the meaning of that format for representations).
The need to allow for lists stemmed from a use-case I had where I used a custom chain to generate topic labels in several languages with the current implementation. Since the current implementation does not allow for lists, I concatenated all elements of the list generated by the chain into a single string with a specific separator so that it could be split later. Allowing for lists in the chain output would make it possible to avoid this. Granted, this may be overkill 😄
I provided an example of basic usage, basic usage with custom prompt and advanced usage with different types of list outputs in this thread. Maybe looking at that code and its output would make it more explicit 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I wasn't entirely sure about the meaning behind the list of tuples and the weights so I just kept the old behaviour (which you have provided an example of) and, given that format, I assumed it could be extended to allow for generated lists of labels. In the case of lists, I don't mind setting the weights to all 1s (again, I didn't research the meaning of that format for representations).
There isn't actually any LLM implemented in BERTopic currently that returns a list of labels. They all return a single label. Although I like the idea of returning multiple labels, I wou suggest removing this here considering this might be a bit out of scope for this PR.
The need to allow for lists stemmed from a use-case I had where I used a custom chain to generate topic labels in several languages with the current implementation. Since the current implementation does not allow for lists, I concatenated all elements of the list generated by the chain into a single string with a specific separator so that it could be split later. Allowing for lists in the chain output would make it possible to avoid this. Granted, this may be overkill 😄
Hmmm, this is rather interesting use case that I haven't seen before. Now you make hesistate on the best course of action here... Nevermind what I said above, let's keep this and make sure they all have values of 1 instead of a decreasing value. Since the weights are currently meaningless (and quite frankly not used) in the LLM-setting, we can just set them to 1.
I provided an example of basic usage, basic usage with custom prompt and advanced usage with different types of list outputs in this thread. Maybe looking at that code and its output would make it more explicit 😃
Wow, these are amazing examples! Thanks for sharing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the nice thing about the list behaviour is that it fits nicely with what is already implemented as returning
[("my label", 1), ("", 0), ("", 0), ("", 0), ("", 0), ("", 0), ("", 0), ("", 0), ("", 0), ("", 0)]
or
[("my first label", 1), ("my second label", 1), ("my third label", 1), ("", 0), ("", 0), ("", 0), ("", 0), ("", 0), ("", 0), ("", 0)]
seems consistent. In addition, with the basic llm
+ prompt
usage only the first behaviour happens. This just gives more flexibility for using custom chains.
I have adapted the code so that the weight is 1 for all labels.
@@ -479,15 +479,23 @@ To use langchain, you will need to install the langchain package first. Addition | |||
like openai: | |||
|
|||
```bash | |||
pip install langchain, openai | |||
pip install langchain | |||
pip langchain_openai | |||
``` | |||
|
|||
Then, you can create your chain as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also give the simple example without the chain since most users will likely want to make use of the llm
param.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I forgot to update the documentation, will fix :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the documentation with explanations on what to install, the basic usage, and the advanced usage. I also adapted the documentation in the code.
If you need a typical user to test run this, am happy to do so. Am currently fighting to get it linked up and working in Jupyter with local Ollama models. |
Here is a full example script with the new LangChain representation in action (both with basic and advanced usage, and with list output in the case of advanced usage). import pandas as pd
from typing import List
from pydantic import BaseModel, Field
from sentence_transformers import SentenceTransformer
from bertopic import BERTopic
from bertopic.representation import LangChain
from langchain_core.callbacks import BaseCallbackHandler
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import CommaSeparatedListOutputParser
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_openai import AzureChatOpenAI
# Custom callback handler for logging the prompts used internally in the chains after they are formatted
class CustomCallbackHandler(BaseCallbackHandler):
def on_chat_model_start(self, serialized, messages, **kwargs):
for message in messages[0]:
message.pretty_print()
# List of documents
documents = [
"The sky is blue and the sun is shining.",
"I love going to the beach on sunny days.",
"Artificial intelligence is transforming the world.",
"Machine learning enables computers to learn from data.",
"It's important to wear sunscreen to avoid sunburns.",
"Deep learning models require a lot of data and computation.",
"Today's weather forecast predicts a clear sky.",
"Neural networks are powerful models in AI.",
"I need to buy a new pair of sunglasses for summer.",
"Natural language processing is a subset of AI."
]
# Create a sentence transformer model and compute embeddings
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = embedding_model.encode(documents, show_progress_bar=False)
# Create an AzureChatOpenAI model
azure_chat_model = AzureChatOpenAI(
azure_deployment="gpt-4o",
temperature=0,
azure_endpoint="redacted",
openai_api_key="redacted",
openai_api_version="2024-08-01-preview",
)
# Create a LangChain representation using the default prompt
# Demonstrates basic usage of LangChain with BERTopic using the default prompt.
langchain_repr_default_prompt = LangChain(
llm=azure_chat_model,
chain_config={
"callbacks": [
CustomCallbackHandler()
]
}
)
# Create a LangChain representation with a custom prompt
# Demonstrates how to use a custom prompt with LangChain and BERTopic.
custom_prompt = "Here is a list of documents: [DOCUMENTS], output a single keyword in Italian that represents these documents. Keyword:"
langchain_repr_custom_prompt = LangChain(
llm=azure_chat_model,
prompt=custom_prompt,
chain_config={
"callbacks": [
CustomCallbackHandler()
]
}
)
# Create a LangChain representation with structured output using a custom chain
# Pydantic model for structured output
class MultiLingualKeywords(BaseModel):
"""A class to contain keywords in multiple languages."""
keyword_en: str = Field(description='A keyword in English that represents the cluster.')
keywords_es: List[str] = Field(description='A list of several keywords in Spanish that represents the cluster.')
# Demonstrates how to use structured output with LangChain and BERTopic using Pydantic models.
azure_chat_model_structured_output = azure_chat_model.with_structured_output(MultiLingualKeywords, method="json_schema")
structured_output_prompt = ChatPromptTemplate.from_template(
"Here is a list of documents: {DOCUMENTS}, output a single keyword in English and a list of keywords in Spanish that represents these documents."
)
structured_output_chain = create_stuff_documents_chain(
llm=azure_chat_model_structured_output,
prompt=structured_output_prompt,
document_variable_name="DOCUMENTS",
output_parser=lambda x: [x.keyword_en] + x.keywords_es # Transforming the structured output into a list
)
langchain_repr_structured_output = LangChain(
chain=structured_output_chain,
chain_config={
"callbacks": [
CustomCallbackHandler()
]
}
)
# Create a LangChain representation that outputs a list of keywords
# Demonstrates how to output a list of keywords using a custom chain and output parser.
list_output_prompt = ChatPromptTemplate.from_template(
"Here is a list of documents: {DOCUMENTS}, output a comma-separated list of keywords in English that represents these documents."
)
list_output_chain = create_stuff_documents_chain(
llm=azure_chat_model,
prompt=list_output_prompt,
document_variable_name="DOCUMENTS",
output_parser=CommaSeparatedListOutputParser(),
)
langchain_repr_list_output = LangChain(
chain=list_output_chain,
chain_config={
"callbacks": [
CustomCallbackHandler()
]
}
)
# Create a custom chain that outputs a single string keyword
# Demonstrates how to create a custom chain that outputs a single string keyword.
single_keyword_prompt = ChatPromptTemplate.from_template(
"Here is a list of documents: {DOCUMENTS}, output a single keyword in English that represents these documents."
)
single_keyword_chain = create_stuff_documents_chain(
llm=azure_chat_model,
prompt=single_keyword_prompt,
document_variable_name="DOCUMENTS",
)
langchain_repr_single_keyword = LangChain(
chain=single_keyword_chain,
chain_config={
"callbacks": [
CustomCallbackHandler()
]
}
)
representation_models = {
"DefaultPrompt": langchain_repr_default_prompt,
"CustomPrompt": langchain_repr_custom_prompt,
"StructuredOutput": langchain_repr_structured_output,
"ListOutput": langchain_repr_list_output,
"SingleKeyword": langchain_repr_single_keyword
}
# Create and fit the BERTopic model with multiple representations
topic_model = BERTopic(representation_model=representation_models)
topics, probabilities = topic_model.fit_transform(documents, embeddings)
topic_info = topic_model.get_topic_info()
def is_non_empty(value):
if value is None:
return False
if isinstance(value, float) and pd.isnull(value):
return False
if isinstance(value, (list, tuple, set, str)):
return len(value) > 0
return True
# Access topics from different representation models
for index, row in topic_info.iterrows():
topic_num = row['Topic']
print(f"\nTopic {topic_num}:")
for representation_name in representation_models.keys():
if representation_name in topic_info.columns:
value = row[representation_name]
if is_non_empty(value):
print(f"Representation '{representation_name}': {value}") And here are the formatted prompts and the resulting representations:
As you can see, this implementation should allow for a very flexible way to create more or less complex representations. |
Awesome, that would be great! |
Ran it with success (yay!) on our system, but got some weird results. Our data input is a set of transcripts of presentations on subsidence. Many of the labels were the types of 1 sentence, topic representation we would expect. And then some of them seem to include prompting information as well. Here is an example of a row that seems weird (I saved the model output saved as csv and I no longer have access to the generating model, though can re-run it): These columns are as expected: Name: And then there is this output in the 'DefaultPrompt' column:
Edit: Info on Model run . Text preparation:
LLM Model Representation:
BerTopic Topic Model:
|
Thank you for trying this out! 😄 I am to understand from your comment that the representation ran fine with the default prompt, but that some of the generated representations actually look like the prompt (like the example you shared)? If so, could it be an hallucination from the model (because it's not used to being prompted like this)? Is the data in the representation fake data or is it related to your documents? If this happens a lot, I can give you a piece of code that splits the examples into several messages instead of a single prompt with examples. This could improve things, but requires using a custom chain as it is not really compatible with the approach of providing a single string for a prompt in the basic usage. |
@mepearson To me, it actually seems like the stop token is not correctly initialized/chosen since it should have stopped after @Skar0 Could that be related to the PR? I'm guessing it likely relates to the underlying LLM+token settings but want to make sure. |
Full disclosure: I am running this within Bertopic because I'm still figuring out how to configure / piece together the LLM pipeline. I would be perfectly happy to run the Bertopic model and THEN run the outputs (and probably just say the top X topics at first as we will likely want to use those to guide the iterations on the model.) All of which means I don't actually understand where in the code the stop tokens are initialized, and why it would vary between the assorted runs. I have commented out the To answer the question - the sentences listed in the prompt section of the representation don't come from our data. I don't know if they are created whole cloth from the pattern or if they exist somewhere as example prompts. I'm back on the system today, so am re-running it using llama2:7b this time rather than mixtral. EDIT: I'll add I'm using the nltk.tokenize and |
So this may have something to do with how the prompts are formatted, or the integration with Ollama models? I ran it again but also did an OpenAI representation model at the same time, which gives the results I'm expecting. These are the top couple topics for this run. It seems the Default Prompt gives 3 topic names / keywords, but the first 2 are repeated for each entry. EDIT: looking at the langchain_rep.py I copied in order to try and use this, the items in 1 and 2 come straight from there, so I'm guessing there's some integration piece I'm missing?
|
Typically, stop tokens are created when you either initialize the model or when you run a given generation. I'm not sure though where that's the case for LangChain. This might give you an idea how to do that when you load in the model.
Thanks for sharing this but I'm not entirely sure what I'm looking at. Is this the output of one topic or perhaps more topics? Or is this the prompt for each topic? Could you describe what all these sections mean? |
Sorry for confusion - that was just a couple columns of output from rows 1:5 of the So it's the output from the 'Representation' , 'OpenAI' and first item in the 'DefaultPrompt' list. |
It may also be that I need to grab more code from the branch. As of now I'm just using the langhcain_rep.py file:
|
@mepearson I might be missing something here, but in the examples you showed the |
@mepearson Also, do you perhaps have a reproducible example? That would make it much easier to see what is happening. |
@MaartenGr - yes, OpenAI is working great and is what I expected to get. It's when I use our internal Ollama models with the new langchain connection that things get wonky. While I think the transcripts I was using are from public videos, they aren't mine so I don't want to share them. But I'll rerun the model with a news dataset or something and share that. |
I think that in the provided examples of output, several representations were configured: one with OpenAI (I guess the existing OpenAI representation from BERTopic) and one with the representation from this PR using a model through LangChain's Ollama integration. It does indeed seem like the OpenAI representation yields good labels while the LangChain representation sometimes yields a label that looks like a prompt, or that contains garbage (like the model saying "It seems like you have provided a list of topics ..."). To be fair, this seems to me like a problem with the LLM that you are using or the LangChain integration with Ollama rather than an issue with the implementation of the LangChain integration. I think that this can be confirmed by using the OpenAI model in the LangChain representation instead of using the Ollama model - unless this is already what is meant by your
I wonder if it's related to how the messages are passed to your Ollama model by LangChain. At the moment, with the default prompt that I set up as a single string in the LangChain representation, a single LangChain message will be provided to the model. That messages contains formatted examples which may confuse the model. Could you try using the LangChain representation with another prompt, or with the same prompt formatted as a list of messages (to split prompt and completion in examples)? I provide examples on how to do this below. from bertopic import BERTopic
from bertopic.representation import LangChain
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_ollama.chat_models import ChatOllama
# Your data preparation ...
# Setting-up your connection to the Ollama model
ollama_chat_model = ChatOllama(...)
# Create a LangChain representation with a custom prompt that **does not contain examples**
custom_prompt = """I have a topic that contains the following documents:
[DOCUMENTS]
The topic is described by the following keywords: [KEYWORDS]
Based on the information above, extract a short topic label in the following format:
topic: <topic label>
"""
langchain_repr_custom_prompt = LangChain(
llm=ollama_chat_model,
prompt=custom_prompt,
chain_config={
"callbacks": [
# CustomCallbackHandler()
]
}
)
# Create a LangChain representation with the same prompt as the hard-coded one used in the basic usage,
# **where examples are split into several messages**
split_examples_prompt = ChatPromptTemplate.from_messages([
("system",
"This is a list of texts where each collection of texts describes a topic. "
"After each collection of texts, the name of the topic they represent is mentioned as a short, highly descriptive title."),
("human",
"Topic:\n"
"Sample texts from this topic:\n"
"Traditional diets in most cultures were primarily plant-based with a little meat on top, but with the rise of industrial-style meat production and factory farming, meat has become a staple food.\n"
"Meat, but especially beef, is the worst food in terms of emissions.\n"
"Eating meat doesn't make you a bad person, not eating meat doesn't make you a good one.\n"
"Keywords: meat, beef, eat, eating, emissions, steak, food, health, processed, chicken\n"
"Topic name:"),
("ai", "Environmental impacts of eating meat"),
("human",
"Topic:\n"
"Sample texts from this topic:\n"
"I have ordered the product weeks ago but it still has not arrived!\n"
"The website mentions that it only takes a couple of days to deliver but I still have not received mine.\n"
"I got a message stating that I received the monitor but that is not true!\n"
"It took a month longer to deliver than was advised...\n"
"Keywords: deliver, weeks, product, shipping, long, delivery, received, arrived, arrive, week\n"
"Topic name:"),
("ai", "Shipping and delivery issues"),
("human",
"Topic:\n"
"Sample texts from this topic:\n"
"{DOCUMENTS}\n"
"Keywords: {KEYWORDS}\n"
"Topic name:")
])
split_examples_chain = create_stuff_documents_chain(
llm=ollama_chat_model,
prompt=split_examples_prompt,
document_variable_name="DOCUMENTS",
)
langchain_repr_split_examples = LangChain(
chain=split_examples_chain,
chain_config={
"callbacks": [
# CustomCallbackHandler()
]
}
)
representation_models = {
"prompt-no-examples": langchain_repr_custom_prompt,
"custom-chain-split-examples": langchain_repr_split_examples,
}
# Rest of your BERTopic setup ... |
Hello, Just checking in to see if anyone else has had a chance to test the basic and advanced usage with a non-OpenAI LLM?
The latter option would require a significant redesign, as it's less compatible with the current "string prompt with BERTopic-specific placeholders" approach. Please let me know if there’s anything I can do to assist or clarify further. I'd love to ensure this PR meets everyone's needs and doesn't stall unnecessarily. Looking forward to your feedback! Best, |
@Skar0 I don't have any additional comments. I haven't had the change to try out Ollama, but for me a Ollama is working quite well with the Other than that, if this works for @mepearson, I'm happy to merge it! |
What does this PR do?
WIP, fixes #2187