KeyError: 'topics_from' #2100

KeeratKG · 2024-07-26T20:48:37Z

Have you searched existing issues? 🔎

I have searched and found no existing issues

Desribe the bug

When trying to run topics, probs = TM.fit_transform(docs) where docs is a list of strings (we want to cluster topics based on these strings), I run into the following error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[10], line 1
----> 1 topics, probs = TM.fit_transform(docs)

File /usr/local/lib/python3.9/site-packages/bertopic/_bertopic.py:496, in BERTopic.fit_transform(self, documents, embeddings, images, y)
    494 # Reduce topics
    495 if self.nr_topics:
--> 496     documents = self._reduce_topics(documents)
    498 # Save the top 3 most representative documents per topic
    499 self._save_representative_docs(documents)

File /usr/local/lib/python3.9/site-packages/bertopic/_bertopic.py:4347, in BERTopic._reduce_topics(self, documents, use_ctfidf)
   4345         documents = self._reduce_to_n_topics(documents, use_ctfidf)
   4346 elif isinstance(self.nr_topics, str):
-> 4347     documents = self._auto_reduce_topics(documents, use_ctfidf)
   4348 else:
   4349     raise ValueError("nr_topics needs to be an int or 'auto'! ")

File /usr/local/lib/python3.9/site-packages/bertopic/_bertopic.py:4502, in BERTopic._auto_reduce_topics(self, documents, use_ctfidf)
   4500 self.topic_mapper_.add_mappings(mapped_topics)
   4501 documents = self._sort_mappings_by_frequency(documents)
-> 4502 self._extract_topics(documents, mappings=mappings)
   4503 self._update_topic_size(documents)
   4504 return documents

File /usr/local/lib/python3.9/site-packages/bertopic/_bertopic.py:3985, in BERTopic._extract_topics(self, documents, embeddings, mappings, verbose)
   3983 self.c_tf_idf_, words = self._c_tf_idf(documents_per_topic)
   3984 self.topic_representations_ = self._extract_words_per_topic(words, documents)
-> 3985 self._create_topic_vectors(documents=documents, embeddings=embeddings, mappings=mappings)
   3986 if verbose:
   3987     logger.info("Representation - Completed \u2713")

File /usr/local/lib/python3.9/site-packages/bertopic/_bertopic.py:4121, in BERTopic._create_topic_vectors(self, documents, embeddings, mappings)
   4119 topic_embeddings_dict = {}
   4120 for topic_to, topics_from in mappings.items():
-> 4121     topic_ids = topics_from["topics_from"]
   4122     topic_sizes = topics_from["topic_sizes"]
   4123     if topic_ids:

KeyError: 'topics_from'

This happens after the following steps of training have already taken place:

2024-07-26 18:43:39,195 - BERTopic - Embedding - Transforming documents to embeddings.
Error displaying widget: model not found
2024-07-26 18:43:55,125 - BERTopic - Embedding - Completed ✓
2024-07-26 18:43:55,126 - BERTopic - Dimensionality - Fitting the dimensionality reduction algorithm
2024-07-26 18:44:21,848 - BERTopic - Dimensionality - Completed ✓
2024-07-26 18:44:21,849 - BERTopic - Cluster - Start clustering the reduced embeddings
2024-07-26 18:44:40,617 - BERTopic - Cluster - Completed ✓
2024-07-26 18:44:40,618 - BERTopic - Representation - Extracting topics from clusters using representation models.
2024-07-26 18:45:04,160 - BERTopic - Representation - Completed ✓
2024-07-26 18:45:04,171 - BERTopic - Topic reduction - Reducing number of topics

Reproduction

from bertopic import BERTopic
from sentence_transformers import SentenceTransformer
from sklearn.feature_extraction.text import CountVectorizer
from bertopic.vectorizers import ClassTfidfTransformer
from bertopic.representation import MaximalMarginalRelevance
from umap import UMAP
from hdbscan import HDBSCAN
import numpy as np
import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer 
from nltk.tokenize import word_tokenize
from collections import Counter

class LemmaTokenizer:
    def __init__(self):
        self.wnl = WordNetLemmatizer()
    def __call__(self, doc):
        return [self.wnl.lemmatize(t) for t in word_tokenize(doc)]

stopwords = list(stopwords.words('english'))


SENT_EMBEDDING = SentenceTransformer('all-MiniLM-L6-v2')
UMAP_MODEL = UMAP(n_neighbors=15, n_components=3, min_dist=0.05)
HDBSCAN_MODEL = HDBSCAN(min_cluster_size=15, prediction_data=True, gen_min_span_tree=True)
VECTORIZE_MODEL = CountVectorizer(ngram_range=(1,3), stop_words=stopwords, tokenizer=LemmaTokenizer())
ctfidf_model = ClassTfidfTransformer()
representation_model = MaximalMarginalRelevance(diversity=0.2)

TM = BERTopic(
umap_model=UMAP_MODEL,
hdbscan_model=HDBSCAN_MODEL,
embedding_model=SENT_EMBEDDING,
vectorizer_model=VECTORIZE_MODEL,
ctfidf_model=ctfidf_model,
representation_model=representation_model,
language='english',
calculate_probabilities=True,
verbose=True,
nr_topics = 'auto')

docs = ["The weather today is amazing", "It is quite unbearably hot today", "Oh this ice cream looks lovely", "Where are you?", "How are you?"] ## sample only 

topics, probs = TM.fit_transform(docs)

BERTopic Version

0.16.13

The text was updated successfully, but these errors were encountered:

lichenzhen · 2024-07-27T21:13:30Z

I'm running into the same issue. The codes were working three weeks ago.

MaartenGr · 2024-07-28T09:03:57Z

I just created a PR that should resolve this issue, could you test whether it works for you? If so, I will go ahead and create a new release (0.16.4) since this affects the core functionality of BERTopic.

abhinavkulkarni · 2024-07-28T14:23:50Z

This doesn't solve the problem for me. I did install from the branch: pip install git+https://github.com/MaartenGr/BERTopic.git@fix_2100.

I'm training the model the following way:

from bertopic import BERTopic
from cuml.cluster import HDBSCAN
from cuml.manifold import UMAP

# Create instances of GPU-accelerated UMAP and HDBSCAN
umap_model = UMAP(n_components=5, n_neighbors=15, min_dist=0.0)
hdbscan_model = HDBSCAN(min_samples=10, gen_min_span_tree=True, prediction_data=True)

# Pass the above models to be used in BERTopic
topic_model = BERTopic(umap_model=umap_model, hdbscan_model=hdbscan_model, nr_topics="auto")
topic_model = topic_model.fit(docs, embeds)
path = Path(f"{save_dir}/model.bin")
topic_model.save(path.as_posix(), serialization="pickle")

I get the following error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[11], line 1
----> 1 topic_model = train_model()

Cell In[10], line 30
     28 # Pass the above models to be used in BERTopic
     29 topic_model = BERTopic(umap_model=umap_model, hdbscan_model=hdbscan_model, nr_topics="auto")
---> 30 topic_model = topic_model.fit(docs, embeds)
     31 path = Path(f"{save_dir}/model.bin")
     32 topic_model.save(path.as_posix(), serialization="pickle")

File ~/miniconda3/envs/python=3.10/lib/python3.10/site-packages/bertopic/_bertopic.py:364, in BERTopic.fit(self, documents, embeddings, images, y)
    322 def fit(
    323     self,
    324     documents: List[str],
   (...)
    327     y: Union[List[int], np.ndarray] = None,
    328 ):
    329     """Fit the models (Bert, UMAP, and, HDBSCAN) on a collection of documents and generate topics.
    330 
    331     Arguments:
   (...)
    362     ```
    363     """
--> 364     self.fit_transform(documents=documents, embeddings=embeddings, y=y, images=images)
    365     return self

File ~/miniconda3/envs/python=3.10/lib/python3.10/site-packages/bertopic/_bertopic.py:496, in BERTopic.fit_transform(self, documents, embeddings, images, y)
    494 # Reduce topics
    495 if self.nr_topics:
--> 496     documents = self._reduce_topics(documents)
    498 # Save the top 3 most representative documents per topic
    499 self._save_representative_docs(documents)

File ~/miniconda3/envs/python=3.10/lib/python3.10/site-packages/bertopic/_bertopic.py:4347, in BERTopic._reduce_topics(self, documents, use_ctfidf)
   4345         documents = self._reduce_to_n_topics(documents, use_ctfidf)
   4346 elif isinstance(self.nr_topics, str):
-> 4347     documents = self._auto_reduce_topics(documents, use_ctfidf)
   4348 else:
   4349     raise ValueError("nr_topics needs to be an int or 'auto'! ")

File ~/miniconda3/envs/python=3.10/lib/python3.10/site-packages/bertopic/_bertopic.py:4502, in BERTopic._auto_reduce_topics(self, documents, use_ctfidf)
   4500 self.topic_mapper_.add_mappings(mapped_topics)
   4501 documents = self._sort_mappings_by_frequency(documents)
-> 4502 self._extract_topics(documents, mappings=mappings)
   4503 self._update_topic_size(documents)
   4504 return documents

File ~/miniconda3/envs/python=3.10/lib/python3.10/site-packages/bertopic/_bertopic.py:3985, in BERTopic._extract_topics(self, documents, embeddings, mappings, verbose)
   3983 self.c_tf_idf_, words = self._c_tf_idf(documents_per_topic)
   3984 self.topic_representations_ = self._extract_words_per_topic(words, documents)
-> 3985 self._create_topic_vectors(documents=documents, embeddings=embeddings, mappings=mappings)
   3986 if verbose:
   3987     logger.info("Representation - Completed \u2713")

File ~/miniconda3/envs/python=3.10/lib/python3.10/site-packages/bertopic/_bertopic.py:4121, in BERTopic._create_topic_vectors(self, documents, embeddings, mappings)
   4119 topic_embeddings_dict = {}
   4120 for topic_to, topics_from in mappings.items():
-> 4121     topic_ids = topics_from["topics_from"]
   4122     topic_sizes = topics_from["topic_sizes"]
   4123     if topic_ids:

KeyError: 'topics_from'

ellenlnt · 2024-07-28T20:40:46Z

The fix did not work for me either unfortunately!

KlausikPL · 2024-07-29T18:49:19Z

I have the same problem using the number of topics= auto

MaartenGr · 2024-07-30T12:53:35Z

Does anybody have a fully reproducible example (data included)? I ask because when I run the following after installing the fix from the related PR, I get no errors:

from sentence_transformers import SentenceTransformer
from datasets import load_dataset
from bertopic import BERTopic
from hdbscan import HDBSCAN
from umap import UMAP

# Extract abstracts to train on and corresponding titles
dataset = load_dataset("CShorten/ML-ArXiv-Papers")["train"]
abstracts = dataset["abstract"][:10_000]

# Pre-calculate embeddings
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = embedding_model.encode(abstracts, show_progress_bar=True)

# Use sub-models
umap_model = UMAP(n_components=5, n_neighbors=15, min_dist=0.0, random_state=42)
hdbscan_model = HDBSCAN(min_samples=5, gen_min_span_tree=True, prediction_data=True)

# Pass the above models to be used in BERTopic
topic_model = BERTopic(
    umap_model=umap_model, 
    hdbscan_model=hdbscan_model, 
    nr_topics="auto",
    verbose=True
)
topic_model = topic_model.fit(abstracts, embeddings)

jlee9095 · 2024-07-30T15:10:53Z

Dear MaartenGr, thank you for sharing the codes. Unfortunately, it does not work for the case when using a pipeline to run BERTopic for non-English text data.

To be specific, now I have the same problem (KeyError: 'topics_from') whenever trying to use the BERTopic commands. The commands worked well several weeks ago, but I don't know why it does not work now..
Since my data is not written in English, I am using a pipeline for my pre-trained model, as shown below.

"from transformers.pipelines import pipeline

pretrained_model = pipeline("feature-extraction", model="beomi/kcbert-base")"

In this case, the suggested commands did not work. If I copied the suggested commands and implemented them in my Python (in other words, if I try not to use my original pipeline but to use 'SentenceTransformer("all-MiniLM-L6-v2")', then the error appears like below.

ValueError Traceback (most recent call last)
Input In [24], in <cell line: 7>()
1 topic_model = BERTopic(
2 umap_model=umap_model,
3 hdbscan_model=hdbscan_model,
4 nr_topics="auto",
5 verbose=True
6 )
----> 7 topic_model = topic_model.fit(documents, embeddings)

File ~\anaconda3\lib\site-packages\bertopic_bertopic.py:364, in BERTopic.fit(self, documents, embeddings, images, y)
322 def fit(
323 self,
324 documents: List[str],
(...)
327 y: Union[List[int], np.ndarray] = None,
328 ):
329 """Fit the models (Bert, UMAP, and, HDBSCAN) on a collection of documents and generate topics.
330
331 Arguments:
(...)
362 ```
363 """
--> 364 self.fit_transform(documents=documents, embeddings=embeddings, y=y, images=images)
365 return self

File ~\anaconda3\lib\site-packages\bertopic_bertopic.py:492, in BERTopic.fit_transform(self, documents, embeddings, images, y)
489 self._save_representative_docs(custom_documents)
490 else:
491 # Extract topics by calculating c-TF-IDF
--> 492 self._extract_topics(documents, embeddings=embeddings, verbose=self.verbose)
494 # Reduce topics
495 if self.nr_topics:

File ~\anaconda3\lib\site-packages\bertopic_bertopic.py:3983, in BERTopic.extract_topics(self, documents, embeddings, mappings, verbose)
3981 logger.info("Representation - Extracting topics from clusters using representation models.")
3982 documents_per_topic = documents.groupby(["Topic"], as_index=False).agg({"Document": " ".join})
-> 3983 self.c_tf_idf, words = self.c_tf_idf(documents_per_topic)
3984 self.topic_representations = self._extract_words_per_topic(words, documents)
3985 self._create_topic_vectors(documents=documents, embeddings=embeddings, mappings=mappings)

File ~\anaconda3\lib\site-packages\bertopic_bertopic.py:4194, in BERTopic._c_tf_idf(self, documents_per_topic, fit, partial_fit)
4192 X = self.vectorizer_model.partial_fit(documents).update_bow(documents)
4193 elif fit:
-> 4194 X = self.vectorizer_model.fit_transform(documents)
4195 else:
4196 X = self.vectorizer_model.transform(documents)

File ~\anaconda3\lib\site-packages\sklearn\feature_extraction\text.py:1330, in CountVectorizer.fit_transform(self, raw_documents, y)
1322 warnings.warn(
1323 "Upper case characters found in"
1324 " vocabulary while 'lowercase'"
1325 " is True. These entries will not"
1326 " be matched with any documents"
1327 )
1328 break
-> 1330 vocabulary, X = self.count_vocab(raw_documents, self.fixed_vocabulary)
1332 if self.binary:
1333 X.data.fill(1)

File ~\anaconda3\lib\site-packages\sklearn\feature_extraction\text.py:1220, in CountVectorizer._count_vocab(self, raw_documents, fixed_vocab)
1218 vocabulary = dict(vocabulary)
1219 if not vocabulary:
-> 1220 raise ValueError(
1221 "empty vocabulary; perhaps the documents only contain stop words"
1222 )
1224 if indptr[-1] > np.iinfo(np.int32).max: # = 2**31 - 1
1225 if _IS_32BIT:

ValueError: empty vocabulary; perhaps the documents only contain stop words

What should I do to solve this problem? T.T (Please understand that I cannot upload the data... But still the KeyError appears... Please help...)

MaartenGr · 2024-07-30T15:43:00Z

@jlee9095 I'm a bit confused. Are you saying that you have two separate issues? Because you mentioned that running the code I provided did not work for you. Could you share your full code to showcase both issues? Also, I'm not able to reproduce the issue so if you can reproduce the issue with dummy data (like the data I shared), I can easier figure out what is wrong.

KeeratKG · 2024-07-30T16:03:25Z

@MaartenGr the fix #2101 works for me, thank you!
Happy to leave this issue open if y'all want to discuss more.

I just created a PR that should resolve this issue, could you test whether it works for you? If so, I will go ahead and create a new release (0.16.4) since this affects the core functionality of BERTopic.

Yes please.

jlee9095 · 2024-07-30T16:38:40Z

@MaartenGr Thank you for your response. Yes, I have two separate issues. The errors that I uploaded above appear whenever I try to run your suggested commands as they are (that is, when using 'Sentence Transformer'). As an alternative, if I try to use my original pipeline from hugging face, then the error appears when running the 'embeddings = embedding_model.encode(documents, show_progress_bar=True)' command. Below are the commands and the errors for the second case.

(Commands for the case using the pipeline from hugging face)
import pandas as pd

docu = pd.read_csv('C:/Users/BERTopic/after_preprocessing.csv', engine='python')
len(docu)

documents = docu['text'].to_list()

from sentence_transformers import SentenceTransformer
from bertopic import BERTopic
from hdbscan import HDBSCAN
from umap import UMAP

from transformers.pipelines import pipeline

pretrained_model = pipeline("feature-extraction", model="beomi/kcbert-base")

embedding_model = pretrained_model
embeddings = embedding_model.encode(documents, show_progress_bar=True)

(Then, the below error appears)

AttributeError Traceback (most recent call last)
Input In [20], in <cell line: 2>()
1 embedding_model = pretrained_model
----> 2 embeddings = embedding_model.encode(documents, show_progress_bar=True)

AttributeError: 'FeatureExtractionPipeline' object has no attribute 'encode'

I am sorry that I am troubling to find a good example data, but I'll do my best to figure it out as well.

jlee9095 · 2024-07-30T23:53:49Z

@MaartenGr Hi, here are two cases that I tested using the example data.

[Case 1. Commands]

from sentence_transformers import SentenceTransformer
from datasets import load_dataset
from bertopic import BERTopic
from hdbscan import HDBSCAN
from umap import UMAP

dataset = load_dataset('klue','sts')["train"]
abstracts = dataset['sentence1'][:1000]

embedding_model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = embedding_model.encode(abstracts, show_progress_bar=True)

umap_model = UMAP(n_components=5, n_neighbors=15, min_dist=0.0, random_state=42)
hdbscan_model = HDBSCAN(min_samples=5, gen_min_span_tree=True, prediction_data=True)

topic_model = BERTopic(
umap_model=umap_model,
hdbscan_model=hdbscan_model,
nr_topics="auto",
verbose=True
)
topic_model = topic_model.fit(abstracts, embeddings)

--------------------------------------------------------------------------- Then, I got the error like below.

KeyError Traceback (most recent call last)
Input In [7], in <cell line: 26>()
19 # Pass the above models to be used in BERTopic
20 topic_model = BERTopic(
21 umap_model=umap_model,
22 hdbscan_model=hdbscan_model,
23 nr_topics="auto",
24 verbose=True
25 )
---> 26 topic_model = topic_model.fit(abstracts, embeddings)

File ~\anaconda3\lib\site-packages\bertopic_bertopic.py:364, in BERTopic.fit(self, documents, embeddings, images, y)
322 def fit(
323 self,
324 documents: List[str],
(...)
327 y: Union[List[int], np.ndarray] = None,
328 ):
329 """Fit the models (Bert, UMAP, and, HDBSCAN) on a collection of documents and generate topics.
330
331 Arguments:
(...)
362 ```
363 """
--> 364 self.fit_transform(documents=documents, embeddings=embeddings, y=y, images=images)
365 return self

File ~\anaconda3\lib\site-packages\bertopic_bertopic.py:496, in BERTopic.fit_transform(self, documents, embeddings, images, y)
494 # Reduce topics
495 if self.nr_topics:
--> 496 documents = self._reduce_topics(documents)
498 # Save the top 3 most representative documents per topic
499 self._save_representative_docs(documents)

File ~\anaconda3\lib\site-packages\bertopic_bertopic.py:4347, in BERTopic._reduce_topics(self, documents, use_ctfidf)
4345 documents = self._reduce_to_n_topics(documents, use_ctfidf)
4346 elif isinstance(self.nr_topics, str):
-> 4347 documents = self._auto_reduce_topics(documents, use_ctfidf)
4348 else:
4349 raise ValueError("nr_topics needs to be an int or 'auto'! ")

File ~\anaconda3\lib\site-packages\bertopic_bertopic.py:4502, in BERTopic.auto_reduce_topics(self, documents, use_ctfidf)
4500 self.topic_mapper.add_mappings(mapped_topics)
4501 documents = self._sort_mappings_by_frequency(documents)
-> 4502 self._extract_topics(documents, mappings=mappings)
4503 self._update_topic_size(documents)
4504 return documents

File ~\anaconda3\lib\site-packages\bertopic_bertopic.py:3985, in BERTopic.extract_topics(self, documents, embeddings, mappings, verbose)
3983 self.c_tf_idf, words = self.c_tf_idf(documents_per_topic)
3984 self.topic_representations = self._extract_words_per_topic(words, documents)
-> 3985 self._create_topic_vectors(documents=documents, embeddings=embeddings, mappings=mappings)
3986 if verbose:
3987 logger.info("Representation - Completed \u2713")

File ~\anaconda3\lib\site-packages\bertopic_bertopic.py:4121, in BERTopic._create_topic_vectors(self, documents, embeddings, mappings)
4119 topic_embeddings_dict = {}
4120 for topic_to, topics_from in mappings.items():
-> 4121 topic_ids = topics_from["topics_from"]
4122 topic_sizes = topics_from["topic_sizes"]
4123 if topic_ids:

KeyError: 'topics_from'

[Case 2. Commands]

from sentence_transformers import SentenceTransformer
from datasets import load_dataset
from bertopic import BERTopic
from hdbscan import HDBSCAN
from umap import UMAP

dataset = load_dataset('klue','sts')["train"]
abstracts = dataset['sentence1'][:1000]

from transformers.pipelines import pipeline

pretrained_model = pipeline("feature-extraction", model="beomi/kcbert-base")

embedding_model = pretrained_model
embeddings = embedding_model.encode(abstracts, show_progress_bar=True)

umap_model = UMAP(n_components=5, n_neighbors=15, min_dist=0.0, random_state=42)
hdbscan_model = HDBSCAN(min_samples=5, gen_min_span_tree=True, prediction_data=True)

topic_model = BERTopic(
umap_model=umap_model,
hdbscan_model=hdbscan_model,
nr_topics="auto",
verbose=True
)
topic_model = topic_model.fit(abstracts, embeddings)

------------------------------------------------------------------------- Then, I got the error like below.

AttributeError Traceback (most recent call last)
Input In [14], in <cell line: 17>()
14 pretrained_model = pipeline("feature-extraction", model="beomi/kcbert-base")
16 embedding_model = pretrained_model
---> 17 embeddings = embedding_model.encode(abstracts, show_progress_bar=True)
19 # Use sub-models
20 umap_model = UMAP(n_components=5, n_neighbors=15, min_dist=0.0, random_state=42)

AttributeError: 'FeatureExtractionPipeline' object has no attribute 'encode'

How can I solve this problem..? All your help will be greatly appreciated...

MaartenGr · 2024-07-31T07:01:24Z

@jlee9095 The second example does not seem related to this particular issue. Generally, I would advise opening up a new issue for that but it seems that you are using the encode function which is not supported for a Hugging Face pipeline. Please refer to the pipeline documentation of HF on how to extract embeddings.

With respect to your first problem, it seems that the PR I linked resolves the problem. Make sure that when you install that PR, that you are certain the PR is properly installed and that you are not using the official release.

WJG100 · 2024-07-31T08:27:42Z

For the error “[KeyError: 'topics_from']”，I download the lower edition 0.16.0 and solve this problem successfully.

smbslt3 · 2024-08-11T02:26:15Z

When I set the nr_topics="auto" parameter, I encounter the following error:

topic_model = BERTopic(
    embedding_model=sentence_model,
    vectorizer_model=vectorizer_model,
    # min_topic_size = 100,   # Split sentences "All"
    nr_topics="auto",  # Automatically detect the number of topics
    # nr_topics = 10, #40,   # Limit the total number of topics
    top_n_words=10,   # Use the top n words
    calculate_probabilities=True,
    umap_model=umap_model,  # Fix UMAP random state
    hdbscan_model=hdbscan_model  # Set HDBSCAN model
)

When I comment out the line nr_topics="auto", the error does not occur. However, when I set this parameter to 'auto', I get a KeyError: 'topics_from'. When set nr_topics=10 the code run properly.

MaartenGr · 2024-08-11T06:30:42Z

@smbslt3 Have you tried the PR that I shared above? In my experience, it should fix the issue.

Izaac-Thomas · 2024-08-13T16:38:12Z

@MaartenGr Hi Maarten! I can't speak on behalf of @smbslt3 but I was experiencing the same issue and the changes to bertopic.py in #2101 fixed the issue for me.

It may also be worth noting to anybody that is still facing this issue that if you installed this library through pip and are trying to update by doing something along the lines of pip install git+https://github.com/MaartenGr/BERTopic.git@fix_2100 like @abhinavkulkarni was, this didn't actually update any code for me and I had to manually change the few lines of code in my local site-packages folder in my Anaconda environment.

Once this change is included in an official release (0.16.4) I'd assume that simply running pip install bertopic==0.16.4 will fix the issue for anyone using pip and still experiencing this issue.

Yif18 · 2024-08-15T08:47:21Z

I'm having the same issue, KeyError: 'topics_from', my workaround is pip install bertopic==0.16.2.
It can be seen that there is a problem with the new version 0.16.3, and I hope to fix it in the next version.

MaartenGr · 2024-08-15T09:12:46Z

To everyone facing this issue, make sure you do not have BERTopic installed before you run pip install git+https://github.com/MaartenGr/BERTopic.git@fix_2100. This should install the related PR (#2101) and solve the issue.

Based on this thread, I can confirm that if the PR is correctly installed, it should solve the issue. I intend to release a new version whenever #2105 is also merged into the main branch.

kungmo · 2024-09-17T02:58:39Z

To everyone facing this issue, make sure you do not have BERTopic installed before you run pip install git+https://github.com/MaartenGr/BERTopic.git@fix_2100. This should install the related PR (#2101) and solve the issue.

Based on this thread, I can confirm that if the PR is correctly installed, it should solve the issue. I intend to release a new version whenever #2105 is also merged into the main branch.

I also have same issue. Due to your help, I can fix this problem. Thank you. I hope this bug be solved in 0.16.4 version.

KeeratKG added the bug Something isn't working label Jul 26, 2024

MaartenGr added a commit that referenced this issue Jul 28, 2024

Fix #2100

db54d8c

MaartenGr mentioned this issue Jul 28, 2024

Fix #2100 #2101

Merged

5 tasks

MaartenGr added a commit that referenced this issue Aug 21, 2024

Fix #2100 (#2101)

74bbc4f

MaartenGr closed this as completed in #2101 Aug 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeyError: 'topics_from' #2100

KeyError: 'topics_from' #2100

KeeratKG commented Jul 26, 2024

lichenzhen commented Jul 27, 2024

MaartenGr commented Jul 28, 2024

abhinavkulkarni commented Jul 28, 2024 •

edited

Loading

ellenlnt commented Jul 28, 2024

KlausikPL commented Jul 29, 2024

MaartenGr commented Jul 30, 2024

jlee9095 commented Jul 30, 2024

MaartenGr commented Jul 30, 2024

KeeratKG commented Jul 30, 2024 •

edited

Loading

jlee9095 commented Jul 30, 2024

jlee9095 commented Jul 30, 2024 •

edited

Loading

MaartenGr commented Jul 31, 2024

WJG100 commented Jul 31, 2024

smbslt3 commented Aug 11, 2024

MaartenGr commented Aug 11, 2024

Izaac-Thomas commented Aug 13, 2024

Yif18 commented Aug 15, 2024

MaartenGr commented Aug 15, 2024

kungmo commented Sep 17, 2024

KeyError: 'topics_from' #2100

KeyError: 'topics_from' #2100

Comments

KeeratKG commented Jul 26, 2024

Have you searched existing issues? 🔎

Desribe the bug

Reproduction

BERTopic Version

lichenzhen commented Jul 27, 2024

MaartenGr commented Jul 28, 2024

abhinavkulkarni commented Jul 28, 2024 • edited Loading

ellenlnt commented Jul 28, 2024

KlausikPL commented Jul 29, 2024

MaartenGr commented Jul 30, 2024

jlee9095 commented Jul 30, 2024

MaartenGr commented Jul 30, 2024

KeeratKG commented Jul 30, 2024 • edited Loading

jlee9095 commented Jul 30, 2024

(Then, the below error appears)

AttributeError: 'FeatureExtractionPipeline' object has no attribute 'encode'

jlee9095 commented Jul 30, 2024 • edited Loading

MaartenGr commented Jul 31, 2024

WJG100 commented Jul 31, 2024

smbslt3 commented Aug 11, 2024

MaartenGr commented Aug 11, 2024

Izaac-Thomas commented Aug 13, 2024

Yif18 commented Aug 15, 2024

MaartenGr commented Aug 15, 2024

kungmo commented Sep 17, 2024

abhinavkulkarni commented Jul 28, 2024 •

edited

Loading

KeeratKG commented Jul 30, 2024 •

edited

Loading

jlee9095 commented Jul 30, 2024 •

edited

Loading