Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fit_transform tries to access embedding_model if representation_model is not None #2189

Open
1 task done
rasantangelo opened this issue Oct 17, 2024 · 1 comment
Open
1 task done
Labels
bug Something isn't working

Comments

@rasantangelo
Copy link

Have you searched existing issues? 🔎

  • I have searched and found no existing issues

Desribe the bug

I was using BERTopic on a cluster of queries with my own embeddings (computed on a model that is hard to pass as a parameter) and it was working as expected.
After trying to use representation_model = KeyBERTInspired() and adding representation_model=representation_model to BERTopic as a parameter. I got this error :

AttributeError                            Traceback (most recent call last)
      [1] representation_model = KeyBERTInspired()
      [2] topic_model = BERTopic(
      [3]     calculate_probabilities=True,
      [4]     min_topic_size=1
      [5]     embedding_model=None,
      [6]     representation_model=representation_model,
      [7] )
----> [8] topics, probs = topic_model.fit_transform(corpus, np.array(corpus_embeddings))
      [9] topic_model.get_topic_info()

File ~/query_analysis/bertopic_env/lib/python3.11/site-packages/bertopic/_bertopic.py:493, in BERTopic.fit_transform(self, documents, embeddings, images, y)
    [490]     self._save_representative_docs(custom_documents)
    [491] else:
    [492]     # Extract topics by calculating c-TF-IDF
--> [493]     self._extract_topics(documents, embeddings=embeddings, verbose=self.verbose)
    [495]     # Reduce topics
    [496]     if self.nr_topics:

File ~/query_analysis/bertopic_env/lib/python3.11/site-packages/bertopic/_bertopic.py:3991, in BERTopic._extract_topics(self, documents, embeddings, mappings, verbose)
   [3989] documents_per_topic = documents.groupby(["Topic"], as_index=False).agg({"Document": " ".join})
   [3990] self.c_tf_idf_, words = self._c_tf_idf(documents_per_topic)
-> [3991] self.topic_representations_ = self._extract_words_per_topic(words, documents)
...
   [3680]        "Make sure to use an embedding model that can either embed documents"
   [3681]        "or images depending on which you want to embed."
   [3682]    

AttributeError: 'NoneType' object has no attribute 'embed_documents'

Reproduction

from query import Query
import json 
import numpy as np
from sklearn.cluster import DBSCAN
from bertopic import BERTopic
from openai import OpenAI 
from bertopic.representation import KeyBERTInspired

representation_model = KeyBERTInspired()
topic_model = BERTopic(
    calculate_probabilities=True,
    min_topic_size=15,
    embedding_model=None,
    representation_model=representation_model,
)
topics, probs = topic_model.fit_transform(corpus, np.array(corpus_embeddings))
topic_model.get_topic_info()

BERTopic Version

0.16.4

@rasantangelo rasantangelo added the bug Something isn't working label Oct 17, 2024
@MaartenGr
Copy link
Owner

You will need to set embedding_model to an embedding model since KeyBERTInspired actually needs to create word embedings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants