Replies: 1 comment
-
It's not clear from your code but are you running this after saving and loading the model? Also, are the |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
when i use topics_over_time(),it comes to the error
topics_over_time = ab_topic_model.topics_over_time(ab_list, time_set)
ValueError Traceback (most recent call last)
Cell In[4], line 1
----> 1 topics_over_time = ab_topic_model.topics_over_time(ab_list, time_set)
File d:\anaconda3\envs\bertopic\Lib\site-packages\bertopic_bertopic.py:799, in BERTopic.topics_over_time(self, docs, timestamps, topics, nr_bins, datetime_format, evolution_tuning, global_tuning)
796 selection = documents.loc[documents.Timestamps == timestamp, :]
797 documents_per_topic = selection.groupby(['Topic'], as_index=False).agg({'Document': ' '.join,
798 "Timestamps": "count"})
--> 799 c_tf_idf, words = self._c_tf_idf(documents_per_topic, fit=False)
801 if global_tuning or evolution_tuning:
802 c_tf_idf = normalize(c_tf_idf, axis=1, norm='l1', copy=False)
File d:\anaconda3\envs\bertopic\Lib\site-packages\bertopic_bertopic.py:3861, in BERTopic._c_tf_idf(self, documents_per_topic, fit, partial_fit)
3858 if fit:
3859 self.ctfidf_model = self.ctfidf_model.fit(X, multiplier=multiplier)
-> 3861 c_tf_idf = self.ctfidf_model.transform(X)
3863 return c_tf_idf, words
File d:\anaconda3\envs\bertopic\Lib\site-packages\sklearn\utils_set_output.py:295, in _wrap_method_output..wrapped(self, X, *args, **kwargs)
293 @wraps(f)
294 def wrapped(self, X, *args, **kwargs):
--> 295 data_to_wrap = f(self, X, *args, **kwargs)
296 if isinstance(data_to_wrap, tuple):
297 # only wrap the first output for cross decomposition
...
1076 )
1078 if ensure_min_features > 0 and array.ndim == 2:
1079 n_features = array.shape[1]
ValueError: Found array with 0 sample(s) (shape=(0, 31883)) while a minimum of 1 is required by the normalize function.
Below is my training code
embedding_model = SentenceTransformer("\sentence_transformer\all-MiniLM-L6-v2")
embeddings = embedding_model.encode(abstract, show_progress_bar=True)
降维
umap_model = UMAP(n_neighbors=12, n_components=5, min_dist=0.0, metric='cosine', random_state=52)
聚类
hdbscan_model = HDBSCAN(min_cluster_size=60,min_samples=10,cluster_selection_epsilon=0, metric='euclidean', cluster_selection_method='eom', prediction_data=True)
改进主题表达
vectorizer_model = CountVectorizer(stop_words="english", min_df=2, ngram_range=(1, 2))
ctfidf_model = ClassTfidfTransformer()
keybert_model = KeyBERTInspired()
representation_model = {"KeyBERT": keybert_model}
topic_model = BERTopic(
)
Train model
topics, probs = topic_model.fit_transform(abstract, embeddings)
topic_model.save("power_battery/model_saved/abstract_topic_model", serialization="safetensors", save_ctfidf=True,save_embedding_model=embedding_model)
Beta Was this translation helpful? Give feedback.
All reactions