-
Is it possible to view the rows that comprise a topic? If so how? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
The package follows, to a certain extent, sklearn's API in that whenever you use from bertopic import BERTopic
from sklearn.datasets import fetch_20newsgroups
docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))['data']
topic_model = BERTopic()
topics, probs = topic_model.fit_transform(docs) Here, You can use that structure to extract the documents under a certain topic by using, for example, the following: import pandas as pd
results = pd.DataFrame({"Doc": docs, "Topic": topics}) The If you already have a data frame, df["Topic"] = topics |
Beta Was this translation helpful? Give feedback.
The package follows, to a certain extent, sklearn's API in that whenever you use
transform
on a set of documents, it will return the topics in the same order. Let's say you have the following code:Here,
docs
is a list of documents on which you train the model. Running.fit_transform(docs)
will return the variabletopics
. Intopics
, you will find the topics that belong to each documents. The topic intopics[0]
corresponds to the document indocs[0]
,t…