Replies: 1 comment
-
With respect to the use case that you are referring to, it should be possible that load one or multiple text files in order to have them processed by BERTopic. One thing to note is that it typically needs a couple of hundred sentences/documents for it to work well. If you have a couple of documents, then I would advise sticking to k-Means like sub-models in BERTopic. More specifically, you can indeed load in a text file with |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We tried our best but the original newsgroup file used in the beginner's guide is large and complex-- can anyone share a re-write of the code to apply for a simple text file with text (in this case "water.txt")
import sys
from bertopic import BERTopic
from sklearn.datasets import fetch_20newsgroups
sys.path.insert(1,'C:/Users/jlevy/Desktop/jlevy/BERTopic-master/sample_text')
#f = open("water.txt",'r')
#print(f.read())
docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))['data']
#docs = f.read()
model = BERTopic(language="english")
topics, probs = model.fit_transform(docs)
#model.get_topic_freq().head(5)
topic_model.get_topic_info()
Beta Was this translation helpful? Give feedback.
All reactions