how to use BerTOPIC to analyze a single (simple) text file #949

jklevy · 2023-01-20T23:08:43Z

jklevy
Jan 20, 2023

We tried our best but the original newsgroup file used in the beginner's guide is large and complex-- can anyone share a re-write of the code to apply for a simple text file with text (in this case "water.txt")

import sys
from bertopic import BERTopic
from sklearn.datasets import fetch_20newsgroups
sys.path.insert(1,'C:/Users/jlevy/Desktop/jlevy/BERTopic-master/sample_text')
#f = open("water.txt",'r')
#print(f.read())
docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))['data']
#docs = f.read()
model = BERTopic(language="english")
topics, probs = model.fit_transform(docs)
#model.get_topic_freq().head(5)
topic_model.get_topic_info()

MaartenGr · 2023-01-21T08:35:36Z

MaartenGr
Jan 21, 2023
Maintainer

With respect to the use case that you are referring to, it should be possible that load one or multiple text files in order to have them processed by BERTopic. One thing to note is that it typically needs a couple of hundred sentences/documents for it to work well. If you have a couple of documents, then I would advise sticking to k-Means like sub-models in BERTopic.

More specifically, you can indeed load in a text file with .read as you mention in your code. If it contains multiple documents, then you could also use .readlines().

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to use BerTOPIC to analyze a single (simple) text file #949

{{title}}

Replies: 1 comment

{{title}}

Select a reply

how to use BerTOPIC to analyze a single (simple) text file #949

jklevy Jan 20, 2023

Replies: 1 comment

MaartenGr Jan 21, 2023 Maintainer

jklevy
Jan 20, 2023

MaartenGr
Jan 21, 2023
Maintainer