Skip to content

Cast topic modeling outcome to dataframe #891

Answered by MaartenGr
angelospapoutsis asked this question in Q&A
Discussion options

You must be logged in to vote

To create a dataframe with topics and documents, you would only need to do the following:

import pandas as pd
from bertopic import BERTopic

topic_model = BERTopic()
topics, docs = topic_model.fit_transform(docs)

# Cast to dataframe
df = pd.DataFrame({"Doc": docs, "Topic": topics})

Do note that in the upcoming release of BERTopic, v0.13, there will be an option to extract document information as follows:

>>> topic_model.get_document_info(docs)

Document                               Topic	Name	                        Top_n_words                     Probability    ...
I am sure some bashers of Pens...	0	0_game_team_games_season	game - team - games...	        0.200010       ...
My brother is 

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@missa7481
Comment options

@MaartenGr
Comment options

Answer selected by angelospapoutsis
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants