Cast topic modeling outcome to dataframe #891
Answered
by
MaartenGr
angelospapoutsis
asked this question in
Q&A
-
Hello there, I have used BertTopic with KeyBERT to extract some topics from some docs What I want is to have a final dataframe that has one column with the topic name as the name of the column and as rows the elements of the topic Is there a way to accomplish this? Does there exist any relevant python code? Thank you in advance |
Beta Was this translation helpful? Give feedback.
Answered by
MaartenGr
Dec 20, 2022
Replies: 1 comment 2 replies
-
To create a dataframe with topics and documents, you would only need to do the following: import pandas as pd
from bertopic import BERTopic
topic_model = BERTopic()
topics, docs = topic_model.fit_transform(docs)
# Cast to dataframe
df = pd.DataFrame({"Doc": docs, "Topic": topics}) Do note that in the upcoming release of BERTopic, v0.13, there will be an option to extract document information as follows: >>> topic_model.get_document_info(docs)
Document Topic Name Top_n_words Probability ...
I am sure some bashers of Pens... 0 0_game_team_games_season game - team - games... 0.200010 ...
My brother is in the market for... -1 -1_can_your_will_any can - your - will... 0.420668 ...
Finally you said what you dream... -1 -1_can_your_will_any can - your - will... 0.807259 ...
Think! It's the SCSI card doing... 49 49_windows_drive_dos_file windows - drive - docs... 0.071746 ...
1) I have an old Jasmine drive... 49 49_windows_drive_dos_file windows - drive - docs... 0.038983 ... |
Beta Was this translation helpful? Give feedback.
2 replies
Answer selected by
angelospapoutsis
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
To create a dataframe with topics and documents, you would only need to do the following:
Do note that in the upcoming release of BERTopic, v0.13, there will be an option to extract document information as follows: