Best practices for modeling topics in dialogs #1745
Unanswered
lpietrobon
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have a dataset representing a long dialog among several characters, and I'd like to extract the topics being discussed. I was about to embed sentences, cluster and describe the clusters ..and then I found out about BERTopic: thank you for doing all the hard work!
I am trying to understand how to best represent my dialog into BERTopic. The dialog I am dealing with is "chatty": I have the sender, timestamp and text of each turn/message, but most messages are rather short and often rely on context (ie previous messages or "reply-to" messages) for their meaning.
Question: have others tried to model dialog/chat dataset? is there a place where I can read up on best practices with this kind of dataset?
A few challenges I've encountered so far in bringing context to individual messages:
a. how can I go about finding a good value for N? (I mean beyond the "vibes check" of trying a few and seeing what happens)
b. I can also create a bunch of BERTopic models for different N, and then merge them ...does this sound like a good idea? Is there a principled way of telling whether it works better than using only 1 value for N?
Beta Was this translation helpful? Give feedback.
All reactions