Replies: 2 comments 2 replies
-
Sure, feel free to share your model here or via any of my other personal social media platforms. I can't promise that I can spend hours looking at everything but I'll definitely have a look. |
Beta Was this translation helpful? Give feedback.
-
There are quite a number of ways that you can evaluate BERTopic, from topic coherence and diversity scores, to cluster and predictive evaluation metrics. This becomes more apparent when you add in the somewhat subjective nature of topic modeling. When do we have captured all topics? What is exactly a topic? Can we universally agree on what this document is about? Especially when stakeholders are involved, this becomes more and more complex. The solution to an evaluation in general, in my opinion, is to start from the beginning. Why are we doing this analysis? What are the end goals? What are we trying to achieve with this model? For whom or what is it being developed? In your case, you mention "because we need to write a term paper". Although that is technically correct, there is a fundamental reason why you are performing your topic modeling. That reason, which is often the reason for developing the model/paper, can then be served to decide which evaluation metric you are going to use. For example, if you are writing a paper to show that BERTopic is horrible at making coherent topics, then coherence metrics might serve here. In contrast, if you want to create the best topic model, then you need to ask yourself what constitutes "best" in your paper/framework/philosophical underpinnings.
Unfortunately, there isn't necessarily a cut-off for when the value that you get back is "good" as the underlying formulas do not really allow for that. Who is to say when a u_mass score suffices or not? These kinds of coherence metrics are typically used for comparisons between models to show that one is "better" than another. The difficulty here is that these measures are an approximation and in practice, even academic research, typically more is needed, like diversity metrics or most preferably, human evaluation. The latter is quite difficult and resource intensive, so that is not always done. Personally, especially if you are not that familiar with coherence metrics, I would start with "c_v" which has been researched quite extensively and has been the most popular one in research. The latter is not necessarily the best reason for choosing it but if you are selecting something else, then some readers might want an explanation for that. In practice, whenever you are choosing evaluation metrics, I would advise taking all of the above into account, arguing for why these metrics are relevant to your use case, and also being transparent in their shortcomings as they almost always have some. |
Beta Was this translation helpful? Give feedback.
-
Hey!
We are two master students (Dominik and Emma) from Germany and we used BERTopic on a Uni Project. We want to publish a paper and together with our project partner we are looking for some feedback on the model. We thought since you are the best to ask and it doesn't hurt to ask, we might as well try. Would you like to take a look at or model and give us some feedback?
I saw that you are also looking for use cases so maybe you would profit from it as well.
For the context: We are using it for abstracts of philosophical papers published in highly ranked journals.
Thanks so much!
Beta Was this translation helpful? Give feedback.
All reactions