Replies: 1 comment 2 replies
-
When documents are too long for the embedding model to fit within its context size or when you expect multiple topics within a given document, you split them into sentences and then you cluster the sentences. That will extract the topics over all sentences (and therefore documents). If you then want the topics for all documents, you can combine the sentences together to get a distribution of topics. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
according to the best practices guide,
however, the provided code example just flats all sentences of all documents, yet I would like to cluster documents, not sentences. What am I missing?
Beta Was this translation helpful? Give feedback.
All reactions