You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to use KDTM to generate topics and document-topic distribution on a corpus containing 4.4M tweets (each tweet can be considered a document). Can you let me know how I can obtain the document-topic distribution? The closest method I can find to this is save_document_representations(), but I am not sure if it's the same thing.
Also, my dataset does not have any labels, so I wanted to know if labels are a part of the training process or if they are optional.
Thanks in advance!
The text was updated successfully, but these errors were encountered:
To your first question, that function will get document-topic distributions, but it's just a single sample. For a later paper, we modified the function to sample multiple times and take the mean (if I recall correctly, there's no analytical mean for a logistic-normal). You can see the modified code in this branch. In fact, if my commit history is to be trusted, you can view the exact changes here.
Labels (as well as covariates) are optional and all reported results are unsupervised.
Not that you asked, but you should also note that we realized the NPMI implementation in this repo (ported from the original Scholar paper) is nonstandard, and I believe we calculate it during training. You should prefer implementations from Gensim, OCTIS, Palmetto, or us. Of course, the best bet is to forgo automated metrics altogether 😉
Another thing you didn't ask: we've found that mallet works surprisingly well with Tweets, in case you haven't tried it already and are looking for a good baseline.
Hi, thank you for this awesome work!
I would like to use KDTM to generate topics and document-topic distribution on a corpus containing 4.4M tweets (each tweet can be considered a document). Can you let me know how I can obtain the document-topic distribution? The closest method I can find to this is save_document_representations(), but I am not sure if it's the same thing.
Also, my dataset does not have any labels, so I wanted to know if labels are a part of the training process or if they are optional.
Thanks in advance!
The text was updated successfully, but these errors were encountered: