Document-topic distribution #2

devanshrj · 2023-05-09T20:56:20Z

Hi, thank you for this awesome work!

I would like to use KDTM to generate topics and document-topic distribution on a corpus containing 4.4M tweets (each tweet can be considered a document). Can you let me know how I can obtain the document-topic distribution? The closest method I can find to this is save_document_representations(), but I am not sure if it's the same thing.

Also, my dataset does not have any labels, so I wanted to know if labels are a part of the training process or if they are optional.

Thanks in advance!

ahoho · 2023-05-09T22:16:55Z

Thanks for your interest!

To your first question, that function will get document-topic distributions, but it's just a single sample. For a later paper, we modified the function to sample multiple times and take the mean (if I recall correctly, there's no analytical mean for a logistic-normal). You can see the modified code in this branch. In fact, if my commit history is to be trusted, you can view the exact changes here.

Labels (as well as covariates) are optional and all reported results are unsupervised.

Not that you asked, but you should also note that we realized the NPMI implementation in this repo (ported from the original Scholar paper) is nonstandard, and I believe we calculate it during training. You should prefer implementations from Gensim, OCTIS, Palmetto, or us. Of course, the best bet is to forgo automated metrics altogether 😉

ahoho · 2023-05-09T22:21:23Z

Another thing you didn't ask: we've found that mallet works surprisingly well with Tweets, in case you haven't tried it already and are looking for a good baseline.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document-topic distribution #2

Document-topic distribution #2

devanshrj commented May 9, 2023

ahoho commented May 9, 2023 •

edited

Loading

ahoho commented May 9, 2023

Document-topic distribution #2

Document-topic distribution #2

Comments

devanshrj commented May 9, 2023

ahoho commented May 9, 2023 • edited Loading

ahoho commented May 9, 2023

ahoho commented May 9, 2023 •

edited

Loading