Skip to content

Repository for EACL 2021 Paper: Adaptive Mixed Component LDA for Low Resource Topic Modeling

Notifications You must be signed in to change notification settings

suzyahyah/adaptive_mixture_topic_model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Basic Run

To compile cython files:

python code/setup.py build_ext --inplace

To preprocess embeddings for faster loading, do

python code/topicModelP/preprocess_en_embed.py

Basic 20 newsgroup vanilla (discrete LDA)

bash ./bin/runGaussian_py.sh $DATA $SHARED_PARAMS $INTERPOLATE $STAGGER $TRAIN_MODE $NTOPICS $TRAIN_SIZE $SPLIT

Batch run for sungrid engine

bash ./bin/runNews.sh bash ./bin/runWiki.sh

Other available configs

See ./bin/runGaussian_py.sh which contains full list of configs and descriptions

For the multi-lingual case, a config file with data_paths should be provided in configs/data_paths.yaml. See example in configs/data_path.yaml

External Evaluations

External evaluations (cv, npmi) relies on the Palmetto library. We only conduct external scoring against the monolingual dataset. This code relies on en_words.txt being present in the DIR.

bash ./bin/external_score.sh

About

Repository for EACL 2021 Paper: Adaptive Mixed Component LDA for Low Resource Topic Modeling

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published