-
Notifications
You must be signed in to change notification settings - Fork 0
Counts words from input given by https://github.com/jukujala/wiki_markup_to_text
jukujala/wiki_word_count
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
* What? Calculates word counts of Wikipedia from input given by https://github.com/jukujala/wiki_markup_to_text * Input Wiki-text format: each line has one Wiki article in format "title tab string-escaped content" * Output Pickled dictionary mapping words to occurences * Usage cat corpus.txt | python decode_strings.py | python build_word_token_dict.py word_tokens.pickle
About
Counts words from input given by https://github.com/jukujala/wiki_markup_to_text
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published