NLP on deep-learning-papers

This a try to do topic modellng on best 100 papers from github repo awesome-deep-learning-papers.

From the repo, we should have 100 papers but during the crawling with script, the access towards one of them (Human-level control through deep reinforcement learning) is blocked.

Then, a script and pdftotext is used to parse pdfs to plain texts.
In find_topics.py, we concatenate all plain texts to papers.txt which is of size 4 MB. This means there is about 4000000 characters in the data.
The gensim library is used as it is tailored for topic modelling. The findings are visualized by pyLDAvis library and stored as .html.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
raw text		raw text
.gitignore		.gitignore
README.md		README.md
fetch_papers.py		fetch_papers.py
find_topics.py		find_topics.py
lda.html		lda.html
lda_tfidf.html		lda_tfidf.html
paper.md		paper.md
papers.txt		papers.txt
parse_pdf_to_text.py		parse_pdf_to_text.py
report.pdf		report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP on deep-learning-papers

About

Releases

Packages

Languages

tim1234ltp/understanding-awesome-deep-learning-papers

Folders and files

Latest commit

History

Repository files navigation

NLP on deep-learning-papers

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages