This repo contains the scripts used in my latest experiment titled Reliving Avengers: Infinity War with spaCy and Natural Language Processing, available at this link Reliving Avengers: Infinity War with spaCy and Natural Language Processing.
Using spaCy, an NLP Python open source library designed to help us process and understand volumes of text, I analyzed the script of the movie to investigate the following concepts:
- Overall top 10 verbs, nouns, adverbs and adjectives from the film.
- Top verbs and nouns spoke by a particular character
- Top 30 named entities from the film
- The similarity between the lines spoken by each character pair, e.g., the similarity between Thor's and Thanos' lines.
- Python
- spaCy
Besides the scripts, the repo contains the full movie script (raw_script.txt), the script without comments, scenes descriptions, and the subjects (cleaned-script.txt), and the cleaned script but with the subjects (cleaned-script-subject.txt). Moreover, the plots directory contains all the plots that show the top nouns, adverbs, adjetives, verbs and entities per character.
Thanks to Manuel Romero (https://github.com/mrm8488) for writing the Jupyter notebook.