The Wiki Movies Scraper is a Scrapy project designed to collect information on movies from Wikipedia, including their Title, Genre, Director, Country & Year, and IMDB Rating. The data is then stored in a CSV format.
To use it in your language, change the scraping parameters related to the selection of words from the html page
- Scrape movie details from Wikipedia.
- Output the data in CSV format
- Python 3.10
- Scrapy
Clone the repository and navigate to the project directory:
git clone https://github.com/RomiconEZ/Scrapy-Wiki-IMDb-Movie-Info.git
cd wiki_movies_scraper
To start scraping movies, run the following command:
scrapy crawl movies_spider
The directory structure for this Scrapy project is as follows:
ScrapyParsers/
wiki_movies_scraper/
wiki_movies_scraper/
spiders/
__init__.py
movies_spider.py
__init__.py
items.py
middlewares.py
pipelines.py
settings.py
scrapy.cfg
movies_example.csv
poetry.lock
pyproject.toml
README.md