Skip to content

Scrapes 白水社中国語辞典 on Weblio and creates a yomichan dictionary

License

Notifications You must be signed in to change notification settings

martanman/weblio-hakusuisha-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Outline

A scraper for the Chinese--Japanese dictionary 白水社中国語辞典 hosted by Weblio and found here. Includes a script to convert the scraped data into a yomichan compatible dictionary format for use in the yomichan browser extension.

Requirements

The python libraries needed can be installed with

pip install scrapy html5lib regex

Pandoc is also required; used to convert html to plain text.

Usage

python spider.py

runs the scraper. It takes about a day to scrape all the pages depending on the value of DOWNLOAD_DELAY in spider.py. Then run

python export.py

to create a yomichan compatible dictionary zip file from entries.jsonl.

Disclaimer

Please don't go around sharing copies of the scraped dictionary for copyright reasons. These scripts are intended for individual use.

About

Scrapes 白水社中国語辞典 on Weblio and creates a yomichan dictionary

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages