All notable changes to this project are documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning (as of version 2.0.11).
- Change log level of "Video at {url} has not yet been translated into {requested_lang_code}" messages from warning to debug (way too verbose)
- Disable preloading of subtitles in video.js
- Add
--language-threshold
CLI argument for considering languages that appear in at least specified percentage of videos incompute_zim_languages
(#212)
- Restore functionality to resist temporary bad TED responses when parsing video pages (#209)
- Retry video data extraction if
videoData
is missing from page data (#226) - Skip download of speaker image if URL is "-" (#224)
- Updgrade to zimscraperlib 3.4.0 (including new webm encoder presets to migrate to VP9 instead of VP8) (#204)
- Retry logic is still failing because req might be null when timeout occurs (#203)
- Typo in disable-metadata-checks arg in ted2zim-multi (#202)
- Change log level from ERROR to WARNING for missing translations (#197)
- Fix HTTP retries to consider any HTTP failure, not only bad HTTP status code (#162)
- New
--long-description
CLI argument to set the ZIM long description - New
--disable-metadata-check
CLI argument to disable the metadata checks which are automated since zimscraperlib 3.x - When
--languages
CLI arugment is not passed, no filtering by language is done (#171)
- Changed default publisher metadata from 'Kiwix' to 'openZIM'
- Validate ZIM metadata as early as possible
- Migrate to zimscraperlib 3.3.2 (including new VideoLowWebm encoder preset version 2)
- Upgrade Python dependencies, including migration to Python 3.12
- Fix language metadata computation (#172)
- Fix computation of automatic description and long description
- Fix subtitles time offset (#177)
- Fix rare bug in display of videos title and description on video page
- Fix support for Youtube fallback when download video from TED CDN is not working (#164 + #182)
- Do not include videos which failed to be fetched / processed in the final list of videos on main page (#167, #169)
- Fix video not working on Safari iOS / iPad (#145)
- fixed search by topic to use new search API instead of broken web page scraping (#149)
- download_link is renamed request_url and can also perform POST requests (in addition to previous GET requests)
- upgrade to Python 3.11 from 3.8
- upgrade to zimscraperlib 2.1 + upgrade all other dependencies
- significant refactoring to adopt openZIM Python conventions
- activate stale bot + add convenient pull requests template
- download_link now retries all errors but 404
- Updated ogv.js to 1.8.9
- Fixed missing speaker photo (#144)
- Fixed crash on videos without speakers (#134)
- Adapted for no-namespace ZIM (#139)
- Fixed new video DOM change
- Fixed dependency issue (markupsafe)
- Don't fail on missing whoTheyAre
- Updated scraperlib (1.6.2) to fix mime guessing bug
- Removed inline JS to comply with some CSP (#128)
- Special handling for playlist 57 (#127)
- ZIM entries now have Titles (#126)
- Updated for new playlists DOM (#124)
- Updated for new video DOM (#129)
- updated scraperlib
- fixed bug in video URL finding if ted json as an h264 entry with None value
- use
eng
as default locale
- [multi] added retry over failure to get playlist slug
- use WebP images instead of JPEG for thumbnails and speaker images
- add multithreading support and ability to download videos hosted on youtube
- fixed usage on older browsers (without ES6 support)
- limited YoutubeDownloader threads to 1
- use slug instead of video ID to make urls meaningful
- fixed bug that required clicking the next page button twice
- add i18n support
- add translations for Hindi
- use pylibzim to create zim
- add variable "{slug}" in ted2zim-multi which will be replaced by the playlist/topic slug (with dashes) automatically
- fix layout on mobile devices
- ted2zim-multi forwards return-code from ted2zim process on failure
- Fixed clashing argument between
--name
and--name-format
in ted2zim-multi
- added ted2zim-multi for multi zim creation
- added --tmp-dir to specify the path folder where the temporary build directory will be created
{period}
can be passed in--zim-file
and be replaced with date as YYYY-MM
- now handling incorrect TED website responses with retries
- fixed crash on missing language details
- removed duplicated subtitles
- fixed auto-description when title is supplied
- removed --max-videos-per-topic option to decrease complexity
- refactoring of code
- fixed missing files in package
- Fixed docker recipe (zimwriterfs version)
- Rewritten the scraper in python3
- New command to run
ted2zim
- Introduced changelog
- Topicwise scraping supported
- Support for playing webm files where not supported natively using videojs-ogvjs
- Web dependencies removed from repository
- New Dockerfile
- New project structure
- Added S3 based optimization cache support
- Add support for TED playlists
- Add support to filter videos available in a specific language based on audio and subtitles
- Add option to choose subtitle languages