Make sure to grab the newest version from there instead.
A python script for ETH students to download lecture videos from video.ethz.ch.
requests
tqdm
(optional, gives better progressbar)
Install with:
pip3 install requests tqdm
Download the file here and run with
python3 vo-scraper.py
python3 vo-scraper.py <arguments> <lecture link(s)>
To see a list of possible arguments check
python3 vo-scraper.py --help
For protected lectures the vo-scraper will ask for your login credentials before downloading the video(s).
You can either specify single episodes by typing their indices separated by space, or add ranges in Haskell syntax, like 1..5
for 1 2 3 4
.
Ranges are upper-bound-inclusive. Custom steps sizes are supported too, e.g. 1..3..10
You may find this example of ranges useful:
Range | Equivalent | In Words |
---|---|---|
1..4 |
1 2 3 4 |
Episode one to four |
..4 |
0 1 2 3 4 |
All episodes up to four (the fifth) |
3.. |
3 4 5 6 [...] |
All episodes starting from three (the fourth) |
.. |
0 1 2 3 [...] |
All episodes |
2..4..6 |
2 4 6 |
Every other episodes from two to six |
..2..6 |
0 2 4 6 |
Every other episodes until six (when I started paying attention) |
1..3.. |
1 3 5 [...] |
Every other episodes starting from the second (i.e.. all the second episodes of the week) |
..3.. |
0 3 6 [...] |
Every third episodes, starting from the beginning |
Downloading live streams is not supported.
Downloading is only supported for recorded lectures on video.ethz.ch. Other platforms such as Zoom, Moodle, and Polybox are not supported.
The file should only have one link per line. Lines starting with #
will be ignored and can be used for comments. Empty lines will also be ignored. It should look something like this:
https://video.ethz.ch/lectures/<department>/<year>/<spring/autumn>/XXX-XXXX-XXL.html
# This is a comment
https://video.ethz.ch/lectures/<department>/<year>/<spring/autumn>/XXX-XXXX-XXL.html
...
Additionally you can also add a username and password at the end of the link seperated by a single space:
https://video.ethz.ch/lectures/<department>/<year>/<spring/autumn>/XXX-XXXX-XXL.html username passw0rd1
...
Note: This is NOT recommended for your NETHZ account password for security reasons!
Q: I don't like having to pass all those parameters each time I download recordings. Is there a better way?
You can can create a file called parameters.txt
in which you put all your parameters. As long as you keep it in the same directory in which you call the scraper, it will automatically detect the file and read the parameters from there.
Example:
If you create a file called parameters.txt
with the following content
--all
--quality low
and then run python3 vo-scraper.py <some lecture link>
in that directory it will download all recordings (--all
) from that lecture in low quality (--quality low
) without you having to pass any parameters.
If you want to use a different name for the parameter file, you can pass the parameter --parameter-file <filename>
. Ironically, you cannot do this via parameters.txt
:P
Each lecture on video.ethz.ch has a JSON file with metadata associated with it.
So for example
https://video.ethz.ch/lectures/d-infk/2019/spring/252-0028-00L.html
has its JSON file under:
https://video.ethz.ch/lectures/d-infk/2019/spring/252-0028-00L.series-metadata.json
This JSON file contains a list of all "episodes" where the ids of all the videos of the lecture are located.
Using those ids we can access another JSON file with the video's metadata under
https://video.ethz.ch/.episode-video.json?recordId=<ID>
Example:
https://video.ethz.ch/.episode-video.json?recordId=3f6dee77-396c-4e2e-a312-a41a457b319f
This file contains links to all available video streams (usually 1080p, 720p, and 360p). Note that if a lecture requires a login, this file will only be accessible if you have a cookie with a valid login-token!
The link to the video stream looks something like this:
https://oc-vp-dist-downloads.ethz.ch/mh_default_org/oaipmh-mmp/<video id>/<video src id?>/presentation_XXXXXXXX_XXXX_XXXX_XXXX_XXXXXXXXXXXX.mp4
Example:
https://oc-vp-dist-downloads.ethz.ch/mh_default_org/oaipmh-mmp/3f6dee77-396c-4e2e-a312-a41a457b319f/2bd93636-e95d-4552-8722-332a95e1a0a6/presentation_c6539ed0_1af9_490d_aec0_a67688dad755.mp4
So what the vo-scraper does is getting the list of episodes from the lecture's metadata and then acquiring the links to the videos selected by the user by accessing the videos' JSON files. Afterwards it downloads the videos behind the links.
There exist three (known) types of protection for lecture videos:
-
NONE
requires no login at all -
ETH
requires logging in with a NETHZ account -
PWD
requires logging in with a custom user name and password
What kind of protection a series of lecture videos has, can be found in its metadata file under <lecture link>.series-metadata.json
, e.g.
https://video.ethz.ch/lectures/d-infk/2019/spring/252-0028-00L.series-metadata.json
This JSON file has a field called protection
with a value corresponding to one of the three protection types.
If the series is protected then a cookie containing an authentication token needs to be sent when requesting the individual videos' metadata file at https://video.ethz.ch/.episode-video.json?recordId=<ID>
Getting a cookie with a valid token differs between videos that require a NETHZ login and videos that use custom credentials.
For NETHZ logins we need to send a POST request to https://video.ethz.ch/j_security_check
with the following headers:
Referer: <lecture link>.html
User-Agent: Mozilla/5.0
as well as the following parametres:
__charset__: utf-8
j_validate: True
j_username: <NETHZ username>
j_password: <NETHZ password>
For logins with custom credentials we have to perforn a POST request to <lecture link>.series-login.json
, e.g.:
https://video.ethz.ch/lectures/d-infk/2020/spring/252-0220-00L.series-login.json
with the following headers:
Referer: <lecture link>.html
User-Agent: Mozilla/5.0
as well as the following parametres:
__charset__: utf-8
username: <custom username>
password: <custom password>
In both cases we get back a cookie which we then can include when requesting the individual video metdata files.
- Make sure you have connection to video.ethz.ch. The scraper should let you know when there's no connection.
- Try running it again. Sometimes random issues can throw it off.
- If the lecture is password protected, make sure you use the correct credentials. Most protected lectures require your NETHZ credentials while some use a custom username and password.
- Make sure you're running the newest version of the scraper by re-downloading the script from the repository. There might have been an update.
- Check whether other lectures still work. Maybe the site was updated which broke the scraper.
- Enable the debug flag with
--verbose
and see whether any of the additional information now provided is helpful. - Check "How does it acquire the videos?" and see whether you can manually reach the video in your browser following the steps described there.
- After having tried all that without success, feel free to open up a new issue. Make sure to explain what you have tried and what the results were. There is no guarantee I will respond within reasonable time as I'm a busy student myself. If you can fix the issue yourself, feel free to open a merge request with the fix.