-
Notifications
You must be signed in to change notification settings - Fork 720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to dl my old site :/ #312
Comments
@afongemie you mean |
Same here :-( It seems that the structure of the wayback machine archive service changed a bit... In wayback_machine_downloader.rb (in /Users/user/.gem/ruby/2.6.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb if you installed here), you can replace the function get_all_snapshots_to_consider in the code by this : def get_all_snapshots_to_consider
# Note: Passing a page index parameter allow us to get more snapshots,
# but from a less fresh index
print "Getting snapshot pages"
snapshot_list_to_consider = []
snapshot_list_to_consider += get_raw_list_from_api(@base_url, nil)
print "."
unless @exact_url
# @maximum_pages.times do |page_index|
# snapshot_list = get_raw_list_from_api(@base_url + '/*', page_index)
# break if snapshot_list.empty?
# snapshot_list_to_consider += snapshot_list
# print "."
# end
page_index = 0
snapshot_list = get_raw_list_from_api(@base_url + '/*', page_index)
snapshot_list_to_consider += snapshot_list
print "."
end
puts " found #{snapshot_list_to_consider.length} snaphots to consider."
puts
snapshot_list_to_consider
end It download everything BUT THE LINKS ARE NOT PRESERVED ! |
jay@jnetreloaded:~/Downloads/jnet_site/archive_dl$ sudo wayback_machine_downloader jnet.sytes.net
Downloading jnet.sytes.net to websites/jnet.sytes.net/ from Wayback Machine archives.
Getting snapshot pages../usr/lib/ruby/3.2.0/open-uri.rb:369:in
'open_http': 400 BAD REQUEST (OpenURI::HTTPError) from /usr/lib/ruby/3.2.0/open-uri.rb:760:in
buffer_open'from /usr/lib/ruby/3.2.0/open-uri.rb:214:in
block in open_loop' from /usr/lib/ruby/3.2.0/open-uri.rb:212:in
catch'from /usr/lib/ruby/3.2.0/open-uri.rb:212:in
open_loop' from /usr/lib/ruby/3.2.0/open-uri.rb:153:in
open_uri'from /usr/lib/ruby/3.2.0/open-uri.rb:740:in
open' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader/archive_api.rb:13:in
get_raw_list_from_api'from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:92:in
block in get_all_snapshots_to_consider' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:91:in
times'from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:91:in
get_all_snapshots_to_consider' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:105:in
get_file_list_curated'from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:164:in
get_file_list_by_timestamp' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:309:in
file_list_by_timestamp'from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:192:in
download_files' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/bin/wayback_machine_downloader:72:in
<top (required)>'from /usr/local/bin/wayback_machine_downloader:25:in
load' from /usr/local/bin/wayback_machine_downloader:25:in
jay@jnetreloaded:~/Downloads/jnet_site/archive_dl$ sudo wayback_machine_downloader http://jnet.sytes.net
Downloading http://jnet.sytes.net to websites/jnet.sytes.net/ from Wayback Machine archives.
Getting snapshot pages../usr/lib/ruby/3.2.0/open-uri.rb:369:in
'open_http': 400 BAD REQUEST (OpenURI::HTTPError) from /usr/lib/ruby/3.2.0/open-uri.rb:760:in
buffer_open'from /usr/lib/ruby/3.2.0/open-uri.rb:214:in
block in open_loop' from /usr/lib/ruby/3.2.0/open-uri.rb:212:in
catch'from /usr/lib/ruby/3.2.0/open-uri.rb:212:in
open_loop' from /usr/lib/ruby/3.2.0/open-uri.rb:153:in
open_uri'from /usr/lib/ruby/3.2.0/open-uri.rb:740:in
open' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader/archive_api.rb:13:in
get_raw_list_from_api'from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:92:in
block in get_all_snapshots_to_consider' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:91:in
times'from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:91:in
get_all_snapshots_to_consider' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:105:in
get_file_list_curated'from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:164:in
get_file_list_by_timestamp' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:309:in
file_list_by_timestamp'from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:192:in
download_files' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/bin/wayback_machine_downloader:72:in
<top (required)>'from /usr/local/bin/wayback_machine_downloader:25:in
load' from /usr/local/bin/wayback_machine_downloader:25:in
jay@jnetreloaded:~/Downloads/jnet_site/archive_dl$ sudo wayback_machine_downloader -d . http://jnet.sytes.net
Downloading http://jnet.sytes.net to ./ from Wayback Machine archives.
Getting snapshot pages../usr/lib/ruby/3.2.0/open-uri.rb:369:in
'open_http': 400 BAD REQUEST (OpenURI::HTTPError) from /usr/lib/ruby/3.2.0/open-uri.rb:760:in
buffer_open'from /usr/lib/ruby/3.2.0/open-uri.rb:214:in
block in open_loop' from /usr/lib/ruby/3.2.0/open-uri.rb:212:in
catch'from /usr/lib/ruby/3.2.0/open-uri.rb:212:in
open_loop' from /usr/lib/ruby/3.2.0/open-uri.rb:153:in
open_uri'from /usr/lib/ruby/3.2.0/open-uri.rb:740:in
open' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader/archive_api.rb:13:in
get_raw_list_from_api'from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:92:in
block in get_all_snapshots_to_consider' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:91:in
times'from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:91:in
get_all_snapshots_to_consider' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:105:in
get_file_list_curated'from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:164:in
get_file_list_by_timestamp' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:309:in
file_list_by_timestamp'from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:192:in
download_files' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/bin/wayback_machine_downloader:72:in
<top (required)>'from /usr/local/bin/wayback_machine_downloader:25:in
load' from /usr/local/bin/wayback_machine_downloader:25:in
jay@jnetreloaded:~/Downloads/jnet_site/archive_dl$ sudo wayback_machine_downloader -a -d /home/jay/Downloads/jnet_site/archive_dl http://jnet.sytes.net
Downloading http://jnet.sytes.net to /home/jay/Downloads/jnet_site/archive_dl/ from Wayback Machine archives.
Getting snapshot pages../usr/lib/ruby/3.2.0/open-uri.rb:369:in
'open_http': 400 BAD REQUEST (OpenURI::HTTPError) from /usr/lib/ruby/3.2.0/open-uri.rb:760:in
buffer_open'from /usr/lib/ruby/3.2.0/open-uri.rb:214:in
block in open_loop' from /usr/lib/ruby/3.2.0/open-uri.rb:212:in
catch'from /usr/lib/ruby/3.2.0/open-uri.rb:212:in
open_loop' from /usr/lib/ruby/3.2.0/open-uri.rb:153:in
open_uri'from /usr/lib/ruby/3.2.0/open-uri.rb:740:in
open' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader/archive_api.rb:13:in
get_raw_list_from_api'from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:92:in
block in get_all_snapshots_to_consider' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:91:in
times'from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:91:in
get_all_snapshots_to_consider' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:105:in
get_file_list_curated'from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:164:in
get_file_list_by_timestamp' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:309:in
file_list_by_timestamp'from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:192:in
download_files' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/bin/wayback_machine_downloader:72:in
<top (required)>'from /usr/local/bin/wayback_machine_downloader:25:in
load' from /usr/local/bin/wayback_machine_downloader:25:in
jay@jnetreloaded:~/Downloads/jnet_site/archive_dl$ sudo wayback_machine_downloader -a -d ./websites http://jnet.sytes.net
Downloading http://jnet.sytes.net to ./websites/ from Wayback Machine archives.
Getting snapshot pages../usr/lib/ruby/3.2.0/open-uri.rb:369:in
'open_http': 400 BAD REQUEST (OpenURI::HTTPError) from /usr/lib/ruby/3.2.0/open-uri.rb:760:in
buffer_open'from /usr/lib/ruby/3.2.0/open-uri.rb:214:in
block in open_loop' from /usr/lib/ruby/3.2.0/open-uri.rb:212:in
catch'from /usr/lib/ruby/3.2.0/open-uri.rb:212:in
open_loop' from /usr/lib/ruby/3.2.0/open-uri.rb:153:in
open_uri'from /usr/lib/ruby/3.2.0/open-uri.rb:740:in
open' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader/archive_api.rb:13:in
get_raw_list_from_api'from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:92:in
block in get_all_snapshots_to_consider' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:91:in
times'from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:91:in
get_all_snapshots_to_consider' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:105:in
get_file_list_curated'from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:164:in
get_file_list_by_timestamp' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:309:in
file_list_by_timestamp'from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:192:in
download_files' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/bin/wayback_machine_downloader:72:in
<top (required)>'from /usr/local/bin/wayback_machine_downloader:25:in
load' from /usr/local/bin/wayback_machine_downloader:25:in
The text was updated successfully, but these errors were encountered: