Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to dl my old site :/ #312

Open
jamieduk opened this issue Nov 27, 2024 · 3 comments
Open

Failed to dl my old site :/ #312

jamieduk opened this issue Nov 27, 2024 · 3 comments

Comments

@jamieduk
Copy link

jay@jnetreloaded:~/Downloads/jnet_site/archive_dl$ sudo wayback_machine_downloader jnet.sytes.net
Downloading jnet.sytes.net to websites/jnet.sytes.net/ from Wayback Machine archives.

Getting snapshot pages../usr/lib/ruby/3.2.0/open-uri.rb:369:in open_http': 400 BAD REQUEST (OpenURI::HTTPError) from /usr/lib/ruby/3.2.0/open-uri.rb:760:in buffer_open'
from /usr/lib/ruby/3.2.0/open-uri.rb:214:in block in open_loop' from /usr/lib/ruby/3.2.0/open-uri.rb:212:in catch'
from /usr/lib/ruby/3.2.0/open-uri.rb:212:in open_loop' from /usr/lib/ruby/3.2.0/open-uri.rb:153:in open_uri'
from /usr/lib/ruby/3.2.0/open-uri.rb:740:in open' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader/archive_api.rb:13:in get_raw_list_from_api'
from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:92:in block in get_all_snapshots_to_consider' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:91:in times'
from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:91:in get_all_snapshots_to_consider' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:105:in get_file_list_curated'
from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:164:in get_file_list_by_timestamp' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:309:in file_list_by_timestamp'
from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:192:in download_files' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/bin/wayback_machine_downloader:72:in <top (required)>'
from /usr/local/bin/wayback_machine_downloader:25:in load' from /usr/local/bin/wayback_machine_downloader:25:in

'
jay@jnetreloaded:~/Downloads/jnet_site/archive_dl$ sudo wayback_machine_downloader http://jnet.sytes.net
Downloading http://jnet.sytes.net to websites/jnet.sytes.net/ from Wayback Machine archives.

Getting snapshot pages../usr/lib/ruby/3.2.0/open-uri.rb:369:in open_http': 400 BAD REQUEST (OpenURI::HTTPError) from /usr/lib/ruby/3.2.0/open-uri.rb:760:in buffer_open'
from /usr/lib/ruby/3.2.0/open-uri.rb:214:in block in open_loop' from /usr/lib/ruby/3.2.0/open-uri.rb:212:in catch'
from /usr/lib/ruby/3.2.0/open-uri.rb:212:in open_loop' from /usr/lib/ruby/3.2.0/open-uri.rb:153:in open_uri'
from /usr/lib/ruby/3.2.0/open-uri.rb:740:in open' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader/archive_api.rb:13:in get_raw_list_from_api'
from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:92:in block in get_all_snapshots_to_consider' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:91:in times'
from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:91:in get_all_snapshots_to_consider' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:105:in get_file_list_curated'
from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:164:in get_file_list_by_timestamp' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:309:in file_list_by_timestamp'
from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:192:in download_files' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/bin/wayback_machine_downloader:72:in <top (required)>'
from /usr/local/bin/wayback_machine_downloader:25:in load' from /usr/local/bin/wayback_machine_downloader:25:in

'
jay@jnetreloaded:~/Downloads/jnet_site/archive_dl$ sudo wayback_machine_downloader -d . http://jnet.sytes.net
Downloading http://jnet.sytes.net to ./ from Wayback Machine archives.

Getting snapshot pages../usr/lib/ruby/3.2.0/open-uri.rb:369:in open_http': 400 BAD REQUEST (OpenURI::HTTPError) from /usr/lib/ruby/3.2.0/open-uri.rb:760:in buffer_open'
from /usr/lib/ruby/3.2.0/open-uri.rb:214:in block in open_loop' from /usr/lib/ruby/3.2.0/open-uri.rb:212:in catch'
from /usr/lib/ruby/3.2.0/open-uri.rb:212:in open_loop' from /usr/lib/ruby/3.2.0/open-uri.rb:153:in open_uri'
from /usr/lib/ruby/3.2.0/open-uri.rb:740:in open' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader/archive_api.rb:13:in get_raw_list_from_api'
from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:92:in block in get_all_snapshots_to_consider' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:91:in times'
from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:91:in get_all_snapshots_to_consider' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:105:in get_file_list_curated'
from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:164:in get_file_list_by_timestamp' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:309:in file_list_by_timestamp'
from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:192:in download_files' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/bin/wayback_machine_downloader:72:in <top (required)>'
from /usr/local/bin/wayback_machine_downloader:25:in load' from /usr/local/bin/wayback_machine_downloader:25:in

'
jay@jnetreloaded:~/Downloads/jnet_site/archive_dl$ sudo wayback_machine_downloader -a -d /home/jay/Downloads/jnet_site/archive_dl http://jnet.sytes.net
Downloading http://jnet.sytes.net to /home/jay/Downloads/jnet_site/archive_dl/ from Wayback Machine archives.

Getting snapshot pages../usr/lib/ruby/3.2.0/open-uri.rb:369:in open_http': 400 BAD REQUEST (OpenURI::HTTPError) from /usr/lib/ruby/3.2.0/open-uri.rb:760:in buffer_open'
from /usr/lib/ruby/3.2.0/open-uri.rb:214:in block in open_loop' from /usr/lib/ruby/3.2.0/open-uri.rb:212:in catch'
from /usr/lib/ruby/3.2.0/open-uri.rb:212:in open_loop' from /usr/lib/ruby/3.2.0/open-uri.rb:153:in open_uri'
from /usr/lib/ruby/3.2.0/open-uri.rb:740:in open' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader/archive_api.rb:13:in get_raw_list_from_api'
from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:92:in block in get_all_snapshots_to_consider' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:91:in times'
from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:91:in get_all_snapshots_to_consider' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:105:in get_file_list_curated'
from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:164:in get_file_list_by_timestamp' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:309:in file_list_by_timestamp'
from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:192:in download_files' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/bin/wayback_machine_downloader:72:in <top (required)>'
from /usr/local/bin/wayback_machine_downloader:25:in load' from /usr/local/bin/wayback_machine_downloader:25:in

'
jay@jnetreloaded:~/Downloads/jnet_site/archive_dl$ sudo wayback_machine_downloader -a -d ./websites http://jnet.sytes.net
Downloading http://jnet.sytes.net to ./websites/ from Wayback Machine archives.

Getting snapshot pages../usr/lib/ruby/3.2.0/open-uri.rb:369:in open_http': 400 BAD REQUEST (OpenURI::HTTPError) from /usr/lib/ruby/3.2.0/open-uri.rb:760:in buffer_open'
from /usr/lib/ruby/3.2.0/open-uri.rb:214:in block in open_loop' from /usr/lib/ruby/3.2.0/open-uri.rb:212:in catch'
from /usr/lib/ruby/3.2.0/open-uri.rb:212:in open_loop' from /usr/lib/ruby/3.2.0/open-uri.rb:153:in open_uri'
from /usr/lib/ruby/3.2.0/open-uri.rb:740:in open' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader/archive_api.rb:13:in get_raw_list_from_api'
from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:92:in block in get_all_snapshots_to_consider' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:91:in times'
from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:91:in get_all_snapshots_to_consider' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:105:in get_file_list_curated'
from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:164:in get_file_list_by_timestamp' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:309:in file_list_by_timestamp'
from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:192:in download_files' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/bin/wayback_machine_downloader:72:in <top (required)>'
from /usr/local/bin/wayback_machine_downloader:25:in load' from /usr/local/bin/wayback_machine_downloader:25:in

'

@dmikhaylov
Copy link

@afongemie you mean wayback_machine_downloader jnet.sytes.net ? Have you actually tried it. It raises the same error.

@fredericschmidt
Copy link

fredericschmidt commented Dec 15, 2024

Same here :-(

It seems that the structure of the wayback machine archive service changed a bit...

In wayback_machine_downloader.rb (in /Users/user/.gem/ruby/2.6.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb if you installed here), you can replace the function get_all_snapshots_to_consider in the code by this :

  def get_all_snapshots_to_consider
    # Note: Passing a page index parameter allow us to get more snapshots,
    # but from a less fresh index
    print "Getting snapshot pages"
    snapshot_list_to_consider = []
    snapshot_list_to_consider += get_raw_list_from_api(@base_url, nil)
    print "."
    unless @exact_url
#      @maximum_pages.times do |page_index|
#        snapshot_list = get_raw_list_from_api(@base_url + '/*', page_index)
#        break if snapshot_list.empty?
#        snapshot_list_to_consider += snapshot_list
#        print "."
#      end
        page_index = 0
        snapshot_list = get_raw_list_from_api(@base_url + '/*', page_index)
        snapshot_list_to_consider += snapshot_list
        print "."
    end
    puts " found #{snapshot_list_to_consider.length} snaphots to consider."
    puts
    snapshot_list_to_consider
  end

It download everything BUT THE LINKS ARE NOT PRESERVED !

@acenturyandabit
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants