You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I caught your bot scraping the MR website and I wanted to take a minute to help you improve it. I found that it has a small implementation bug that causes it to crawl incorrect relative links when they are out of global context. It's kind of rare, but as you will see below, your bot discovered MP3 sound effects after scraping this JS file: https://www.macintoshrepository.org/assets/js/ben_chat_v2.js
The issue is that your bot thinks that those MP3 files are located at /assets/assets/audio/logged_in.mp3 which obviously does not exist. Your bot is lacking global context, which means that the paths in that JS file are relative to where that JS file is supposed to be loaded, not relative to the JS file itself, which is what your bot currently thinks. Since the JS file is loaded from / or /applications/ then ../assets/audio/logged_in.mp3 becomes /assets/audio/logged_in.mp3 and not /assets/assets/audio/logged_in.mp3 😁
JS file:
audio_tick = new Howl({src:['../assets/audio/chatpost.mp3']});
audio_tear_short = new Howl({src:['../assets/audio/tear_short.mp3']});
audio_tear_long = new Howl({src:['../assets/audio/tear_long.mp3']});
audio_eep = new Howl({src:['../assets/audio/eep.mp3']});
audio_logged_in = new Howl({src:['../assets/audio/logged_in.mp3']});
audio_magnet_unlock = new Howl({src:['../assets/audio/magnet_unlock.mp3']});
audio_priv_msg = new Howl({src:['../assets/audio/svrmsg.mp3']});
lots of 404 errors in the logs:
domlogs/macintoshrepository.org-ssl_log:152.53.39.37 - - [17/Sep/2024:06:10:37 -0400] "GET /assets/assets/audio/logged_in.mp3 HTTP/1.1" 404 100206 "https://www.macintoshrepository.org/assets/js/ben_chat_v2.js" "ArchiveTeam ArchiveBot/20231201.ad9703c (wpull 2.0.3) and not Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36"
domlogs/macintoshrepository.org-ssl_log:152.53.39.37 - - [17/Sep/2024:06:10:38 -0400] "GET /assets/assets/audio/magnet_unlock.mp3 HTTP/1.1" 404 100200 "https://www.macintoshrepository.org/assets/js/ben_chat_v2.js" "ArchiveTeam ArchiveBot/20231201.ad9703c (wpull 2.0.3) and not Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36"
domlogs/macintoshrepository.org-ssl_log:152.53.39.37 - - [17/Sep/2024:06:10:38 -0400] "GET /assets/assets/audio/eep.mp3 HTTP/1.1" 404 100196 "https://www.macintoshrepository.org/assets/js/ben_chat_v2.js" "ArchiveTeam ArchiveBot/20231201.ad9703c (wpull 2.0.3) and not Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36"
domlogs/macintoshrepository.org-ssl_log:152.53.39.37 - - [17/Sep/2024:06:10:39 -0400] "GET /assets/assets/audio/chatpost.mp3 HTTP/1.1" 404 100203 "https://www.macintoshrepository.org/assets/js/ben_chat_v2.js" "ArchiveTeam ArchiveBot/20231201.ad9703c (wpull 2.0.3) and not Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36"
domlogs/macintoshrepository.org-ssl_log:152.53.39.37 - - [17/Sep/2024:06:10:39 -0400] "GET /assets/assets/audio/tear_short.mp3 HTTP/1.1" 404 100206 "https://www.macintoshrepository.org/assets/js/ben_chat_v2.js" "ArchiveTeam ArchiveBot/20231201.ad9703c (wpull 2.0.3) and not Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36"
domlogs/macintoshrepository.org-ssl_log:152.53.39.37 - - [17/Sep/2024:06:10:40 -0400] "GET /assets/assets/audio/svrmsg.mp3 HTTP/1.1" 404 100194 "https://www.macintoshrepository.org/assets/js/ben_chat_v2.js" "ArchiveTeam ArchiveBot/20231201.ad9703c (wpull 2.0.3) and not Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36"
domlogs/macintoshrepository.org-ssl_log:152.53.39.37 - - [17/Sep/2024:06:10:40 -0400] "GET /assets/assets/audio/tear_long.mp3 HTTP/1.1" 404 100210 "https://www.macintoshrepository.org/assets/js/ben_chat_v2.js" "ArchiveTeam ArchiveBot/20231201.ad9703c (wpull 2.0.3) and not Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36"
BTW, I was very impressed by the real time tracker on http://www.archivebot.com
It's crazy how much real time DATA it pushes to the browser very 0.5s! Incredible work 👍
The text was updated successfully, but these errors were encountered:
Hi! I caught your bot scraping the MR website and I wanted to take a minute to help you improve it. I found that it has a small implementation bug that causes it to crawl incorrect relative links when they are out of global context. It's kind of rare, but as you will see below, your bot discovered MP3 sound effects after scraping this JS file: https://www.macintoshrepository.org/assets/js/ben_chat_v2.js
The issue is that your bot thinks that those MP3 files are located at
/assets/assets/audio/logged_in.mp3
which obviously does not exist. Your bot is lacking global context, which means that the paths in that JS file are relative to where that JS file is supposed to be loaded, not relative to the JS file itself, which is what your bot currently thinks. Since the JS file is loaded from/
or/applications/
then../assets/audio/logged_in.mp3
becomes/assets/audio/logged_in.mp3
and not/assets/assets/audio/logged_in.mp3
😁JS file:
lots of 404 errors in the logs:
BTW, I was very impressed by the real time tracker on http://www.archivebot.com
It's crazy how much real time DATA it pushes to the browser very 0.5s! Incredible work 👍
The text was updated successfully, but these errors were encountered: