-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Truly dynamic ads #118
Comments
Here is some Python code for investigating this issue: import requests
from urllib.parse import urlparse, parse_qs
from critrolesync import get_podcast_feed_from_id
episode_ids = ['C3E53']
user_agents = {
# my installed browsers
# 'chrome': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36',
# 'firefox': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/113.0',
# examples from the fake_useragent library
'chrome': 'Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/525.19 (KHTML, like Gecko) Chrome/1.0.154.59 Safari/525.19',
'firefox': 'Mozilla/5.0 (X11; U; Linux i686; pl-PL; rv:1.9.0.3) Gecko/2008092700 SUSE/3.0.3-2.2 Firefox/3.0.3',
}
def print_redirect_info(published_url, user_agent=None):
headers = None
if user_agent is not None:
headers = {'User-Agent': user_agent}
with requests.get(published_url, stream=True, headers=headers) as r:
print(f'USER-AGENT: {r.request.headers["User-Agent"]}')
print(f'REDIRECTED TO: {r.url}')
print(f'FILE: {urlparse(r.url).path.split("/")[-1]}')
print(f'BYTES: {parse_qs(r.url)["x-total-bytes"][0]}')
print()
for episode_id in episode_ids:
print(f'=== {episode_id} ===')
published_url = get_podcast_feed_from_id(episode_id)['URL']
print(f'PUBLISHED URL: {published_url}')
print()
print('--- PYTHON REQUESTS DEFAULT ---')
print_redirect_info(published_url)
print('--- CHROME ---')
print_redirect_info(published_url, user_agents['chrome'])
print('--- FIREFOX ---')
print_redirect_info(published_url, user_agents['firefox'])
print() The result I obtain (as of right now) is the following:
Note that the file size for each result is different. If I repeat this using the commented-out User-Agent strings (corresponding to my locally installed browsers), or if I simply remove "en-US" from the Chrome User-Agent string, the Firefox result remains the same but the Chrome result changes to be the same as the Firefox result. (EDIT: Just noticed that the example Firefox User-Agent string I was using here had "pl-PL" for Polish language.) When tested against most other episodes (e.g., C3E54), the Chrome and Firefox results are usually the same, but they always differ from the Python Requests library default user agent. |
Test ResultsWhen using the Python Requests library default user agent, the same version of each episode is always served, for all episodes, even when the IP address (and city) is changed using a VPN. The default user agent has been in use for all auto-syncing in the past, so these versions were likely used when obtaining timestamps. When using a Chrome or Firefox user agent, different MP3 versions are served for all episodes (that we care about) relative to the Python Requests library default agent. This means that all timestamps obtained using auto-syncing in the past are now inaccurate relative to MP3 versions served in a browser (and presumably in most podcast managers too). Re-auto-syncing using the default user agent should not help (still needs to be tested). When using a Chrome or Firefox user agent, keeping the User-Agent string the same but refreshing my VPN to obtain a new IP address, even in the same city, is enough for a random smattering of two dozen episodes to change versions in a matter of minutes. This means that even users using similar devices at the same time will, if their IP addresses differ, be served different versions for a minority of episodes. Consequently, there is no longer always a single solution to time conversion. Changing the User-Agent string from one browser to another while keeping the IP address fixed has similar results. This means that even on the same device, different MP3 versions may be served for a minority of episodes if the app is changed. In these tests with browser user agents, the minority of episodes affected seems to be randomly different each time. ConclusionCritRoleSync is sunk. OK, maybe it's not that grim, but I may have to accept significant syncing inaccuracies of up to a couple minutes going forward, which is both very unsatisfying and will make debugging issues much harder. |
In the last few days, I've discovered a new major problem for CritRoleSync: Advertisements have become much more dynamic than before (compare #6). This issue seems to apply only to the newest podcast feed (Critical Role; C2E20 and later).
Every time a user tries to stream or download a podcast episode, their device uses a URL published in the podcast feed to access an MP3 file. I discovered that this published URL always redirects to a different address, and -- here's the rub -- the new target URL changes frequently and serves up a different version of the file when it does. Different versions contain different ads and have different durations, making CritRoleSync's synchronization timestamps inconsistent for users and very difficult to debug.
I have tried investigating what factors may influence which version of the MP3 file is served to the user. Some episodes appear to be affected more regularly than others. Changing the User-Agent string used in the header of the HTTP request (which contains information about the web browser and operating system of the device) frequently affects which file version is served. Even when the User-Agent string is fixed, the file version sometimes changes for other reasons. I am guessing this may be related to how much time has passed since the last request, or perhaps it depends on the user's IP address or geographic location (which can be tested using a VPN). It's possible that, on top of these factors, there is randomization as well. Without insider information on how the file serving algorithm works, this is very difficult to analyze.
My automated GitHub Actions system for archiving the podcast feeds and checking for differences (
archive-podcast-feeds.yml
) can detect changes in the published URL. However, the URL redirection changes I'm describing here are entirely opaque to this automated system since they happen only when the user tries to access an episode.All of this is very bad news for CritRoleSync. Unlike #6, where ads seemed to be changed systematically for all users, only on rare occasions, and always with podcast feed updates indicating the changes in published URLs and durations, this new issue is much more problematic for CritRoleSync. If different versions of the MP3 files with different ads and durations are being served to different users at the same time, there may be no way for CritRoleSync to predict which version a user is listening to, and so there will be no way to provide precise synchronization timestamps. Approximate timestamps could likely still be provided, with an inaccuracy of perhaps a couple minutes.
The text was updated successfully, but these errors were encountered: