Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When reloading a ZIM, images are broken if their paths in the ZIM have changed #1287

Open
audiodude opened this issue Nov 17, 2024 · 10 comments

Comments

@audiodude
Copy link
Collaborator

This might be a weird edge case that isn't a problem in practice.

Steps to reproduce:

  1. Open a ZIM file created with mwoffliner
  2. Re-create the ZIM file using mwoffliner, the image paths have changed
  3. Go to "Configure" and select the "new" ZIM, at the same path and load it

Expected result

The ZIM is reloaded

Actual result

The ZIM appears to reload, at least the main page is shown, but images are broken.

@Jaifroid
Copy link
Member

@audiodude Thanks for the report. I suppose the issue is that we do a lot of caching for speed of access (which was a real problem on older devices), and the caching of assets (usually only CSS and JS) is stored under the ZIM name. If I recall correctly, we're not relying on the name of the file, but on the name in the metadata. If you substitute a ZIM archive with the exact same name in the metadata, or possibly (see below) the exact same filename, then some content will be drawn from the cache.

In practice, I suppose we'd never have two ZIMs with the same name and different content (in production, at least)?

There could be some old code left over that uses the ZIM filename as a cache key (not good, I know, but in the app's history there was a long period where we hadn't coded the part of the JS emulation that accesses the metadata -- we now do). But I rather suspect the file has both the same filename and the same name defined in metadata, and for one of those reasons, cached content is being used.

It's curious that images are affected, since we don't have specific code for caching images, though there is a lot of browser automatic caching... But it may be something else cached that is causing this rather than the images themselves.

Let me check if we're using the file name or only name defined in metadata in some cache keys, because that would be a concrete issue we could fix.

@audiodude
Copy link
Collaborator Author

But wouldn't caching work in the opposite way? If we reload a ZIM from the same path, where the contents have changed, wouldn't caching "help" show the images even if the paths in the ZIM have changed?

@Jaifroid
Copy link
Member

Well as I mentioned, the images themselves are not cached, as this is not the type of data that is re-used between pages. We only cache CSS and (in some configurations) JS. Of course caching may be a red herring here... Let's think of other possibilities....

If paths are changing, is it because we're getting rid of the old namespaces in Wikimedia ZIMs and upgrading to no-namespace (or C-namespace) ZIM format? Kiwix JS works with no-namespace (a.ka. Type 1) ZIMs (the old namespace ZIMs are Type 0). But we do depend on minorVersion to be correctly set in the ZIM's header as defined in the ZIM spec.

Is your new ZIM setting the minorVersion header correctly?

The app displays in console the detected minorVeresion (with devTools open and logging level set to All / including debug logs). This is what it shows with the latest publicly available English Wikipedia ZIM (A-namespace or Type 0):

Image

This is what it shows with another C-namespace (Type 1) ZIM (Stackexchange):

Image

@Jaifroid
Copy link
Member

Does the ZIM work when it is not replacing an identically named older ZIM? If it does, then we are dealing with some kind of caching issue. If it's simply not working, then there could be an issue with the new format and/or interaction with KJS's detection of format.

@Jaifroid
Copy link
Member

Jaifroid commented Nov 24, 2024

PS App has a Reset button in Configuration (right at bottom) which can be helpful. Also a Bypass AppCache setting for use when developing, which works best in Chrome/Chromium and with Dev Tools open and caching disabled in the Network tab. With Firefox, it's harder to block the browser's own caching.

@audiodude
Copy link
Collaborator Author

The behavior I'm seeing would be more consistent with "The page HTML is cached, and it has the old image paths in it".

The same ZIM works fine if I close the extension and load it from scratch.

@audiodude
Copy link
Collaborator Author

Also the ZIMs are created from the same version of mwoffliner, just with different mechanisms for calculating image URLs.

@Jaifroid
Copy link
Member

Ah OK, so no change of namespaces. OK, it does look like a caching issue, then. Oddly, we don't consciously cache page HTML from the ZIM in the browser extension, but it's possible something is nevertheless getting cached in a transient low-level Cache. I am beginning to suspect the memory cache used for faster access to assets and/or the very low-level filecache used for speeding up access to file slices. That cache is evacuated on a least-recently-used basis, so maybe not getting properly reset if the "same" ZIM is being loaded.

@audiodude
Copy link
Collaborator Author

In all honesty, I only filed the bug because when I saw the broken images, I thought something was wrong with my ZIM, but reloading from scratch fixed it. As mentioned, I'm pretty sure this is a scenario that won't come up for actual users. Feel free to close.

@Jaifroid
Copy link
Member

No worries -- the report is useful because it may indicate a problem with clearing of volatile caches when a ZIM is re-loaded. So I'll keep open for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants