Cache ZIM metadata on library-gen? #209

rgaudin · 2024-06-26T08:38:02Z

Currently, the library generator script which is used both for library and dev-library (different source folders) spends most of its time reading metadata from ZIM files on the filesystem.

On library, this is ~6,800 files. This can be completed within ~6mn but if the disk is busy (reminder: the server is using mechanical drives), this can take 3 hours.

This script is ran every 30mn on library and every 10mn for dev-library.

While this will all be obsolete once the CMS takes over, a quick and easy improvement would be to cache this information and only read metadata for new files. It's actually already cached (in previously written library xml) so it's just a matter of skipping/reusing data for existing entries.

The only drawback is that it wont update metadata of a file that has been overwritten but that's already a scenario we've excluded and we could implement a simple file-flag that triggers a full re-read if present.

kelson42 · 2024-06-26T11:44:50Z

I don't think we should do anything in the meantime (before CMS is published) but otherwise I would recommend to save in the libary a kind of publishing date (see this comment: kiwix/libkiwix#702 (comment)) which would be the same as the ZIM file last modified date. Based on the comparison, I would use the last library.xml as cache if the file has not been renewed.

rgaudin · 2024-06-26T11:57:59Z

Yes, the problem being the XML file is public so we should not come up with anything ourselves and wait for that libkiwix ticket first…

Let's keep that ticket open as an option until the CMS arrives or something else pressures us to do it.

rgaudin added the question Further information is requested label Jun 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache ZIM metadata on library-gen? #209

Cache ZIM metadata on library-gen? #209

rgaudin commented Jun 26, 2024

kelson42 commented Jun 26, 2024

rgaudin commented Jun 26, 2024

Cache ZIM metadata on library-gen? #209

Cache ZIM metadata on library-gen? #209

Comments

rgaudin commented Jun 26, 2024

kelson42 commented Jun 26, 2024

rgaudin commented Jun 26, 2024