You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the library generator script which is used both for library and dev-library (different source folders) spends most of its time reading metadata from ZIM files on the filesystem.
On library, this is ~6,800 files. This can be completed within ~6mn but if the disk is busy (reminder: the server is using mechanical drives), this can take 3 hours.
This script is ran every 30mn on library and every 10mn for dev-library.
While this will all be obsolete once the CMS takes over, a quick and easy improvement would be to cache this information and only read metadata for new files. It's actually already cached (in previously written library xml) so it's just a matter of skipping/reusing data for existing entries.
The only drawback is that it wont update metadata of a file that has been overwritten but that's already a scenario we've excluded and we could implement a simple file-flag that triggers a full re-read if present.
The text was updated successfully, but these errors were encountered:
I don't think we should do anything in the meantime (before CMS is published) but otherwise I would recommend to save in the libary a kind of publishing date (see this comment: kiwix/libkiwix#702 (comment)) which would be the same as the ZIM file last modified date. Based on the comparison, I would use the last library.xml as cache if the file has not been renewed.
Currently, the library generator script which is used both for library and dev-library (different source folders) spends most of its time reading metadata from ZIM files on the filesystem.
On library, this is ~6,800 files. This can be completed within ~6mn but if the disk is busy (reminder: the server is using mechanical drives), this can take 3 hours.
This script is ran every 30mn on library and every 10mn for dev-library.
While this will all be obsolete once the CMS takes over, a quick and easy improvement would be to cache this information and only read metadata for new files. It's actually already cached (in previously written library xml) so it's just a matter of skipping/reusing data for existing entries.
The only drawback is that it wont update metadata of a file that has been overwritten but that's already a scenario we've excluded and we could implement a simple file-flag that triggers a full re-read if present.
The text was updated successfully, but these errors were encountered: