Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly handle test zim files with git-lfs #2

Open
mgautierfr opened this issue Jan 13, 2021 · 7 comments
Open

Properly handle test zim files with git-lfs #2

mgautierfr opened this issue Jan 13, 2021 · 7 comments
Assignees

Comments

@mgautierfr
Copy link
Collaborator

For now we have a bunch of test/reference zim file directly in the libzim repository. All those files represent around 33Mb and ~1,4Mb is updated time to time.

We don't want that :

  • We don't want to have test data to change. They are somehow a reference and libzim must handle "correctly" previous version of test data.
  • We pollute our git history. Git is not good to manage binary data.
  • We will add more test data with time.

I'm open to idea here, but I propose :

  • Create another repository containing some reference data and script generating the test/reference zims.
  • CI will run on this repository and will generate a archive containing all test data and some metadata (which zims are valid or not. What kind of error to expect. Some reference data associated to zims files to check reading,. ...)
  • Other projects (libzim, but also zim-tools, kiwix-*, ...) will download this archive (potentially configurable to point to a specific data directory) and will test using the zim files in the archive.

Tests would be designed to be extensible (add/change zim files) without having to change the tests itself (auto-discover test files, from filename or metadata information)

@kelson42
Copy link

kelson42 commented Jan 13, 2021

@mgautierfr git-lfs seems to be a good solution, but the free storage on Github is limited to 1GB and the bandwidth to 1GB /month. Any channce we could stay within the free plan limit?

@mgautierfr
Copy link
Collaborator Author

mgautierfr commented Jan 20, 2021

We can store the generated data using git-lfs instead using a archive yes.

The data is not intended to change. If we want to test a new point, we will add data, not change it. Except if we put big zim file we should stay under 1GB for a (long) time.
About the bandwidth, depends of several things:

  • The dev/user. Ideally, they should set the data repository only once and make dev repository point to the data repository. But they may do differently and download several times the data.
  • The CI. The data repository will be setup by all jobs in our CI. If the CI is not taken into account in the bandwidth limit we are good. But else we will hit the limit pretty quickly.

@kelson42
Copy link

@mgautierfr OK, I think we should carry on.

@kelson42 kelson42 changed the title Properly handle test zim files Properly handle test zim files with git-lfs Feb 13, 2021
@Deburama1
Copy link

Hello, I am new to open source contribution, how can I contribute to this code as a first issue.

@kelson42
Copy link

kelson42 commented Mar 4, 2021

@Deburama1 Hi and welcome, unfortunately there is nothing much we can do if you don't know how to fix the issue.

@kelson42
Copy link

@mgautierfr What should we do with this ticket? Close it? Move it to zim/zim-test-suite?

@mgautierfr
Copy link
Collaborator Author

We can move it to zim-test-suite yes.

@mgautierfr mgautierfr transferred this issue from openzim/libzim Apr 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants