Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experience when working on a fix for time-travel bug in xtradb operator #8

Open
laphets opened this issue Jun 18, 2021 · 7 comments
Open
Assignees

Comments

@laphets
Copy link
Collaborator

laphets commented Jun 18, 2021

When I was working on to fix K8SPXC-725 and K8SPXC-763 , I have met the following two issues:

Firstly, sieve saves a snapshot for all the crd config files for a certain operator, and those config files can be easily divergent from the upstream. For example, when I tried to run my fixed xtradb operator again for the same workload, since my fix is checkouted from the main branch, which is the latest version of the xtradb operator, but the crd config files inside sieve is outdated for a previous version, and I need to spend some efforts to make those crd file stored in sieve sync with the upstream so that the xtradb operators can be successfully set up with the matched crd configs.

And this issues will also occur if developers want to replay bugs for their own version operators.


Secondly, also for xtradb, they have some tricky logic to specify an init image (image for some init container) for the xtradbcluster. Basically, if the version specified for xtradbcluster is same as the operator, the init image will be same as the operator image, e.g. xxx/xtradb-operator:time-travel

However, if the version for xtradbcluster (e.g. 1.7) is different from the operator, the init image will be xxx/xtradb-operator:1.7

And it is obvious that we do not build such image as xxx/xtradb-operator:1.7.

In that case, we also need to update all the workload’s config files (which specifies the version of xtradbcluster) to catch up with the version of the operator, so that the init image will be assigned same as the operator, otherwise, we may get some image pulling 404 error.

@tianyin
Copy link
Member

tianyin commented Jun 18, 2021

Firstly, sieve saves a snapshot for all the crd config files for a certain operator

Yes. Basically you can only find bugs in a specific version, while the upstream is constantly evolving.

And this issues will also occur if developers want to replay bugs for their own version operators.

I think we should just tell them the version we find the bug.

It is possible that the new version does not even have the bug.

Basically, we don't have a fully automated way to test a given operator. If that's the case, then the problem will remain.

@tianyin
Copy link
Member

tianyin commented Jun 18, 2021

In that case, we also need to update all the workload’s config files

So, I guess the fundamental issue is the manual effort one needs to pay to use Sieve.

Is there any ways to quantify the work?

@lalithsuresh
Copy link
Collaborator

lalithsuresh commented Jun 18, 2021

@laphets this is why I think the test cases, CRD/manifests and workloads should absolutely /not/ be part of Sieve like it is right now. Developers should be able to use their own version control for these things, and run Sieve against their code bases as is (even if it is not from a commit).

@laphets
Copy link
Collaborator Author

laphets commented Jun 19, 2021

In that case, we also need to update all the workload’s config files

So, I guess the fundamental issue is the manual effort one needs to pay to use Sieve.

Is there any ways to quantify the work?

Basically, they may need go through steps stated in port.md again for a different version / sha operator they want to test on. And they may also need to update workload related resource config files (we should clarify this) to fit for a different version operator (this is tricky, because those file need only be updated if there is any compatibility issue between different version of operator and workload resource, but this compatibility issue may not always be obvious).

@marshtompsxd
Copy link
Member

@laphets this is why I think the test cases, CRD/manifests and workloads should absolutely /not/ be part of Sieve like it is right now. Developers should be able to use their own version control for these things, and run Sieve against their code bases as is (even if it is not from a commit).

Agree. We currently keep the test cases and configs in the project just to make it easier to reproduce the bugs (especially in the final artifact evaluation phase). The final release of Sieve should only provide several interfaces to reuse/write test workloads. The current test cases and configs can be part of example if we want.

@marshtompsxd
Copy link
Member

In that case, we also need to update all the workload’s config files

So, I guess the fundamental issue is the manual effort one needs to pay to use Sieve.

Is there any ways to quantify the work?

To port a new operator, a developer needs to basically 1) add a few entries into controllers.py and 2) provide some necessary configs (like the CRD yaml files) and modify a few lines.
To write a new test, a developer needs to add a few lines into controllers.py and workloads.py.
We can basically count how many lines should be added/modified during this process, and the number should be very reasonable.

@tianyin
Copy link
Member

tianyin commented Jun 19, 2021

We currently keep the test cases and configs in the project just to make it easier to reproduce the bugs (especially in the final artifact evaluation phase).

I think the best way to do it is to move them to other repos in this org.

Ideally, this repo is only the sieve tool. And for each operator, it could be a repo in which sieve is a git submodule.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants