Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create new commands that would build the docker images and run the e2e tests separately #9630

Open
tatilepizs opened this issue Nov 7, 2024 · 21 comments
Assignees
Labels
Type: Technical issue Improve something that users won't notice

Comments

@tatilepizs
Copy link
Contributor

Describe the issue
When we are creating new e2e tests or debugging the existing ones (and not modifying the cht-core code), it is really frustrating that every time we run the e2e tests we need to download and build completely the docker images. It takes around 2 minutes for a test to start running, which is a lot of time when we are debugging, for example, and just need to try small things in each iteration.

Describe the improvement you'd like
We would like to create two new commands, one that builds the docker images and one that only executes the wdio e2e tests.

The first command, the one that build the images, would be in charge of deleting always the images associated at the branch (if they exist) and then creating them. if we remove the ${buildTime} from this line we can use the same images every time that we run the tests using the same branch, and by deleting the images at the beginning we would make sure that we are using the correct images, in case something changes in the cht-core code.
The second command would be in charge of only executing the e2e tests, and since everything will be already built, it should just take seconds to start seeing the execution of the test.

We will also leave the command wdio-local that would build and execute, just as it is right now.

Describe alternatives you've considered

@tatilepizs tatilepizs added the Type: Technical issue Improve something that users won't notice label Nov 7, 2024
@tatilepizs tatilepizs self-assigned this Nov 7, 2024
@tatilepizs
Copy link
Contributor Author

@dianabarsan, I would love to hear your thoughts about this idea. Do you think that the approach that I am describing is feasible? Is there something else that I should consider?

cc: @jkuester

@dianabarsan
Copy link
Member

I think I already had a branch that did this. I'll share.

@dianabarsan
Copy link
Member

Ok, it's not a branch, because you don't need new commands.

All you need to do is:

  • run local tests so images get created
  • open tests/cht-core-test.yml and see the image tag. it's going to look like image: medicmobile/cht-haproxy:4.14.0-dev.1731328820240. Take the tag and export it as VERSION. in my example:
    export VERISON=4.14.0-dev.1731328820240
  • run the CI command for the same test, instead of the local one (so, instead of npm run wdio-local you would run npm run ci-webdriver-default

@jkuester
Copy link
Contributor

@dianabarsan is there a compelling reason that we need to timestamp the image version? The main benefit I can think of is that it helps ensure we don't locally run the e2e tests against a stale version of the docker image. Is there something else I am missing?

The downsides of timestamping the image version include:

  • As a developer, it is just continuously pumping new docker images onto my local system. I assume the docker layering allows these near-identical images to exist with very little extra disk space, but it is still annoying.
  • For QA engineers (or anyone else trying to re-run the e2e tests without rebuilding images), it requires the manual step of checking the current image version and then setting the VERSION envar.

IMHO, re-running the e2e tests without rebuilding the images should be a workflow with top-level support. As a developer I am frequently re-running tests while writing them. It would be easier to script this if the image VERSION was deterministic (and not time-stamped). However, even if we keep the time-stamping, I still think it would be valuable to have an npm script that would programatically set the VERSION when necessary.

@dianabarsan
Copy link
Member

dianabarsan commented Nov 13, 2024

re-running the e2e tests without rebuilding the images should be a workflow with top-level support

Sure, I agree.

If we neither timestamp the images NOR rebuild the images every time we run tests, then the risk of running tests over unknown images is really high. How do you even know what's in the image? Do you start the container and inspect it to see what changes are in it? Especially if you run multiple git worktrees and run tests in parallel.

This is one argument for timestamping, that if one uses different worktrees (like me), then you will have different images for them - which you can reuse with the VERSION env. Now, the worktrees don't necessarily need a timestamp (though this is most correct), we could use a hash of the local folder path instead to uniquely identify the reusable worktree images.

It's a shame there is no "only build if changed" docker command :(

I still think it would be valuable to have an npm script that would programatically set the VERSION when necessary.

What can be done is:

  • set the VERSION env variable
  • run the local e2e test that builds images. because of the VERSION env variable you set before, the image name will be deterministic.
  • for subsequent tests, run the ci e2e test commands.
export VERSION=mytest
npm run integration-all-local
npm run ci-integration-all
npm run ci-integration-all
npm run ci-integration-all
npm run ci-integration-all
npm run ci-integration-all

This way you don't need to fish around get timestamps from generated docker compose files.

@dianabarsan
Copy link
Member

dianabarsan commented Nov 13, 2024

What if .... instead of timestamps we actually used the local checked out branch name? That way the version is deterministic, we can include it in npm scripts and it will also work with different worktrees or when working on multiple branches in parallel?

VERSION=$(git branch --show-current) npm run integration-all-local
VERSION=$(git branch --show-current) npm run ci-integration-all

And these can be our npm commands for local runs.

@jkuester
Copy link
Contributor

If we neither timestamp the images NOR rebuild the images every time we run tests, then the risk of running tests over unknown images is really high.

Agreed. My vote would be that the behavior of wdio-local (and similar commands) remains the same, where the images are rebuilt for each run. Then, we can just use the ci-webdriver-default commands to run the tests without rebuilding images (or add new commands if there is some reason the ci ones will not work).

Ultimately it is up to the developer to be aware of the current state of their repo/images, but this seems to offer a good balance of a safe default happy path where everything gets built while allowing for a shortcut when someone knows what they are doing....

What if .... instead of timestamps we actually used the local checked out branch name?

This is my thought exactly! Specifically, my proposal is that we update the get-version.js script code so that the default version value is the current git branch name. Then, we update the ci test commands to include:

    "ci-webdriver-default": "export VERSION=${VERSION:-$(node ./scripts/build/get-version.js)} wdio run ./tests/e2e/default/wdio.conf.js",

This way, when the VERSION envar is provided to the ci commands (e.g. by the GitHub actions) that value will be used. But, when there is no VERSION, it will call the get-version script and use the default value (aka the branch name).

@tatilepizs what are your thoughts here?

@tatilepizs
Copy link
Contributor Author

Thank you both for this discussion, I like the idea of using the branch name for the images, that would help a lot.
But, I was trying the steps that Diana suggested here and they are not working for me, after I run npm run ci-webdriver-default it gets retrying and never run the tests, it just shows this message:

API check failed, trying again in 1 second
AggregateError
Checking API

Here is the video of the steps:

Screen.Recording.2024-11-13.at.11.03.09.am.mov

@dianabarsan is there something I am missing?

@dianabarsan
Copy link
Member

It looks like your docker is trying to pull the images and fails?
Can you please post logs @tatilepizs ? It's easier to follow than a video. I wonder if this is related to docker desktop.

@tatilepizs
Copy link
Contributor Author

tatilepizs commented Nov 13, 2024

This is all I can see:

npm run ci-webdriver-default

> [email protected] ci-webdriver-default
> wdio run ./tests/e2e/default/wdio.conf.js


Execution of 1 workers started at 2024-11-13T22:51:30.751Z

(node:12628) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
time="2024-11-13T16:51:32-06:00" level=warning msg="The \"COUCHDB_UUID\" variable is not set. Defaulting to a blank string."

time="2024-11-13T16:51:32-06:00" level=warning msg="The \"COUCHDB_UUID\" variable is not set. Defaulting to a blank string."
time="2024-11-13T16:51:32-06:00" level=warning msg="The \"COUCHDB_UUID\" variable is not set. Defaulting to a blank string."

 Volume cht-e2e_cht-ssl  Removing

 Volume cht-e2e_cht-credentials  Removing

 Volume cht-e2e_cht-ssl  Removed
 Volume cht-e2e_cht-credentials  Removed

time="2024-11-13T16:51:32-06:00" level=warning msg="The \"COUCHDB_UUID\" variable is not set. Defaulting to a blank string."

time="2024-11-13T16:51:32-06:00" level=warning msg="The \"COUCHDB_UUID\" variable is not set. Defaulting to a blank string."
time="2024-11-13T16:51:32-06:00" level=warning msg="The \"COUCHDB_UUID\" variable is not set. Defaulting to a blank string."

 nginx Pulling 
 api Pulling 
 couchdb-2.local Pulling 
 couchdb-3.local Pulling 

 couchdb-1.local Pulling 
 healthcheck Pulling 
 sentinel Pulling 
 haproxy Pulling 

 sentinel Error manifest for medicmobile/cht-sentinel:4.14.0-dev.1731538290620 not found: manifest unknown: manifest unknown

 haproxy Error context canceled
 nginx Error context canceled
 healthcheck Error context canceled
 couchdb-1.local Error context canceled
 api Error context canceled
 couchdb-2.local Error context canceled

 couchdb-3.local Error context canceled

Error response from daemon: manifest for medicmobile/cht-sentinel:4.14.0-dev.1731538290620 not found: manifest unknown: manifest unknown

time="2024-11-13T16:51:34-06:00" level=warning msg="The \"COUCHDB_UUID\" variable is not set. Defaulting to a blank string."

time="2024-11-13T16:51:34-06:00" level=warning msg="The \"COUCHDB_UUID\" variable is not set. Defaulting to a blank string."

time="2024-11-13T16:51:34-06:00" level=warning msg="The \"COUCHDB_UUID\" variable is not set. Defaulting to a blank string."

Checking API
(node:12628) Warning: Setting the NODE_TLS_REJECT_UNAUTHORIZED environment variable to '0' makes TLS connections and HTTPS requests insecure by disabling certificate verification.
API check failed, trying again in 1 second
AggregateError
Checking API
API check failed, trying again in 1 second
AggregateError
Checking API
API check failed, trying again in 1 second
...

@dianabarsan
Copy link
Member

Oh, I think I saw what the problem was, in the video.

Your exported variable is "VERISON" not "VERSION" :)

@dianabarsan
Copy link
Member

dianabarsan commented Nov 14, 2024

Excerpt from the video.
Image

@tatilepizs
Copy link
Contributor Author

Oh no 🤦 I am sorry about this.
Thanks Diana

@dianabarsan
Copy link
Member

Did it work without the typo?

@tatilepizs
Copy link
Contributor Author

Yes, it didn't download the images, thank you 😊

I was thinking if there is a way to leave the containers up and running the first time, so the next time, it just uses the containers that are already up and doesn't need to create the containers again, that would be awesome because the tests will be running in just a couple of seconds

@dianabarsan
Copy link
Member

@tatilepizs I used to leave containers running for further testing, but @jkuester pushed a commit that killed them :)

cf4a1dd

@tatilepizs
Copy link
Contributor Author

😅 If I cancel the process, I will have the containers up, but when I run the command npm run ci-webdriver-default it will stop and remove all the containers and then create them again

 npm run ci-webdriver-default             

> [email protected] ci-webdriver-default
> wdio run ./tests/e2e/default/wdio.conf.js


Execution of 1 workers started at 2024-11-15T17:16:40.386Z

(node:91316) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
time="2024-11-15T11:16:42-06:00" level=warning msg="The \"COUCHDB_UUID\" variable is not set. Defaulting to a blank string."
time="2024-11-15T11:16:42-06:00" level=warning msg="The \"COUCHDB_UUID\" variable is not set. Defaulting to a blank string."
time="2024-11-15T11:16:42-06:00" level=warning msg="The \"COUCHDB_UUID\" variable is not set. Defaulting to a blank string."

 Container cht-e2e-healthcheck-1  Stopping
 Container cht-e2e-couchdb-2.local-1  Stopping
 Container cht-e2e-sentinel-1  Stopping
 Container cht-e2e-couchdb-3.local-1  Stopping
 Container cht-e2e-couchdb-1.local-1  Stopping
 Container cht-e2e-nginx-1  Stopping
 Container cht-e2e-sentinel-1  Stopped
 Container cht-e2e-sentinel-1  Removing
 Container cht-e2e-sentinel-1  Removed
 Container cht-e2e-healthcheck-1  Stopped
 Container cht-e2e-healthcheck-1  Removing
 Container cht-e2e-healthcheck-1  Removed
 Container cht-e2e-couchdb-2.local-1  Stopped
 Container cht-e2e-couchdb-2.local-1  Removing
 Container cht-e2e-couchdb-2.local-1  Removed
 Container cht-e2e-couchdb-1.local-1  Stopped
 Container cht-e2e-couchdb-1.local-1  Removing
 Container cht-e2e-couchdb-1.local-1  Removed
 Container cht-e2e-couchdb-3.local-1  Stopped
 Container cht-e2e-couchdb-3.local-1  Removing
 Container cht-e2e-couchdb-3.local-1  Removed
 Container cht-e2e-nginx-1  Stopped
 Container cht-e2e-nginx-1  Removing
 Container cht-e2e-nginx-1  Removed
 Container cht-e2e-api-1  Stopping
 Container cht-e2e-api-1  Stopped
 Container cht-e2e-api-1  Removing
 Container cht-e2e-api-1  Removed
 Container cht-e2e-haproxy-1  Stopping
 Container cht-e2e-haproxy-1  Stopped
 Container cht-e2e-haproxy-1  Removing
 Container cht-e2e-haproxy-1  Removed
 Volume cht-e2e_cht-ssl  Removing
 Volume cht-e2e_cht-credentials  Removing
 Network cht-net-e2e  Removing
 Volume cht-e2e_cht-credentials  Removed
 Volume cht-e2e_cht-ssl  Removed
 Network cht-net-e2e  Removed

time="2024-11-15T11:16:47-06:00" level=warning msg="The \"COUCHDB_UUID\" variable is not set. Defaulting to a blank string."
time="2024-11-15T11:16:47-06:00" level=warning msg="The \"COUCHDB_UUID\" variable is not set. Defaulting to a blank string."
time="2024-11-15T11:16:47-06:00" level=warning msg="The \"COUCHDB_UUID\" variable is not set. Defaulting to a blank string."

 Network cht-net-e2e  Creating
 Network cht-net-e2e  Created
 Volume "cht-e2e_cht-ssl"  Creating
 Volume "cht-e2e_cht-ssl"  Created
 Volume "cht-e2e_cht-credentials"  Creating
 Volume "cht-e2e_cht-credentials"  Created
 Container cht-e2e-haproxy-1  Creating
 Container cht-e2e-couchdb-1.local-1  Creating
 Container cht-e2e-healthcheck-1  Creating
 Container cht-e2e-couchdb-2.local-1  Creating
 Container cht-e2e-couchdb-3.local-1  Creating
 Container cht-e2e-haproxy-1  Created
 Container cht-e2e-sentinel-1  Creating
 Container cht-e2e-api-1  Creating
 Container cht-e2e-healthcheck-1  Created
 Container cht-e2e-couchdb-2.local-1  Created
 Container cht-e2e-couchdb-3.local-1  Created
 Container cht-e2e-couchdb-1.local-1  Created
 Container cht-e2e-api-1  Created
 Container cht-e2e-nginx-1  Creating
 Container cht-e2e-sentinel-1  Created
 Container cht-e2e-nginx-1  Created
 Container cht-e2e-couchdb-2.local-1  Starting
 Container cht-e2e-couchdb-1.local-1  Starting
 Container cht-e2e-healthcheck-1  Starting
 Container cht-e2e-couchdb-3.local-1  Starting
 Container cht-e2e-haproxy-1  Starting
 Container cht-e2e-healthcheck-1  Started
 Container cht-e2e-couchdb-2.local-1  Started
 Container cht-e2e-couchdb-1.local-1  Started
 Container cht-e2e-haproxy-1  Started
 Container cht-e2e-sentinel-1  Starting
 Container cht-e2e-api-1  Starting
 Container cht-e2e-couchdb-3.local-1  Started
 Container cht-e2e-sentinel-1  Started
 Container cht-e2e-api-1  Started
 Container cht-e2e-nginx-1  Starting
 Container cht-e2e-nginx-1  Started
...

@jkuester
Copy link
Contributor

@jkuester pushed a commit that killed them :)

😅 💀

I was thinking if there is a way to leave the containers up and running the first time, so the next time, it just uses the containers that are already up and doesn't need to create the containers again

This is where I would draw the line and say it is not a good idea to add first-class support for this workflow! 😅 IMHO, you need to drop all the containers and reset things between test runs to better guarantee the reproducibility of test results. The docker images are static and cannot be affected by things that happen during a test run. But the containers are not! It might be possible (or likely?) that the state of the CHT instance in the containers is slightly different after a test run. Then subsequent test runs would not be running with the exact same environment as a fresh run.

@tatilepizs
Copy link
Contributor Author

🤔 hmm, you are right Josh, I didn't think about that

@dianabarsan
Copy link
Member

@tatilepizs are you happy with the :

VERSION=$(git branch --show-current) npm run integration-all-local
VERSION=$(git branch --show-current) npm run ci-integration-all

solution?

@tatilepizs
Copy link
Contributor Author

Yes, I think it will be a great improvement to the way that we execute the tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Technical issue Improve something that users won't notice
Projects
Status: Todo
Development

No branches or pull requests

3 participants