Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallel restraint #218

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
Open

parallel restraint #218

wants to merge 12 commits into from

Conversation

qqfang97
Copy link

In short, let the tests run in parallel.

qqfang97 and others added 12 commits March 17, 2021 11:58
There are some tasks have to run in order at first or at the end.
So I introduce three lists, pre/parallel/post in recipe running phase.
Add PARALLEL parameter to put tasks in corresponding list.
The default value to pre, so xml without any this param will no change.
Round 1, the pre list tasks will run in order. like config/setup tasks.
Round 2, the parallel list tasks will parallel execution.
Round 3, the post list tasks run in order too. like reservesys, tasks
cannot run in parallel

Signed-off-by: Ping Fang <[email protected]>
Use glib thread pool launch nproc thread worker to execution
task_handler. Build per-thread data structure pass it to handler
 nstead of app_data.

Signed-off-by: Ping Fang <[email protected]>
Once tasks parallel execute, we can't use single app_data to track
all tasks. The per-task index used to transfer between restraintd
and rstrnt_xxx.

Signed-off-by: Ping Fang <[email protected]>
load_eval function will compute currently system load through
/proc/loadavg, /proc/stat, /proc/meminfo. This is used to prevent
local watchdog timeout caused by other parallel tasks.

Signed-off-by: Ping Fang <[email protected]>
The local server need task index to identify requester.
Add get_nproc get max available thread number.

Signed-off-by: Ping Fang <[email protected]>
Replace all internal interface, pass per-thread task data structure.
Instead of use a single default main loop, create their own main loop
foreach thread. The thread loop remove and reattach event source until no
more tasks, then stop loop.

Signed-off-by: Ping Fang <[email protected]>
Sometimes parallel execution will produce some presure or some stress testing
parallel running. By evaluating system load, extend local timer needed 15 min
at most 3 times to avoid unexpected timeout. There is only one external/remote
watchdog in beaker side. We update it only when requested time longer
then currently remain time.

Signed-off-by: Ping Fang <[email protected]>
There maybe some tasks will conflict with others. For example,
those tasks need to modify sysfs. The restraint will check this
keyword and only allow one conflicted task execute at a time.
There is a reserved keyword zreboot used for zstream.

Signed-off-by: Ping Fang <[email protected]>
free these three lists at the recipe cleanup

Signed-off-by: Ping Fang <[email protected]>
Add mutex to protect race condition of conflict keywords

Signed-off-by: Ping Fang <[email protected]>
@lgtm-com
Copy link

lgtm-com bot commented May 18, 2021

This pull request introduces 2 alerts when merging b2d368a into cf36cbc - view on LGTM.com

new alerts:

  • 1 for Local variable hides global variable
  • 1 for FIXME comment

@StykMartin
Copy link
Contributor

In short, this PR should be REJECTED.

Long version now

  • PR is not descriptive at all. Hard to tell what is the intention of this change in depth.
  • Code doesn't compile
  • You added 0 new tests for this functionality and this is a game changer for Restraint

Basically right now this patch would do more harm than good to Restraint maintainers.
Feel free to address the issues and we will gladly review them again. And as always, feel free to ask us any questions, I enabled discussions.

Copy link
Contributor

@StykMartin StykMartin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment above ^

@StykMartin StykMartin requested a review from cbouchar May 18, 2021 08:19
@qqfang97
Copy link
Author

The original intention of this improvement is to improve the test efficiency.
To achieve parallel restraint, I made the following changes.

  1. Introduce three task lists: pre(run first in order), parallel(parallel execution), post(run last in order).
  2. Create multiple woker threads and introduce a new data structure to save thread info.
  3. The worker thread fetch task from the list, initialize thread info.
  4. Change all interface pass thread info instead of app_data(can only track one task).
  5. Go through the original process, check config, install pkg, parse metadata.....
  6. Repeat three times for three list, pre 1 thread, parallel N threads, post 1 thread.
  7. Add new enviroment variable RSTRNT_INDEX to specific task during comunicate with outside like rstrnt_xxxx.
  8. Add new xml task parameter RSTRNT_PARALLEL to specific which list it belongs to.
    image
    The timestamp shows that tasks executed sametime.

@cbouchar
Copy link
Contributor

I agree with Martin. This is a high risk change which I voiced concerns about a long time ago. Even the smallest of changes may work fine for one user but fail miserably for other users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants