Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regenerated jcstress playlist for jcstress and pushed its generator #5317

Merged
merged 41 commits into from
Jul 31, 2024

Conversation

judovana
Copy link
Contributor

The initial set of tests is obtained by jcstress.jar -l The generator is:
mixing targets with to few targets
ensuring no test is run twice
targets are as short as possible where still fully descriptive
regexes for -t runtime are fully quified

Similarly as there is exclude list to not shorten groups where missleading preffix would remain, can be added include list, which would connect remaining tests with 1 or two targets to artificial group. But it was jsut 23 tests in time of this commit

@judovana judovana marked this pull request as draft May 14, 2024 13:05
@judovana
Copy link
Contributor Author

Will resolve #5278 if (ever) agreed on

@judovana judovana force-pushed the jcstressGeneratorExperiemnt branch from 97156ff to ff5c02f Compare May 14, 2024 13:09
@judovana judovana changed the title Regeenrated jcstress palylist for jcstress and pushed its generator Regenerated jcstress palylist for jcstress and pushed its generator May 14, 2024
@judovana judovana marked this pull request as ready for review May 28, 2024 16:35
@judovana
Copy link
Contributor Author

It seems the jcstress issue got staled a bit. Opening from draft for review.

Copy link
Contributor

@smlambert smlambert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a .md file added into system/jcstress directory to describe how to run the generator, and capture the general gist of the tests created in the playlist.

Also please spellcheck all of your comments for 'typing too fast' mistakes like memebrs, namesapces, etc.

@smlambert smlambert changed the title Regenerated jcstress palylist for jcstress and pushed its generator Regenerated jcstress playlist for jcstress and pushed its generator Jun 6, 2024
The initial set of tests is obtained by `jcstress.jar -l`
The generator is:
  mixing targets with  to few targets
  ensuring no test is run twice
  targets are as short as possible where still fully descriptive
  regexes for -t runtime are fully quified

Similarly as there is exclude list to not shorten groups where
missleading preffix would remain, can be added include list, which would
connect remaining tests with 1 or two targets to artificial group. But
it was jsut 23 tests in time of this commit
@judovana judovana force-pushed the jcstressGeneratorExperiemnt branch from ff5c02f to 0498a0f Compare June 6, 2024 13:39
judovana added 10 commits June 6, 2024 17:24
* more strict checksum
* merging of groups go until it really have nothiong to do
* except natural groups, also artificial are created
* some control env variables like verbose, limit and jsut_regex to check
  times
* rearranged code
* it can launch the jcstres on all targets it generated, and meassure
  times
* on my machine, in single core mode, it is 20+-1minutes per target (in all
  cores mode (8 cores) it is 13 hours per target (not even one
finished(yet)
* fixed selector generator to contains or always
Now spliting works as expected. On two cores:
splitting by 100, 54 groups 4hours each
splitting by 2000 7 groups 1.5 day each
no splitting  11.5 days total

Oscialtion to be calculated
@judovana
Copy link
Contributor Author

judovana commented Jun 9, 2024

Hello!

In past week, I was playing a lot with the generator, and before we proceed with play-list generation, I think there is quite a few factors to consider.

  • experinece: in my runs in RH, the stadnart runtime of all between jcstress2020 and 2024 got aprox 100x times longer
    • possible solutions is to not run some targets and/or limit the number of cores
    • important note is, that limiting to 1 core is no go, because it eliminates to much tests.
  • The runtime is quite rising with the number of allowed cores, so keeping on default ( using all machine cores ) leads to incredibly unpredictable results.
    • So I have added the optional -c switch to have predictable number of cores generated. Eg -c 2 will mean that all machines will use two cores.
    • See eg this results I generated .
 1 core : Total time: 9 minutes [0d+00:09:35]
 2 cores: Total time: 16458 minutes [11d+10:18:34]
 3 cores: Total time: 33695 minutes [23d+09:35:21]
 4 cores: Total time: 149339 minutes [103d+16:59:21]
 8 cores: Total time: 149339 minutes [103d+16:59:20] 

not sure why 4 and 8 are same, My machine have 4cores with hyper threading, so 8 virtual. Maybe the jcstress is using only real ones or it estimated 103days as to much . Anyway Imagine it on 100core Arm...

  • the single jcstress test do not have constant length.
  • When I did some basic grouping, I realised, that some group with eg 100 tests runs much longer then some group with 2000 tests.
    • it is deterministic
    • It seems that generated tests are those fast ones
    • It is well visible on following figure:
Results gathered: 14; 100% time of longest group, n% time of ideal group
small.groups.1 with 853tests took 117791s [1d+08:43:11] (100%)(+66%)
org.openjdk.jcstress.tests.memeffects.basic-001 with 816tests took 117504s [1d+08:38:24] (99%)(+66%)
org.openjdk.jcstress.tests.atomics-001 with 694tests took 98783s [1d+03:26:23] (83%)(+40%)
org.openjdk.jcstress.tests.acqrel.varHandles.byteBuffer-001 with 672tests took 96767s [1d+02:52:47] (82%)(+37%)
org.openjdk.jcstress.tests.locks-001 with 646tests took 93023s [1d+01:50:23] (78%)(+31%)
org.openjdk.jcstress.tests.acqrel-001 with 588tests took 84671s [0d+23:31:11] (71%)(+20%)
small.groups.2 with 597tests took 83807s [0d+23:16:47] (71%)(+18%)
org.openjdk.jcstress.tests.acqrel.varHandles-001 with 504tests took 72575s [0d+20:09:35] (61%)(+2%)
org.openjdk.jcstress.tests.atomicity.varHandles-001 with 714tests took 68543s [0d+19:02:23] (58%)(-3%)
small.groups.3 with 528tests took 65915s [0d+18:18:35] (55%)(-7%)
org.openjdk.jcstress.tests.atomicity-001 with 612tests took 64800s [0d+18:00:00] (55%)(-9%)
org.openjdk.jcstress.tests.seqcst.volatiles with 2131tests took 8928s [0d+02:28:48] (7%)(-88%)
org.openjdk.jcstress.tests.seqcst.sync with 2131tests took 8927s [0d+02:28:47] (7%)(-88%)
small.groups.4 with 57tests took 5471s [0d+01:31:11] (4%)(-93%)
Total time: 16458 minutes [11d+10:18:25]
Ideal avg time: 1175 minutes [0d+19:35:36] (100%)
Max seen  time: 1963 minutes [1d+08:43:11] (166%)
Min seen  time: 91 minutes [0d+01:31:11] (7%)
Avg differecne from longest: 59%
Avg differecne from ideal: 60%

You can see that:

org.openjdk.jcstress.tests.seqcst.volatiles with 2131tests took 8928s [0d+02:28:48] (7%)(-88%)
org.openjdk.jcstress.tests.seqcst.sync with 2131tests took 8927s [0d+02:28:47] (7%)(-88%)

Are much shorter then others, despite having much more tests. My Generator can split it, but how, depends on how mch targets we wish to have. This example is with LIMIT=500 CORES=2 and not splitting the huge groups.
Also it is visible that spiting of thsoe groups have sense only since some time per group (in this case, 2 cores, it would be 2.5 hours)

All those times are based on estimations jcstress itself is doing. I have verified that those estimations quite precisse enough. Worst I had seen was +-25% of real run compared to estimate one, but usually less then 10%.

So the questions:

  • how much cores to use?
  • How much can the basic split time be?
    • longstory short estiamtion (with two cores) are as follows (the times of groups are in last brackets):
* split_exl Limit 10 - 603 groups, from those  7 "small groups" (0.5hours each. %like longest/ideal %17%/? (6m-2.5h)
* split_all Limit 10 - 603 groups, from those  7 "small groups" (0.5hours each. %like longest/ideal %68%/81 (26m-38m)
* split_exl Limit 50 - 128 groups, from those  6 "small groups" (~2.5hhours each. %like longest/ideal %60/85% (45m-3.5h)
* split_all Limit 50 - 206 groups, from those  7 "small groups" (~1.1 hours each. %like longest/ideal %37/27% (6s-3.5h)   
   (there was an error (eg for rg.openjdk.jcstress.tests.seqcst.sync-028) 3 actors:   No scheduling is possible, these tests would not run. Which I need to investiagte and maybe fall back to simple more simple class counting, or run also the -l listing with -c (which seems most correct, as -l is indeed counting with -c)
 the real min time would be some 1hour.
* split_exl Limit 100 - 60 groups, from those  7 "small groups" (~4.5hours each. %like longest/ideal %63/79% (2.5h-7h)
* split_all Limit 100 - 99 groups, from those  7 "small groups" (~2.5hours each . %like longest/ideal %38/21% (7s-7h)
   (same error, so real min time would be again some 2.5 hours)

Although I will provide full statistics later, It seems that having more then 150 of groups is impractical and having longer then 3hours per is impractical to. Thus number of tests i group (limit) should be around 50. As in this are is the glorifed 2.5 hours for "unsplittable" groups, theirs split depends on selected cores and limits. I recommend to not split them.

The generator is using test enumeration with or, so the cmdline limitations may be hit:

  • on Cygwin this is ~ 32000,
  • on BSDs and Linux systems I use it is anywhere from 131072 to 2621440.
  • on cmd.exe it is just 8192 (I guess tht is unused in any automation)
  • windows internally it is 32768

But those limits may be hit only n the LIMIT=500 tests and more setups, which are unlikely to be used.

@judovana
Copy link
Contributor Author

judovana commented Jun 9, 2024

jcstress-20240222. -c 1 -l | wc -l
4489
jcstress-20240222.jar -c 2 -l | wc -l
4489
jcstress-20240222.jar -c 4 -l | wc -l
4489

Bad luck, -l do not honour -c

Filled: https://bugs.openjdk.org/browse/CODETOOLS-7903748

@judovana
Copy link
Contributor Author

Full statistics of cores x runs with split possibilities:
https://jvanek.fedorapeople.org/jcstress/details.html
The smallest (10) groups with large namesapces split are still running, and will run till the end of today, but I doubt that this group wil be used. Will udpate it once finshed.

@judovana
Copy link
Contributor Author

judovana commented Jul 1, 2024

@smlambert
Copy link
Contributor

@judovana - as we wait for your suggested upstream changes to jcstress, we could consider also disabling the jcstress.all target so we can merge this PR. Once changes upstream are accepted and we can re-enable the jcstress.all target with appropriate settings so it can be run in a timely manner.

@judovana
Copy link
Contributor Author

judovana commented Jul 2, 2024

10:30:56  (Time: overtime 16:28:58, 2 tests in flight, 30 ms per test)
10:30:56  (Sampling Rate: 135.63 K/sec)
10:30:56  (JVMs: 0 starting, 1 running, 0 finishing)
10:30:56  (CPUs: 2 configured, 2 allocated)
10:30:56  (Results: 337416 planned; 88910 passed, 0 failed, 0 soft errs, 0 hard errs)

Looks better then usually.

My upstream changes are not going to solve it. They are walking around, trying to be a bit more precise, but not lowering the total time. Give me a bt more time please. I still keep experimenting above it, and maybe it will go somewhere.

@judovana
Copy link
Contributor Author

judovana commented Jul 2, 2024

Killed by bug in infra:

11:13:16  (Time: overtime 17:11:18, 2 tests in flight, 30 ms per test)
11:13:16  (Sampling Rate: 135.71 K/sec)
11:13:16  (JVMs: 0 starting, 1 running, 0 finishing)
11:13:16  (CPUs: 2 configured, 2 allocated)
11:13:16  (Results: 337416 planned; 92563 passed, 0 failed, 0 soft errs, 0 hard errs)
11:13:16  
11:13:22  Cannot contact test-docker-ubi9-x64-1: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
11:19:55  wrapper script does not seem to be touching the log file in /home/jenkins/workspace/Grinder@tmp/durable-82ef8d76
11:19:55  (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400)

restart: https://ci.adoptium.net/view/Test_grinder/job/Grinder/10505/

@judovana
Copy link
Contributor Author

judovana commented Jul 4, 2024

10:47:24  (Time: 1d+overtime 21:19:37, 1 tests in flight, 30 ms per test)
10:47:24  (Sampling Rate: 78.73 K/sec)
10:47:24  (JVMs: 0 starting, 1 running, 0 finishing)
10:47:24  (CPUs: 2 configured, 2 allocated)
10:47:24  (Results: 337416 planned; 162967 passed, 0 failed, 0 soft errs, 0 hard errs)

Looks good - for the test- but not good for timeout. Anyway I think it is the way to go. To limit the -c to 2 and give smallest possible time budget. But still we must count with aprox 3 days on slowest machines.

@judovana
Copy link
Contributor Author

judovana commented Jul 4, 2024

@judovana judovana force-pushed the jcstressGeneratorExperiemnt branch from 2a7f553 to 624aea8 Compare July 31, 2024 17:38
judovana added 4 commits July 31, 2024 20:01
Despite usptream disregards this, it is currently the only way to cut
runtime to at least somehow reasonable duration
@judovana judovana force-pushed the jcstressGeneratorExperiemnt branch from 267d58c to 3fdeacb Compare July 31, 2024 18:45
@judovana
Copy link
Contributor Author

All targets are now disabled, and cores are set to default of 2.

https://ci.adoptium.net/view/Test_grinder/job/Grinder/10679/console

@judovana
Copy link
Contributor Author

Will add also -Djcstress.console.printIntervalMs handling

Copy link
Contributor

@smlambert smlambert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @judovana - I will merge it now with all targets disabled. In that way it can be used by running with TARGET=disabled.jcstress.xxxx if needed, and returned to in future when some options addressed in upstream jcstress repo.

@judovana
Copy link
Contributor Author

wait a sec, this is funny

@smlambert smlambert requested a review from sophia-guo July 31, 2024 19:27
@judovana judovana force-pushed the jcstressGeneratorExperiemnt branch from a6d02bc to 99b291f Compare July 31, 2024 19:35
@judovana
Copy link
Contributor Author

https://ci.adoptium.net/view/Test_grinder/job/Grinder/10680/ had just -Djcstress.console.printIntervalMs=3600000 set instead of default (-Djcstress.console.printIntervalMs=15000) And it have some unexpected impact:

Original ETA and samplign rate

20:38:07  (Time: 00:59:59 left, 1 tests in flight, 30 ms per test)
20:38:07  (Sampling Rate: 147.79 K/sec)
20:38:07  (JVMs: 0 starting, 1 running, 0 finishing)
20:38:07  (CPUs: 2 configured, 2 allocated)
20:38:07  (Results: 337416 planned; 1 passed, 0 failed, 0 soft errs, 0 hard errs)

new ETA and sampling rate:

21:28:01  (Time: 11d+17:05:01 left, 1 tests in flight, 2969 ms per test)
21:28:01  (Sampling Rate: 7.62 M/sec)
21:28:01  (JVMs: 0 starting, 1 running, 0 finishing)
21:28:01  (CPUs: 2 configured, 2 allocated)
21:28:01  (Results: 337416 planned; 94 passed, 0 failed, 0 soft errs, 0 hard errs)

Sampling rate is much higher, but Eta is absolutely out. And looking to the passed tests, counting, the delay is real.
The

@judovana
Copy link
Contributor Author

Gosh, I had reset time budget. Fixing

@judovana
Copy link
Contributor Author

ok, fixed. https://ci.adoptium.net/view/Test_grinder/job/Grinder/10681/console looks good.

I'm not aware of any more issues (except the time needed). Feel free to merge. I will now move to next step in #5261 - to improve https://github.com/adoptium/TKG/blob/master/scripts/getDependencies.xml so it is using jcstress-20240222.jar from dependency pipeline. (it will need to regenerate te playlist.xml once done)

@sophia-guo sophia-guo merged commit c595428 into adoptium:master Jul 31, 2024
3 checks passed
@judovana
Copy link
Contributor Author

judovana commented Aug 1, 2024

hmm.. that https://ci.adoptium.net/view/Test_grinder/job/Grinder/10681/console looks good! It is in the half after twelve hours!

@judovana
Copy link
Contributor Author

judovana commented Aug 6, 2024

Just for record, the https://ci.adoptium.net/view/Test_grinder/job/Grinder/10681/console finished in 1 day and 20 hours :)

I think that one can be deployed under some circumsatnces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants