Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spurious CI simtest failures #359

Open
morxa opened this issue Apr 7, 2020 · 5 comments
Open

Spurious CI simtest failures #359

morxa opened this issue Apr 7, 2020 · 5 comments
Labels
bug Something isn't working

Comments

@morxa
Copy link
Member

morxa commented Apr 7, 2020

As noted in #355, the CI simtest sometimes fails. One example is build 2052. At least in this case, one fawkes instance did not start at all: This is the full debug3_latest.log:

W 11:06:43.141097 FawkesMainThread: Disabling IPv6 support

The other instances seem to start fine, but then CLIPS locks up, possibly due to MongoDB not responding. Here is the end of debug1_latest.log:

D 11:06:57.351652 CLIPS (executive): FIRE    3 wm-robmem-register-trigger-done: f-1807
D 11:06:57.351661 CLIPS (executive): <== f-1807  (mutex-trigger-register-feedback SUCCESS "wm-robmem-sync-trigger" 0x7fdabc000b80)
I 11:06:57.351688 CLIPS (executive): Registered trigger 'wm-robmem-sync-trigger'
D 11:06:57.351695 CLIPS (executive): ==> f-1811  (wm-robmem-sync-initialized (trigger-ptr 0x7fdabc000b80))
W 11:07:05.134372 FawkesMainThread: Thread time limit exceeded while waiting for syncpoint '/worldstate/end'. Time limit: 0.133333 sec.
W 11:07:05.134412 FawkesMainThread: bad component: NavGraphVisualizationThread
W 11:07:05.134796 FawkesMainThread: Loop time exceeded, desired: 0.066333 sec (66333 usec),  actual: 0.134095 sec
W 11:10:37.134518 FawkesMainThread: Thread time limit exceeded while waiting for syncpoint '/skill/end'. Time limit: 0.133333 sec.
W 11:10:37.134540 FawkesMainThread: bad component: SkillerSimulatorExecutionThread
W 11:10:37.134736 FawkesMainThread: Loop time exceeded, desired: 0.066333 sec (66333 usec),  actual: 0.134141 sec
W 11:15:38.134087 FawkesMainThread: Thread time limit exceeded while waiting for syncpoint '/postloop/end'. Time limit: 0.133333 sec.
W 11:15:38.134109 FawkesMainThread: bad component: ROSNodeThread
W 11:15:38.134137 FawkesMainThread: Loop time exceeded, desired: 0.066333 sec (66333 usec),  actual: 0.133981 sec
W 11:20:39.134378 FawkesMainThread: Thread time limit exceeded while waiting for syncpoint '/agent/end'. Time limit: 0.133333 sec.
W 11:20:39.134401 FawkesMainThread: bad component: ClipsExecutiveThread
W 11:20:39.134673 FawkesMainThread: Loop time exceeded, desired: 0.066333 sec (66333 usec),  actual: 0.134126 sec
W 11:21:39.134308 FawkesMainThread: Thread time limit exceeded while waiting for syncpoint '/worldstate/end'. Time limit: 0.133333 sec.
W 11:21:39.134329 FawkesMainThread: bad component: NavGraphVisualizationThread
W 11:21:39.134789 FawkesMainThread: Loop time exceeded, desired: 0.066333 sec (66333 usec),  actual: 0.134072 sec
W 11:22:39.135074 FawkesMainThread: Thread time limit exceeded while waiting for syncpoint '/worldstate/end'. Time limit: 0.133333 sec.
W 11:22:39.135095 FawkesMainThread: bad component: NavGraphVisualizationThread
W 11:22:39.135551 FawkesMainThread: Loop time exceeded, desired: 0.066333 sec (66333 usec),  actual: 0.134109 sec
W 11:23:40.134246 FawkesMainThread: Thread time limit exceeded while waiting for syncpoint '/agent/end'. Time limit: 0.133333 sec.
W 11:23:40.134268 FawkesMainThread: bad component: ClipsExecutiveThread
W 11:23:40.134424 FawkesMainThread: Loop time exceeded, desired: 0.066333 sec (66333 usec),  actual: 0.133956 sec
W 11:29:12.668292 FawkesMainThread: Thread time limit exceeded while waiting for syncpoint '/agent/end'. Time limit: 0.133333 sec.
W 11:29:12.668328 FawkesMainThread: bad component: ClipsExecutiveThread
W 11:29:12.668595 FawkesMainThread: Loop time exceeded, desired: 0.066333 sec (66333 usec),  actual: 0.134130 sec
W 11:29:43.134995 FawkesMainThread: Thread time limit exceeded while waiting for syncpoint '/agent/end'. Time limit: 0.133333 sec.
W 11:29:43.135036 FawkesMainThread: bad component: ClipsExecutiveThread
W 11:29:43.135391 FawkesMainThread: Loop time exceeded, desired: 0.066333 sec (66333 usec),  actual: 0.134207 sec
W 11:31:12.670452 FawkesMainThread: Thread time limit exceeded while waiting for syncpoint '/worldstate/end'. Time limit: 0.133333 sec.
W 11:31:12.670482 FawkesMainThread: bad component: NavGraphVisualizationThread
W 11:31:12.670853 FawkesMainThread: Loop time exceeded, desired: 0.066333 sec (66333 usec),  actual: 0.134045 sec

Robot 2 complains about not being able to reach mongodb on robot 3 (debug2_latest.log):

W 11:34:01.533838 MongoDBReplicaSet|robot-memory-distributed: Fail: No suitable servers found (`serverSelectionTryOnce` set): [connection refused calling ismaster on 'localhost:27033']: generic server error
@morxa morxa added the bug Something isn't working label Apr 7, 2020
@TarikViehmann
Copy link
Contributor

I encounter the same issue sometimes (maybe once every 20th game, 3 times so far). In all cases it was robotino3 and restarting the agents did not help, i had to clear the mongodb data in /tmp/. Was already curious if this is the same issue that you ran into with the sim tests.

@morxa
Copy link
Member Author

morxa commented Apr 9, 2020

@vmatare
Copy link
Contributor

vmatare commented Apr 9, 2020

Is it possible to investigate the situation live in any way?

@morxa
Copy link
Member Author

morxa commented Apr 9, 2020

I guess you can by running the simulation over and over again, with:

$ ./gazsim.bash -o -r -m m-skill-sim-clips-exec -k -n 3

@vmatare
Copy link
Contributor

vmatare commented Apr 9, 2020

Well, makes me think: Shouldn't there be some kind of error message from RobotMemory if it suddenly stops being able to do anything?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants