Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding timeout functionality to action nodes for ineffectual work functions #62

Merged
merged 4 commits into from
Nov 8, 2024

Conversation

joshc-slac
Copy link
Collaborator

@joshc-slac joshc-slac commented Oct 19, 2024

Description

Want to allow work function of ActionNode's ActionWorker to timeout.

Specifically if the action performed in the work function is ineffectual. I.e. the work done in the work function never satisfies the completion condition. This does not guard against the case in which the work function itself hangs.

This motivated:

  • Moving the wrapped_action_work to the ActionWorker, keeps file sizes and contents reasonable. This did lead to what may be considered a pythonic sin of a pass-through definition in ActionNode; however for generally cleanliness of code base I would strongly argue this is best.
  • Creating a helper Timer class. One can imagine this will be used elsewhere. This can also extend the time.perf_counter method to be a code timer across the codebase

Motivation and Context

We want our work functions to be able to timeout. Notably this does not address the very real issue #56, but it is critical our trees be intelligent enough to stop work on a meaningful time horizon and report failure should they be unable to execute their task . This does not address #66

How Has This Been Tested?

Only in unit tests

Where Has This Been Documented?

We should do that more. I will do a little here.

Our work wrapper's wrapper now takes an additional, optional work_function_timeout_period_sec

def wrapped_action_work(loop_period_sec: float = 0.1, work_function_timeout_period_sec: float = 2):

We initialize the Timer object (within the spawned process), using this new parameter

work_loop_timeout_timer = Timer(name=name,
timer_period_seconds=work_function_timeout_period_sec,
auto_start=False,
is_periodic=True)

We use the timer as another mechanism to exit the work loop:

while not completion_condition() and not work_loop_timeout_timer.is_elapsed():

The final completion condition check is now more than perfunctory

# check if we exited loop because we timed out or we succeeded at task
if completion_condition():
logger.debug(f"Worker for node ({name}) completed.")
volatile_status.set_value(py_trees.common.Status.SUCCESS)
else:
logger.debug(f"Worker for node ({name}) failed.")
volatile_status.set_value(py_trees.common.Status.FAILURE)

Pre-merge checklist

  • Code works interactively
  • Code follows the style guide
  • Code contains descriptive docstrings, including context and API
  • New/changed functions and methods are covered in the test suite where possible
  • Test suite passes locally
  • Test suite passes on GitHub Actions
  • Ran docs/pre-release-notes.sh and created a pre-release documentation page
  • Pre-release docs include context, functional descriptions, and contributors as appropriate

@joshc-slac joshc-slac changed the base branch from master to joshc-slac/new-multi-shot October 19, 2024 01:29
@joshc-slac
Copy link
Collaborator Author

Closes #45

@joshc-slac joshc-slac force-pushed the joshc-slac/new-multi-shot branch 2 times, most recently from 573ab0b to dcefc01 Compare October 19, 2024 02:56
@joshc-slac joshc-slac force-pushed the joshc-slac/timeout-action-node branch from d49a5ee to 8df553d Compare October 19, 2024 03:12
Base automatically changed from joshc-slac/new-multi-shot to master October 23, 2024 00:44
@joshc-slac joshc-slac force-pushed the joshc-slac/timeout-action-node branch from 8df553d to a2c1edb Compare October 28, 2024 19:19
Copy link
Contributor

@tangkong tangkong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a lot of questions, but maybe I'm misunderstanding things.

self.is_periodic = is_periodic
self.auto_start = auto_start
if (self.auto_start):
self.timer_start_time = time.time()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and throughout we should use time.monotonic, since we don't actually care about the wall clock.

volatile_status.set_value(py_trees.common.Status.RUNNING)
# Start timer
work_loop_timeout_timer.start_timer()
while not completion_condition() and not work_loop_timeout_timer.is_elapsed():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we only check the timer at the top of the while loop, if a work function never completes / hangs forever, this won't actually timeout will it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, at present this doesn't have a mechanism to send a kill signal to a hung function. Within Worker we can tie these functionalities together... it is a good, deeper idea. The intention here was if futile work was being done, i.e. the function is returning but having no effect on the success condition, to time out that work

work_loop_timeout_timer = Timer(name=name,
timer_period_seconds=work_function_timeout_period_sec,
auto_start=False,
is_periodic=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the intent behind the periodic timer here? It looks like we just reset the start time if the elapsed time is greater than the timer period. But the way the timer is called, the loop will never repeat within one work wrapper call (while loop starts -> work loops a couple times -> timer elapses -> while loop broken (even though timer started again))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we want the (do_work) while loop, the one that launches this "thread" as a persistent process across the life cycle of the program to persist.

I was adding a timer to the actual "behavior tree load" this can be seen as, "I got initialized, I did my task for [loop_timer] amount of time, I never found my success condition to be true, let me signal fail and wait for the BT to try me again should it do so"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's no disagreement that there needs to be a timer
The question here is: why is the timer periodic?
After all, on line 63 we manually restart the timer anyway. What does periodicity do here?

Status.FAILURE,
):
time.sleep(0.01)
action.tick_once()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this testing the right thing? The timeout is supposed to trigger within a tick right? Should the sleep be within the work function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test is correct, the sleep here is more like the tree's tick rate

Comment on lines +29 to +31
if (self.is_periodic):
self.timer_start_time = time.time()
return True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what this periodic clause would be used for?
It's unintuitive that a timer would restart itself when checked.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great point. It was more out of habit from previously implemented timers that I found the pattern of "one shot" vs "periodic" timer distinction helpful. Here I am not sure if there is much (or may even be negative) value add. Let me reflect for a moment

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay on further reflection. You are 100% right that the timer within ActionWorker does not need to be periodic. I would however like to keep the notion of periodic timer in its current implementation for this use case:

A periodic timer when checked if elapsed (and it is) will emit a rising edge signal that is lowered thereafter which is convenient for breaking out of loops at fixed frequency, if not set as periodic the elapsed signal will stay high

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://en.wikipedia.org/wiki/You_aren%27t_gonna_need_it I might advocate for adding this when we need it, if we later discover we need it.

Is the distinction between a stays-high signal and a periodic-break signal even relevant here in python land? We should design our "signals" in the way that works for our use case

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pep8 says module names should be lowercase

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you don't have to change it because we need to do a mass update later to bring the package into compliance

import time


class Timer():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mixed feelings about this class name because it collides with threading.Timer from the standard library

ZLLentz
ZLLentz previously approved these changes Oct 31, 2024
Copy link
Member

@ZLLentz ZLLentz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to approve because the core seems fully correct to me and I'm quibbling on some secondary things

@joshc-slac joshc-slac changed the title Adding timeout functionality to action nodes Adding timeout functionality to action nodes for ineffectual work functions Nov 4, 2024
@joshc-slac joshc-slac force-pushed the joshc-slac/timeout-action-node branch from a2c1edb to 331cd0f Compare November 4, 2024 20:08
Comment on lines 8 to 36
class TestTimer():
def test_elapsed(self):
t = Timer(name="test_elapsed",
timer_period_seconds=0.1,
is_periodic=False)
t.start_timer()
assert t.is_elapsed() is False
time.sleep(0.5)
assert t.is_elapsed() is True

def test_timer_error(self):
t = Timer(name="test_error_not_started",
timer_period_seconds=0.1,
is_periodic=False)
with pytest.raises(RuntimeError):
t.get_elapsed()
with pytest.raises(RuntimeError):
t.is_elapsed()

def test_periodic(self):
t = Timer(name="test_error_not_started",
timer_period_seconds=0.1,
auto_start=True,
is_periodic=True)
time.sleep(0.2)
assert t.is_elapsed() is True
assert t.is_elapsed() is False
time.sleep(0.1)
assert t.is_elapsed() is True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 layer Nitpick:

  • This class structure appears unnecessary to me
  • The assert statements are not pythonic. is True can be entirely omitted for truthiness checks and is False should use not instead.
Suggested change
class TestTimer():
def test_elapsed(self):
t = Timer(name="test_elapsed",
timer_period_seconds=0.1,
is_periodic=False)
t.start_timer()
assert t.is_elapsed() is False
time.sleep(0.5)
assert t.is_elapsed() is True
def test_timer_error(self):
t = Timer(name="test_error_not_started",
timer_period_seconds=0.1,
is_periodic=False)
with pytest.raises(RuntimeError):
t.get_elapsed()
with pytest.raises(RuntimeError):
t.is_elapsed()
def test_periodic(self):
t = Timer(name="test_error_not_started",
timer_period_seconds=0.1,
auto_start=True,
is_periodic=True)
time.sleep(0.2)
assert t.is_elapsed() is True
assert t.is_elapsed() is False
time.sleep(0.1)
assert t.is_elapsed() is True
def test_elapsed(self):
t = Timer(name="test_elapsed",
timer_period_seconds=0.1,
is_periodic=False)
t.start_timer()
assert not t.is_elapsed()
time.sleep(0.5)
assert t.is_elapsed()
def test_timer_error(self):
t = Timer(name="test_error_not_started",
timer_period_seconds=0.1,
is_periodic=False)
with pytest.raises(RuntimeError):
t.get_elapsed()
with pytest.raises(RuntimeError):
t.is_elapsed()
def test_periodic(self):
t = Timer(name="test_error_not_started",
timer_period_seconds=0.1,
auto_start=True,
is_periodic=True)
time.sleep(0.2)
assert t.is_elapsed()
assert not t.is_elapsed()
time.sleep(0.1)
assert t.is_elapsed()

ZLLentz
ZLLentz previously approved these changes Nov 4, 2024
Copy link
Member

@ZLLentz ZLLentz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left my nit and approving again

@joshc-slac joshc-slac force-pushed the joshc-slac/timeout-action-node branch from 331cd0f to f069edb Compare November 7, 2024 00:10
@joshc-slac joshc-slac changed the base branch from master to joshc-slac/better-conditions November 7, 2024 00:10
ZLLentz
ZLLentz previously approved these changes Nov 7, 2024
Copy link
Member

@ZLLentz ZLLentz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-approval post nit

Base automatically changed from joshc-slac/better-conditions to master November 7, 2024 03:57
@joshc-slac joshc-slac dismissed ZLLentz’s stale review November 7, 2024 03:57

The base branch was changed.

@joshc-slac
Copy link
Collaborator Author

joshc-slac commented Nov 7, 2024

Hmmm @tangkong, casual ghub thought that our approval dismissing settings may be a tad over eager (or maybe it is I that is a tad overeager)

I just went down a small rabbit hole, our settings are probably sufficient and err in the correct direciton

@ZLLentz
Copy link
Member

ZLLentz commented Nov 7, 2024

I think the diff visualization is broken, it includes most of the elements from the refactor PR.

@joshc-slac joshc-slac force-pushed the joshc-slac/timeout-action-node branch from 3a9badf to 277f99c Compare November 8, 2024 00:06
@joshc-slac joshc-slac merged commit dd2395f into master Nov 8, 2024
14 checks passed
@joshc-slac joshc-slac deleted the joshc-slac/timeout-action-node branch November 8, 2024 00:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants