Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix patch_off_object #114

Merged
merged 18 commits into from
Dec 20, 2024

Conversation

scottcanoe
Copy link
Contributor

@scottcanoe scottcanoe commented Dec 14, 2024

Problem Statement

In a small percentage of runs (<1%), Monty gets stuck bouncing between off-object observations early in the run. These experiments timeout after max_monty_steps with no matching steps performed. These timeouts occur mostly, but not exclusively, in experiments with distant agents. So far I've focused on distant agents. Here's a quick overview of what I've observed and done.

In distant agents experiments, get_good_view is called at the beginning of an episode. It moves the agent to a reasonable distance from the object and orients the agent so that the view finder's central pixel is on-object. However, the view finder's central area is not a perfect proxy for sensor module 0 (a patch), and this method does not guarantee that the patch will be on-object, even if the view finder is.

  1. Because of the differences in resolutions/zooms of the patch and view finder, the central view finder's pixel is covered by about 10 pixels in the patch's image. If the view finder is aimed directly on an object's edge, the patch's central pixel can end up slightly off-object. If the agent goes the wrong way to try and find the object and ends up even further off-object, then the system gets stuck in a loop, bouncing back and forth between these two off-object observations, repeatedly calling fixme_undo_last_action. Example (object is "banana"):

  1. A semantic map can contain "holes", and it can be problematic if the agent is oriented directly onto it. For example, the "chain" object has (real) topological holes in the links. If the agent is oriented onto a hole, the agent may then jumped to an off-object location, reverse this action, and ultimately get stuck in a fixme_undo_last_action loop. In another case, there was a small (spurious) hole in the object's semantic map of the knife, and in this episode, the system also got stuck in a fixme_undo_last_action loop. Here's the patch's semantic map on the part of the knife while holes:

Solutions

I've made several changes here.

  1. Where get_good_view was called (at the beginning of an episode and after a jump), it is now called twice, first using the view finder and then sensor module 0. The second call allows for finer orienting movements that ensure that the central sensor patch ends up on-object.
  2. Allowing the agent to move forward on the second pass with the sensor patch caused problems, so I added a parameter to get_good_view to facilitate disabling move-forward actions. Solutions (1) and (2) are the primary fix that solved problem (1).
  3. I shrunk the size of the smoothing kernel in find_location_to_look_at to reduce the likelihood of orienting onto a semantic hole. The previous kernel was quite large, and the smearing of semantic information made it difficult to discourage orientation towards semantic holes. This fixed problem (2).
  4. I've also added a fix that resets an LM's current_mlh between episodes. This wasn't affecting the system's behavior, but it did cause incorrect logging of these patch_off_object episodes.

These changes eliminated patch_off_object episodes for benchmark experiments tested except for the multi-object experiment and surface agent experiments. Those will have to be tackled in a future PR.

Benchmark updates for the 10- and 77-object experiments have been included in this PR with no substantial changes to report.

scottcanoe and others added 15 commits December 12, 2024 08:13
Enforce stricter criteria for considering a view on-object, and improve the ability to locate a good movement target when it is not.
Added `allow_translation` for get_good_view to make moving closer to the object optional.

Also changed the uses of get_good_view to use the double-checking (use view finder and then a patch)
@scottcanoe scottcanoe marked this pull request as ready for review December 16, 2024 21:18
@scottcanoe
Copy link
Contributor Author

Tagging @hlee9212 for visibility.

@scottcanoe scottcanoe added bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request labels Dec 16, 2024
@scottcanoe
Copy link
Contributor Author

Also noting here that I don't know what's causing the spurious semantic holes to sometimes show up (like in the "knife" example above), but I'll plan to take a closer look at that while working on the upcoming semantic sensor PR.

@scottcanoe scottcanoe changed the title WIP: Fix patch_off_object Fix patch_off_object Dec 16, 2024
@tristanls tristanls added the triaged This issue or pull request was triaged label Dec 17, 2024
Copy link
Contributor

@nielsleadholm nielsleadholm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice, thanks for the fix and the great documentation. Just added a function suggestion.

@@ -715,6 +735,10 @@ def handle_successful_jump(self):

else:
self.get_good_view("view_finder")
Copy link
Contributor

@nielsleadholm nielsleadholm Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this could be made a bit "dryer", and the underlying assumptions of the central sensor patch a bit more visible, by having an outer function, maybe called something like multi_step_get_good_view (or a better name), which takes no arguments (since we are hard-coding these anyways). This function then calls:

self.get_good_view("view_finder")
for patch_id in ("patch", "patch_0"):
    if patch_id in self._observation["agent_id_0"].keys():
        self.get_good_view(patch_id, allow_translation=False)
        break

which could be used on both lines 487 and 738.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about get_good_view_with_patch_refinement? It's a long name, but multi-step to me sounds like it could refer to multiple steps taken with the same sensor (as is partly done already with move forward).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point, I like it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@nielsleadholm
Copy link
Contributor

@vkakerbeck I'm curious from when you've examined YCB objects in the past, whether you ever encountered holes in objects like the knife blade? I'm assuming this is a defect in the underlying mesh that emerged during the original 3D scanning of the YCB objects, since it's being picked up by the ground truth semantic sensor.

Copy link
Contributor

@vkakerbeck vkakerbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the fix! Just a few questions so I understand all the changes.

Oh and yes, the YCB objects have tons of weird artifacts like that...

@@ -724,7 +725,7 @@ def find_location_to_look_at(self, sem3d_obs, image_shape, target_semantic_id):
# as expected, which can otherwise break if e.g. on_object_image is passed
# as an int or boolean rather than float
smoothed_on_object_image = scipy.ndimage.gaussian_filter(
on_object_image, sem3d_obs.shape[0] / 10, mode="constant"
on_object_image, 2, mode="constant"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this number hardcoded now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just landed on a kernel size of 2 because it made the smoothed images look "reasonably" smooth, and solved the issue with holes on the object. I can see why we would want to kernel size to scale with the resolution, but then I thought what we really want to scale it by is the physical size on the object between pixels since presumably holes in the object are usually pretty small (in centimeters). This is something I can come back to, but I just left it since it seemed to be giving a good result when I looked at all the images during debugging.

self.current_mlh["location"] = [0, 0, 0]
self.current_mlh["rotation"] = Rotation.from_euler("xyz", [0, 0, 0])
self.current_mlh["scale"] = 1
self.current_mlh["evidence"] = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wha was the reason these needed to be added now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since not resetting current_mlh was causing incorrect logging for patch_off_object episodes, I thought it made sense to include it here. More specifically, an episode could end up with patch_off_object but it would report that the most likely object was whatever the MLH was for the previous episode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah okay, good catch!

| base_77obj_surf_agent | 98.27% | 4.33% | 52 | 0.18 | 42m | 123s |
| randrot_noise_77obj_dist_agent | 87.01% | 29.00% | 151 | 0.63 | 2h10m | 468s |
| randrot_noise_77obj_surf_agent | 94.37% | 21.65% | 113 | 0.61 | 1h31m | 339s |
| randrot_noise_77obj_5lms_dist_agent | 90.91% | 5.19% | 70 | 1.01 | 1h7m | 1439s |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an idea why performance is worse now in terms of % correct and % used mlh?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping to come up with a good answer for this, but ultimately I don't really know. I reran with a handful of different random seeds, sometimes recovering the roughly the same accuracy as before. Somehow the very slightly different starting conditions ended up having a noticeable change on accuracy, but I don't have much to say beyond that. It's something I'd like to investigate further in the near future. I'll be redoing all benchmarks for the semantic sensor PR, so maybe there's more I can learn then.

@scottcanoe scottcanoe merged commit f610702 into thousandbrainsproject:main Dec 20, 2024
13 checks passed
@scottcanoe scottcanoe deleted the patch_off_object branch December 20, 2024 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request triaged This issue or pull request was triaged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants