Don't use the semantic sensor by default #116

scottcanoe · 2024-12-17T05:40:17Z

Problem Statement

The environment data loader returns observations for each sensor, such as an RGBA image or depth image. By default, it also returns a semantic map for each sensor which indicates the object each pixel is "looking at"¹. However, these semantic maps do contain a lot of information which is delivered to Monty "for free", and we would like to remove this dependence on privileged information by having Monty compute its own semantic maps from sensor data as its default behavior.

Most of the groundwork for this has already been laid. The DepthTo3DLocation transform currently has a method for estimating semantic maps using depth data. When DepthTo3DLocation.use_semantic_sensor=False, the ground-truth semantic map is not used during the normal course of an episode. However, it is still used by when initializing an episode and after executing hypothesis-driven jumps. The goal is to replace usage of the ground-truth semantic map with the estimated semantic map in these contexts. We would also like to prevent the ground-truth semantic map from being returned by the environment data loader to guarantee that Monty doesn't use it when we don't want it to.

Here's a quick overview of the settings that affect the usage of ground-truth semantic maps².

The MountConfig.semantics value for a sensor determines whether the ground-truth semantic map is returned by the data loader. More specifically, it creates the item observations[agent_id][sensor_module_id]["semantic"].
DepthTo3DLocation.use_semantic_sensor determines whether the ground-truth semantic map is used by this transform.
InformedPolicy's move_close_enough, orient_to_object, and find_location_to_look_at methods currently use (and require) the existence of observations[agent_id][sensor_module_id]["semantic"] which is controlled by the mount config only. Setting DepthTo3DLocation.use_semantic_sensor=False has no effect here.

Important: the estimated semantic map created by DepthTo3DLocation is binary (off- or on-object), whereas the ground-truth semantic maps are zero when off-object and a particular object ID otherwise (e.g., 3s where the pixel is on the potted_meat_can). The presence of object IDs in the semantic map is what is needed by the InformedPolicy's movement/orientation methods. While we can currently work around binary semantic maps for single-object episodes (see below), there's currently no way to avoid using ground-truth semantic maps for multi-object experiments.

Solutions

I've adapted InformedPolicy.move_close_enough, InformedPolicy.orient_to_object, and InformedPolicy.find_location_to_look_at methods to use semantic maps embedded into the observations dict by DepthTo3DLocation. These methods have been augmented with primary_target_id and multiple_objects_present parameters. If this is False, the nonzero elements of the estimated semantic map are temporarily replaced with the ID of the target object. The nice thing about this method is that the semantic data contained in "semantic_3d" reflects the choice of DepthTo3DLocation.use_semantic_sensor, and the modifications to the code have no effect when the semantic sensor is in use. These changes are specific to the distant agent.

Quite a few changes have also been made to DepthTo3DLocation. The basic reason here is that the ground-truth semantic map was used to zero out off-object regions on an estimated surface. I introduced a simple, temporary semantic mask from depth data that performs the same function. There was a decent amount of reorganization to get this to work, including changes to parameters and moving some behavior between functions. Side note, the surface estimation behavior of DepthTo3DLocation isn't super obvious to a code reader, and I'd be happy to contribute some documentation to it somewhere.

Beyond these changes, some tests were updated and some typos were changed (num_distractors was often spelled num_distactors, and semantic was sometimes written as sematic). I have also added a pre-episode checks that ensures that we start an episode on the target object (with the exception of multi-object experiments).

Dataset configs were also changed to not use semantic sensors in DepthTo3DLocation nor have ground-truth semantic data returned by habitat. Multi-object dataset configs were also added and should be used when performing multi-object experiments. Benchmark pretraining and object recognition experiments reflect these modifications.

Finally, I re-pretrained all models (we're now up to pretrained_ycb_v10) and ran all benchmarks.

Clarification for future readers: these maps are not used for object recognition -- they're used to select a location in space where Monty should move and orient its sensors toward to initialize an episode. ↩
The user should be aware that certain combinations of these two attributes are invalid. For example, setting the MountConfig.semantics value for a sensor to False and DepthTo3DLocation.use_semantic_sensor=True will trigger an error. This is something we can warn against when we implement config checking. ↩

Also fix spelling error. `get_perc_on_obj_semantic` had the parameter "sematic" [sic]. I changed it to "semantic" and updated uses.

Change default "semantics" to False in Multi-LM mount configs.

Update docstring

…cation_to_look_at

- Explicitly use semantic sensors for habitat_transform_test.py - Drop more non-matching columns between parallel and non-parallel experiments for run_parallel_test.py since not having a semantic sensor modifies reporting of stepwise_performance and stepwise_target_object.

docstrings

Don't use ycb data path for multi-object dataset args.

Add return values to get_good_view and get_good_view_with_patch_refinement so we can raise an assertion error if we start an episode off-object.

Prepare to pretrain new models.

…onty into semantic_sensor

Also typos

github-actions · 2025-01-09T23:37:42Z

📚 Documentation Preview

✅ A preview of the documentation changes in this PR is available for maintainers at:
https://thousandbrainsproject.readme.io/v0.0-semantic-sensor

Last updated: 2025-01-15 21:32 UTC

nielsleadholm

Really nice work! Just added some comments which we can also discuss in the team meeting, it might be worth starting with looking at my last comment (re. semantic_mask and get_semantic_from_depth).

tests/unit/policy_test.py

tests/unit/run_parallel_test.py

src/tbp/monty/frameworks/environment_utils/transforms.py

nielsleadholm · 2025-01-13T12:24:33Z

src/tbp/monty/frameworks/environment_utils/transforms.py

+            else:
+                semantic_added = True
+                semantic_mask = np.ones_like(depth_obs, dtype=int)
+                semantic_mask[depth_obs >= 1] = 0


Should this be hard-coded based on 1? Would it be sufficient if we just initialize it as ones and then call self.clip instead?

I don't think we can initialize with ones and defer clipping to self.clip since this is only used with surface agents (i.e., when self.depth_clip_sensors is non-empty and the the sensor index is in self.depth_clip_sensors). I hard-coded as 1 since the MissingToMaxDepth transform applied before DepthTo3DLocations clips missing values/off-object to 1, so I figured that was, at least for now, a safe value to consider as the background off-object value.

could you add a comment on this in the code? something like
# MissingToMaxDepth will set observations that go into the empty void to 1

src/tbp/monty/frameworks/environment_utils/transforms.py

nielsleadholm · 2025-01-13T12:28:27Z

src/tbp/monty/frameworks/environment_utils/transforms.py

-
+            default_on_surface_th: default threshold to use if no bimodal distribution
+                is found
+            semantic_mask: binary mask indicating on-object locations


I think the purpose of semantic_mask and get_semantic_from_depth is still a bit confusing in terms of what they actually do, vs. what's documented (a lot of which I appreciate was inherited before this PR). I think it would be clearer if we renamed get_semantic_from_depth to get_surfaces_from_depth, and add a comment here that semantic_mask is based on a simpler heuristic of depth. You could almost create a new, mini function for the below lines, and name that function get_semantic_from_depth. This would then be used on lines 322:

semantic_mask = np.ones_like(depth_obs, dtype=int) semantic_mask[depth_obs >= 1] = 0 agent_obs["semantic"] = semantic_mask

Then it would also hopefully be clearer that our final semantic map is typically determined by the element-wise product of these two.

I've added a bunch of extra comments in __call__ and get_surface_from_depth (which was get_semantic_from_depth) that I think helps understand what's going on with surface vs semantic.

I kept the code referenced code in __call__ though. I'm cool changing it around, but I left it because I didn't want to obscure that the observations dict was being added to (and that we'd have to possibly remove an item from it later).

scottcanoe · 2025-01-14T02:23:08Z

Hey @nielsleadholm, thanks for the feedback! It needed fresh eyes. I've made a bunch of edits and added a fair amount of comments. Let me know what you think.

vkakerbeck

Thanks for taking this increadible deep dive and figuring out this jungle of complex entanglements and issues. I just left a few minor questions for my own understanding.

benchmarks/configs/ycb_experiments.py

vkakerbeck · 2025-01-14T10:00:11Z

src/tbp/monty/frameworks/environment_utils/transforms.py

+            #    according to depth. However, the threshold we use to separate the two
+            #    parts is dependent on distant vs surface agents. The parameter
+            #    `default_on_surface_th` changes based on whether we are using a distant
+            #    or surface agent.


Could we turn this big block into a docstring? Ideally of the function so it also shows up in our API docs.

Good idea, got it.

vkakerbeck · 2025-01-14T10:35:53Z

src/tbp/monty/frameworks/environment_utils/transforms.py

+            else:
+                semantic_added = True
+                semantic_mask = np.ones_like(depth_obs, dtype=int)
+                semantic_mask[depth_obs >= 1] = 0


could you add a comment on this in the code? something like
# MissingToMaxDepth will set observations that go into the empty void to 1

vkakerbeck · 2025-01-14T10:40:59Z

src/tbp/monty/frameworks/environment_utils/transforms.py

+            else:
+                semantic_mask = np.ones_like(depth_obs, dtype=int)
+                semantic_mask[depth_obs >= 1] = 0
+                agent_obs["semantic"] = semantic_mask


do we have to add this mask the agent_obs if we are passing it as a parameter to get_surface_from_depth? Could we just do self.get_surface_from_depth(..., semantic_mask=estimated_semantic_mask)?

There was a reason for it, but I went ahead and refactored the code such that we no longer modify the observations dict. I'm outlining the changes in a comment below.

vkakerbeck · 2025-01-14T10:45:44Z

tests/unit/run_parallel_test.py

+        # loaders only have one object in parallel experiments.
+        for col in ["time", "stepwise_performance", "stepwise_target_object"]:
+            scsv.drop(columns=col, inplace=True)
+            pcsv.drop(columns=col, inplace=True)


Why did this issue only come up now? How was this related to the semantic sensor?

This issue is due to the fact that stepwise_performance and stepwise_target_object are derived from the semantic map stored by DepthTo3DLocations. When the semantic sensor is in use, the semantic map contains any integer value/object ID, so stepwise_performance and stepwise_target_object were always "correct". However, without the semantic sensor, semantic maps are always binary, so the sensed object IDs is always 1.

The reason this is different between parallel and serial runs is because in parallel runs, there is always only one object in the data loader, and so the mapping from object ID to object name is always 1 -> the_object. These columns are always fine in parallel runs, but they are now wrong in serial runs when we don't have the semantic sensor, and the values in the columns are always "mug" (or whatever the object is for object ID 1).

So that's the short answer. I hope that makes sense.

Ah thanks, yes that makes sense.

nielsleadholm

Thanks for those clarifying comments Scott, this is great!

src/tbp/monty/frameworks/environment_utils/transforms.py

…ers.

scottcanoe · 2025-01-15T06:26:18Z

@nielsleadholm and @vkakerbeck Thanks for the review, great suggestions.

I've made a few of changes to DepthTo3DLocations worth mentioning.

I restructured things so that we don't have a semantic mask being temporarily added and later removed from the observations dict. I never liked that solution (it's pretty hacky IMO). The updated version creates a temporary variable semantic_patch, and passes it around to the relevant functions, such as clip. clip now takes depth and semantic matrices directly rather than pulling them from an agent_obs dictionary.
In the spirit of Always Be Refactoring, I went ahead and removed inconsistent variable naming that was bothering me. For example, the depth observations were sometimes named depth_obs and other times named depth_patch, depending on which function was using it. Now, *_obs is reserved for dictionaries, and *_patch is reserved for matrices. This is now consistent within and across functions. Not a big change, but it improves readability.

Other than that, I've improved documentation and added more type annotation. Though nothing in the code should have functionally changed since the last review (changes were mostly stylistic or documentation things), I also reran some benchmarks to make sure everything looks normal.

vkakerbeck

Great updates and refactors! Thanks :) Looks all good to me now. I just added one more minor comment but nothing blocking.

vkakerbeck · 2025-01-15T09:39:47Z

src/tbp/monty/frameworks/environment_utils/transforms.py

-            semantic_mask=semantic_mask,
+            semantic_patch,
+            0.01,
+            default_on_surface_th,


Is there a reason you are not using key words here anymore? I prefer it looking like min_depth_range=0.01, instead of just 0.01, since it is more readable and less error prone.

No problem, I'll change it back. I agree it's more readable to use keywords here. I had switched it to the above to make the calling style uniform (within this class, at least).

nielsleadholm

Great stuff, thanks Scott, this looks very clean now!

scottcanoe added 8 commits December 10, 2024 13:15

set use_semantic_sensor=False everywhere

17249c6

Merge branch 'main' into use_semantic_sensor

950b6c9

Update make_dataset_configs.py

a4392b3

Undo accidental rotation change

a006194

Change how semantic maps are obtained

e6c50fd

Also fix spelling error. `get_perc_on_obj_semantic` had the parameter "sematic" [sic]. I changed it to "semantic" and updated uses.

Update make_dataset_configs.py

dad74bd

Change default "semantics" to False in Multi-LM mount configs.

Use estimated semantic map for single-object experiments

bfcb38d

Update motor_policies.py

a5452f5

Update docstring

scottcanoe requested review from vkakerbeck and nielsleadholm December 20, 2024 22:01

scottcanoe and others added 18 commits December 20, 2024 19:16

Add multi-obj dataset args, default semantics to False

e9b1296

Merge branch 'main' into semantic_sensor

e2bb36b

Update policy_test.py

7c92374

Fix for surf agent w/o semantic sensor

a33d3d8

Update sandbox.py

9b08b2b

rename multi_objects_present -> multiple_objects_present, fix find_lo…

1fd3b37

…cation_to_look_at

Update tests

0035622

- Explicitly use semantic sensors for habitat_transform_test.py - Drop more non-matching columns between parallel and non-parallel experiments for run_parallel_test.py since not having a semantic sensor modifies reporting of stepwise_performance and stepwise_target_object.

Update motor_policies.py

4186fae

docstrings

Update policy_test.py

ede3007

Don't use ycb data path for multi-object dataset args.

Merge branch 'main' into semantic_sensor

2f6e7f4

Update embodied_data.py

4c343be

Add return values to get_good_view and get_good_view_with_patch_refinement so we can raise an assertion error if we start an episode off-object.

Update pretraining_experiments.py

9184f60

Prepare to pretrain new models.

Merge branch 'semantic_sensor' of https://github.com/scottcanoe/tbp.m…

c71bd2a

…onty into semantic_sensor

Add num_distractors logic for pre-episode assertion

1eaa595

Also typos

Delete sandbox.py

5c88140

Update transforms.py

23842cd

Merge branch 'main' into semantic_sensor

e4d3aee

Update tables

edcd230

scottcanoe marked this pull request as ready for review January 9, 2025 23:52

scottcanoe changed the title ~~WIP: Don't use the semantic sensor by default~~ Don't use the semantic sensor by default Jan 10, 2025

scottcanoe added 2 commits January 12, 2025 16:03

Improved docstring

b194f95

Punctuation in docstring

01f3f02

nielsleadholm reviewed Jan 13, 2025

View reviewed changes

tristanls assigned vkakerbeck and nielsleadholm Jan 13, 2025

tristanls added the triaged This issue or pull request was triaged label Jan 13, 2025

scottcanoe added 4 commits January 13, 2025 15:37

Add comments for tests

c3d48db

Add comments for tests

6d0701e

Add explanations about surface estimation

8f83704

type annotation

cd7bb10

vkakerbeck reviewed Jan 14, 2025

View reviewed changes

nielsleadholm approved these changes Jan 14, 2025

View reviewed changes

src/tbp/monty/frameworks/environment_utils/transforms.py Outdated Show resolved Hide resolved

scottcanoe added 7 commits January 14, 2025 09:53

Update locations to public models

b8dd850

Add v10 comment in ycb_experiments.py

ae767ce

Convert comment to docstring, add more type annotation

1a1ea85

Remove semantic_mask temporary variable

d9170a7

Typo fix

1e22061

Don't temporarily add to agent_obs. Rename some variables and paramet…

7f55693

…ers.

Improve docstring

310e968

Merge branch 'main' into semantic_sensor

69c6edf

vkakerbeck approved these changes Jan 15, 2025

View reviewed changes

nielsleadholm approved these changes Jan 15, 2025

View reviewed changes

scottcanoe and others added 2 commits January 15, 2025 13:26

Use keywords for some function calls

21de91b

Merge branch 'main' into semantic_sensor

74c33da

scottcanoe merged commit 2d843e1 into thousandbrainsproject:main Jan 15, 2025
14 checks passed

scottcanoe deleted the semantic_sensor branch January 15, 2025 21:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't use the semantic sensor by default #116

Don't use the semantic sensor by default #116

scottcanoe commented Dec 17, 2024 •

edited

Loading

github-actions bot commented Jan 9, 2025 •

edited

Loading

nielsleadholm left a comment

nielsleadholm Jan 13, 2025

scottcanoe Jan 14, 2025

vkakerbeck Jan 14, 2025

scottcanoe Jan 15, 2025

nielsleadholm Jan 13, 2025

scottcanoe Jan 14, 2025 •

edited

Loading

scottcanoe Jan 14, 2025

scottcanoe commented Jan 14, 2025

vkakerbeck left a comment

vkakerbeck Jan 14, 2025

scottcanoe Jan 14, 2025

vkakerbeck Jan 14, 2025

vkakerbeck Jan 14, 2025

scottcanoe Jan 14, 2025 •

edited

Loading

vkakerbeck Jan 14, 2025

scottcanoe Jan 14, 2025

vkakerbeck Jan 15, 2025

nielsleadholm left a comment

scottcanoe commented Jan 15, 2025

vkakerbeck left a comment

vkakerbeck Jan 15, 2025

scottcanoe Jan 15, 2025

nielsleadholm left a comment

Don't use the semantic sensor by default #116

Don't use the semantic sensor by default #116

Conversation

scottcanoe commented Dec 17, 2024 • edited Loading

Problem Statement

Solutions

Footnotes

github-actions bot commented Jan 9, 2025 • edited Loading

nielsleadholm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scottcanoe Jan 14, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scottcanoe commented Jan 14, 2025

vkakerbeck left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scottcanoe Jan 14, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nielsleadholm left a comment

Choose a reason for hiding this comment

scottcanoe commented Jan 15, 2025

vkakerbeck left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nielsleadholm left a comment

Choose a reason for hiding this comment

scottcanoe commented Dec 17, 2024 •

edited

Loading

github-actions bot commented Jan 9, 2025 •

edited

Loading

scottcanoe Jan 14, 2025 •

edited

Loading

scottcanoe Jan 14, 2025 •

edited

Loading