-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to add visual distractions? #685
Comments
Is this your own custom environment? What environment is this exactly? And are you planning to use the GPU sim + rendering? |
Hi @StoneT2000 , I am using SimplerEnv and TorchRL. import torch
import numpy as np
from tensordict import TensorDict, TensorDictBase
from torchrl.envs import EnvBase
from torchrl.data import Composite, Unbounded, Bounded
from sapien.pysapien.render import RenderTexture2D
import sapien
class SimplerEnvWrapper(EnvBase):
def __init__(self, base_env, **kwargs):
super().__init__(**kwargs)
self._device = torch.device(kwargs.get("device", "cpu"))
self.base_env = base_env
self.numpy_to_torch_dtype_dict = {
bool: torch.bool,
np.uint8: torch.uint8,
np.int8: torch.int8,
np.int16: torch.int16,
np.int32: torch.int32,
np.int64: torch.int64,
np.float16: torch.float16,
np.float32: torch.float32,
np.float64: torch.float64,
}
self._make_specs()
def _make_specs(self):
raw_observation_spec = self.get_image_from_maniskill3_obs_dict(
self.base_env, self.base_env.observation_space.spaces
)
height = raw_observation_spec.shape[-3]
width = raw_observation_spec.shape[-2]
self.channels = raw_observation_spec.shape[-1]
shape = (height, width, self.channels)
observation_spec = {
"pixels": Bounded(
low=torch.from_numpy(
raw_observation_spec.low[0, :, :, : self.channels]
).to(self._device),
high=torch.from_numpy(
raw_observation_spec.high[0, :, :, : self.channels]
).to(self._device),
shape=shape,
dtype=torch.uint8,
device=self._device,
)
}
self.observation_spec = Composite(**observation_spec)
action_space = self.base_env.action_space
self.action_spec = Bounded(
low=torch.from_numpy(action_space.low).to(self._device),
high=torch.from_numpy(action_space.high).to(self._device),
shape=action_space.shape,
dtype=self.numpy_to_torch_dtype_dict[action_space.dtype.type],
device=self._device,
)
self.reward_spec = Unbounded(
shape=(1,), dtype=torch.float32, device=self._device
)
self.done_spec = Unbounded(shape=(1,), dtype=torch.bool, device=self._device)
def get_image_from_maniskill3_obs_dict(self, env, obs, camera_name=None):
if camera_name is None:
if "google_robot" in env.unwrapped.robot_uids.uid:
camera_name = "overhead_camera"
elif "widowx" in env.unwrapped.robot_uids.uid:
camera_name = "3rd_view_camera"
else:
raise NotImplementedError()
img = obs["sensor_data"][camera_name]["rgb"]
return img
def _reset(self, tensordict: TensorDictBase = None):
base_color_texture = RenderTexture2D(
"/home/user/Downloads/cliff_side_4k.blend/textures/cliff_side_diff_4k.jpg"
)
for actor_name in self.base_env.unwrapped.scene.actors.keys():
for part in self.base_env.unwrapped.scene.actors[actor_name]._objs:
for triangle in (
part.find_component_by_type(sapien.render.RenderBodyComponent)
.render_shapes[0]
.parts
):
# triangle.material.set_base_color([0.8, 0.1, 0.1, 1.0])
triangle.material.set_base_color_texture(base_color_texture)
obs_dict, _ = self.base_env.reset()
rgb_obs = (
self.get_image_from_maniskill3_obs_dict(self.base_env, obs_dict)[
0, :, :, : self.channels
]
.to(torch.uint8)
.squeeze(0)
)
text_instruction = self.base_env.unwrapped.get_language_instruction()
done = torch.tensor(False, dtype=torch.bool, device=self._device)
terminated = torch.tensor(False, dtype=torch.bool, device=self._device)
return TensorDict(
{
"pixels": rgb_obs,
"text_instruction": text_instruction,
"done": done,
"terminated": terminated,
},
batch_size=[],
device=self._device,
)
def _step(self, tensordict: TensorDictBase):
action = tensordict["action"]
obs_dict, reward, done, _, info = self.base_env.step(action)
rgb_obs = (
self.get_image_from_maniskill3_obs_dict(self.base_env, obs_dict)[
0, :, :, : self.channels
]
.to(torch.uint8)
.squeeze(0)
)
text_instruction = self.base_env.unwrapped.get_language_instruction()
return TensorDict(
{
"pixels": rgb_obs,
"text_instruction": text_instruction,
"reward": reward,
"done": done,
},
batch_size=[],
device=self._device,
)
def _set_seed(self, seed: int):
self.base_env.seed(seed) PS: I am not sure I am doing this right, should I apply the changes before the environment reset? Where from mani_skill.envs.sapien_env import BaseEnv
...
env_name = cfg["env"]["name"]
sensor_configs = dict()
sensor_configs["shader_pack"] = "default"
base_env: BaseEnv = gym.make(
env_name,
max_episode_steps=max_episode_steps,
obs_mode="rgb+segmentation",
num_envs=1,
sensor_configs=sensor_configs,
render_mode="rgb_array",
sim_backend=cfg["env"]["device"],
) I am testing the following existing environments from maniskill3 (using SimplerEnv):
My goal is to leverage the flexibility of maniskill3/simplerenv and be able to :
The more I can achieve from this list, the better. Ideally I would like to apply these randomization at the start of the episode. Yes I plan on using the GPU to improve simulation performance (fps), I assume that |
Thanks for the extensive notes, all of what you suggest are possible but it depends a little bit on what models you want to evaluate actually.
There are two ways forward. The easiest option actually is to build a new table-top environment (take one of the templates or e.g. the pick cube environment) and add the parallelizations / randomizations you want for a custom environment. Only choose this option if you don't need to verify real2sim alignment and just simply want a controllable robot and objects. Alternatively you can copy the code for the bridge dataset digital twins and modify the attributes in there to change the default RGB overlays, swap the overlay at each timestep when using video, modify the scene loader to add distractor objects etc. Let me know which option you think is needed and I can suggest the relevant docs/code to do what you want. |
Thanks a lot @StoneT2000 for the amazing reply!
I plan on training and evaluating models (training from scatch).
Yes. I want to train in simulation using an environment that is as realistic as possible (visually) but if this hinders training time I'm open to try to train using a hybrid approach where the environments are still realistic but maybe slightly less (eg: without ray-tracing) to boost collection speed during training and then the visual generalization benchmark can be more realistic and slower.
This sounds interesting as I also want not just 1 environment but at least 2-3 that can show increasing level of difficulty (eg: easy to hard). |
@StoneT2000 After looking at the doc for Maniskill3, I'm tempted to use Maniskill3 directly instead of SimplerEnv. Would it be feasible to use Maniskill3 directly while also being able to add the visual distractions ? Any help is appreciated! |
Yes using ManiSkill3 directly is likely better. SIMPLER's setup is based on maniskill2 and not well parallelized whereas maniskill3 is. Moreover SIMPLER is real2sim only, it is not designed for sim2real. What kind of visual distractions were you thinking? Like spawning irrelevant objects in a table? (that is possible, just follow the custom tasks tutorial on loading objects and initializing them). As for more parallel domain randomizations https://maniskill.readthedocs.io/en/latest/user_guide/tutorials/domain_randomization.html details some of them. I will mark this issue as a request to add more docs (eg randomizing textures, robot controllers PD parameters and more). |
Thanks for the detailed answer @StoneT2000 , I will use ManiSkill3 directly then and I will focus on the simple/default rendering quality which is already fairly good anyways.
I would like the following visual randomization :
Makes sense, these might already be possible and all is needed is some documentation improvement. |
Hi,
I would like to change textures (randomly or via a png file) of the various objects in the scene (eg: before every new episode).
I managed to change the
base_color
but when I change the textures, nothing happens.Any pointers is appreciated.
The objective is to change textures, camera FOV, lighting and if possible add new objects to evaluate methods for visual generalization.
PS: I do not know much about textures, I downloaded a sample file from https://polyhaven.com/a/cliff_side
The text was updated successfully, but these errors were encountered: