Robotic Grasping with 3D Visual Observations #362
Replies: 3 comments 3 replies
-
@AndrejOrsula thanks a lot for the report and, first off, congrats for the great work! It's really rewarding seeing projects like yours working this well both in simulation and in the real world 🚀 Sorry for commenting this late, but this discussion for some reason went under my radar and I just found it.
This is the type of feedback we are looking for from our community. The use cases of RL, even limited in the robot learning domain, are quite wide, and these kind of posts are very helpful to us and to external contributors to focus the development. I'd encourage all downstream users to provide a similar feedback, it's much appreciated!
This is definitely our current biggest limitation, we are well aware of it (#199, #249, #287). Unfortunately, the first attempt to overcome it, #249, was too hacky and performances were really really bad. In this moment, there is no clean solution beyond what you implemented, which is pretty smart and somehow similar to what @FirefoxMetzger prototyped in #287 (comment) and FirefoxMetzger/ropy. Some fresh air on the problem could come from gazebosim/gz-sim#793, but it's too early to tell.
Using a middleware and in general IPC is a valid solution. ROS has nice resources ready to be used. In general, I tend to prefer solutions that do not involve any network transport due to reproducibility problems (and it's just easier doing all things in the same Python code). For motion control, using Ignition plugins and custom controllers (similar to
I would have taken a very similar approach. I believe that this is the most straightforward path, and kudos for all the model processing that is often a pretty tedious task!
This is a shared curse, I relate very much, welcome to the club 😄 I just commented #363 which you might find interesting. Currently, when the contact system is enabled, it makes the simulation much slower. Contact rich scenario like yours, or also like mine for bipedal locomotion, are currently quite slow with DART. Consider that manipulation is a much lighter task, I really envy your RTF! Mine are around 30% 😅 Concurrent execution definitely mitigates the problem, but it's a workaround rather than a solution. We cannot do much, maybe new physics backends (gazebosim/gz-physics#153, leggedrobotics/raisimLib#40) might come helpful here.
I'm really happy that sim2real was smooth in your case, not every researcher in this domain is so lucky :) I'm not sure how you implemented the real-time execution, our approach not yet fully finalized nor tested is #94. In general, guaranteeing safety is still a pretty open research question in robot learning.
Allowing to specify the number of iterations is relatively simple feature to add. Right now, in order to simplify the understanding of the different rates (physics, controllers, simulator) we decided to keep the
It is not yet supported, but there's some recent relate upstream activity gazebosim/gz-sim#515. Of course being able to do it from APIs would be better but using transport is a workaround that already works.
Yes this is more related to upstream, I'm don't have enough knowledge of the sensors / rendering stack to comment. I'm not sure if these parameters could be changed dynamically, but if it's possible, a user commands approach similar to what done with the lights could be a possible implementation.
The randomizer is just a To conclude, and for the records, for those reading this comment, this month (June 2021) @AndrejOrsula will present his work in the community meeting (good luck!), and the presentation will complement his description above and the repos. |
Beta Was this translation helpful? Give feedback.
-
... actually, this overhead could be a lot larger than you'd initially expect. I've noticed that ignition's communication layer is - at least for camera images (I assume RGBD is similar, but take this with some salt) - quite horrible. The (protobuf) message contains a raw, uncompressed image. The exact handling is machine-specific but involves at least one copy of the data. Certainly not ideal for raw image data even at low framerates like 30Hz. Put into context, the render of a ~10s simulation takes about 35s on my desktop (AMD Ryzen 7 5800X, RTC3070) of which 30s are spent on rendering and a bit less than 5s are spent on physics. If you send that data through a bridge to ROS2 you simply double the message overhead. That is because, afaik, ROS2 also uses the zmq+protobuf combo. Aka, ignition makes a copy as the message is sent off via This could be solved elegantly by ignition and/or ROS offering us to reconfigure communication to use zmq's
@AndrejOrsula I may have missed a feature here, but is it possible to do "on-demand rendering" of sensors in ignition?
Random sidenote: ropy.transform offers tf2-like functionality in python. It started from my desire to express coordinate transformations and, in particular, projections in a style similar to graph computations in frameworks like tensorflow or torch. Now ropy has a module to do tf2-style coordinate transformations in N-dimensions. The cool thing is that - since it builds on top of numpy - you get full interoperability with the entire scientific python stack (e.g., dask for multi-threading or cluster use), and using things like
In my (somewhat limited) experience, this is either because the environment is slow, or because the chosen RL algorithm is not computationally efficient. It would be interesting to see some profiling runs of the full pipeline to get a sense of it. My experience with A3C is that filling the replay buffer with "fresh" actions tends to be the most time-consuming part and that is either because the environment is slow by default or because the rollout loop is not efficiently implemented. While toying with Expert Iteration I found the same thing: the rollout being the bottleneck, even though in this case, I can rule out the environment because it was small and rather optimized (iirc it accounted for less than 10% of the runtime in my profiles). Another factor is the choice of TF vs torch. Theoretically, it doesn't matter much, but in practice, I see a lot of people shooting themself in the foot with torch and then they artificially bottleneck themself by constantly synchronizing the GPU and the CPU. (Disclaimer: We are only one robotics group/lab and the majority of our division does research on supervised image processing, which biases my experience. That said, you can equally shoot yourself in the foot with this in supervised learning or RL.)
@AndrejOrsula I'm a bit curious about this as well. Last I checked, increasing the RTF above 100% would speed up physics, but keep all the sensors at the original speed, which would cause synchronization issues between controllers, sensors, and the environment. Has this been a problem for you? (maybe the behavior has changed since) |
Beta Was this translation helpful? Give feedback.
-
Sorry, my comment wasn't meant as a criticism of your choice of algorithm. It's moreso a general statement that most (all?) RL algorithms come with a quite high big O, low sample efficiency, and lots of blocks with sequential dependencies.
@AndrejOrsula Did you write the octree code for the CPU and the rest of the network exists on the GPU? In this case, this could explain the high execution time.
This is indeed one of the ways to shoot yourself into the foot with torch; though it mainly applies in supervised learning, and not so much to RL unless you save the replay buffer to disk instead of keeping it in memory.
Hm, I always thought JAX is part of TensorFlow. I will have to double-check that.
That sounds really good. I guess I will try to bump my RTF/physics speed again and see how it performs. Thanks for the information. |
Beta Was this translation helpful? Give feedback.
-
Hello everyone. First of all, thank you very much for your effort that made gym-ignition possible. Here is how I used it in my latest project for my Master's Thesis (GitHub repo).
Short Description
I investigated the applicability of DRL for vision-based robotic grasping of diverse objects. However, instead of the traditionally used RGB/RGB-D images (2D/2.5D), I tried employing octrees (3D) to learn an end-to-end policy with the aim to see if it brings any benefits. The goal of agent is to solve a very simple episodic task, which involves grasping of any object from the workspace through continuous actions in Cartesian space, i.e. translational gripper displacement, yaw rotation, and closing/opening of the gripper. For RL, I utilised model-free off-policy actor-critic algorithms from stable-baselines3 (TD3, SAC and TQC). All training was performed inside simulation but Sim2Real transfer on a real robot was also tested.
I will just mention some specific parts related to Ignition/gym-ignition that could be discussed further. For many of these, there might be a much better solution that I have not found and/or did not think of. Some of these also relate to the approach of fully using Python, which was selected to simplify integration with different modules due to the limited time I had for the project and my lack of prior experience with DL/RL. I believe it would be much easier to handle many of these issues with a lower-level language.
Sensors (RGB-D Camera)
Due to the current limitations of #249, I have decided to try a different approach of getting data from sensors. First, I attempted to parse the output of
ign topic -e -t ...
subprocess, but that was just too hacky. Instead, I use ROS 2 and convert all messages between Ignition Transport and ROS 2 via ros_ign_bridge. Although it causes some overhead, it is simple to implement and provides additional benefits such as use of RViz 2 and other useful tools like tf2. Besides the reduced determinism of simulation, the lack of ability to trigger image capture is by far the largest disadvantage. Therefore, the camera framerate needs to be set higher than the update rate of agent (e.g. 4x), while also making sure that a new observation is received after each update (extra steps that are potentially not necessary). Not a huge deal, but definitely very far from ideal.Robot Controller and Motion Planning
For motion planning, I just used MoveIt 2 to generate joint trajectories based on the selected actions. To execute these trajectories, JointTrajectoryController system plugin is used. Once I have time to try out ign_ros2_control, I will probably transition to it in order to simplify the setup for arbitrary robot models and make it possible to employ the same interface also on real robots (once their ros2_control implementation is ready). Similar to sensors, ROS 2 with ros_ign_bridge is used to facilitate the communication.
More on ROS 2
I did not create any Runtime for ROS 2 because I was not exactly sure how that would look. For me, it was easier to just create ROS 2 nodes as individual submodules that are part of the Task object, e.g. one for camera subscriber and another for Moveit 2 requests.
Dataset
Google Scanned Objects collection from Fuel and a bunch of free PBR textures for ground plane are used. For all the different models, a single RandomObject class is used (not sure whether there is a better way). Because these models do not have inertial properties associated with them, they were estimated from their mesh and random mass. Their collision geometry also needs to be decimated, otherwise the stepping of simulation is unbearably slow.
Performance
Training is definitely a very time-consuming process. On my laptop (130 W), 500k time steps takes approximately three days to complete, albeit a large portion of it is the DL itself. I am barely approaching ~200% RTF when stepping the full environment (4 objects, rendering, random actions) with step size of 4ms. It is a bit better for a single object at ~350% RTF. That is with low-poly collision geometry, which is actually disabled for lower links of the robot. Such a slow training makes hyperparameter tuning especially painful (both manual and automatic). Parallelised environments would definitely help and make the environments more scalable!
Sim2Real
This part was relatively painless. I added a very simple runtime for it. It's a quick-and-dirty solution that allows evaluation of trained agents on a real robot (no training). Most of the aspects of RL are manual, including determining success and resetting the environment. It also allows manual stepping as a first step when testing the transfer to make sure nothing breaks. I cannot guarantee its safety though.
Other Thoughts
stepsPerRun
. I can see several use-cases for it such as tasks with actions that trigger action primitives (e.g. discrete pixel-wise action space that determines grasp pose) and all tasks defined as semi-MDPs. It is also useful for any task during reset of the environment that requires the agent to wait for the simulation to reach a certain condition before continuing (something that should not/cannot be manually set). Related to Moving code from Python to C++: high-level overview #304.The source code of this project is kinda messy right now. If I have time for it in the future, I might refactor it, split into separate submodules and possibly rewrite in a lower-level language (I do not normally use Python and I still kinda dislike it :D). Let me know if you think there is something of interest for gym-ignition that you would like to include from this project. I could then extract it and open a PR here, if desired.
Beta Was this translation helpful? Give feedback.
All reactions