Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote control car example with OpenCV UI #46

Open
wants to merge 30 commits into
base: dev
Choose a base branch
from

Conversation

davinellulinvega
Copy link

@davinellulinvega davinellulinvega commented Mar 25, 2021

A simple script showing how to use OpenCV to remotely control Cozmo, while also streaming its camera feed in either color or gray scale. This is linked to issue #37.

@davinellulinvega
Copy link
Author

I added the image resize and the sharpening for two main reasons:

  1. Nobody likes to drive while squinting at a small patch on their screen.
  2. The sharpening is in preparation of other up coming contributions focusing on head, cube, and pet detection/tracking. For those to be effective, the sharper the image, the easier my job will be down the line.

Copy link
Contributor

@gimait gimait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried this and it was super fun! Great work :)

Just a thing that I thought could make driving a bit more intuitive. I think you could change the keys to make the robot run only while pressing certain keys, you could implement something like this:

  • Have 'w', 'a', 's' and 'd' control the robot movement while pressed. You don't need to send the motor commands every time, just whenever a new key is pressed or unpressed (you can calculate this on each loop: if no keys are pressed, I believe the value of key will be None and you can stop all motors).
  • Then, have two extra keys to change the linear and angular speed of the robot, like you are doing now. This way you can control the speed of the robot, and also move it using the direction keys. If you want to get fancy, you could also add a couple of trackbars so that you can visualize the current speed (and control it with the mouse too).

Another thing you could try is to use pynput instead of OpenCV's getkey to be able to detect multiple key press and releases. It might simplify things a bit actually.

Let me know what you think and if you need help ;)

Edit:
I forgot to mention: you could convert the key to a char to do all key checks instead of converting all characters. That's going to look nicer and will be a lot less operations: you only do one translation when getting the key instead of several on each if statement.

# Exit the program
break

# Losing a bit of computational time to prevent sending motor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you are missing any computational time anywhere, just implementing a feature!

(I just wanted to say that there is no loss here that I can see at least, just fine code ;))

cli.add_handler(pc.event.EvtNewRawCameraImage, on_camera_img)

# Handle cliff and pick-up detection
cli.add_handler(pc.event.EvtCliffDetectedChange, stop_all)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼

HEAD_LIGHT), end='\r')
finally:
# This is to make sure that whatever happens during execution, the
# robot will always stop driving before exiting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼

@davinellulinvega
Copy link
Author

@gimait Thanks a lot for the feedback.
I like your idea for the control scheme a lot more than what I implemented so far. I'll see what I can cook up with pynput and some OpenCV trackbars.
The main reason why I stuck with OpenCV's waitKey() method is that I was not aware of the existence of pynput, pure and simple. All the other keyboard related libraries I have tried so far either were not cross-platform or required an elevation to root to work.
Concerning your last comment, again you are entirely correct. Here I was being a smartass in my comment trying to save compute time while overlooking all those useless conversions.
Well time to get back to work and modify all this.
If you have some time to spare and an interest in the subject, might I have your opinion on issue #47 ?
Again thanks a lot for being awesome and for your ideas.

@gimait
Copy link
Contributor

gimait commented Apr 20, 2021

Cool! Looking forward to see what you come up with. Hope it works with pynput, I have only used another package, getkey, in the past, but I don't think its multiplatform and is long ago since it stop being maintained. Looks like this one has regular updates though, so I though you might be interested.

I'll have a look at #47, we were planning on implementing object recognition features little by little, so it's great that you already implemented something for face recognition. We can talk about it over the ticket.

@davinellulinvega
Copy link
Author

davinellulinvega commented Apr 21, 2021

@gimait So yeah, I am sorry for the 1 commit for all, but after your feedback I have been forced to reorganize my code into a class, which means basically rewriting everything.
I am still not satisfied with the result, since the video feed is now all laggy, but I do not know how I might fix this issue. I you or anyone else has a suggestion (keep in mind that displaying the image cannot be put into another thread).
Edit: As mentioned in my commits, it turns out I just did not read OpenCV's documentation hard enough. You can in fact put the display function (cv.imshow()) inside a thread, if it is followed by a cv.waitKey(). So I have done that in an attempt to make the video feed less laggy, but to no avail. pynput seem to be hogging all the cpu time for itself whenever a key is pressed for a long time.

…() function, putting its content in the rawCameraImage callback, and replacing step() by a simple sleep in the main loop.
…in loop from blocking the execution of other threads while it is asleep.
@davinellulinvega
Copy link
Author

davinellulinvega commented Apr 21, 2021

After some more reflection on the laggy video feed, it seems pretty clear that the GIL is to blame for that (unless I have overlooked something). This brings the question: are we tied to only using threads or can we go multi-process? The idea would be to dedicate a single and simple process to displaying the video, while leaving the rest of the code intact. Any thoughts on that?

Copy link
Contributor

@gimait gimait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked a bit around and I have a couple of comments.

  1. About the lagging and the multithreading: You are using event callbacks in your application, so I don't see where you are having issues with multithreading. Actually, you had it running nicely before with one thread, I think you should come back to that (using a queue to store new frames and displaying them on the screen in the main thread, no need for anything fancy). If it was working fine before, it will work again ;)

  2. I like that you move things into a class, but I believe you are putting too much functionality in one place and this is leading to some confusion. I think you should separate the code in more objects. For example, use a class for the controls/key handling and a separate one to manage and display the images. If you then want to encapsulate everything in one class, create a third one that uses these two. Give it a thought and let me know if you need help structuring it. If this works nicely we could add it as part of the package and not just as example code.

I think you need to work this out a little more. It's looking nice, but we need to figure out what went wrong with the lagging and that.



# Declare a flag telling the program's main loop to stop
GO_ON = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not declaring this as member of OpencvRC?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, there is no reasons for that to be here. Some oversight on my end. Since I have to rework the class structure anyway, I will be sure to put it where it needs be.


# Start the keyboard event listener
self._kbd_listener.start()
self._kbd_listener.wait()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this wait statement stop the main thread here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit confusing indeed, but according to the documentation this will only stop the main thread until the listener is ready to do its job. In essence, it has the same purpose as the self._cozmo_clt.wait_for_robot() function above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good then! I wasn't sure so I thought I'd like to ask ;)

# This might seem odd, but is actually required by OpenCV to perform GUI
# housekeeping. See OpenCV's documentation for imshow() for more
# "in-depth" information.
cv.waitKey(25)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_on_new_image is being called whenever a new frame is available. The frames are streamed as recorded, which means that you don't need to wait here. I don't think the problem is here, but the lagging issues you are seeing could be related to this.

Copy link
Author

@davinellulinvega davinellulinvega Apr 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mention in the comment on line 205-207, this wait is actually required by OpenCV. As per the documentation a call to cv.imshow() should always be followed by a call to cv.waitKey(), otherwise the image will not be displayed. The best I can do here is to set the wait to only 1 ms. But as you suggested in your overall comment, I will try putting the actual display function back into the main thread and have it communicate via queue with the callback.
Edit: As an addendum, I should specify that simply using time.sleep() or a blocking call to Queue.get() instead of cv.waitKey() does not work. If my understanding is correct, it seems that it is cv.waitKey() that is doing the actual work of displaying the image. cv.imshow() is only fast because it does not do much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right, it might be causing trouble here because of the callback though. I think this should go separately from the frame retrieval

# Set both actions to NONE, which will stop the motors
self._set_action(Direction.NONE, Direction.NONE)

def _on_head_tilt_change(self, value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest sending the command for the new head tilt whenever this changes. Otherwise, it will only be applied when you attempt to move the robot.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand the problem here. Within the _on_head_tilt_change() function I assign the new value for the head tilt using self.head_tilt = value. This in fact calls the property setter that I defined on line 450. This setter in turn sends an immediate command to Cozmo to move its head to the new angle. Maybe not the most straight forward way of doing this, but it works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😱 you are right! I completely missed the setters! good work with that, keep it!

self.head_tilt = value * pc.MAX_HEAD_ANGLE.radians + \
(1 - value) * pc.MIN_HEAD_ANGLE.radians

def _on_lift_height_change(self, value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as for the head tilt.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same comment as above applies here. I am using a property setter to make sure the new value stays within Cozmo's physical constraints and send a command for Cozmo to move its lift to new heights. The code corresponding to the property setter is located on line 469.

Comment on lines 497 to 499
self._velocity['angular'] = max(0,
min(value, pc.MAX_WHEEL_SPEED.mmps /
pc.TRACK_WIDTH.mm))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so you know, we are using 120 characters per line in pycozmo.
I don't know if it's very important but I personally like better longer lines ;)

If you want, you can check the flake8 configuration

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing that out, I was not aware of it. I thought I was stuck with 80 characters per line.
I 100% agree with you on longer lines. Especially, on more modern screens where there is so much space. I will configure my IDE to put the cut of at 120 characters from now on (that is going to be liberating).


try:
# Instantiate a dummy event
evt = Event()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need this. You are creating an event to do the work of sleep. I think you can do better ;)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. If I put the display function back into the main thread, this will be useless. I did not like it anyway, so good ridance.

@davinellulinvega
Copy link
Author

Once again thanks a lot for your overall (and more specific) feedback on this new implementation.
Regarding the video feed, as I already commented above, I will indeed put it back into the main thread since it makes more sense to have it separated from the image retrieval/processing thread.
As for the code structure, as you suggested it would be better (or more reusable?) to have two classes: one in charge of the display and the other responsible for keyboard events, as well as the different taskbars. I think I will still keep the OpencvRC class as the main entry point, at least for the example so that everything is a bit tidier looking. If you have any advice or specific requirements for structuring the code I would gladly take it. Otherwise, I will simply take what I already have and split it into two classes (plus make all the corrections mentioned above).

@davinellulinvega
Copy link
Author

davinellulinvega commented Apr 22, 2021

@gimait As mentioned in my previous comment I have simply split the existing code into three classes (and fixed the issues discussed above as well). I am still open to suggestion for different structuring, but I wanted to test if putting the display process back into the main thread would fix the video lag issue. Well it did not, so that's that. More digging is required to solve this particular problem. The rest, though, works like a charm. As always any feedback is welcome.
Thanks a lot in advance for your time and help.

@davinellulinvega
Copy link
Author

@gimait eureka! I have found why the video was laggy. I do not really have the full explanation, but here is the main idea. Basically, the program uses pynput to listen to keyboard events. However, if any of OpenCV's windows is focused, they will also listen for keyboard events. Since we are forced to use the waitKey() function after a call to cv.imshow() it means that any keyboard event will interrupt any GUI "housekeeping" that waitKey() is doing. Therefore, the video will not be showing properly.
Two solutions are available:

  1. Tell people to remove the focus from all OpenCV windows when remote controlling Cozmo.
  2. Have pynput suppress key events. This means that only pynput will see the key presses and releases, but not propagate to other programs.

To be honest I do not like solution 1, but, and that is where things get ironical, it just so happen that under Linux (more specifically within a Xorg environment) passing suppress=True to the keyboard listener randomly crashes X and logs people out of their sessions (this is a known issue).
So we are in a bit of a pickle so to speak. The third option would be to look for another keyboard library, but to be honest I do like pynput and have not found anything else that could replace it.
Any suggestions as to how we might proceed from here?

@gimait
Copy link
Contributor

gimait commented Apr 22, 2021

Oh, I see. Of course, these two are fighting to get the keys. Have you thought of using a different method to display the video? From a quick search, I found you can probably do it with matplotlib or tkinter, but there might be other options. I looked into whether you could do it directly with PIL, but I don't think it's possible, unfortunately.

I think any keyboard library will interfere with waitkey. It is quite a bad design decision from opencv to have only that function as an event manager.

… the video or other issues related to key events.
@davinellulinvega
Copy link
Author

@gimait Well OpenCV's documentation strikes again. To avoid any problems and the "requirement" of calling waitKey() after each imshow() the solution is to put the window in its own thread. For more information refer to the "extensive" documentation. All joking aside, it now all works beautifully.

Comment on lines 136 to 152
"""
Create the window OpenCV will use to display the different track bars, as well as start the keyboard listener.
:return: None
"""

# Create a new thread for the window containing the task bars to prevent any lag issues due to key events
cv.startWindowThread()
# Create a window for the control panel
cv.namedWindow(self._win_name)

# Create the different trackbars controlling the robot's velocities,
# head tilt, and lift height
cv.createTrackbar('Linear velocity', self._win_name, 0, 100, self._on_linear_velocity_change)
cv.createTrackbar('Angular velocity', self._win_name, 0, 100, self._on_angular_velocity_change)
cv.createTrackbar('Head tilt', self._win_name, 0, 100, self._on_head_tilt_change)
cv.createTrackbar('Lift height', self._win_name, 0, 100, self._on_lift_height_change)
cv.createTrackbar('Head light', self._win_name, 0, 1, self._on_head_light_change)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the display code should be in the Display class.
You can pass references to the controller's change callbacks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put that code here since in my mind it was not really part of the display per se. However, it can easily be moved in the Display class as you said. I will have a look at that later today.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you suggested I have moved the window and track bars creation into the display class. The RemoteController's callbacks are passed as argument to the Display.init() function. It is not pretty but it works, and it is more flexible that way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put that code here since in my mind it was not really part of the display per see.

But this is what creates the display window and the sliders in it, without this you don't have a display, right? :) I think it's better to have it separate from the controls, what if you'd like to use another interface for the video streaming instead of opencv?

Copy link
Contributor

@gimait gimait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, from what I've seen the key was to just get rid of the waitkey and run the window in the separate thread? (Sorry, I didn't see much explanation about this in the documentation haha)

@davinellulinvega
Copy link
Author

That's the irony of this situation. Everywhere in the documentation you are being told that imshow() has to be followed by either waitKey() or pollKey() (and I have tried, it is indeed required). However, if you manage to stumble upon this one line of documentation, with no explanation to what the function actually does, it turns out that calling cv.startWindowThread() before creating a named window to display the video feed solves all our problems.

…s references to callbacks in the RemoteController class.
@gimait
Copy link
Contributor

gimait commented Apr 26, 2021

I think it's good now, let's see if we can have it merged soon.

@zayfod, when you have time, could you have a look and see what you think about the example?

@Stilmant
Copy link

Stilmant commented Mar 30, 2023

Hello all. Starting to work on pycozmo.
Thanks for your works.

I wanted to play a bit with your script but I had nochance nor under windows nor under linux
(cozmo) michael@Lux-mstilma-lt:~/cozmo/pycozmo/examples$ python3 opencv_rc.py
NvStorageOpResult(tag=NvEntryTag.NVEntry_CameraCalib, length=0, op=NvOperation.NVOP_READ, result=NvResult.NV_OKAY, data=b'6\xcf\x92C\x88\xed\x92Ca+\x1fC\xa7"\xdcB\x00\x00\x00\x00\xf0\x00@\x01\xd7y\x14\xbce\x97]>Q+\t\xbbub\xb7\xbaK&\xe1\xbd\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00')
NvStorageOpResult(tag=NvEntryTag.NVEntry_SavedCubeIDs, length=0, op=NvOperation.NVOP_READ, result=NvResult.NV_OKAY, data=b'OMZC\x00\x00\x00\x00\x0c\x00\x00\x00\x00\x00\xe5>d\xce\x99aG3\x9e\xb8\xe4\xce\xc4\xb2')
[1637469796, 3097375559, 2999242468]
0x6199ce64
0xb89e3347
0xb2c4cee4
WARNING:root:Did not get any image from the camera. So not displaying anything new.
WARNING:root:Did not get any image from the camera. So not displaying anything new.
WARNING:root:Did not get any image from the camera. So not displaying anything new.

Just hoping this ring some bells.

@davinellulinvega
Copy link
Author

Have you tried to get an image from Cozmo's camera by other means?
The only reason for this script to display this warning is if the queue between the main process and the display thread is empty for more than 0.2 seconds. You could try to either increase the timeout, or remove it completely making the call to _img_queue.get() blocking.
Unfortunately, I do not have a Cozmo on hand anymore. Therefore, I will not be able to test if the code is still functioning. For information, though, the script was developed and tested on Linux. I have never tried it on other OSs.
Finally, I have opened the issue section on my fork of this repository. It might be a good idea to move this discussion over there.

@Stilmant
Copy link

Thanks for your answer and issues section opening.
I got the image from Cozmo's camera indeed. (a still photo however)
I will play more around with the original stack features and see if I missed something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants