You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We want to integrate our target matching model into the OBC CV pipeline. This model is an alternative to the segmentation and classification stages of the pipeline. The idea is that instead of outright guessing what characteristics a target has, we can find which one of the competition objective targets it closely resembles. Remember, at the start of the mission the competition tells us which targets to aim our five water bottles at. Knowing this information, we can compare any arbitrary target we take a picture of to these known targets.
One problem we will run into is how to obtain a reference image of the competition targets. We can't just compare any target we take a picture of to a textual description of one of the competition targets. The way we will do comparison is by comparing the features of the query image (from our camera) and the reference image (the ones the competition tells us to go for). Features are an abstract representation of a given image that highlight various notable aspects. You can think of them as a way to simply an image and only represent the characteristics we care about. So, we have the query image since we just took the picture and saliency cropped the target out of the fullsize image. However, we need an image that matches the competition target descriptions. To do this we can take care of our existing work done with not-stolen. not-stolen is a Python application that we can either shell out to or reimplement in C++. That's a discussion for another issue (#77). But for completion of this issue, assume that the images of the competition objectives are able to be acquired. You can use any cropped images of targets as reference images.
Assuming you have your reference images, we can start matching. So you will take a query image that comes from the output of saliency. This image will be cropped from the original large aerial image, so it should only contain the target. Then you will pass this image into the target matching model and get out the features of the image. Then once you have the features of the query image, you can compute the difference between its features and the features of all the reference images. This difference can be thought of as a measure of similarity between the two. The smaller the difference in features, the more similar the targets are. Whichever reference image has the smallest difference in comparison to the query image can be considered our match. Once we have this match, we can assign this query image to the bottle ID that the matched reference image had.
Data Flow Outline
Here I will outline how the inputs are processed until we get the desired output. This closely follows the Python code in this Jupyter Notebook. I would highly recommend reading that first (especially the last cell).
You will need to do some converting to get the image in a state that the model can accept as input. The model does not take an openCV matrix but instead various libtorch/Pytorch types. To do the conversion, I would copy over the functions from this comment. You may need to call ToTensor first and then pass the output into ToInput which creates a std::vector<torch::jit:IValue> that the model can accept. The types get pretty weird, I know. @hashreds (Igor on Discord) has been working on related work with saliency so I would reach out to see what he has working. Right now he's working through an issue with the types the model expects, but let's hope that this model doesn't result in the same issue.
Once you have the appropriate type, you will have to resize the tensor to match what the model expects. The model requires an input that is of the shape 3x128x128. You should probably do this before converting to a std::vector<torch::jit::Ivalue> and operate on the at::Tensor type. This interpolate function seems like it would do the trick (see discussion here. Feel free to find another way of doing the resizing/interpolation. We could also do the resize when the image is an opencv Matrix (before turning it into a tensor). The OpenCV API seems a lot easier to use so this might be a better option, but it's up to you and whatever works.
Ok, once you have the right type (should be std::vector<torch::jit::IValue>) and size (should be 3x128x128), you can finally feed it into the model as input. To do so, you can use the model's forward method. Here's an example of how it works. Then, you should get a tensor as output (you might need to call the .toTensor() function from the example code)
Next, we need to calculate the distance in features between this query and the references from the competition objectives. Let's assume we've computed the features for all the reference images beforehand (maybe in the constructor of Matching, see the implementation notes section for details) (also these features would be computed just as we did for our query in the last few steps). To calculate the distance, you can follow the code from this Jupyter Notebook below and convert it to C++ (torch.norm has a c++ equivalent).
#Compute distance between all images and the query
distances = []
for i in range(NUM_NEGATIVES+1):
distances.append(torch.norm(query_feat - possible_targets[i], p=2))
Now, we should have distances between the our query image features and the features of all the reference competition objectives. It should be as simple as choosing the smallest one and assigning that as the match. With this information, you can populate the bottleDropIdx field of the MatchResult struct. To populate the foundMatch boolean field, you can check if the computed distance meets the matchThreshold private member.
Then you should be good to return the MatchResult!
Implementation Notes
First focus on loading the model with libtorch and C++ following the sample code here. We also have it working in one of the integration tests. You'll want to use the model weights from here. When loading the model file be weary of the paths. If you use a relative path it will likely be relative to the build folder since that's where we run all our commands. With regards to where this goes, I think it makes sense to put this in the constructor of the Matching class.
As mentioned above, it might make sense to precompute the features of all the reference/competition objective targets. We will get these when the mission starts up and they won't change from query to query. This means that the constructor needs to load the model from its weights and then pass the competition target images as input. Finally, the reference image feature tensors need to be stored as a private member of the class.
The text was updated successfully, but these errors were encountered:
Subtask of #15.
Depends on #16.
High Level Overview
We want to integrate our target matching model into the OBC CV pipeline. This model is an alternative to the segmentation and classification stages of the pipeline. The idea is that instead of outright guessing what characteristics a target has, we can find which one of the competition objective targets it closely resembles. Remember, at the start of the mission the competition tells us which targets to aim our five water bottles at. Knowing this information, we can compare any arbitrary target we take a picture of to these known targets.
One problem we will run into is how to obtain a reference image of the competition targets. We can't just compare any target we take a picture of to a textual description of one of the competition targets. The way we will do comparison is by comparing the features of the query image (from our camera) and the reference image (the ones the competition tells us to go for). Features are an abstract representation of a given image that highlight various notable aspects. You can think of them as a way to simply an image and only represent the characteristics we care about. So, we have the query image since we just took the picture and saliency cropped the target out of the fullsize image. However, we need an image that matches the competition target descriptions. To do this we can take care of our existing work done with not-stolen. not-stolen is a Python application that we can either shell out to or reimplement in C++. That's a discussion for another issue (#77). But for completion of this issue, assume that the images of the competition objectives are able to be acquired. You can use any cropped images of targets as reference images.
Assuming you have your reference images, we can start matching. So you will take a query image that comes from the output of saliency. This image will be cropped from the original large aerial image, so it should only contain the target. Then you will pass this image into the target matching model and get out the features of the image. Then once you have the features of the query image, you can compute the difference between its features and the features of all the reference images. This difference can be thought of as a measure of similarity between the two. The smaller the difference in features, the more similar the targets are. Whichever reference image has the smallest difference in comparison to the query image can be considered our match. Once we have this match, we can assign this query image to the bottle ID that the matched reference image had.
Data Flow Outline
Here I will outline how the inputs are processed until we get the desired output. This closely follows the Python code in this Jupyter Notebook. I would highly recommend reading that first (especially the last cell).
cv::Mat
(We can check the function signature of thematch
function and see that it takes aCroppedTarget
which embeds acv::Mat
https://github.com/tritonuas/obcpp/blob/feat/cv-orchestrator-matching/include/cv/matching.hpp#L40)ToTensor
first and then pass the output intoToInput
which creates astd::vector<torch::jit:IValue>
that the model can accept. The types get pretty weird, I know. @hashreds (Igor on Discord) has been working on related work with saliency so I would reach out to see what he has working. Right now he's working through an issue with the types the model expects, but let's hope that this model doesn't result in the same issue.std::vector<torch::jit::Ivalue>
and operate on theat::Tensor
type. This interpolate function seems like it would do the trick (see discussion here. Feel free to find another way of doing the resizing/interpolation. We could also do the resize when the image is an opencv Matrix (before turning it into a tensor). The OpenCV API seems a lot easier to use so this might be a better option, but it's up to you and whatever works.std::vector<torch::jit::IValue>
) and size (should be 3x128x128), you can finally feed it into the model as input. To do so, you can use the model'sforward
method. Here's an example of how it works. Then, you should get a tensor as output (you might need to call the.toTensor()
function from the example code)Matching
, see the implementation notes section for details) (also these features would be computed just as we did for our query in the last few steps). To calculate the distance, you can follow the code from this Jupyter Notebook below and convert it to C++ (torch.norm has a c++ equivalent).pairwise_distance
function used here. Here's the C++ version.bottleDropIdx
field of theMatchResult
struct. To populate thefoundMatch
boolean field, you can check if the computed distance meets thematchThreshold
private member.MatchResult
!Implementation Notes
build
folder since that's where we run all our commands. With regards to where this goes, I think it makes sense to put this in the constructor of the Matching class.The text was updated successfully, but these errors were encountered: