-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
An input to an ADD node keeps getting casted to float32 despite being float16 in the onnx file causing issues with the ADD op #681
Comments
Since the beginning of the creation of onnx2tf, it has been working according to the specifications that have been determined from the beginning. The tool does not successfully convert models other than those with Float32 inputs. If you simply want to generate a Float16 tflite file, it will work just fine if you first generate a Float32 ONNX file for you. If you have a strong reason for not being able to generate a Float32 ONNX file, please explain that reason in detail before beginning any correspondence on this issue. I know that your R&D is very interesting, but I am concerned that you may have made a mistake in the arrangements for generating the tflite file in the first place. |
Ah that makes sense. I will try it out on a float32 file then. While I would have preferred to have a float16 give that it is for embedded devices, it does give out float16 tflite files as you said. |
Please share the Float32 model. As you surmise, it's an ONNX issue, but without the model files I have no idea. Looking only at the partial image you shared, it is only natural that such an error would occur if the input resolution of the model is dynamic. |
Right my bad. Please find the file on huggingface The input resolution is indeed dynamic, there is no height or width it needs to be resized to in preprocessing, but I am not sure how this comes into play in the conversion process. I have a lot to learn from your codebase which I am doing as we speak |
Can you tell me how you customized the onnx file? The onnx modifier does not allow changes to format version and static shape does does not get reflected downstream |
Please wait a moment. I am in the process of examining what is wrong with the design of this model. I found the process very redundant by design.
|
If changing the input resolution of the model to a fixed resolution and optimizing the overall model structure is not a problem, the easiest way to generate tflite is described below. I don't know how you generated your onnx file, but I don't recommend
onnx2json \
--input_onnx_file_path metric3d-vit-small.onnx \
--output_json_path metric3d-vit-small.json \
--json_indent 2
nano metric3d-vit-small.json
{
"irVersion": "10",
"producerName": "pytorch",
"producerVersion": "2.0.1",
"domain": "",
"graph": {
"node": [
{
"irVersion": "9",
"producerName": "pytorch",
"producerVersion": "2.0.1",
"domain": "",
"graph": {
"node": [
json2onnx \
--input_json_path metric3d-vit-small.json \
--output_onnx_file_path metric3d-vit-small.onnx
onnxsim metric3d-vit-small.onnx metric3d-vit-small.onnx --overwrite-input-shape "pixel_values:1,3,480,640"
|
Wow this is brilliant! Thanks so much for the help. I still have to test it with the tflite api and validate the model results but still you have amazing skills |
Just curious though, why does it not support variable sizes? In every node the input size are written relative to the original height and width so it should be simple enough to plug those values in right? Of course it might very well be very complicated since I am not aware of the entire scope of the project, but if you could tell me why I could learn something more from you again |
You are diligent. It is now 2:00 AM in Japan, so I will explain tomorrow morning. Good night. |
I will explain step by step. The reasons for the very high conversion difficulty in converting the model are simply as follows.
A fairly complex conversion operation is implemented to accommodate automatic conversion of any model that assumes input data other than images. You are the only one who knows that the input data for this model is an image. |
Thanks for the detailed explanation. I get what you are saying. However in each node'
^ This is from the json file The dim param is always in an arithmetic of the original variables height and width. The way I understand from your comment, the onnx2tf builds a numpy array from it. Now since the values of height and width are not know we perhaps cannot create the appropriate array but then how does onnx manage to have its model graph with shapes defined by variables that will only be defined at runtime? Referring to point 6, if onnx2tf is provided information that the input variables height and width are dynamic, and every node downstream is always relative to these variables fixed at runtime, could there not be a work around to define the nodes in tflite with the a similar variable in the entry node?
Right so in the conversion steps H and W are recognized as None and onnx2tf sets them as 1, which then causes an issue with the ConvTranspose. But if values are not provided, we could perhaps a create a map between the variable names as strings and an integer value, and expect the map to be defined at runtime? And till then use the strings as placeholders and create other structures and define the Ops to be done on the string at runtime? It might not work with the how tflite works but onnx supports (though likely not in the way I have described it) it so I have a feeling it could be done with tflite as well. |
Look closely at the names of the attributes. The way
No. That's not possible. (I'm thinking I probably can't do it.) It is already done precisely, but the TFLite runtime is not flexible. To add a little more, the APIs for Keras, TensorFlow, and TensorFlowLite are not flexible. I did not say that ONNX with dynamic tensors cannot be converted to TFLite. In fact, I am able to convert models with some dynamic inputs other than yours. The tutorial below explains how to generate a TFLite with a dynamic tensor and how to perform inference. https://github.com/PINTO0309/onnx2tf?tab=readme-ov-file#14-inference-with-dynamic-tensors-in-tflite However, TensorFlow is very annoying because it cannot define a flexible model structure like ONNX and has already been troubling me for more than two years. TensorFlow, TFLite runtime and Keras completely ignore the name given to the OP. LOL. For your reference, I'll post a JSON image of TFLite's model structure. It is not possible to name each element. All undefined elements can only be declared as Just to be clear, we have a function to customize the behavior of onnx2tf using the file https://github.com/PINTO0309/onnx2tf?tab=readme-ov-file#parameter-replacement Rather than spending the effort on such things, the following very useful tools can be used to generate tflite models in a more flexible, straightforward, and efficient manner. https://github.com/AlexanderLutsenko/nobuco Alternatively, some tools, such as the one below, preserve channel order by extrapolating a large number of https://github.com/onnx/onnx-tensorflow Since onnx2tf is not a tool designed simply to convert models accurately, but rather to convert them while greatly optimizing the redundant model structure of the conversion source, the behavior of thoroughly removing or fusing unnecessary OPs is implemented. Therefore, defining a simple OP name mapping or mapping by element name is itself quite difficult. Statically and accurately tying all the elements together during model transformation and dynamically estimating the shape during inference are completely different behaviors. |
Oh so any fix could not be generalized to all models. But in my case (ie. this particular) if I write an adapter which could parse the string into an arithmetic operation and create some data structure to store it should theoretically be possible?
Ah so the transpose ops mess with the rank ordering of the dimensions. Turning them off (if they are redundant) or simplifying them (using onnxsim) resolves the issue. Got it
Was not aware of this, will try this out soon! The onnx-tensorflow module is not maintained though and is not compatible with anything really or the compatibility is poorly defined. I spent a few hours dealing with that before I landed upon your project (thank God for that). It is odd they stopped maintaining an official converter but maybe the update to tensorflow 2.x made them give up I suppose lol. |
I think you're right, if you work super hard you can do it. 😸 |
Issue Type
Documentation Feature Request
OS
Linux
onnx2tf version number
1.25.7
onnx version number
1.16.2
onnxruntime version number
1.18.1
onnxsim (onnx_simplifier) version number
0.4.36
tensorflow version number
2.17.0
Download URL for ONNX
https://huggingface.co/onnx-community/metric3d-vit-small/resolve/main/onnx/model_fp16.onnx
Parameter Replacement JSON
Description
Trying to deploy a monocular depth estimation model to an autonomous drone flight controller using Snapdragon SOCs. It is for both research, an academic curiosity to try to run a large model on the edge as well as for product development. I have spent so long trying to make these models run on snapdragon SOCs and TFlite is the only framework which can interact well with the snapdragon neural engines, so it is crucial I get this converted. Solving this problem will finally culminate into hours of studies and research work into deploying large models on the edge and I can finally move on to the next stage of development.
When I run
onnx2tf -i model_fp16.onnx
it runs up till 69/1875 layers upon which it gives an exceptionThe entire log output has been attached in a txt file. I went through the model graph on netron and both the inputs seem to be float16 to the ADD op, so I am not sure why exactly it got converted to float32.
I tried to debug by going over the netron graph and finding the node where it was breaking. It seemed fine to me. I tried tried going through the parameter replacement documentation but I couldn't figure out how I could typecast the input back to float16. I did some experiments (which I add in the parameter replacement field) but it didnt go anywhere. I then went through the source code to try to understand the issue, but it just seems to be an implicit type cast somewhere.
These SOCs do not really work with any framework other than TFLite hence it is crucial I get this converted to TF. Moreover the model itself is only available in the onnx format. I have spent many hours trying to make this work and this project is the closest I have gotten from all the dependency hell I have been going through for so long. Having this problem solved will be a massive relief for me and would allow me to move on to further stages of my project.
The other converter scripts which are now deprecated or not compatible with the latest onnx models especially with the constantly updating API. This project seems to be the best in terms of dependency management and it would be really great to have this resolved.
log.txt
The text was updated successfully, but these errors were encountered: