Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrating GOT-OCR2.0 in Transformers 🤗 #137

Open
yonigozlan opened this issue Oct 18, 2024 · 7 comments
Open

Integrating GOT-OCR2.0 in Transformers 🤗 #137

yonigozlan opened this issue Oct 18, 2024 · 7 comments
Labels
good first issue Good for newcomers

Comments

@yonigozlan
Copy link

Hi!
First of all, congrats on such a great model!
I am an MLE at Hugging Face, and given the popularity and performance of your model, we are eager to integrate it into the Transformers 🤗 library. If you are interested in working with us (mostly helping us debugging if needed or clarifying certain aspects of the model) that would be great!
Looking forward to hearing back from you

Best,
Yoni

@Ucas-HaoranWei
Copy link
Owner

Hi Yoni,
It's an honor to integrate GOT into Transformers.
If you need any help, please feel free to contact me anytime.
My email is [email protected]

Best,
Haoran Wei

@Ucas-HaoranWei Ucas-HaoranWei added the good first issue Good for newcomers label Oct 18, 2024
@yonigozlan
Copy link
Author

yonigozlan commented Nov 14, 2024

Hi again Haoran,
The PR is up here :).
For now, I intend to only support inference in Transformers. In your experience since the model came out, are there a lot of users fine-tuning this model? I am guessing maybe at least for the stage3 post-training as described in the paper?
I am trying to gauge how useful it would be to support this last stage of post-training in Transformers. In any case, it can always be added later.
Thank you!

@Ucas-HaoranWei
Copy link
Owner

Hi again Haoran, The PR is up here :). For now, I intend to only support inference in Transformers. In your experience since the model came out, are there a lot of users fine-tuning this model? I am guessing maybe at least for the stage3 post-training as described in the paper? I am trying to gauge how useful it would be to support this last stage of post-training in Transformers. In any case, it can always be added later. Thank you!

Hi, Yoni,
Great job, I will review this PR as soon as possible.
There are still quite a few users who need to fine-tune the model, but I also suggest that you wait and see what people's future requirements are and then decide whether to incorporate training and fine-tuning later.
Thank you.

@yonigozlan
Copy link
Author

Hi Haoran!
The PR is almost ready to go! I just wanted to ask some clarification on the box finegrained option. I saw that formats such as [x, y] or [x, y, x, y] could be added to the queries, but I was wondering what the x and y precisely are in both cases. Are they the coordinates in image pixels? In resized image pixels? And I'm confused in the first case how you can define a bow with two coordinates. Thank you in advance!

@Ucas-HaoranWei
Copy link
Owner

Hi Yoni,
Thank you for your work
[x y] is a single point, which can be ignored because GOT does not use the single-point positioning text (it is an initial idea that is not adopted later). [x0 y0 x1 y1] is a box, x0 y0 is the top left corner, and x1 y1 is the bottom right corner. They are the coordinates in image pixels. Their definition method is first to normalize the coordinates and then by 1000 times to eliminate decimal points.

@yonigozlan
Copy link
Author

Thanks for the explanation!
By the way, would you like us to transfer the weights of the transformers implementation to your hub organization page? And if so what would you like us to name the weights? I was thinking of stepfun-ai/GOT-OCR2.0-hf , as we usually use the suffix "-hf" to signal a transformers model.

@Ucas-HaoranWei
Copy link
Owner

Of course, '-hf' is fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants