Support Kosmos-2.5 #31711

tic-top · 2024-06-29T15:48:17Z

What does this PR do?

#30877 Implementation of Kosmos-2.5 in transformers.
https://huggingface.co/kirp/kosmos2_5/blob/main/README.md

Usage

from PIL import Image
import requests
import torch
from transformers import AutoProcessor, AutoModelForVision2Seq, AutoConfig
import re

repo = "kirp/kosmos2_5"
device = "cuda:0"
config = AutoConfig.from_pretrained(repo)

NAME = {
    "f" : "flash_attention_2",
    "s" : "sdpa",
    "e" : "eager",
}

# all sdpa fp16
dtype = torch.float16
config._attn_implementation = NAME["s"]
config.vision_config._attn_implementation = NAME["s"]
config.text_config._attn_implementation = NAME["s"]

# # all sdpa fp16
# dtype = torch.float16
# config._attn_implementation = NAME["s"]
# config.text_config._attn_implementation = NAME["s"]
# config.vision_config._attn_implementation = NAME["s"]

# # all eager bf16
# dtype = torch.bfloat16
# config._attn_implementation = NAME["e"]
# config.text_config._attn_implementation = NAME["e"]
# config.vision_config._attn_implementation = NAME["e"]


model = AutoModelForVision2Seq.from_pretrained(repo, device_map = device, torch_dtype=dtype, config=config)
processor = AutoProcessor.from_pretrained(repo)

url = "https://huggingface.co/kirp/kosmos2_5/resolve/main/receipt_00008.png"
image = Image.open(requests.get(url, stream=True).raw)
prompt = "<ocr>" # <md>

inputs = processor(text=prompt, images=image, return_tensors="pt")
height, width = inputs.pop("height"), inputs.pop("width")
raw_width, raw_height = image.size
scale_height = raw_height / height
scale_width = raw_width / width

inputs = {k: v.to(device) if v is not None else None for k, v in inputs.items()}
inputs["flattened_patches"] = inputs["flattened_patches"].to(dtype)

generated_ids = model.generate(
    **inputs,
    max_new_tokens=1024,
)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)

def postprocess(y, scale_height, scale_width):
    y = y.replace(prompt, "")
    if "<md>" in prompt:
        return y
    pattern = r"<bbox><x_\d+><y_\d+><x_\d+><y_\d+></bbox>"
    bboxs_raw = re.findall(pattern, y)
    lines = re.split(pattern, y)[1:]
    bboxs = [re.findall(r"\d+", i) for i in bboxs_raw]
    bboxs = [[int(j) for j in i] for i in bboxs]
    info = ""
    for i in range(len(lines)):
        box = bboxs[i]
        x0, y0, x1, y1 = box
        if not (x0 >= x1 or y0 >= y1):
            x0 = int(x0 * scale_width)
            y0 = int(y0 * scale_height)
            x1 = int(x1 * scale_width)
            y1 = int(y1 * scale_height)
            info += f"{x0},{y0},{x1},{y0},{x1},{y1},{x0},{y1},{lines[i]}"
    return info

output_text = postprocess(generated_text[0], scale_height, scale_width)
print(output_text)

amyeroberts · 2024-07-01T10:10:04Z

cc @ydshieh

ydshieh · 2024-12-17T17:35:12Z

Hi @zucchini-nlp Great reviews! Could you check the changes shown below? I think all comments addressed.

(and also the removed generate)

https://github.com/tic-top/transformers/compare/0ec499a841b35ce9c77e072362a09a20a287a2f5..8fc9699655c5b80b0c5cbb1fadfc4837daf1ad90

zucchini-nlp

Thanks a lot for iterating and removing the overwritten generate()!

ydshieh · 2024-12-18T12:57:42Z

run slow

github-actions · 2024-12-18T12:58:35Z

This comment contains run-slow, running the specified jobs: ['models/kosmos2_5'] ...

stevhliu

Thanks!

docs/source/en/model_doc/kosmos-2.5.md

stevhliu · 2025-01-13T16:14:14Z

docs/source/en/model_doc/kosmos-2.5.md

+**Markdown Task:** For usage instructions, please refer to [md.py](https://huggingface.co/microsoft/kosmos-2.5/blob/main/md.py).
+
+**OCR Task:** For usage instructions, please refer to [ocr.py](https://huggingface.co/microsoft/kosmos-2.5/blob/main/ocr.py).


Would be nice to include the code snippets here so users don't have to click on another link

src/transformers/models/kosmos2_5/configuration_kosmos2_5.py

Co-authored-by: Steven Liu <[email protected]>

[email protected] added 14 commits June 30, 2024 09:22

.

532b1e0

.

234149a

import sort

05c9943

.

7d8783b

format

2de836d

format

9eece30

reformat

3a0cfaa

reformat

b72fe0a

reformat

589e9ef

Merge remote-tracking branch 'upstream/main' into main

fe51247

fixup

241b0bf

init test

ba8b3dd

init weight

9c74c61

modeling_test in progress

363180b

ydshieh self-assigned this Jul 1, 2024

ydshieh added the run-slow label Jul 1, 2024

[email protected] added 13 commits July 1, 2024 17:39

model test

29d7cff

better initilization

42dd2ea

model test

9046ec5

restore ks2_test; update ks25 test

b64e300

load from the config

916781a

processor test

578acce

run slow-prepare some test

c306325

skip sdpa test

b7d5ec9

test finish

f05e361

duplicate import

f19b06c

add mean

73dddc5

std

cd8ac6e

fixup

35ef655

ydshieh added 7 commits December 17, 2024 16:33

temp

ce222a6

temp

876cb6b

temp

a3638ea

temp

30f927a

temp

a65a9b1

temp

7c99fd0

fix

ec9ea0c

ydshieh added 5 commits December 17, 2024 18:37

fix

8fc9699

fix

22cb70d

fix

001fd70

fix

d1116f5

fix

6f09a51

zucchini-nlp approved these changes Dec 18, 2024

View reviewed changes

Merge branch 'main' into main

7d0b827

Merge branch 'main' into kosmos25

cd018b0

ydshieh requested review from Rocketknight1, qubvel, molbap, yonigozlan and stevhliu as code owners January 10, 2025 13:24

stevhliu reviewed Jan 13, 2025

View reviewed changes

ydshieh and others added 6 commits January 21, 2025 15:49

Merge branch 'temp' into kosmos25

a5b23f8

no more copied

d1debcc

fix

1279316

Apply suggestions from code review

69aec2e

Co-authored-by: Steven Liu <[email protected]>

fix default values in docstrings

8c579a9

update doc

af813ce

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Kosmos-2.5 #31711

Support Kosmos-2.5 #31711

tic-top commented Jun 29, 2024 •

edited

Loading

amyeroberts commented Jul 1, 2024

ydshieh commented Dec 17, 2024 •

edited

Loading

zucchini-nlp left a comment

ydshieh commented Dec 18, 2024

github-actions bot commented Dec 18, 2024

stevhliu left a comment

stevhliu Jan 13, 2025

		Markdown Task: For usage instructions, please refer to [md.py](https://huggingface.co/microsoft/kosmos-2.5/blob/main/md.py).

		OCR Task: For usage instructions, please refer to [ocr.py](https://huggingface.co/microsoft/kosmos-2.5/blob/main/ocr.py).

Support Kosmos-2.5 #31711

Are you sure you want to change the base?

Support Kosmos-2.5 #31711

Conversation

tic-top commented Jun 29, 2024 • edited Loading

What does this PR do?

Usage

amyeroberts commented Jul 1, 2024

ydshieh commented Dec 17, 2024 • edited Loading

zucchini-nlp left a comment

Choose a reason for hiding this comment

ydshieh commented Dec 18, 2024

github-actions bot commented Dec 18, 2024

stevhliu left a comment

Choose a reason for hiding this comment

stevhliu Jan 13, 2025

Choose a reason for hiding this comment

tic-top commented Jun 29, 2024 •

edited

Loading

ydshieh commented Dec 17, 2024 •

edited

Loading