Support Kosmos-2.5 #31711

tic-top · 2024-06-29T15:48:17Z

What does this PR do?

#30877 Implementation of Kosmos-2.5 in transformers.
https://huggingface.co/kirp/kosmos2_5/blob/main/README.md

Usage

from PIL import Image
import requests
import torch
from transformers import AutoProcessor, AutoModelForVision2Seq, AutoConfig
import re

repo = "kirp/kosmos2_5"
device = "cuda:0"
config = AutoConfig.from_pretrained(repo)

NAME = {
    "f" : "flash_attention_2",
    "s" : "sdpa",
    "e" : "eager",
}

# all sdpa fp16
dtype = torch.float16
config._attn_implementation = NAME["s"]
config.vision_config._attn_implementation = NAME["s"]
config.text_config._attn_implementation = NAME["s"]

# # all sdpa fp16
# dtype = torch.float16
# config._attn_implementation = NAME["s"]
# config.text_config._attn_implementation = NAME["s"]
# config.vision_config._attn_implementation = NAME["s"]

# # all eager bf16
# dtype = torch.bfloat16
# config._attn_implementation = NAME["e"]
# config.text_config._attn_implementation = NAME["e"]
# config.vision_config._attn_implementation = NAME["e"]


model = AutoModelForVision2Seq.from_pretrained(repo, device_map = device, torch_dtype=dtype, config=config)
processor = AutoProcessor.from_pretrained(repo)

url = "https://huggingface.co/kirp/kosmos2_5/resolve/main/receipt_00008.png"
image = Image.open(requests.get(url, stream=True).raw)
prompt = "<ocr>" # <md>

inputs = processor(text=prompt, images=image, return_tensors="pt")
height, width = inputs.pop("height"), inputs.pop("width")
raw_width, raw_height = image.size
scale_height = raw_height / height
scale_width = raw_width / width

inputs = {k: v.to(device) if v is not None else None for k, v in inputs.items()}
inputs["flattened_patches"] = inputs["flattened_patches"].to(dtype)

generated_ids = model.generate(
    **inputs,
    max_new_tokens=1024,
)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)

def postprocess(y, scale_height, scale_width):
    y = y.replace(prompt, "")
    if "<md>" in prompt:
        return y
    pattern = r"<bbox><x_\d+><y_\d+><x_\d+><y_\d+></bbox>"
    bboxs_raw = re.findall(pattern, y)
    lines = re.split(pattern, y)[1:]
    bboxs = [re.findall(r"\d+", i) for i in bboxs_raw]
    bboxs = [[int(j) for j in i] for i in bboxs]
    info = ""
    for i in range(len(lines)):
        box = bboxs[i]
        x0, y0, x1, y1 = box
        if not (x0 >= x1 or y0 >= y1):
            x0 = int(x0 * scale_width)
            y0 = int(y0 * scale_height)
            x1 = int(x1 * scale_width)
            y1 = int(y1 * scale_height)
            info += f"{x0},{y0},{x1},{y0},{x1},{y1},{x0},{y1},{lines[i]}"
    return info

output_text = postprocess(generated_text[0], scale_height, scale_width)
print(output_text)

amyeroberts · 2024-07-01T10:10:04Z

cc @ydshieh

[email protected] added 30 commits June 16, 2024 14:22

kosmos2_5 basic

661ea9c

.

a6636c1

.

21b0ecc

.

3aa802c

.

d9cf290

.

f1be589

.

352e678

image processor

cedd7d3

init

4ceb5c8

v1

cab16ce

.

c433374

eager attention supported, flash_attn2 is not completed

cd55891

remove the hardcode dtype

02f21a7

remove hardcode dtype

ce839cc

sdpa, flash attn2 supported

fdc6142

remove cache and inference code

0d7a273

flash2 & spa

b1d373b

sdpa/flash_attn2/eager supported

ef94db2

new configuration

4ee3d7e

.

c8aaa35

v1

93a7dc3

new config

80c29c5

segment_emb is needed

0ec4d44

remove ckpt, default to flash_attn

be5b0f9

add some document

5f51a7d

support sdpa

fdc28b7

textspda

b85d5d7

new processor

3ed0d66

copyright

775bec3

default configuration become eager

fcf17a6

[email protected] added 7 commits June 30, 2024 10:20

reformat

b72fe0a

reformat

589e9ef

Merge remote-tracking branch 'upstream/main' into main

fe51247

fixup

241b0bf

init test

ba8b3dd

init weight

9c74c61

modeling_test in progress

363180b

ydshieh self-assigned this Jul 1, 2024

ydshieh added the run-slow label Jul 1, 2024

[email protected] added 20 commits July 1, 2024 17:39

model test

29d7cff

better initilization

42dd2ea

model test

9046ec5

restore ks2_test; update ks25 test

b64e300

load from the config

916781a

processor test

578acce

run slow-prepare some test

c306325

skip sdpa test

b7d5ec9

test finish

f05e361

duplicate import

f19b06c

add mean

73dddc5

std

cd8ac6e

fixup

35ef655

remove tmp img

9379458

hi

2e398f7

init test

40b4e98

fix format

303e918

initialization test passed

d5ad957

update readme

e81b7fe

Merge remote-tracking branch 'upstream/main' into main

eb2b93c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Kosmos-2.5 #31711

Support Kosmos-2.5 #31711

tic-top commented Jun 29, 2024 •

edited

Loading

amyeroberts commented Jul 1, 2024

Support Kosmos-2.5 #31711

Are you sure you want to change the base?

Support Kosmos-2.5 #31711

Conversation

tic-top commented Jun 29, 2024 • edited Loading

What does this PR do?

Usage

amyeroberts commented Jul 1, 2024

tic-top commented Jun 29, 2024 •

edited

Loading