fix ten line token convert bug #65

UserWangZz · 2024-09-06T03:50:29Z

PPOCRLabel/libs/utils.py中的convert_token方法中
col.split()返回值n为str类型，在下方token_list.append中，只对n的第一位进行format

GreatV · 2024-09-06T04:29:47Z

libs/utils.py

@@ -233,10 +233,10 @@ def convert_token(html_list):
                token_list.append("<td")
                if "colspan" in col:
                    _, n = col.split("colspan=")
-                    token_list.append(' colspan="{}"'.format(n[0]))
+                    token_list.append(' colspan="{}"'.format(str(int(n[0]))))


是不是把n[0]改为n也可以

早晨代码写的急了，应该直接n

GreatV · 2024-09-06T04:33:44Z

def convert_token(html_list):
    """
    Convert raw html to label format
    """
    token_list = ["<tbody>"]
    # final html list:
    for row in html_list:
        token_list.append("<tr>")
        for col in row:
            if col == None:
                continue
            elif col == "td":
                token_list.extend(["<td>", "</td>"])
            else:
                token_list.append("<td")
                if "colspan" in col:
                    _, n = col.split("colspan=")
                    token_list.append(' colspan="{}"'.format(str(int(n[0]))))
                if "rowspan" in col:
                    _, n = col.split("rowspan=")
                    token_list.append(' rowspan="{}"'.format(str(int(n[0]))))
                token_list.extend([">", "</td>"])
        token_list.append("</tr>")
    token_list.append("</tbody>")

    return token_list

input_html_list = [
            ["td", "rowspan=31"],
            ["td", "td"],
            ["td", None]
        ]
print(convert_token(input_html_list))

这里两种修改都是

['<tbody>', '<tr>', '<td>', '</td>', '<td', ' rowspan="3"', '>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '</tr>', '</tbody>']

fix ten line token convert bug

63069ff

GreatV reviewed Sep 6, 2024

View reviewed changes

UserWangZz closed this Sep 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix ten line token convert bug #65

fix ten line token convert bug #65

UserWangZz commented Sep 6, 2024

GreatV Sep 6, 2024

UserWangZz Sep 6, 2024

GreatV commented Sep 6, 2024 •

edited

Loading

fix ten line token convert bug #65

fix ten line token convert bug #65

Conversation

UserWangZz commented Sep 6, 2024

GreatV Sep 6, 2024

Choose a reason for hiding this comment

UserWangZz Sep 6, 2024

Choose a reason for hiding this comment

GreatV commented Sep 6, 2024 • edited Loading

GreatV commented Sep 6, 2024 •

edited

Loading