Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 #61

Open
perttikellomaki opened this issue Dec 15, 2021 · 1 comment
Open

Comments

@perttikellomaki
Copy link

I'm trying to read a database produced by an ancient version of the ACDsee photo manager program (don't ask).
When I try to read it simply as:

table = DBF('asset.dbf')
for record in table:
    print(record)

I get ValueError: Unknown field type: '7'.
I followed the advice in another issue and created a field parser as:

class TestFieldParser(FieldParser):
    def parse7(self, field, data):
        return data

table = DBF('asset.dbf', parserclass=TestFieldParser)
for record in table:
    print(record)

This produces the stack trace below. Googling for the error suggests that maybe the file is being read with the wrong encoding. Is there an easy way to try reading e.g. with UTF-8?

Traceback (most recent call last):
  File "/mnt/acdsee/ACDsee/./dumpdb.py", line 10, in <module>
    for record in table:
  File "/mnt/acdsee/ACDsee/venv/lib/python3.9/site-packages/dbfread/dbf.py", line 314, in _iter_records
    items = [(field.name,
  File "/mnt/acdsee/ACDsee/venv/lib/python3.9/site-packages/dbfread/dbf.py", line 315, in <listcomp>
    parse(field, read(field.length))) \
  File "/mnt/acdsee/ACDsee/venv/lib/python3.9/site-packages/dbfread/field_parser.py", line 79, in parse
    return func(field, data)
  File "/mnt/acdsee/ACDsee/venv/lib/python3.9/site-packages/dbfread/field_parser.py", line 87, in parseC
    return self.decode_text(data.rstrip(b'\0 '))
  File "/mnt/acdsee/ACDsee/venv/lib/python3.9/site-packages/dbfread/field_parser.py", line 45, in decode_text
    return decode_text(text, self.encoding, errors=self.char_decode_errors)
  File "/home/linuxbrew/.linuxbrew/opt/[email protected]/lib/python3.9/encodings/cp1252.py", line 15, in decode
    return codecs.charmap_decode(input,errors,decoding_table)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 35: character maps to <undefined>
@shawnbrown
Copy link
Contributor

I'm not sure if I have an answer for you but have you tried specifying the encoding?

If you know what encoding you have, you might be able to get it to work this way. The DBF() constructor's optional second argument is for an encoding:

table = DBF('asset.dbf', 'UTF-8')
for record in table:
    print(record)
table = DBF('asset.dbf', 'Latin-1')
for record in table:
    print(record)

Another option is to set the char_decode_errors handler. The argument defaults to 'strict' when unspecified.

So this...

table = DBF('asset.dbf')

Is the same as...

table = DBF('asset.dbf', char_decode_errors='strict')

But you could relax this requirement by specifying a more forgiving error handler (see Python's Error Handlers docs for more options):

table = DBF('asset.dbf', char_decode_errors='replace')

You might settle on some combination of the two... defining an expected encoding and loosening the error handling behavior:

table = DBF('asset.dbf', 'Latin-1', char_decode_errors='replace')
for record in table:
    print(record)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants