Firebird database with malformed strings cannot be accessed [PYFB76] #91

firebird-automations · 2018-11-22T15:22:40Z

Submitted by: Joern Ungermann (joernu76)

We have a firebird database containing a few malformed ASCII strings. These are byte streams, presumably ASCII coded, stored as is from a rather unreliable wireless transmission. Thus, there are bit flips, which causes the strings to contain some strange characters.
Using python 2, we had no problem accessing such databases. Using python 3, everything needs to be decoded/encoded.
Using the default access to the database using sqlalchemy and fdb causes a UnicodeDecodingError upon accessing an entry with a malformed string:

[...]
File "/home/icg173/anaconda3/lib/python3.6/site-packages/fdb/fbcore.py", line 2659, in __xsqlda2tuple
value = b2u(value, self.__python_charset)
File "/home/icg173/anaconda3/lib/python3.6/site-packages/fdb/fbcore.py", line 480, in b2u
return st.decode(charset)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x90 in position 1: invalid start byte

Playing around with the "charset" option of sqlalchemy could open some files, where the malformed strings contained only values available in the given charset, but introduced other problems. Looking into the code, there seems to be an "OCTETS" option, which supposedly parse the bytestream through, but fails to work (the python-translated-charset is None, which is not a valid option for encode/decode)

I'd prefer either of two options:
a) implement (optionally?) an option to influence the error behavious of decoding (either replace or ignore would suit our usecase)
b) implement a pass-through option for the byte-stream

If I oversaw some option to solve my use-case wth the given code-base, I'd be very happy to hear about it.

firebird-automations · 2018-11-22T17:24:51Z

Commented by: @pcisar

It's strange, because client charset OCTETS should work as pass-through for all strings as b2u() returns the string (bytes in this case) unchanged when passed charset is None (which should be according to ibase.charset_map). But P3 application or upper layer has to be prepared to handle bytes instead unicode string on output.

Anyway, this is tricky problem. I don't like the idea to hide decode errors (especially if it's easy to use this option on global scale) because then such errors can slip unnoticed to cause problems elsewhere, that are then much harder to detect. A pass-through solution that would return byte-stream instead unicode when error is encountered is the same problem, as it means that users must expect various result types.

But optional connection option (disabled by default) for decoding error resolution appears to be good enough solution, and I'll implement it for next release. However, usability depends on how this would be exposed by other layers like sqlalchemy.

As a hotfix, you can change the b2u() function in http://fbcore.py to replace or ignore bad characters.

firebird-automations · 2018-11-22T20:28:19Z

Commented by: Joern Ungermann (joernu76)

Thanks, solving this would be marvelous, but I do not see an easy solution for this including all the stacks, either. The quick fix works as intended, but doesn't scale very well, as it breaks our automatic deployment. We really should switch to docker or somesuch.

If I try to use the charset=OCTETS option in sqlalchemy to bypass the coding, I get the following error:

File "/home/icg173/bin/gloripy_V01781_R7633e89/gloripy/data/database.py", line 792, in __init__
self.metadata.reflect(self.engine)
File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/sql/schema.py", line 3908, in reflect
with bind.connect() as conn:
File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 2102, in connect
return self._connection_cls(self, **kwargs)
File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 90, in __init__
if connection is not None else engine.raw_connection()
File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 2188, in raw_connection
self.pool.unique_connection, _connection)
File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 2158, in _wrap_pool_connect
return fn()
File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/pool.py", line 342, in unique_connection
return _ConnectionFairy._checkout(self)
File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/pool.py", line 788, in _checkout
fairy = _ConnectionRecord.checkout(pool)
File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/pool.py", line 529, in checkout
rec = pool._do_get()
File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/pool.py", line 1193, in _do_get
self._dec_overflow()
File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 66, in __exit__
compat.reraise(exc_type, exc_value, exc_tb)
File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 249, in reraise
raise value
File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/pool.py", line 1190, in _do_get
return self._create_connection()
File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/pool.py", line 347, in _create_connection
return _ConnectionRecord(self)
File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/pool.py", line 474, in __init__
self.__connect(first_connect_check=True)
File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/pool.py", line 671, in __connect
connection = pool._invoke_creator(self)
File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/strategies.py", line 106, in connect
return dialect.connect(*cargs, **cparams)
File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 412, in connect
return self.dbapi.connect(*cargs, **cparams)
File "/home/icg173/anaconda3/lib/python3.6/site-packages/fdb/fbcore.py", line 827, in connect
no_reserve, db_key_scope, no_gc, no_db_triggers, no_linger)
File "/home/icg173/anaconda3/lib/python3.6/site-packages/fdb/fbcore.py", line 760, in build_dpb
dpb.add_string_parameter(isc_dpb_user_name, user)
File "/home/icg173/anaconda3/lib/python3.6/site-packages/fdb/fbcore.py", line 625, in add_string_parameter
value = value.encode(charset_map.get(self.charset, self.charset))
TypeError: encode() argument 1 must be str, not None

The "OCTETS" charset from "firebird+fdb://{}:{}@{}/{}?charset=OCTETS" is translated to "None" in fdb.ibase.charset_map.
The None is then passed as argument to the encode method, which doesn't like it. In this case, the value should be directly returned as bytes object?
I am not sure what fdbcore is doing there, but it would certainly break later as well, when None is passed to the decode function, which, probably, should be bypassed as well in the case of OCTETS?

firebird-automations · 2018-11-23T16:15:08Z

Commented by: @pcisar

I see, FDB clearly does not handle OCTETS properly. Using it as client charset is very wild idea that should nobody even try, but anyway. The problem with OCTESTS handling is that it adds another level of complexity to the code that is already trying to handle P2/P3 string difference in split-brain fashion (i,.e. P3 support in FDB is somewhat crude and unclean thanks to its history coming back to early P3 era). I worry that it would not get better anytime soon (it's planned as part of transition from P2 centered to P3 centered development of FDB over 2.x series).

Ok, I'll try to add optional support for replace/ignore option into bytes2unicode conversions in FDB as soon as possible. However, it happens at several places in various contexts, so it may take some time as I'd like do it properly.

firebird-automations added affect-version: 2.0 priority: minor type: bug labels Apr 25, 2021

firebird-automations assigned pcisar Apr 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Firebird database with malformed strings cannot be accessed [PYFB76] #91

Firebird database with malformed strings cannot be accessed [PYFB76] #91

firebird-automations commented Nov 22, 2018

firebird-automations commented Nov 22, 2018

firebird-automations commented Nov 22, 2018

firebird-automations commented Nov 23, 2018

Firebird database with malformed strings cannot be accessed [PYFB76] #91

Firebird database with malformed strings cannot be accessed [PYFB76] #91

Comments

firebird-automations commented Nov 22, 2018

firebird-automations commented Nov 22, 2018

firebird-automations commented Nov 22, 2018

firebird-automations commented Nov 23, 2018