You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am not asking for a fix. Just explaining an issue.
Here is the command I ran on bash:
$ ./tools/pycrate_asn1compile.py -i DSRC_instances_asn1_specs/EN15509/ -o DSRC_instances_asn1_specs/EN15509 -j
./tools/pycrate_asn1compile.py, args error: unable to read input file DSRC_instances_asn1_specs/EN15509/ISO14906Amd(2014)EfcDsrcGenericv5.asn
'utf-8' codec can't decode byte 0x93 in position 10503: invalid start byte
So we see there are invalid "utf-8" characters in the ASN1 file.
In one of the comments present in the ISO14906Amd(2014)EfcDsrcGenericv5.asn file they surrounded the word UNIX time with left (“ = 0x93 in Latin1) and right (” = 0x94 in Latin1) double quotation marks instead of plain ASCII quotation marks "(0x22), like so:
“UNIX time”
This makes up invalid UTF-8 text.
We find the same kind of issue in EfcDsrcApplicationv5 and AVIAEINumberingAndDataStructures, be it for double quotation marks or other such characters, such as single quotation marks and dashes (’=0x94 and –=0x96).
As is often recommended, one should manually remove the comments from the ASN.1 specs.
Instead, I will simply change these characters by hand to their ASCII equivalents to make up valid UTF-8 text and compile the ASN1 specs from that point.
They are just comments after all...
It is impossible to detect 8-bit encodings programatically, right? Only if it is kept as metadata or noted down somewhere.
If the encoding could be determined, we could then simply do open("myfile", encoding=determined_encoding).
The text was updated successfully, but these errors were encountered:
rmwesley
changed the title
Cannot compile ASN1 specs with ISO-8859 (Latin or Western) encoding
Cannot compile ASN1 specs with ISO-8859 (Latin or Western) encoding characters present (often in ASN.1 comments)
Oct 4, 2024
I agree that many ASN.1 specs provided here and there contain misencoded (or sometimes simply invalid) characters. This is generally the result of how the work is organized when building a technical standard or specification: different contributions from different companies and regions of the world are all merged in a big Word document, which then is eventually converted to PDF. This is error prone!
On the other side, the current pycrate ASN.1 compiler tries to decode any input as UTF8 and breaks if it contains a non-UTF8 byte. What could be done is:
convert wrongly encoded but meaningful characters to their expected UTF8 encoding.
drop invalid bytes when they just breaks the UTF8 decoding.
This could lead to better acceptance of ASN.1 specs at the end.
I am not asking for a fix. Just explaining an issue.
Here is the command I ran on bash:
So we see there are invalid "utf-8" characters in the ASN1 file.
In one of the comments present in the
ISO14906Amd(2014)EfcDsrcGenericv5.asn
file they surrounded the word UNIX time with left (“ = 0x93 in Latin1) and right (” = 0x94 in Latin1) double quotation marks instead of plain ASCII quotation marks "(0x22), like so:“UNIX time”
This makes up invalid UTF-8 text.
We find the same kind of issue in EfcDsrcApplicationv5 and AVIAEINumberingAndDataStructures, be it for double quotation marks or other such characters, such as single quotation marks and dashes (’=0x94 and –=0x96).
As is often recommended, one should manually remove the comments from the ASN.1 specs.
Instead, I will simply change these characters by hand to their ASCII equivalents to make up valid UTF-8 text and compile the ASN1 specs from that point.
They are just comments after all...
It is impossible to detect 8-bit encodings programatically, right? Only if it is kept as metadata or noted down somewhere.
If the encoding could be determined, we could then simply do
open("myfile", encoding=determined_encoding)
.Just to note, I downloaded the original ASN.1 specs directly from the official ISO site.
Some of the specifications using ISO-8859-1 (Latin1) encoding are https://standards.iso.org/iso/14906/ed-2/ISO14906Amd(2014)EfcDsrcGenericv5.asn, https://standards.iso.org/iso/14906/ed-2/ISO14906Amd(2014)EfcDsrcApplicationv5.asn and https://standards.iso.org/iso/14816/ISO14816%20ASN.1%20repository/ISO14816_AVIAEINumberingAndDataStructures.asn.
The text was updated successfully, but these errors were encountered: