You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are using CoBrix with PySpark and executing it on AWS EMR.
We have the EBCDIC file and it's corresponding copybook in the AWS S3 bucket. While trying to parse the EBCDIC file using the Copybook, we are getting an error.
Error message :
py4j.protocol.Py4jJavaError : An error occurred while calling o2021.loa : za.co.absa.cobrix.cobol.parser.exceptions.SyntaxErrorException : Syntax error in the copybook at line 29 : Invalid input 'BBBB' at position 29:45
We expected the Cobrix to successfully parse the EBCDIC file record column using the Cobybook which has this datatype of 'BBBB'
Context
PySpark Jar dependencies :
cobol-parser_2.12-2.6.7.jar
hadoop-lzo-0.4.3.jar
scodec-bits_2.12-1.1.12.jar
scodec-core_2.12-1.11.4.jar
spark-cobol_2.12-2.6.7.jar
Operating system: AWS EMR (Linux Image)
Copybook (if possible)
15 EL02-267-COLNAME-A
20 EL02-267-COLNAME-B
PIC X(19).
.........
.........
.........
20 EL02-267-COLNAME-C REDEFINES
EL02-267-COLNAME-D
PIC 9(06)BBBB. (This is what is causing the issue we suppose)
GP5WHB 20 FILLER pic X(285). CLEAN-UP
Attach a small data file that can help reproduce the issue, if possible : Need to check the feasibility due to confidentiality of the data. Will get back.
The text was updated successfully, but these errors were encountered:
Yes, 'BBBB' is something Cobrix does not support mainly because we are not sure at the moment how to properly handle it.
This might be a relevant issue: #505
Does it work if you remove 'BBBB'? Does it produce the expected output in this case?
One query, could you advice on what could be a replacement for 'BBBB', I mean, is there any other Cobol datatype definition that could be analogous to the use-case of 'BBBB' and works with Cobrix too?
Please note, I am yet to try out your advice on removing the 'BBBB' and give a try. Sorry for the delay, will get back on that asap!
Since 'B' means just inserting spaces in the data representation of the number, and because Cobrix converts numbers to Spark native binary formats, 'B' should not need a replacement. We may eventually implement it so Cobrix ignores all 'B' in numbers. We haven't done it yet since we haven't encountered such PICs in our organization so we can't confirm that ignoring 'B's would be an expected behavior.
Once you confirm that removing 'B's from PICs produces correct output in numeric fields we are going to implement the support 'B's natively.
Describe the bug
We are using CoBrix with PySpark and executing it on AWS EMR.
We have the EBCDIC file and it's corresponding copybook in the AWS S3 bucket. While trying to parse the EBCDIC file using the Copybook, we are getting an error.
Error message :
py4j.protocol.Py4jJavaError : An error occurred while calling o2021.loa : za.co.absa.cobrix.cobol.parser.exceptions.SyntaxErrorException : Syntax error in the copybook at line 29 : Invalid input 'BBBB' at position 29:45
Code snippet that caused the issue
Expected behavior
We expected the Cobrix to successfully parse the EBCDIC file record column using the Cobybook which has this datatype of 'BBBB'
Context
PySpark Jar dependencies :
Copybook (if possible)
Attach a small data file that can help reproduce the issue, if possible : Need to check the feasibility due to confidentiality of the data. Will get back.
The text was updated successfully, but these errors were encountered: