-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for COMP-3 numbers without the sign nibble #701
Comments
Could you add
and send the HEX value of the field that is incorrectly decoded, and I'll take a look |
Sorry for so many questions, but we have been trying since long. |
Yes, Cobrix supports packed decimal data. 'debug' does not suppose to change anything, it just creates debug columns. I'm asking you to to send an example of HEX values that Cobrix didn't convert properly. |
Transaction code is coming as 601, while in mainframe I see the value as 791. The field type is PIC S9(3)V COMP-3. Another is date field, which is PIC S9(7)V COMP-3, it is coming as null in dataframe, while it should come as 240802 actually in mainframe. Cobrix version installed is spark_cobol_2_12_2_7_3_bundle.jar. We have spark version 2.12 in Databricks. |
When E.g. field1 = 601, field1_debug=? |
Got it thanks. |
Makes sense. Yes, I think we can add support for |
This are specs for |
Transaction code is defined as PIC S9(3)V COMP-3. If I see this value in mainframe is 791, while in dataframe coming as 601. This is also coming incorrect :-( . Thanks for the quick revert back. Date is defined as below. |
Thanks for the field definition. We can add support for COMP-3 numbers without a sign nibble. Just keep in mind that this definition:
implies 9 digits. while |
:( |
Checked - parsing of |
When i tried this with keeping the field in copybook unchanged i.e. PIC S9(3)V COMP-3, in debug it was coming as 601C. After changing the data from COMP-3 to COMP-3U, |
Maybe you can do something like df.select("failure_field1", "failure_field1_debug").show(false) and send here the table, for each field that is failing for you. |
I will double check the 601 with user, if he is sending me wrong snapshots. I can't upload the table due to data privacy, here are the values. I printed first 10, all are coming as below Trans_code Acct_open_dt Tran_date Trans_time Tran Date and time are together under a level field called trans-date-time, if it makes any difference.
I will double check the 601 with user, if he is sending me wrong snapshots. I can't upload the table due to data privacy, here are the values. I printed first 10, all are coming as below Trans_code Acct_open_dt Tran_date Trans_time Tran Date and time are together under a level field called trans-date-time, if it makes any difference. |
I will double check the 601 with user, if he is sending me wrong snapshots. I can't upload the table due to data privacy, here are the values. I printed first 10, all are coming as below Trans_code Acct_open_dt Tran_date Trans_time Tran Date and time are together under a level field called trans-date-time, if it makes any difference. |
Looks good. The only issue left then is |
|
Other than the wrong values above, Value: 00829 , Debug: 203030383239 |
I realized for this example that your data is ASCII, not EBCDIC. Ebcdic encoding for 0082918 is F0F0F8F2F9F1F8
It is straightforward. One character is 1 byte. |
Hi @yruslan |
Hi @noumanhk34 , please provide a small example of such a file with a copybook. Even one field copybook is sufficient. |
AMXTCB02-CB-OPEN -DTE PIC S9(09) COMP-3 @yruslan |
Hi @noumanhk34 , how the data looks like for this field? You can use
as above to extract HEX dump of raw values. Please, send a couple of examples where values are coming as |
@yruslan Please find below code and output field_1 field_1_debug |
I see. Indeed,
Also, Cobrix does not support date fields since there is no date type in copybooks. Dates in copybooks are represented by numbers and each mainframe system stores dates in a different way. What you can do is post-processing. Once you have the dataframe, you can convert numeric fields to dates by applying the date format pattern. It looks like your copybook and your data file do not match. I you want, you can attach a data file and the copybook and I can take a look. Keep in mind that this is a public chat so the data you attach is going to be visible to everyone. |
No I can not attach |
its date in 09282005 format |
You can convert numbers to dates in the format specified using a post-processing such as: import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
import spark.implicits._
// Let's assume we get these numbers from a file (column is 'n')
val dfInput = List(9282005, 10282005).toDF("n")
// Let's convert the column 'n' to date and output it to the column 'd'
val dfOutput = dfInput.withColumn("d", to_date(lpad(col("n"), 8, "0"), "MMddyyyy"))
dfOutput.show() For this example the output is
|
@yruslan i mean to say the packed field contain date in this format |
@yruslan this debug data is a date in given format might be it in string data type |
@yruslan I have a ascii file while have multiple segments like HEADER,BASE SEGMENT,J1 SEGEMENT,J2 SEGMENT,K4 SEGMENT,L1 SEGMENT and TRAILER, |
|
Yes, multi-segment ASCII files are supported. And yes, you can use multiple copybooks. But in practice since copybooks usually have arts common across segments, and segment-specific parts, we usually create a single copybook for parting multi-segment files. In the multi-segment copybooks each segment is a GROUP, and each segment GROUP uses REDEFINE to ensure only one group is used per record. You can take a look at multi-segment examples in README: |
We are using cobrix to convert the mainframe EBCDIC file. Below are the problematic data fields:
XXX-TRANSACTION-AMOUNT PIC S9(15) V99 COMP-3
We are not able to convert the fields correctly. I suspect due to sign field we are running into issues and coming as NULL.
Rest all fields are coming correctly.
cobolDataframe = spark.read.format("za.co.absa.cobrix.spark.cobol.source")
.option("copybook", "dbfs:/FileStore/Optis Test/copybook.txt")
.option("record_format", "D")
.option("is_rdw_big_endian", "true")
.option("rdw_adjustment", -4)
.load("dbfs:/FileStore/Optis Test/inputfile.txt")
thanks for the help
Background [Optional]
A clear explanation of the reason for raising the question.
This gives us a better understanding of your use cases and how we might accommodate them.
Question
A clear and concise inquiry
The text was updated successfully, but these errors were encountered: