Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix json inputs for drill windowing tests #15148

Merged
merged 10 commits into from
Oct 19, 2023

Conversation

kgyrtkirk
Copy link
Member

@kgyrtkirk kgyrtkirk commented Oct 13, 2023

Apparently Drill had some issue with parquet files up to 1.10 ; they have added a feature to fix those corruptions.

Unfortunately - we are working with parquet files which do had that issue...

This PR:

  • adds a flag to JsonToParquet to do the fix during conversion
  • updates the json files to more correct conents
  • some resultset mismatches were fixed by this
  • updates parquet to 1.13.1

@kgyrtkirk kgyrtkirk marked this pull request as ready for review October 13, 2023 10:02
@kgyrtkirk
Copy link
Member Author

@imply-cheddar , @rohangarg could you please take a look?

*/
@Nullable
private static Object convertField(Group g, String fieldName, boolean binaryAsString)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double checking, this binaryAsString argument. That was being passed around but never actually used?

Any changes you want to make to ParquetToJson are good, that code is just for our own dev purposes, but these changes in ParquetGroupConverter is changing the code that our current parquet-based file ingestions end up using. So, I just want to double check that this cleanup is truly a redundant argument and not just something that wasn't needed for ParquetToJson but used for the other production code paths.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

binaryAsString was passed around in private static methods in a class on which an instance method was called first....so I've choosen to remove them (and use the implict class access - to get it where needed) ; instead of adding another boolean to every static method


public ParquetGroupConverter(boolean binaryAsString)
public ParquetGroupConverter(boolean binaryAsString, boolean convertCorruptDates)
{
this.binaryAsString = binaryAsString;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is binaryAsString still used somewhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes; it should be used around here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a design nit: with the change to remove the static label from the methods, I would expect them to start coming after the constructor. That is, we tend to follow the code flow of static first, then constructor then class methods. I noticed that it was inverted because I kept searching for the constructor at the top of the file and didn't see it, and then realized it was at the bottom and that's because the methods used to be static but now are not.

The current structure reads really nicely for the diff though, I hope that the whitespace change that I'm nit picking doesn't screw up the diff...

@abhishekagarwal87 abhishekagarwal87 merged commit 9fb0dbf into apache:master Oct 19, 2023
81 checks passed
@kgyrtkirk
Copy link
Member Author

Thank you @imply-cheddar and @abhishekagarwal87 for reviewing and merging the changes!

kgyrtkirk added a commit to kgyrtkirk/druid that referenced this pull request Oct 19, 2023
This PR:

adds a flag to JsonToParquet to do the fix during conversion
updates the json files to more correct conents
some resultset mismatches were fixed by this
updates parquet to 1.13.1

(cherry picked from commit 9fb0dbf)
rohangarg pushed a commit that referenced this pull request Oct 20, 2023
This PR:

adds a flag to JsonToParquet to do the fix during conversion
updates the json files to more correct conents
some resultset mismatches were fixed by this
updates parquet to 1.13.1

(cherry picked from commit 9fb0dbf)
CaseyPan pushed a commit to CaseyPan/druid that referenced this pull request Nov 17, 2023
This PR:

adds a flag to JsonToParquet to do the fix during conversion
updates the json files to more correct conents
some resultset mismatches were fixed by this
updates parquet to 1.13.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants