-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPCC-31584 Add ECL Record information to Parquet #18514
HPCC-31584 Add ECL Record information to Parquet #18514
Conversation
Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-31584 Jirabot Action Result: |
* @param currTable The index of the Table to read columns from. | ||
* @return __int64 The number of rows in the current Table. | ||
*/ | ||
__int64 ParquetReader::readColumns(__int64 currTable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a public method but relies on expectedRecord to be non-null to avoid crashing. The one caller of this function in this class does check for that, but since the method is public it could be called from elsewhere. Is this a concern? Should the method be protected or private instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a concern at the moment because it is only called in one place. I think it would be best to make the method private because it probably won't ever need to be called outside of that one codepath where expectedRecord is used to decide which method to call for reading the columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were some additional methods that should not be called anywhere else. I have made them private as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question. Also, probably needs to be merged after your other change has been upmerged.
@@ -293,6 +321,7 @@ std::shared_ptr<parquet::arrow::RowGroupReader> ParquetReader::queryCurrentTable | |||
tables += fileTableCounts[i]; | |||
if (currTable < tables) | |||
{ | |||
currentTableMetadata = parquetFileReaders[i]->parquet_reader()->metadata()->Subset({static_cast<int>(currTable - offset)}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Does this assume that a subset of the data is always the leading fields? I couldn't find the documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean by leading fields, but the field information for each subset should be the same. The only thing that could change is the number of rows in the RowGroup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh ok. That makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jackdelv please squash
99e8033
to
174f8ea
Compare
@ghalliday Squashed. |
Type of change:
Checklist:
Smoketest:
Testing: