-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPCC-31496 Allow field translation that only removes fields - and does not add blank values #18496
Conversation
Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-31496 Jirabot Action Result: |
@ghalliday Is this the behaviour that you were imagining? Is this the best place to add the PayloadRemoveOnly logic? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are other references to RecordTranslationMode::Payload that you might want to change, if the intention was to change the default.
But I don't think as coded this actually works as intended.
common/thorhelper/thorcommon.cpp
Outdated
if (publishedCrc && expectedCrc && (publishedCrc != expectedCrc)) | ||
{ | ||
DBGLOG("Overriding stored record layout reading file %s", tracing); | ||
if (expectedFormat->queryRecordAccessor(true).getNumFields() > publishedFormat->queryRecordAccessor(true).getNumFields()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is supposed to check that fields were only removed and not added, but how would it handle a case where fields were both added and removed (but more were removed than added)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed the check to use match_none. Now it strictly checks for added fields.
roxie/ccd/ccdfile.cpp
Outdated
if (formatCrcs.item(idx) && expectedFormatCrc && (formatCrcs.item(idx) != expectedFormatCrc)) | ||
{ | ||
DBGLOG("Overriding stored record layout reading file %s", subname); | ||
if (expected->queryRecordAccessor(true).getNumFields() > diskTypeInfo.item(idx)->queryRecordAccessor(true).getNumFields()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as before.
rtl/eclrtl/rtldynfield.hpp
Outdated
@@ -110,7 +110,8 @@ enum class RecordTranslationMode : byte | |||
Payload = 2, // Translate all fields in datasets, and only payload fields in indexes | |||
AlwaysDisk = 3, // Always translate - even if wouldn't normally (e.g. csv/xml source read as binary), or crcs happen to match | |||
AlwaysECL = 4, // Ignore the published format - can make sense to force no translation e.g. when field names have changed | |||
Unspecified = 5 | |||
Unspecified = 5, | |||
PayloadRemoveOnly = 6 // Allow fields to be removed from the incoming dataset, but not allow fields to be missing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have made Unspecified be 6... these values are not persisted anywhere AFAIK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed Unspecified to 6.
@jackdelv following on from Richard's comments. You need to add a method to IDynamicTransform e.g. hasNewFields(). Take a look at match_none in FieldMatchType which should be set if that is the case. |
@richardkchapman @ghalliday I addressed your comments and made appropriate changes. However, I am uncertain how to verify that I have covered all the cases where hasNewFields needs to be called. I looked in the repo for uses of "RecordTranslationMode::Payload", "payload", "needsTranslate". Should there be a check for added fields everywhere needsTranslate is used? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it is close to being ready
- needs to be a non-draft PR
- I think it is missing a check in slavmain.cpp(~355)
- Missing some places in hthorkey.cpp (search for canTranslate())
I am not quite sure what version to target. Probably 9.6.x. An alternative, is to target the change to 9.4.x, by the change in defaults to 9.6..x.
@jakesmith thoughts?
helm/hpcc/values.schema.json
Outdated
"default": "payload", | ||
"enum": ["false", "true", "payload"], | ||
"default": "PayloadRemoveOnly", | ||
"enum": ["false", "true", "PayloadRemoveOnly"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
payload still needs to be in the list as a valid option. It would be better to have consistent casing with the other options - a leading lower case letter.
hpcc:tooltip="Enables translation (where possible) of mismatched index layouts on-the-fly. Specify 'payload' to attempt to translate payload fields only"> | ||
<xs:simpleType> | ||
<xs:restriction base="xs:string"> | ||
<xs:enumeration value="false" hpcc:displayName="False" hpcc:desciption=""/> | ||
<xs:enumeration value="true" hpcc:displayName="true" hpcc:description=""/> | ||
<xs:enumeration value="payload" hpcc:displayName="Payload" hpcc:description=""/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enumeration needs to still include payload. Same for all other instances.
rtl/eclrtl/rtldynfield.cpp
Outdated
@@ -60,6 +62,7 @@ extern ECLRTL_API const char *getTranslationModeText(RecordTranslationMode val) | |||
case RecordTranslationMode::AlwaysDisk: return "alwaysDisk"; | |||
case RecordTranslationMode::AlwaysECL: return "alwaysECL"; | |||
case RecordTranslationMode::Payload: return "payload"; | |||
case RecordTranslationMode::PayloadRemoveOnly: return "PayloadRemoveOnly"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leading lower case for consistency
@ghalliday I add payload back as an option and changed the casing to be more consistent. I also added checks in slavmain.cpp and hthorkey.cpp. Back to you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jackdelv - added a couple of minor comments.
common/thorhelper/thorread.cpp
Outdated
@@ -176,6 +176,9 @@ void DiskReadMapping::ensureTranslators() const | |||
if (!translator->canTranslate()) | |||
throw MakeStringException(0, "Untranslatable record layout mismatch detected for file %s", filename); | |||
|
|||
if (mode == RecordTranslationMode::PayloadRemoveOnly && translator->hasNewFields()) | |||
throw MakeStringException(0, "Untranslatable record layout mismatch detected for file %s. Expected fields were not found in source.", filename); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
formatting: indentation off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
ecl/hthor/hthorkey.cpp
Outdated
@@ -700,6 +700,8 @@ const IDynamicTransform * CHThorIndexReadActivityBase::getLayoutTranslator(IDist | |||
Owned<const IDynamicTransform> payloadTranslator = createRecordTranslator(projectedFormat->queryRecordAccessor(true), actualFormat->queryRecordAccessor(true)); | |||
if (!payloadTranslator->canTranslate()) | |||
throw MakeStringException(0, "Untranslatable key layout mismatch reading index %s", f->queryLogicalName()); | |||
if (getLayoutTranslationMode() == RecordTranslationMode::PayloadRemoveOnly && payloadTranslator->hasNewFields()) | |||
throw MakeStringException(0, "Untranslatable record layout mismatch detected for file %s. Expected fields were not found in source.", f->queryLogicalName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
formatting: indentation off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
common/thorhelper/thorcommon.cpp
Outdated
@@ -2148,6 +2148,9 @@ static bool getTranslators(Owned<const IDynamicTransform> &translator, Owned<con | |||
if (!translator->canTranslate()) | |||
throw MakeStringException(0, "Untranslatable record layout mismatch detected for file %s", tracing); | |||
|
|||
if (mode == RecordTranslationMode::PayloadRemoveOnly && translator->hasNewFields()) | |||
throw MakeStringException(0, "Untranslatable record layout mismatch detected for file %s. Expected fields were not found in source.", tracing); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: It is translatable, but mode prohibits it. Consider different message here (and similarly elsewhere).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed message.
@@ -68,7 +68,7 @@ | |||
enableKeyDiff="true" | |||
enableSysLog="true" | |||
fastLaneQueue="true" | |||
fieldTranslationEnabled="payload" | |||
fieldTranslationEnabled="payloadRemoveOnly" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be consistent with testing/regress/environment.xml.in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes between initfiles/etc/DIR_NAME/environment.xml.in and testing/regress/environment.xml.in look consistent to me. I'm probably misunderstanding what you mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, I missed the testing/regress change.
I think it's safe enough to go into 9.4 if we leave the default as 'payload'. And if many are going to stick with 9.4 for a while, and want this feature, they'll have it there. So I'd vote for this targeting 9.4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
@jackdelv the regression tests will need to be updated to set the option in the work unit to override the default |
@ghalliday There are still some regression tests failing like sqtrans.ecl that hadn't previously set a value for the translationMode. Is overriding the default in the ecl files correct, or did you mean something different? |
@jackdelv yes, please add a similar #option into those files to force the old payload option on. |
@ghalliday I believe all the necessary regression tests have been fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jackdelv - looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just re-running k8s tests.
Also debating whether this should go in 9.6.x. It is a bit late - because it will potentially cause existing queries to fail, which shouldn't happen in a point release. However, without it there is no safe way of enabiling field translation if you do not want missing fields to be allowed.
Possibly one way forward would be to merge it to 9.6.x with payload as the default, and then change the default for master/9.8.x.
Please squash, and lets talk about the target build. |
…s not add blank values
@ghalliday Squashed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jackdelv please can you split this PR in two.
One set of changes that targets 9.6.x which adds the code to process everything, but does not change the default. Then a second PR targeting master that changes the default.
Type of change:
Checklist:
Smoketest:
Testing: