Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-31496 Allow field translation that only removes fields - and does not add blank values #18496

Closed
wants to merge 1 commit into from

Conversation

jackdelv
Copy link
Contributor

@jackdelv jackdelv commented Apr 5, 2024

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

Copy link

github-actions bot commented Apr 5, 2024

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-31496

Jirabot Action Result:
Workflow Transition: Merge Pending
Updated PR

@jackdelv
Copy link
Contributor Author

jackdelv commented Apr 5, 2024

@ghalliday Is this the behaviour that you were imagining? Is this the best place to add the PayloadRemoveOnly logic?

@jackdelv jackdelv requested a review from ghalliday April 5, 2024 12:12
Copy link
Member

@richardkchapman richardkchapman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are other references to RecordTranslationMode::Payload that you might want to change, if the intention was to change the default.

But I don't think as coded this actually works as intended.

if (publishedCrc && expectedCrc && (publishedCrc != expectedCrc))
{
DBGLOG("Overriding stored record layout reading file %s", tracing);
if (expectedFormat->queryRecordAccessor(true).getNumFields() > publishedFormat->queryRecordAccessor(true).getNumFields())
Copy link
Member

@richardkchapman richardkchapman Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is supposed to check that fields were only removed and not added, but how would it handle a case where fields were both added and removed (but more were removed than added)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed the check to use match_none. Now it strictly checks for added fields.

if (formatCrcs.item(idx) && expectedFormatCrc && (formatCrcs.item(idx) != expectedFormatCrc))
{
DBGLOG("Overriding stored record layout reading file %s", subname);
if (expected->queryRecordAccessor(true).getNumFields() > diskTypeInfo.item(idx)->queryRecordAccessor(true).getNumFields())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as before.

@@ -110,7 +110,8 @@ enum class RecordTranslationMode : byte
Payload = 2, // Translate all fields in datasets, and only payload fields in indexes
AlwaysDisk = 3, // Always translate - even if wouldn't normally (e.g. csv/xml source read as binary), or crcs happen to match
AlwaysECL = 4, // Ignore the published format - can make sense to force no translation e.g. when field names have changed
Unspecified = 5
Unspecified = 5,
PayloadRemoveOnly = 6 // Allow fields to be removed from the incoming dataset, but not allow fields to be missing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have made Unspecified be 6... these values are not persisted anywhere AFAIK

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed Unspecified to 6.

@ghalliday
Copy link
Member

@jackdelv following on from Richard's comments. You need to add a method to IDynamicTransform e.g. hasNewFields(). Take a look at match_none in FieldMatchType which should be set if that is the case.

@jackdelv
Copy link
Contributor Author

jackdelv commented Apr 10, 2024

@richardkchapman @ghalliday I addressed your comments and made appropriate changes. However, I am uncertain how to verify that I have covered all the cases where hasNewFields needs to be called. I looked in the repo for uses of "RecordTranslationMode::Payload", "payload", "needsTranslate". Should there be a check for added fields everywhere needsTranslate is used?

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it is close to being ready

  • needs to be a non-draft PR
  • I think it is missing a check in slavmain.cpp(~355)
  • Missing some places in hthorkey.cpp (search for canTranslate())

I am not quite sure what version to target. Probably 9.6.x. An alternative, is to target the change to 9.4.x, by the change in defaults to 9.6..x.
@jakesmith thoughts?

"default": "payload",
"enum": ["false", "true", "payload"],
"default": "PayloadRemoveOnly",
"enum": ["false", "true", "PayloadRemoveOnly"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

payload still needs to be in the list as a valid option. It would be better to have consistent casing with the other options - a leading lower case letter.

hpcc:tooltip="Enables translation (where possible) of mismatched index layouts on-the-fly. Specify 'payload' to attempt to translate payload fields only">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="false" hpcc:displayName="False" hpcc:desciption=""/>
<xs:enumeration value="true" hpcc:displayName="true" hpcc:description=""/>
<xs:enumeration value="payload" hpcc:displayName="Payload" hpcc:description=""/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enumeration needs to still include payload. Same for all other instances.

@@ -60,6 +62,7 @@ extern ECLRTL_API const char *getTranslationModeText(RecordTranslationMode val)
case RecordTranslationMode::AlwaysDisk: return "alwaysDisk";
case RecordTranslationMode::AlwaysECL: return "alwaysECL";
case RecordTranslationMode::Payload: return "payload";
case RecordTranslationMode::PayloadRemoveOnly: return "PayloadRemoveOnly";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leading lower case for consistency

@ghalliday ghalliday requested review from jakesmith and removed request for richardkchapman April 11, 2024 09:38
@ghalliday ghalliday dismissed richardkchapman’s stale review April 11, 2024 09:38

Issue has been addressed

@ghalliday ghalliday requested review from richardkchapman and removed request for richardkchapman April 11, 2024 09:39
@jackdelv jackdelv marked this pull request as ready for review April 11, 2024 13:52
@jackdelv
Copy link
Contributor Author

@ghalliday I add payload back as an option and changed the casing to be more consistent. I also added checks in slavmain.cpp and hthorkey.cpp. Back to you.

@jackdelv jackdelv requested a review from ghalliday April 11, 2024 13:54
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackdelv - added a couple of minor comments.

@@ -176,6 +176,9 @@ void DiskReadMapping::ensureTranslators() const
if (!translator->canTranslate())
throw MakeStringException(0, "Untranslatable record layout mismatch detected for file %s", filename);

if (mode == RecordTranslationMode::PayloadRemoveOnly && translator->hasNewFields())
throw MakeStringException(0, "Untranslatable record layout mismatch detected for file %s. Expected fields were not found in source.", filename);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting: indentation off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -700,6 +700,8 @@ const IDynamicTransform * CHThorIndexReadActivityBase::getLayoutTranslator(IDist
Owned<const IDynamicTransform> payloadTranslator = createRecordTranslator(projectedFormat->queryRecordAccessor(true), actualFormat->queryRecordAccessor(true));
if (!payloadTranslator->canTranslate())
throw MakeStringException(0, "Untranslatable key layout mismatch reading index %s", f->queryLogicalName());
if (getLayoutTranslationMode() == RecordTranslationMode::PayloadRemoveOnly && payloadTranslator->hasNewFields())
throw MakeStringException(0, "Untranslatable record layout mismatch detected for file %s. Expected fields were not found in source.", f->queryLogicalName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting: indentation off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -2148,6 +2148,9 @@ static bool getTranslators(Owned<const IDynamicTransform> &translator, Owned<con
if (!translator->canTranslate())
throw MakeStringException(0, "Untranslatable record layout mismatch detected for file %s", tracing);

if (mode == RecordTranslationMode::PayloadRemoveOnly && translator->hasNewFields())
throw MakeStringException(0, "Untranslatable record layout mismatch detected for file %s. Expected fields were not found in source.", tracing);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: It is translatable, but mode prohibits it. Consider different message here (and similarly elsewhere).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed message.

@jackdelv jackdelv requested a review from jakesmith April 15, 2024 12:41
@@ -68,7 +68,7 @@
enableKeyDiff="true"
enableSysLog="true"
fastLaneQueue="true"
fieldTranslationEnabled="payload"
fieldTranslationEnabled="payloadRemoveOnly"
Copy link
Member

@jakesmith jakesmith Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be consistent with testing/regress/environment.xml.in.

Copy link
Contributor Author

@jackdelv jackdelv Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes between initfiles/etc/DIR_NAME/environment.xml.in and testing/regress/environment.xml.in look consistent to me. I'm probably misunderstanding what you mean.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, I missed the testing/regress change.

@jakesmith
Copy link
Member

I am not quite sure what version to target. Probably 9.6.x. An alternative, is to target the change to 9.4.x, by the change in defaults to 9.6..x.
@jakesmith thoughts?

I think it's safe enough to go into 9.4 if we leave the default as 'payload'. And if many are going to stick with 9.4 for a while, and want this feature, they'll have it there. So I'd vote for this targeting 9.4

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@ghalliday
Copy link
Member

@jackdelv the regression tests will need to be updated to set the option in the work unit to override the default

@jackdelv
Copy link
Contributor Author

@ghalliday There are still some regression tests failing like sqtrans.ecl that hadn't previously set a value for the translationMode. Is overriding the default in the ecl files correct, or did you mean something different?

@jackdelv jackdelv requested a review from ghalliday April 23, 2024 14:20
@ghalliday
Copy link
Member

@jackdelv yes, please add a similar #option into those files to force the old payload option on.

@jackdelv
Copy link
Contributor Author

@ghalliday I believe all the necessary regression tests have been fixed.

Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackdelv - looks good.

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just re-running k8s tests.
Also debating whether this should go in 9.6.x. It is a bit late - because it will potentially cause existing queries to fail, which shouldn't happen in a point release. However, without it there is no safe way of enabiling field translation if you do not want missing fields to be allowed.

Possibly one way forward would be to merge it to 9.6.x with payload as the default, and then change the default for master/9.8.x.

@ghalliday
Copy link
Member

Please squash, and lets talk about the target build.

@jackdelv
Copy link
Contributor Author

@ghalliday Squashed.

@jackdelv jackdelv requested a review from ghalliday April 29, 2024 12:03
Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackdelv please can you split this PR in two.
One set of changes that targets 9.6.x which adds the code to process everything, but does not change the default. Then a second PR targeting master that changes the default.

@jackdelv jackdelv closed this May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants