Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-30368 Add packagemap support for remoteStorage property #19148

Conversation

asselitx
Copy link
Contributor

@asselitx asselitx commented Sep 24, 2024

Similar to how the daliip property is supported in the packagemap file, add support for a remoteStorage property that mirrors the --remote-storage CLI option.

  • Apply the same scoping rules as for daliip- values defined deeper in the hierarchy take precedence, and anything defined in the file takes precedence over a CLI option.
  • Disallow use of both daliip and remoteStorage in the same element.

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

Manually ran seven tests on constellation of three clusters:

  • A local k8s roxiecluster that is the target of packagemap and query deployments, configured to copy files locally.
  • A second local k8s thorcluster configured as a remote storage location for the first.
  • The Alpha30 used as a daliip source.

Each test verified that the nonlocal files were copied to local storage:

  1. Package File <SuperFile> remoteStorage overrides CLI daliip. This is the case needed by the requester.
  2. Package File <Package> remoteStorage overrides <PackageMaps> remoteStorage
  3. Package File <SuperFile> remoteStorage overrides <PackageMaps> remoteStorage
  4. Package File <SuperFile> daliip overrides CLI remote-storage, but there is another <SuperFile> in the package file that correctly resolves with the CLI remote-storage value.
  5. CLI daliip
  6. CLI remote-storage
  7. Package File <SuperFile remoteStorage> overrides <PackageMap daliip>

@asselitx asselitx requested a review from afishbeck September 24, 2024 22:04
@asselitx
Copy link
Contributor Author

asselitx commented Sep 24, 2024

Before moving out of draft, there are a few tasks to complete:

  • Test more combinations of options for using the daliip and remoteStorage together and at different levels of the packagemap file.
  • Ensure I'm targeting the right branch and rebase if required.
  • Create Jira for documentation change

Copy link

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-30368

Jirabot Action Result:
Workflow Transition To: Merge Pending
Updated PR

@asselitx
Copy link
Contributor Author

@afishbeck My biggest questions are:

  1. Is the method of storing the 'effective' remoteStorage name in the ReferencedFile object sound, or is there a cleaner approach I didn't see?
  2. Is it reasonable to keep a separate function (splitDerivedDfsLocationOrRemote) for the purposes of processing the packagemap files (as opposed to where splitDerivedDfsLocation is used elsewhere)? It seemed that adding more features to the existing function would be over-complicating something that I would already have a hard time explaining what it did.

else
remoteStorage.set(remoteLocation);
file.resolveLocalOrRemote(locations, srcCluster, user, remoteStorage, remotePrefix, checkLocalFirst, expandSuperFiles ? &subfiles : NULL, trackSubFiles, resolveLFNForeign);
remoteStorage.clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the ability to clear this here mean that remoteStorage is a class member variable that is just being used here for temporary storage? Is that necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will re-confirm that this is the case, but I recall that resolveLocalOrRemote uses the class member variable for its work. But then I was concerned about side-effects since it wasn't previously set at this location. Perhaps it isn't necessary and should be a local temp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't correct and is adjusted in the next commit.

const char *defaultCluster, const char *baseCluster, const char *baseIP, const char *basePrefix,
const char *currentRemoteStorage, StringBuffer& effectiveRemoteStorage, const char *baseRemoteStorage)
{
if (!isEmptyString(address) && !isEmptyString(currentRemoteStorage))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that it shouldn't happen at the same level. I think you do support what Ananth wants which is to for example have the packagemap define a remoteStorage, but some packages inside it point to a daliIP?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it should support that case, but I need to confirm with a test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested behavior where a CLI setting of either daliip or remote-storage is overridden by either daliip or remoteStorage set in the packagemap file. Also, those properties defined in the file have a precedence over each other- a property defined in an element will override the value defined in its ancestors.

@afishbeckln
Copy link
Contributor

@afishbeck My biggest questions are:

  1. Is the method of storing the 'effective' remoteStorage name in the ReferencedFile object sound, or is there a cleaner approach I didn't see?
  2. Is it reasonable to keep a separate function (splitDerivedDfsLocationOrRemote) for the purposes of processing the packagemap files (as opposed to where splitDerivedDfsLocation is used elsewhere)? It seemed that adding more features to the existing function would be over-complicating something that I would already have a hard time explaining what it did.
  1. Putting it in ReferencedFile is likely the least complicated option without doing some refactoring over all. If huge numbers of ReferencedFiles were being used you could find ways of reducing memory usage but I don't think that's a huge concern?

  2. Yes I think the way you organized the change to the splitDerivedABC calls makes sense.

@ghalliday ghalliday removed the request for review from afishbeck September 25, 2024 15:52
@asselitx asselitx requested a review from ghalliday September 25, 2024 16:12
@asselitx asselitx marked this pull request as ready for review September 25, 2024 16:13
@asselitx
Copy link
Contributor Author

@ghalliday Finalizing is still pending on running a few tests.

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see any logic flaws, but a couple of instances where the variable names are confusing.

{
StringBuffer ip;
StringBuffer cluster;
StringBuffer prefix;
splitDerivedDfsLocation(superFile.queryProp("@daliip"), cluster, ip, prefix, NULL, _cluster, _ip, _prefix);
StringBuffer remoteStorageName;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing having a variable remoteStorageName and _remoteStorageName. Can you rename one (or both) to clarify the difference.
Same applies to the next function.

@@ -126,13 +126,38 @@ void splitDerivedDfsLocation(const char *address, StringBuffer &cluster, StringB
prefix.append(basePrefix);
}

void splitDerivedDfsLocationOrRemote(const char *address, StringBuffer &cluster, StringBuffer &ip, StringBuffer &prefix,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't suggest it for this PR (or even as a priority), but this function has too many parameters, with little logical ordering - if it was refactored it would be good to think about which parameters should be grouped together, and possibly moved into a class.

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see any problems.
@afishbeckln please take another look

Copy link
Contributor

@afishbeckln afishbeckln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@ghalliday
Copy link
Member

@asselitx please squash.

Similar to how the daliip property is supported in the packagemap file, add
support for a remoteStorage property that mirrors the `--remote-storage` CLI
option.

 - Apply the same scoping rules as for daliip- values defined deeper in the
hierarchy take precedence, and anything defined in the file takes precedence
over a CLI option.
 - Disallow use of both daliip and remoteStorage in the same element.

Signed-off-by: Terrence Asselin <[email protected]>
@asselitx asselitx force-pushed the packagemap-override-hpcc-30368 branch from c400b9b to a189286 Compare October 14, 2024 14:20
@ghalliday ghalliday merged commit 260f93e into hpcc-systems:candidate-9.8.x Oct 14, 2024
52 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants