Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-30446 esp components fail to start in cloud due to invalid metrics #17880

Merged
merged 1 commit into from
Oct 24, 2023

Conversation

kenrowland
Copy link
Contributor

@kenrowland kenrowland commented Oct 9, 2023

Added code to metrics manager to remove illegal characters from metric name

Signed-Off-By: Kenneth Rowland [email protected]

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

@kenrowland kenrowland requested a review from afishbeck October 9, 2023 21:17
@github-actions
Copy link

github-actions bot commented Oct 9, 2023

Copy link
Member

@afishbeck afishbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kenrowland one comment / question.

system/jlib/jmetrics.cpp Outdated Show resolved Hide resolved
@kenrowland kenrowland requested a review from afishbeck October 11, 2023 17:43
Copy link
Member

@afishbeck afishbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kenrowland approved

@kenrowland
Copy link
Contributor Author

@ghalliday Please merge

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kenrowland
I think the net effect of this change is to prevent the metric being flagged as invalid, but it will not actually affect the name of the metric that is reported.
Are you sure this code should not be in the caller?

@@ -153,6 +153,13 @@ bool MetricsManager::addMetric(const std::shared_ptr<IMetric> &pMetric)
bool rc = false;
std::string name = pMetric->queryName();

// Remove unwanted characters from input name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments on this code:
i) It would possibly be better to not perform if the string doesn't contain any of those characters
ii) I think you could remove all characters in one go and avoid multiple passes/copies of the name. Something like
name.erase(std::remove_if(name.begin(), name.end(), [](unsigned char x) { strchr(removeChars, x) != nullptr }));
iii) Shouldn't this be the responsibility of the caller to pass a clean name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ghalliday

The case that caused the bug to be written is most likely because the ESP name from a helm chart (values.yaml probably) has a dash in it. The generated ESP service code for execution profiling uses the ESP process name when constructing metric names.

I considered adding code to the generated code to clean up the name prior to registering the metric, however a comment by Tony in the Jira suggested cleaning it up in the metrics code. The advantage of cleaning in the metrics code is it relieves all components from worrying about using names sourced from a configuration file (where a dash or other character may be legal) from having to clean the name prior to registering a metric.

If you prefer the caller clean the name first, I can add a utility function to the metrics framework and call it in the generated code prior to registration.

Your thoughts on pushing the responsibility to the caller?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @afishbeck concern was that the service name should not be constrained by the rules of the metrics.

However it would make sense for registerServiceMethodProfilingMetric() to ensure that it is generating a valid metric name before it tries to register it.

It would make sense for the metric frame work either require a metric name is valid, or to clean it, but with this change it is doing both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the removal to code in ESPCommon that registers the method profiling metric.

@@ -153,6 +153,13 @@ bool MetricsManager::addMetric(const std::shared_ptr<IMetric> &pMetric)
bool rc = false;
std::string name = pMetric->queryName();

// Remove unwanted characters from input name
char removeChars[] = "_-* ";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constexpr preferred.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

metricName.erase(no_, metricName.end());

// Remove unwanted characters from new metric name
constexpr char removeChars[] = "_-* ";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: Where was this list derived from how about ! : or other characters which will be rejected later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ghalliday
We could add more characters here, however this method is specific to creation of a profiling metric for ESP service methods. The intent is to remove any characters the code cannot control from sources where some characters are allowed, but are forbidden by the metrics framework. The sources in this case are the SCM files and the config itself (name of the ESP process).

For other uses of the framework, the component should be setting the metric name and ensuring it does not have illegal characters. In these cases, it might be the right thing to reject the metric lest the component owner thinks a metric has a character when the final registered metric does not. I can see the need for a separate function for cleaning a string that came from the config that could be used by components when building metric names,

Certainly we can take the approach that all metric names get scrubbed of illegal characters and add more to this list, but it would grow quite long since the regex for valid name essentially only allows upper and lower case letters, numbers, and periods (for hierarchical naming)

Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

squash the commits and open a new issue if we see any real life problems.

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kenrowland please squash
I'll merge as since it solves the immediate problem, but it is not clear that the list is sufficient.

metricName.erase(no_, metricName.end());

// Remove unwanted characters from new metric name
constexpr char removeChars[] = "_-* ";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

squash the commits and open a new issue if we see any real life problems.

…trics

Added code to metrics manager to remove illegal characters from metric name

Signed-Off-By: Kenneth Rowland [email protected]
@kenrowland
Copy link
Contributor Author

@ghalliday Please merge

@ghalliday ghalliday merged commit 0c0a576 into hpcc-systems:candidate-9.4.x Oct 24, 2023
46 of 49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants