Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CASSSIDECAR-162: Sidecar endpoint to GET sstable's preemptive interval value #152

Open
wants to merge 6 commits into
base: trunk
Choose a base branch
from

Conversation

skoppu22
Copy link

@skoppu22 skoppu22 commented Dec 3, 2024

A new endpoint is developed in Sidecar, to retrieve value of C*'s sstable_preemptive_open_interval value. Currently end users have to invoke C*'s JMX call, which requires C* superuser role to invoke any JMX operation. With this change, end users can just use this endpoint, which abstracts underlying C*'s version dependencies/changes. We can enable authorization checks for these endpoints once we have authorization developed in Sidecar

Circle CI run: https://app.circleci.com/pipelines/github/skoppu22/cassandra-sidecar?branch=jmxep1

/**
* @return metrics related to invocation of C* JMX operations
*/
JmxOperationsMetrics jmxOperationsMetrics();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these Jmx operations metrics be part of a different PR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first Sidecar endpoint for a jmx operation, we need this, we will use the same for upcoming endpoints as well which invoke C* JMX operations. I understand there are existing calls as well in Sidecar invoking C* JMX, we will need to modify them to publish metrics. I prefer to have JmxOperationsMetrics accessible in JmxClient.java and record metrics there, that way each endpoint doesn't have to worry about publishing JMX calls metrics, instead JmxClient will take care of this for every JMX call being made, but I need to see the possibility of this.

On another thought, do we really need to track metrics of JMX calls? Does JVM (or something in Java world) provide this already? If so, we don't need this.

Note: this is different from Sidecar endpoints invocation metrics, which are already provided by vertx.

throw cassandraServiceUnavailable();
}


Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra space

StorageOperations storageOperations = delegate == null ? null : delegate.storageOperations();
if (storageOperations == null)
{
throw cassandraServiceUnavailable();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add some text to this error?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cassandraServiceUnavailable currently says "Cassandra service is unavailable". The same message goes back in client's response. As this is an internal error, adding further info may leak internal info to the client. We use the same cassandraServiceUnavailable almost everywhere without adding further details, must be the same reason I am assuming.

Any other kind of exception is caught in AbstractHandler's handle() and logged and updated in the response.

{
if (result.succeeded())
{
jmxOperationsMetrics.recordTimeTaken(operationName + "Succeeded",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record time taken method, I think we could explore the possibility of splitting out the Succeeded/Failed into its own Status parameter.

Copy link
Author

@skoppu22 skoppu22 Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we split the status, the metric would be like (, , ), i.e, that will be triplet. And also we may need count of failures to trigger escalation if needed, i.e, count after filtering on first two in this triplet. I am not sure is it feasible with metric system and dashboards we have now (I am not very good in these).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember doing the same in C* as well, creating metric name on the fly based on the status, something like operationSucceeded or operationFailed and publishing the time which also gives us the failure count indirectly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed metrics code from this PR, will open separate PR just for JMX calls metrics

// Endpoint to retrieve sstable's preemptiveOpenInterval value.
// Value returned is in MB, may return negative value when disabled
private static final String SSTABLE = "/sstable";
public static final String SSTABLE_PREEMPTIVE_OPEN_INTERVAL_ROUTE = API_V1 + CASSANDRA + SSTABLE +
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we are reusing the getSsTablePreemtiveOpenIntervalInMB JMX operation, but, from a pure API standpoint, we are omitting the MB part in the endpoint, but returning it as part of the response (SSTablePreemptiveOpenIntervalInMB). My suggestion for this would be to either:

  • We add the MB part to the endpoint (/preemptive-open-interval-in-mb)
    Or
  • We just support an optional unit filtering parameter in the url, that can be extended in the future with other unit types if needed. That way, the url would end in /preemptive-open-interval?unit=mb (remember that the unit is optional with MB as a default, so no need to pass it from the client).

That would make us have to modify the response to something like:

{
  "SSTablePreemptiveOpenInterval": 1234,
  "unit": "MB"
}

What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We want to make endpoint generic, so underlying changes to C* shouldn't impact the endpoint usage for the end user, for example C* 6.0 returning value in different unit, while older versions continue to return the value in MB, the same endpoint should work regardless of the running c* version. Hence not including MB in the endpoint name. Whereas JMX operation name need to match exactly the JMX endpoint we are invoking. The same with passing option param like unit=MB, better we avoid supporting new unit param in the endpoint whenever C* changes for different versions.

  • In the response we need to somehow tell the end user that value returned is in in MB. I preferred the name SSTablePreemptiveOpenIntervalInMB as it matches the JMX operation currently the end users are calling. Your suggestion of separating the unit into another field sounds good, but that may complicate callers's code, as they need to parse unit as a string and convert value into those units accordingly. C* latest code changed this yaml param to for example "60MiB", let me see which one is simpler for the calling code "60MiB" or {60, "MiB"} and will get back to you

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did explore this a little, I believe the one you suggested /preemptive-open-interval?unit=mb is more easy way to implement and also is in-line with current JMX call semantics. Other alternatives unnecessarily complicates conversions from human readable format to byte size at the client.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added unit=MiB param, unit names matches with names C* is using.

}));
}

private void verifyResponse(VertxTestContext context, HttpResponse<Buffer> response, String expectedValue)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: verifyValidResponse?

@@ -213,4 +216,17 @@ public void outOfRangeDataCleanup(@NotNull String keyspace, @NotNull String tabl
jmxClient.proxy(StorageJmxOperations.class, STORAGE_SERVICE_OBJ_NAME)
.forceKeyspaceCleanup(concurrency, keyspace, table);
}

@Override
public GetPreemptiveOpenIntervalResponse getSSTablePreemptiveOpenInterval(DataStorageUnit unit)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed 'inMB' from StorageOperations function names to make them generic for supporting other units later on.

* Invokes C* StorageServiceMBean's JMX function getSSTablePreemptiveOpenIntervalInMB
* @return the same value returned by the JMX operation
*/
int getSSTablePreemptiveOpenIntervalInMB();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only place we use 'inMB', this should exactly match C* JMX function name

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants