-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CASSSIDECAR-162: Sidecar endpoint to GET sstable's preemptive interval value #152
base: trunk
Are you sure you want to change the base?
Conversation
Integration test using dtest Sidecar client changes
/** | ||
* @return metrics related to invocation of C* JMX operations | ||
*/ | ||
JmxOperationsMetrics jmxOperationsMetrics(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these Jmx operations metrics be part of a different PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the first Sidecar endpoint for a jmx operation, we need this, we will use the same for upcoming endpoints as well which invoke C* JMX operations. I understand there are existing calls as well in Sidecar invoking C* JMX, we will need to modify them to publish metrics. I prefer to have JmxOperationsMetrics accessible in JmxClient.java and record metrics there, that way each endpoint doesn't have to worry about publishing JMX calls metrics, instead JmxClient will take care of this for every JMX call being made, but I need to see the possibility of this.
On another thought, do we really need to track metrics of JMX calls? Does JVM (or something in Java world) provide this already? If so, we don't need this.
Note: this is different from Sidecar endpoints invocation metrics, which are already provided by vertx.
throw cassandraServiceUnavailable(); | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra space
StorageOperations storageOperations = delegate == null ? null : delegate.storageOperations(); | ||
if (storageOperations == null) | ||
{ | ||
throw cassandraServiceUnavailable(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to add some text to this error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cassandraServiceUnavailable currently says "Cassandra service is unavailable". The same message goes back in client's response. As this is an internal error, adding further info may leak internal info to the client. We use the same cassandraServiceUnavailable almost everywhere without adding further details, must be the same reason I am assuming.
Any other kind of exception is caught in AbstractHandler's handle() and logged and updated in the response.
{ | ||
if (result.succeeded()) | ||
{ | ||
jmxOperationsMetrics.recordTimeTaken(operationName + "Succeeded", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record time taken method, I think we could explore the possibility of splitting out the Succeeded
/Failed
into its own Status parameter
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we split the status, the metric would be like (, , ), i.e, that will be triplet. And also we may need count of failures to trigger escalation if needed, i.e, count after filtering on first two in this triplet. I am not sure is it feasible with metric system and dashboards we have now (I am not very good in these).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember doing the same in C* as well, creating metric name on the fly based on the status, something like operationSucceeded or operationFailed and publishing the time which also gives us the failure count indirectly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed metrics code from this PR, will open separate PR just for JMX calls metrics
// Endpoint to retrieve sstable's preemptiveOpenInterval value. | ||
// Value returned is in MB, may return negative value when disabled | ||
private static final String SSTABLE = "/sstable"; | ||
public static final String SSTABLE_PREEMPTIVE_OPEN_INTERVAL_ROUTE = API_V1 + CASSANDRA + SSTABLE + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we are reusing the getSsTablePreemtiveOpenIntervalInMB
JMX operation, but, from a pure API standpoint, we are omitting the MB
part in the endpoint, but returning it as part of the response (SSTablePreemptiveOpenIntervalInMB
). My suggestion for this would be to either:
- We add the
MB
part to the endpoint (/preemptive-open-interval-in-mb
)
Or - We just support an optional
unit
filtering parameter in the url, that can be extended in the future with other unit types if needed. That way, the url would end in/preemptive-open-interval?unit=mb
(remember that the unit is optional with MB as a default, so no need to pass it from the client).
That would make us have to modify the response to something like:
{
"SSTablePreemptiveOpenInterval": 1234,
"unit": "MB"
}
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
We want to make endpoint generic, so underlying changes to C* shouldn't impact the endpoint usage for the end user, for example C* 6.0 returning value in different unit, while older versions continue to return the value in MB, the same endpoint should work regardless of the running c* version. Hence not including MB in the endpoint name. Whereas JMX operation name need to match exactly the JMX endpoint we are invoking. The same with passing option param like unit=MB, better we avoid supporting new unit param in the endpoint whenever C* changes for different versions.
-
In the response we need to somehow tell the end user that value returned is in in MB. I preferred the name SSTablePreemptiveOpenIntervalInMB as it matches the JMX operation currently the end users are calling. Your suggestion of separating the unit into another field sounds good, but that may complicate callers's code, as they need to parse unit as a string and convert value into those units accordingly. C* latest code changed this yaml param to for example "60MiB", let me see which one is simpler for the calling code "60MiB" or {60, "MiB"} and will get back to you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did explore this a little, I believe the one you suggested /preemptive-open-interval?unit=mb is more easy way to implement and also is in-line with current JMX call semantics. Other alternatives unnecessarily complicates conversions from human readable format to byte size at the client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added unit=MiB param, unit names matches with names C* is using.
})); | ||
} | ||
|
||
private void verifyResponse(VertxTestContext context, HttpResponse<Buffer> response, String expectedValue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: verifyValidResponse?
@@ -213,4 +216,17 @@ public void outOfRangeDataCleanup(@NotNull String keyspace, @NotNull String tabl | |||
jmxClient.proxy(StorageJmxOperations.class, STORAGE_SERVICE_OBJ_NAME) | |||
.forceKeyspaceCleanup(concurrency, keyspace, table); | |||
} | |||
|
|||
@Override | |||
public GetPreemptiveOpenIntervalResponse getSSTablePreemptiveOpenInterval(DataStorageUnit unit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed 'inMB' from StorageOperations function names to make them generic for supporting other units later on.
* Invokes C* StorageServiceMBean's JMX function getSSTablePreemptiveOpenIntervalInMB | ||
* @return the same value returned by the JMX operation | ||
*/ | ||
int getSSTablePreemptiveOpenIntervalInMB(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only place we use 'inMB', this should exactly match C* JMX function name
A new endpoint is developed in Sidecar, to retrieve value of C*'s sstable_preemptive_open_interval value. Currently end users have to invoke C*'s JMX call, which requires C* superuser role to invoke any JMX operation. With this change, end users can just use this endpoint, which abstracts underlying C*'s version dependencies/changes. We can enable authorization checks for these endpoints once we have authorization developed in Sidecar
Circle CI run: https://app.circleci.com/pipelines/github/skoppu22/cassandra-sidecar?branch=jmxep1