You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For performance testing, it is important to understand the CPU/Memory/Disk/Network tradeoffs.
These metrics are presently available for nodes in the OpenSearch cluster via the Nodes stats API
However, these metrics are not available for extensions run on a remote node. While EC2 provides some metrics, they do not provide sufficient detail for performance measurement. In particular:
from an external OS perspective, the Java process is consuming all the memory it has claimed for its heap, while the JVM has its own stats on the portion of the heap it has consumed, along with GC stats and other interesting tidbits.
whole-server CPU can be measured and if the only significant process consuming CPU is the extension this is an acceptable proxy, but this won't work for extension nodes hosting multiple extensions
What solution would you like?
Option 1: A whole new extension-specific stats API could be created, e.g., GET /_extensions/_uniqueId/stats. This may be preferable particularly if the extension only returns a subset of stats available on the OpenSearch API. One problem is this imposes a restriction on APIs that the extension itself may implement; we may want to reserve some prefix that implementers can avoid stepping on.
Option 2: The existing API GET /_nodes/<node_id>/stats could be modified to permit returning stats from an extension. If the extension has a node id that would be preferable, or we could use a specific text like "extension:uniqueid".
What alternatives have you considered?
The Java process for the extension could have JMX enabled. This opens a port and various processes can query that JMX port for information. See for example this post which runs a daemon on the server to query the JMX port and send the values to Cloudwatch. This unfortunately creates potential security issues with an open port on the extension.
The java process could be modified internally to create a daemon thread to log (configured) stats to a log file and/or inserted into an OpenSearch index.
Do you have any additional context?
I'm currently pursuing the alternative (JMX+collectd->cloudwatch) to keep performance testing unblocked, but I don't think it's the best long term solution, thus this issue.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem?
For performance testing, it is important to understand the CPU/Memory/Disk/Network tradeoffs.
These metrics are presently available for nodes in the OpenSearch cluster via the Nodes stats API
However, these metrics are not available for extensions run on a remote node. While EC2 provides some metrics, they do not provide sufficient detail for performance measurement. In particular:
What solution would you like?
Option 1: A whole new extension-specific stats API could be created, e.g.,
GET /_extensions/_uniqueId/stats
. This may be preferable particularly if the extension only returns a subset of stats available on the OpenSearch API. One problem is this imposes a restriction on APIs that the extension itself may implement; we may want to reserve some prefix that implementers can avoid stepping on.Option 2: The existing API
GET /_nodes/<node_id>/stats
could be modified to permit returning stats from an extension. If the extension has a node id that would be preferable, or we could use a specific text like "extension:uniqueid".What alternatives have you considered?
The Java process for the extension could have JMX enabled. This opens a port and various processes can query that JMX port for information. See for example this post which runs a daemon on the server to query the JMX port and send the values to Cloudwatch. This unfortunately creates potential security issues with an open port on the extension.
The java process could be modified internally to create a daemon thread to log (configured) stats to a log file and/or inserted into an OpenSearch index.
Do you have any additional context?
I'm currently pursuing the alternative (JMX+collectd->cloudwatch) to keep performance testing unblocked, but I don't think it's the best long term solution, thus this issue.
The text was updated successfully, but these errors were encountered: