Skip to content
@soma-monitoring-toolbox

soma-monitoring-toolbox

SOMA-1

SOMA is an open-source and freely available toolbox for composing distributed, high-performance computing (HPC) monitoring, analysis, and visualization services out of microservice components. SOMA employs the principle of composability to design, build, maintain, deploy, and extend monitoring capabilities. SOMA is inspired by and reliant on the Mochi framework from Argonne National Laboratory (https://mochi.readthedocs.io/) to build portable HPC microservices.

SOMA's design is guided by the following principles ("CPAIR"):

  1. Composability: Distinct, new functionality should go into a new microservice or library component --- that means no mixing of "related" functionality. We want to imbibe the principles of microservice design to the extent possible with little to no exceptions. The rationale here is to offer the user improved maintainability and scalability for monitoring service components while exploring the performance trade-offs resulting from a clean separation through services.
  2. Performance: While SOMA can be deployed on commodity clusters, it is designed to truly come to life on RDMA-enabled HPC clusters. By relying on a HPC-optimized service framework like Mochi, SOMA can transparently use high-performance RDMA networks and high-end computing accelerators such as GPUs to move and process data efficiently. These capabilities become critical must your monitoring service lie in the critical path of your distributed workflow.
  3. Accessibility: We intend to make high-performance monitoring capabilities accessible to all --- especially HPC application domain specialists. That means reducing the barrier-to-entry for new users and balancing the knobs that are hidden and the knobs exposed to the power user.
  4. Inclusion: SOMA intends to address the monitoring needs of existing traditional MPI applications as well as ML or AI-centic HPC workflows on the horizon. A service-based design allows SOMA's capabilities to be extended when required, while simultaneously increasing the degree of code reuse. Service-based designs also allow SOMA to be "cloud-ready" and deployable on your favorite CSP.
  5. Reuse: It cannot be overstated that the SOMA's goal is to not reinvent the wheel. Therefore, SOMA does not promise to provide all the functionality required for monitoring --- especially when archetypical, free-to-use software for such functionality is already provided elsewhere. For example, there exist several state-of-the-art HPC performance tools such as TAU, ScoreP, HPCToolkit, etc that offer sophisticated application performance measurement capabilities. SOMA's API would offer a thin adapter for the measurement APIs of these tools, allowing for reuse of existing instrumentation whereever possible. Similarly, SOMA does not intend to provide a full-blown monitoring dashboard solution --- rather, SOMA's output would be converted to a file format that existing tools such as Graphana or Zipkin can injest directly.

Popular repositories Loading

  1. soma-collector soma-collector Public

    soma-collector is the component exposing the core SOMA measurement API

    C++ 2 1

  2. .github .github Public

  3. LULESH LULESH Public

    This is our example version of lulesh - to showcase SOMA's functionality

    C++

Repositories

Showing 3 of 3 repositories
  • LULESH Public

    This is our example version of lulesh - to showcase SOMA's functionality

    soma-monitoring-toolbox/LULESH’s past year of commit activity
    C++ 0 0 0 0 Updated Oct 16, 2024
  • soma-collector Public

    soma-collector is the component exposing the core SOMA measurement API

    soma-monitoring-toolbox/soma-collector’s past year of commit activity
    C++ 2 1 0 0 Updated Oct 15, 2024
  • .github Public
    soma-monitoring-toolbox/.github’s past year of commit activity
    0 0 0 0 Updated Dec 21, 2022

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…