Cluster Insights with ClusterWatch

By Raj Marni. March 28, 2025. Revised. Version: 0.0.11

1. Overview

In the k8or Orbit ecosystem, ClusterWatch plays a critical role in cluster monitoring. It continuously collects, stores, and queries time-series data from various sources within Kubernetes clusters—providing deep insights into resource usage, application performance, and system health. By integrating seamlessly with other observability components such as HelloScope (for visualization) and K8rix (Cluster Matrix), ClusterWatch ensures that administrators and developers have real‑time data and alerts, enabling proactive maintenance and troubleshooting.


2. Key Functions

  1. Metrics Collection & Storage

    • Data Scraping: ClusterWatch collects metrics from nodes, pods, and custom exporters (e.g., Node Exporter, cAdvisor) running in each K3s cluster.

    • Time-Series Database: Stores historical data for performance analysis, trend tracking, and capacity planning.

  2. Alerting & Threshold Management

    • Alert Rules: Define conditions (e.g., high CPU/memory usage, pod restarts) that trigger alerts.

    • Alert Dispatching: Integrates with Alertmanager to notify teams via email, Slack, or other communication channels when critical thresholds are exceeded.

  3. Data Querying & Analysis

    • PromQL: Enables powerful querying of metrics data to identify performance bottlenecks or anomalous behavior.

    • Integration with HelloScope: Feeds data into HelloScope dashboards for visual analysis and real‑time monitoring.

  4. Scalability & Performance

    • Efficient Data Storage: Optimized for handling high volumes of metrics data across multiple clusters.

    • Horizontal Scalability: Can be scaled out using techniques such as federation or remote storage integrations when needed.


3. Architecture & Interactions

3.1 Internal Components & Data Flow

  • Scraping Agents:

    • Node Exporter & cAdvisor: Deployed on each K3s node to expose hardware and container metrics.

    • Custom Exporters: Optionally, additional exporters gather application-specific metrics.

  • ClusterWatch Server:

    • Periodically scrapes metrics endpoints from these agents.

    • Aggregates and stores metrics in its time-series database.

3.2 Interactions with k8or Orbit Components

  • Integration with Cluster Matrix (K8rix, C84):

    • ClusterWatch data is used by K8rix to display real-time resource utilization and cluster health directly on the Cluster Matrix dashboards.

  • Visualization with HelloScope (Part of InsightHub):

    • ClusterWatch serves as the primary data source for HelloScope, enabling the creation of custom dashboards that provide detailed insights into cluster performance, historical trends, and operational alerts.

  • Access & Security via AccessPoint (C52):

    • Metrics collection and API calls from ClusterWatch are routed through AccessPoint, ensuring that data flows are secure and authenticated.

  • Alerting via Integration with Alertmanager:

    • ClusterWatch works with Alertmanager to deliver alerts to appropriate channels, which can then trigger automated responses or manual interventions.

  • Continuous Delivery Feedback (ArgoCD, C108):

    • Deployment events (e.g., rollouts, scaling) trigger changes in cluster state that are reflected in ClusterWatch metrics, closing the loop in the CD pipeline.


4. Benefits & Impact

  1. Real-Time Operational Visibility

    • Provides administrators with immediate insights into cluster performance, resource usage, and potential issues.

  2. Proactive Issue Resolution

    • Automated alerting helps teams quickly identify and address anomalies before they impact production.

  3. Historical Data Analysis

    • Time-series data enables trend analysis, capacity planning, and performance forecasting.

  4. Scalability

    • Designed to handle high volumes of data across multiple clusters, ensuring monitoring keeps pace with growth.

  5. Seamless Integration

    • Works in concert with other k8or Orbit components (e.g., HelloScope, K8rix, AccessPoint) to provide a unified monitoring solution that covers both the management and runtime aspects of the infrastructure.

Last updated