Component C132 – NATS

By Raj Marni. March 27, 2025. Revised. Version: 0.0.09

1. Overview

NATS (C132) is a lightweight, high-performance pub/sub messaging system used by k8or Orbit to enable asynchronous, decoupled communication across various microservices or components (Portal, Manifestor, SyncMaster, etc.) and cluster-plane agents. By leveraging subject-based routing, NATS lets components publish events on specific “subjects” while other components subscribe to those subjects of interest—promoting an event-driven architecture that scales with minimal coupling.

2. Internal Modules & Responsibilities

2.1 NATS Server/Broker

Core Routing Engine:
- Maintains subject-based subscriptions, ensures published messages are delivered to all relevant subscribers.
- Implements the wire protocol (TCP, possibly TLS) for client connections.
Connection Manager:
- Tracks active client sessions from orbit-plane services or cluster-plane agents.
- Manages heartbeats, keep-alives, and auto-disconnects if a client goes silent or fails to authenticate.
Message Buffer & Delivery:
- Buffers messages briefly if a subscriber is momentarily unavailable (in some configurations), ensuring best-effort or guaranteed delivery (depending on NATS mode, e.g., NATS JetStream for persistence).

2.2 Security & Policy Layer

Authentication:
- May rely on token-based or user/pass credentials for each client.
- Could integrate with orbit-plane’s IAM or AccessPoint for dynamic credential provisioning.
Authorization:
- Subject-level authorization ensures only specific microservices can publish or subscribe to certain topics (e.g., “deploy.*”, “transfer.completed”).
- This layer might check an external config or policy store to grant or deny pub/sub actions.
Encryption in Transit:
- Typically uses TLS or a secure connection so that messages remain private and tamper-proof.

2.3 Orbit-Plane and Cluster-Plane Integrations

Orbit-Plane Services:
- Each microservice/component (Portal Deploy Logic, Manifestor, etc.) includes a NATS client.
- Publishes events (e.g., “image.uploaded”) or subscribes to subjects (e.g., “deployment.status”) to react in real time.
Cluster-Plane Agents:
- Agents or sidecars in the K3s clusters can publish operational events (“node.scaled”, “pod.crashloop”) or logs, which orbit-plane subscribers can act upon.
- They also might subscribe to commands or configuration updates from orbit-plane components.

3. Data Flow & Process IDs

Below is a generic example of how messages might flow:

Publishing
- A microservice (Portal Transfer Logic) finishes transferring an image. It publishes a message on subject transfer.completed with relevant metadata.
- This call might be labeled with a PID like c8bmsXX-c132bus-e20, indicating the message was sent from the Portal back-end (c8bmsXX) to NATS (C132).
Routing & Delivery
- The NATS server sees there are multiple subscribers to transfer.completed (maybe a logging service, a metrics aggregator, and a Slack integration).
- It delivers the message to each subscribed client. The message might have a small ephemeral buffer or be persisted if using NATS JetStream.
Consumption
- Each subscriber processes the event in their own way (logging it, updating UI, etc.).
- If an error occurs, the subscriber can handle it locally or publish a new message (like “transfer.error”) that other components might watch.
Cluster-Plane Interaction
- If a cluster-plane agent wants to signal a new node addition, it publishes “cluster.dev.nodeAdded”. The orbit-plane’s management microservice, subscribed to “cluster.*.nodeAdded”, receives it and updates the UI or triggers a new environment config.

4. Error Handling & Observability

Client Connection Failures
- If a microservice loses connectivity or fails to authenticate, NATS logs the event and the microservice might attempt reconnection.
- The orbit-plane monitoring stack can watch for high disconnection rates or failed auth attempts.
Subject Overlaps or Collisions
- In subject-based routing, well-defined naming conventions help avoid confusion or collisions. E.g., “deploy.prod.” or “deploy.dev.”.
- If a subject is misnamed, no subscribers will receive the message, or unauthorized subscribers might not have permission.
Performance Monitoring
- NATS provides metrics (message rates, latencies, queue sizes) which can feed into the orbit-plane’s observability stack (Prometheus).
- High message volume or slow consumers can lead to backpressure, so the system might require additional NATS servers or a cluster for scaling.
Message Persistence (as required by an use case)
- If ephemeral messages suffice, NATS in standard mode is used.
- If guaranteed delivery is needed, JetStream or another persistence layer can store messages until consumed or for replay.

5. Security & Policy Enforcement

Subject-Level ACL:
- Administrators define which microservices can publish or subscribe to each subject. For example, only the Portal back-end can publish “deploy.request” while certain cluster-plane agents subscribe to it.
Integration with AccessPoint:
- If used, AccessPoint might handle or distribute short-lived credentials for NATS connections, ensuring that only authorized microservices get valid tokens to connect.
Auditing:
- Potential logs: who published which message, from which IP, with which subject. This can be stored for compliance or forensic analysis.

6. Outcomes & Benefits

Decoupled Event-Driven Architecture
- Encourages each component to act on events it cares about, without direct coupling or synchronous calls.
Scalability & Resilience
- As new services come online, they simply subscribe to existing subjects or create new ones. The messaging layer can scale horizontally if needed.
Faster Development
- Teams can add features (like logging or analytics) that just subscribe to relevant events, with minimal changes to the original publisher code.
Real-Time Updates
- The entire orbit-plane or cluster-plane can respond in near real-time to events, enabling dynamic scaling, immediate logging, or user notifications.

PreviousComponent C128 – K8Rngr NextUpload Image

Last updated 3 months ago