Component C132 – NATS
By Raj Marni. March 27, 2025. Revised. Version: 0.0.09
1. Overview
NATS (C132) is a lightweight, high-performance pub/sub messaging system used by k8or Orbit to enable asynchronous, decoupled communication across various microservices or components (Portal, Manifestor, SyncMaster, etc.) and cluster-plane agents. By leveraging subject-based routing, NATS lets components publish events on specific “subjects” while other components subscribe to those subjects of interest—promoting an event-driven architecture that scales with minimal coupling.

2. Internal Modules & Responsibilities
2.1 NATS Server/Broker
Core Routing Engine:
Maintains subject-based subscriptions, ensures published messages are delivered to all relevant subscribers.
Implements the wire protocol (TCP, possibly TLS) for client connections.
Connection Manager:
Tracks active client sessions from orbit-plane services or cluster-plane agents.
Manages heartbeats, keep-alives, and auto-disconnects if a client goes silent or fails to authenticate.
Message Buffer & Delivery:
Buffers messages briefly if a subscriber is momentarily unavailable (in some configurations), ensuring best-effort or guaranteed delivery (depending on NATS mode, e.g., NATS JetStream for persistence).
2.2 Security & Policy Layer
Authentication:
May rely on token-based or user/pass credentials for each client.
Could integrate with orbit-plane’s IAM or AccessPoint for dynamic credential provisioning.
Authorization:
Subject-level authorization ensures only specific microservices can publish or subscribe to certain topics (e.g., “deploy.*”, “transfer.completed”).
This layer might check an external config or policy store to grant or deny pub/sub actions.
Encryption in Transit:
Typically uses TLS or a secure connection so that messages remain private and tamper-proof.
2.3 Orbit-Plane and Cluster-Plane Integrations
Orbit-Plane Services:
Each microservice/component (Portal Deploy Logic, Manifestor, etc.) includes a NATS client.
Publishes events (e.g., “image.uploaded”) or subscribes to subjects (e.g., “deployment.status”) to react in real time.
Cluster-Plane Agents:
Agents or sidecars in the K3s clusters can publish operational events (“node.scaled”, “pod.crashloop”) or logs, which orbit-plane subscribers can act upon.
They also might subscribe to commands or configuration updates from orbit-plane components.
3. Data Flow & Process IDs
Below is a generic example of how messages might flow:
Publishing
A microservice (Portal Transfer Logic) finishes transferring an image. It publishes a message on subject
transfer.completed
with relevant metadata.This call might be labeled with a PID like
c8bmsXX-c132bus-e20
, indicating the message was sent from the Portal back-end (c8bmsXX) to NATS (C132).
Routing & Delivery
The NATS server sees there are multiple subscribers to
transfer.completed
(maybe a logging service, a metrics aggregator, and a Slack integration).It delivers the message to each subscribed client. The message might have a small ephemeral buffer or be persisted if using NATS JetStream.
Consumption
Each subscriber processes the event in their own way (logging it, updating UI, etc.).
If an error occurs, the subscriber can handle it locally or publish a new message (like “transfer.error”) that other components might watch.
Cluster-Plane Interaction
If a cluster-plane agent wants to signal a new node addition, it publishes “cluster.dev.nodeAdded”. The orbit-plane’s management microservice, subscribed to “cluster.*.nodeAdded”, receives it and updates the UI or triggers a new environment config.
4. Error Handling & Observability
Client Connection Failures
If a microservice loses connectivity or fails to authenticate, NATS logs the event and the microservice might attempt reconnection.
The orbit-plane monitoring stack can watch for high disconnection rates or failed auth attempts.
Subject Overlaps or Collisions
In subject-based routing, well-defined naming conventions help avoid confusion or collisions. E.g., “deploy.prod.” or “deploy.dev.”.
If a subject is misnamed, no subscribers will receive the message, or unauthorized subscribers might not have permission.
Performance Monitoring
NATS provides metrics (message rates, latencies, queue sizes) which can feed into the orbit-plane’s observability stack (Prometheus).
High message volume or slow consumers can lead to backpressure, so the system might require additional NATS servers or a cluster for scaling.
Message Persistence (as required by an use case)
If ephemeral messages suffice, NATS in standard mode is used.
If guaranteed delivery is needed, JetStream or another persistence layer can store messages until consumed or for replay.
5. Security & Policy Enforcement
Subject-Level ACL:
Administrators define which microservices can publish or subscribe to each subject. For example, only the Portal back-end can publish “deploy.request” while certain cluster-plane agents subscribe to it.
Integration with AccessPoint:
If used, AccessPoint might handle or distribute short-lived credentials for NATS connections, ensuring that only authorized microservices get valid tokens to connect.
Auditing:
Potential logs: who published which message, from which IP, with which subject. This can be stored for compliance or forensic analysis.
6. Outcomes & Benefits
Decoupled Event-Driven Architecture
Encourages each component to act on events it cares about, without direct coupling or synchronous calls.
Scalability & Resilience
As new services come online, they simply subscribe to existing subjects or create new ones. The messaging layer can scale horizontally if needed.
Faster Development
Teams can add features (like logging or analytics) that just subscribe to relevant events, with minimal changes to the original publisher code.
Real-Time Updates
The entire orbit-plane or cluster-plane can respond in near real-time to events, enabling dynamic scaling, immediate logging, or user notifications.
Last updated