1. Architecture Overview
Universal mesh AI platform. Every node type, every protocol, zero-trust security, minimal configuration.
┌──────────────────────────────────────────────────────────────────────┐
│ LINUS-AI Node │
│ │
│ ┌─────────┐ ┌──────────┐ ┌─────────┐ ┌──────────┐ ┌────────┐ │
│ │linus_ai-│ │linus_ai- │ │linus_ai-│ │linus_ai- │ │linus_ai│ │
│ │ http │ │ ai │ │ net │ │ vault │ │guardian│ │
│ │(axum API│ │(inference│ │(mDNS+ │ │(encrypted│ │(2FA + │ │
│ │ + SSE) │ │ engine) │ │ TCP) │ │ SQLite) │ │ TOTP) │ │
│ └────┬────┘ └────┬─────┘ └────┬────┘ └────┬─────┘ └───┬────┘ │
│ └────────────┴─────────────┴─────────────┴────────────┘ │
│ linus-ai-core │
│ (EventBus · Config · Crypto · Types) │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────────────┐ │
│ │linus_ai- │ │linus_ai- │ │linus_ai- │ │ linus-ai-bin │ │
│ │ thermal │ │ task │ │blockchain│ │ (main.rs entrypoint) │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────────────────┘ │
└──────────────────────────────────────────────────────────────────────┘
│ HTTP :9480 │ TCP :9479
▼ ▼
Browser / API clients Mesh peers (mDNS + Tailscale)
Core Design Principles
- Zero-config discovery — UDP multicast on port 9479, auto-role from hardware score
- Zero-trust security — every mesh message HMAC-SHA256 signed; vault data ChaCha20-AEAD encrypted
- Single integration point — one HTTP API (
/infer,/jobs,/settings) for all connected software - Role-aware routing — tasks automatically route to the best available hardware
- py2c scripting — any automation expressible in a type-annotated
.pyfile, compiled to native
2. Node Taxonomy & Scoring
| # | Type | Examples | Typical Role |
|---|---|---|---|
| 1 | IoT Device | Smart sensors, soil probes | edge |
| 2 | Edge Gateway | Raspberry Pi, Jetson Nano | spoke |
| 3 | Cloud VM | AWS EC2, GCP CE, Azure VM | hub / master_hub |
| 4 | Workstation | Mac Studio, threadripper | hub / master_hub |
| 5 | Mobile | Android ARM64, iOS | edge |
| 6 | 5G Device | gNB, CPE, mmWave AP | spoke |
| 7 | Industrial | PLC, SCADA server | spoke |
| 8 | Browser | WASM node | edge |
Hardware Scoring (automatic)
composite = RAM_score + GPU_score + Model_score − Thermal_penalty − Load_penalty RAM: ≥64 GB = 40 · ≥32 GB = 30 · ≥16 GB = 20 · ≥8 GB = 10 · <8 GB = 5 GPU: CUDA = 35 · Metal = 30 · ROCm = 25 · Vulkan = 15 · CPU-only = 0
3. Role Hierarchy
master_hub (score ≥ 80) ├── hub (score ≥ 50) │ ├── worker (score ≥ 20) │ │ └── edge (score < 20) │ └── student (learning mode, any score) └── spoke (remote/WAN node)
Role assignments are automatic on peer connect. Override via:
4. Module Reference
linus-ai-core
EventBus— async pub/sub backbone (tokio broadcast channels)LinusAIConfig— single config struct; all settings + env var overrides- Crypto primitives:
chacha20_encrypt/decrypt,aead_encrypt/decrypt,pbkdf2_key,sign_message,verify_signature,merkle_root - Types:
NodeRole,Priority,TaskState,ThermalState
linus-ai-net
MeshNetwork— Tailscale peer discovery + mDNS multicast (224.0.0.251:5353) + TCP wire- Wire protocol: 4-byte length-prefixed frames, HMAC-SHA256 auth, heartbeats every 30 s
- Auto-role assignment on new peer connect
- Auto-push largest fitting GGUF to peers with 0 models and ≥ 4 GB RAM
linus-ai-inference
ModelManager— GGUF format parser (zero deps), scan/load/hot-swapInferenceEngine— thermal-aware request batching + SSE streaming- Backend selected at compile time via Cargo feature flags (set by
build.sh):llama-bundled(default) — statically linked llama.cpp; Metal/CUDA GPU auto-enabledcandle-only— pure-Rust HuggingFace candle; safe for all cross-compilation targets- (no features) — subprocess fallback: llama-server HTTP → ollama HTTP → llama-cli → error
linus-ai-http
- Axum HTTP server on
:9480 - SSE streaming for
/infer/streamand/agent/stream - Hardware detection (
/hardware) - Multi-part upload with PDF/DOCX/TXT extraction (
/upload)
linus-ai-vault
VaultDB— per-row ChaCha20+HMAC-SHA256 AEAD encrypted SQLite- Key hierarchy: OS keychain → DPAPI (Win) → Secret Service (Linux) → PBKDF2+file (0600)
- Audit hash chain; optional Guardian 2FA gate per operation
linus-ai-guardian
Guardian— 2FA, TOTP (RFC 6238), sessions, bank-style auth gatescreate_user,login,verify_totp,enroll_totp- Auth gates:
create_gate→approve_gate(gate_id, totp_code)
linus-ai-thermal
ThermalGovernor — 5-stage machine:
| State | Temp | Action |
|---|---|---|
| NOMINAL | < 70 °C | Normal operation |
| THROTTLE | 70–79 °C | Reduce batch size |
| HOT | 80–84 °C | Route new requests to peers |
| CRITICAL | 85–89 °C | Queue only, no new inference |
| EMERGENCY | ≥ 90 °C | HTTP 503, drain queue |
linus-ai-task
TaskScheduler— distributed job scheduler (LINUS-AI native format)- Job types:
service,batch,system - Allocation scoring: hardware composite + thermal state + current load
linus-ai-blockchain
TransparencyLedger— append-only SHA-256 hash chain in SQLite- Every inference request, model load, and peer event logged
GET /blockchain/entries?limit=100— audit trail
linus_ai.compliance (Python)
- Domain-specific compliance enforcement for 14 industry profiles across 4 tiers (OPEN/AUDIT/REGULATED/RESTRICTED)
PIIScanner— 12 PII types; CREDIT_CARD/CVV/SSN/PAN_LIKE block; others redact+warnInjectionDetector— 8 rule families; RESTRICTED hard-blocks; REGULATED warnsConsentManager— REGULATED/RESTRICTED require explicit user consent; stored in~/.linus-ai/consent.jsonAuditLogger— HMAC-SHA256 chained monthly.jsonlfiles;verify_chain()detects tampering;seal_completed_months()applies OS-level immutability- Env vars:
LINUS_AI_AUDIT_DIR(primary),LINUS_AI_AUDIT_EXPORT_DIRS(colon-separated SIEM sinks)
linus_ai.rag_access (Python)
- Fine-grained RAG document access control with 5 classification levels (PUBLIC → TOP_SECRET)
DocumentACL— allow/deny lists at user, role, department, division, company scopeRAGAccessController— 7-step decision algorithm; explicit DENY always winsfilter_rag_chunks()— filters a list of RAG chunks to only those the principal may readRAGAuditLogger— same HMAC-chain format as compliance audit; propagates to export dirsDocumentRegistry,PrincipalRegistry— JSON-backed persistent stores
5. HTTP API Reference
All endpoints on http://localhost:9480 (port configurable via LINUS_AI_API_PORT).
Inference
Agent (ReAct loop)
Models
Mesh
Vault & Guardian
Status & Monitoring
Compliance & Security
RAG Document Access Control
Agent Profiles
6. Mesh & Overlay Protocol
LAN Discovery (mDNS)
Peers announce via UDP multicast to 224.0.0.251:5353. Announcement payload (JSON):
WAN Overlay Relay
For nodes that cannot reach each other directly (NAT, firewall):
Node A ──TCP──► Relay Server ──TCP──► Node B
:9777
Node → relay message types: register, heartbeat, peers, relay
Relay → node message types: registered, pong, peers, relay, relay_ack, error
7. Security Model
Mesh Authentication
Every message between mesh peers carries an HMAC-SHA256 signature:
Peers reject messages with: wrong signature · timestamp delta > 60 s · unknown node_id.
Vault Encryption
Each vault record is encrypted independently:
The vault passphrase is stored in the OS keychain (macOS Keychain · Windows DPAPI · Linux Secret Service). Falls back to a 0600-permission file if no keychain is available.
Guardian 2FA
Guardian wraps any sensitive operation in an auth gate:
8. Inference Pipeline
Request
│
▼
ThermalGovernor ──EMERGENCY?──► HTTP 503
│
▼ (NOMINAL / THROTTLE / HOT)
MeshRouter
├─ score peers (composite − load − thermal)
├─ HOT state? → prefer peers
├─ THROTTLE? → reduce batch_size
└─ local? → InferenceEngine
│
▼
Backend selection
llama-bundled (default) ← compile-time feature
candle-only ← compile-time feature
subprocess fallback:
llama-server → ollama → llama-cli → error
│
▼
SSE token stream → HTTP client
│
▼
TransparencyLedger (log event)
Agent ReAct Loop
user message │ ▼ fan-out to ≤ 3 peers (Connected, load < 80%, models > 0, score ≥ 35) │ │ ▼ peer results ▼ local ReAct thought → action → observation → thought → ... → local_result │ ▼ synthesis (hub combines all peer results + local result) │ ▼ SSE token stream
Tools available in agent mode: search (3-tier: Google News RSS → DDG HTML → DDG Instant), infer_peer, vault_read, vault_write, thermal_status.
9. py2c User Scripts
py2c translates a Python-subset to C17, compiles with the platform toolchain, and links against nxrt.h (zero-dependency C17 runtime). The same .py file runs interpreted (python3 script.py) and compiles to any of 11 targets.
Language Subset
Supported:
- Type annotations on all function parameters and local variables (required for compiled path)
int,float,bool,str,bytes,list[T],Optional[T]- All arithmetic, bitwise, comparison, and boolean operators
if/elif/else,while,forwithrange()@dataclassclasses,match/case(Python 3.10+), f-strings, walrus operator:=import linus_ai— resolves to#include "nxrt.h"@linus_ai.nativedecorator — raw C passthrough (escape hatch)
Not supported (compile path): *args, **kwargs, closures capturing mutable outer variables, try/except, dynamic attribute access, third-party imports.
Example: LINUS-AI API Poller
10. Tensor Parallelism
Tensor parallelism (TP) splits individual weight matrices across N nodes. All nodes process the same token simultaneously — each holds 1/N of every layer's weights and computes a partial result; results are reduced (summed) across nodes after each transformer block.
Backends
| Backend | Mechanism | Requirement |
|---|---|---|
| RPC (default) | llama.cpp --rpc + llama-rpc-server | brew install llama.cpp (≥ b2373) |
| Native | LINUS-AI AllReduce over HTTP (TNSR wire format) | No extra tools — Phase 2 |
RPC Mode Operation
Coordinator (rank 0) Workers (rank 1…N-1)
───────────────────── ─────────────────────
POST /tensor/plan ──────────────────▶ POST /tensor/plan
POST /tensor/rpc/start → llama-rpc-server :50052
POST /tensor/infer
└─ llama-server --rpc w1:50052,w2:50052
└─ weight tensors split by RAM proportion
└─ GPU layers offloaded to workers
└─ token returned to client
Native AllReduce (Megatron-LM style)
For each transformer block:
┌── Q/K/V projection (column-parallel) ──────────────────────┐
│ Each rank computes: partial = X @ W_col_slice │
│ No communication needed (outputs concatenated logically) │
└──────────────────────────────────────────────────────────────┘
↓
┌── Output projection (row-parallel) + AllReduce ────────────┐
│ Each rank computes: partial = partial_in @ W_row_slice │
│ POST partial to /tensor/allreduce on coordinator │
│ Coordinator sums all world_size partials → full activation │
└──────────────────────────────────────────────────────────────┘
↓
┌── FFN layer (column then row, same pattern) ───────────────┐
└──────────────────────────────────────────────────────────────┘
TNSR Wire Format (native AllReduce frames)
Offset Size Field ────── ────── ───────────────────────────────── 0 4 B magic b"TNSR" 4 1 B rank (u8, 0 = coordinator) 5 1 B world_size (u8) 6 16 B request_id ([u8; 16], UUID v4) 22 4 B element_count (u32 LE) 26 N×4 B f32 elements (LE), N = element_count
API
| Method | Path | Body | Description |
|---|---|---|---|
| GET | /tensor/plan | — | Return active plan or {"plan": null} |
| POST | /tensor/plan | TensorParallelPlan JSON | Set tensor parallel plan |
| GET | /tensor/status | — | Plan summary + RPC worker health |
| POST | /tensor/infer | {prompt, max_tokens?, temperature?} | Run TP inference (coordinator only) |
| POST | /tensor/allreduce | binary TNSR frame | Submit partial tensor; returns reduced frame |
| POST | /tensor/rpc/start | {rpc_port?} | Spawn llama-rpc-server on this node |
| POST | /tensor/rpc/stop | — | Stop llama-rpc-server |
Pipeline vs Tensor vs Hybrid
| Strategy | Split axis | Activation flow | Best for |
|---|---|---|---|
| Pipeline | Layer depth | Sequential A→B→C | Models too large for any single node |
| Tensor | Weight matrix width | Parallel + AllReduce | Lowest latency on same-size nodes |
| Hybrid | Both | Pipeline groups with TP inside | Largest models (70B+) on heterogeneous mesh |
11. Compliance & Security
The compliance layer (linus_ai/compliance.py) enforces domain-specific governance before every inference request.
Profile Tiers
| Tier | Profiles | PII blocking | Injection | Consent required |
|---|---|---|---|---|
| OPEN | general, creative, reasoning, code, engineering | Warn | Warn | No |
| AUDIT | education, support, sales, data_science | Block critical | Warn | No |
| REGULATED | medical, legal, finance, hr | Block critical | Warn | Yes |
| RESTRICTED | security | Block all | Hard block | Yes |
Preflight Check Flow
text + profile │ ▼ PIIScanner │ ├─ CREDIT_CARD / CVV / SSN / PAN_LIKE → BLOCK │ └─ other types → warn + redact ▼ InjectionDetector │ ├─ RESTRICTED profile → BLOCK │ └─ REGULATED profile → WARN ▼ ConsentManager │ └─ REGULATED/RESTRICTED and no consent → BLOCK ▼ AuditLogger.log() │ ├─ write to primary dir (LINUS_AI_AUDIT_DIR or ~/.linus-ai/audit) │ └─ propagate to LINUS_AI_AUDIT_EXPORT_DIRS (colon-separated) ▼ ALLOW / BLOCK / WARN result
Audit Immutability
Completed monthly log files are sealed by AuditLogger.seal_completed_months():
chmod 0o400— read-only at OS levelos.chflags(UF_IMMUTABLE)— macOS schg flag (requires root to unset)chattr +i— Linux immutable attribute (requires root to unset)
HMAC-SHA256 chaining allows tamper detection without immutability at the OS level: verify_chain() returns False if any record was altered or deleted.
12. RAG Document Access Control
The RAG access layer (linus_ai/rag_access.py) provides fine-grained per-document access control for retrieval-augmented generation.
Classification Levels
| Value | Name | Default access |
|---|---|---|
| 0 | PUBLIC | Anyone |
| 1 | INTERNAL | Authenticated company members via ACL |
| 2 | CONFIDENTIAL | Explicit ACL permit required |
| 3 | SECRET | Clearance ≥ 3 + explicit ACL permit |
| 4 | TOP_SECRET | Clearance 4 + named on owner's explicit list |
Access Decision Algorithm (7 Steps)
1. Is principal the document owner? → PERMIT 2. Is principal in doc's deny_users list? → DENY 3. Is document PUBLIC? → PERMIT 4. principal.clearance < doc.min_clearance? → DENY 5. TOP_SECRET and principal not in allow? → DENY 6. ACL permits at any scope? → PERMIT (company → division → department → role → user) 7. Default → DENY
All decisions are HMAC-chained in ~/.linus-ai/rag-audit-<YYYY-MM>.jsonl and propagated to any export dirs configured via LINUS_AI_AUDIT_EXPORT_DIRS.
13. Changelog
v1.4.0 — current
- Compliance & Security Layer — 14 domain-specific profiles across 4 tiers (OPEN/AUDIT/REGULATED/RESTRICTED); PII scanner (12 types, 4 blocking); injection detector (8 rule families); consent manager; immutable HMAC-chained audit logs with OS-level sealing
- RAG Document Access Control — 5-level document classification (PUBLIC → TOP_SECRET); fine-grained ACL at user/role/department/division/company scope; 7-step access decision algorithm;
filter_rag_chunks()helper; HMAC-chained RAG audit log - Enterprise audit routing —
LINUS_AI_AUDIT_DIRandLINUS_AI_AUDIT_EXPORT_DIRSenv vars for real-time propagation to SIEM systems;seal_completed_months()andexport_to()onAuditLogger - New REST API —
/compliance/*(8 endpoints) and/rag/*(10 endpoints) wired intomain.py - Test suite expanded — 62 new compliance tests + 53 new RAG access tests; total 263 tests (484 including Rust)
- Control panel UI — compliance profile card, RAG document registry with ACL editor, RAG audit viewer in
linus_ai_control_panel.html
v1.2.0
- py2c: Python-subset → C17 → native binary compiler (11 cross-compilation targets). Replaces VOLT language — same
.pyfiles run interpreted AND compile to native - VOLT retired:
linus_ai-voltcrate andlinus_ai/volt/module removed - Agent ReAct: fan-out to ≤ 3 peers, synthesis pass on hub
- Privacy scopes:
private|lan|open(replacesallow_web_searchbool) - Auto-behaviors: auto-push GGUF to peers with 0 models; auto-assign roles on peer connect
- nxrt.h:
nanosleepreplacesusleepfor full musl/POSIX compatibility
v0.8.x
- Rust binary replaces Nuitka-compiled Python (Phase 2 complete)
- linus-ai-vault: OS keychain key storage
- linus-ai-guardian: RFC 6238 TOTP, bank-style auth gates
- linus-ai-thermal: 5-stage governor, HOT → peer-priority routing
- linus-ai-blockchain: SQLite-backed SHA-256 hash chain
- Overlay relay: WAN mesh without Tailscale