Technical Specification — LINUS-AI v1.4.0

INIA — Integrated Node Integration Architecture. Architecture overview, module reference, security model, mesh protocol, inference pipeline, tensor parallelism, compliance layer, and full changelog.

1. Architecture Overview

Universal mesh AI platform. Every node type, every protocol, zero-trust security, minimal configuration.

┌──────────────────────────────────────────────────────────────────────┐
│  LINUS-AI Node                                                       │
│                                                                      │
│  ┌─────────┐  ┌──────────┐  ┌─────────┐  ┌──────────┐  ┌────────┐  │
│  │linus_ai-│  │linus_ai- │  │linus_ai-│  │linus_ai- │  │linus_ai│  │
│  │  http   │  │   ai     │  │  net    │  │  vault   │  │guardian│  │
│  │(axum API│  │(inference│  │(mDNS+   │  │(encrypted│  │(2FA +  │  │
│  │ + SSE)  │  │ engine)  │  │ TCP)    │  │ SQLite)  │  │ TOTP)  │  │
│  └────┬────┘  └────┬─────┘  └────┬────┘  └────┬─────┘  └───┬────┘  │
│       └────────────┴─────────────┴─────────────┴────────────┘       │
│                            linus-ai-core                             │
│                  (EventBus · Config · Crypto · Types)                │
│                                                                      │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────────────────┐ │
│  │linus_ai- │  │linus_ai- │  │linus_ai- │  │   linus-ai-bin       │ │
│  │ thermal  │  │  task    │  │blockchain│  │ (main.rs entrypoint) │ │
│  └──────────┘  └──────────┘  └──────────┘  └──────────────────────┘ │
└──────────────────────────────────────────────────────────────────────┘
         │ HTTP :9480                           │ TCP :9479
         ▼                                      ▼
   Browser / API clients              Mesh peers (mDNS + Tailscale)

Core Design Principles

  • Zero-config discovery — UDP multicast on port 9479, auto-role from hardware score
  • Zero-trust security — every mesh message HMAC-SHA256 signed; vault data ChaCha20-AEAD encrypted
  • Single integration point — one HTTP API (/infer, /jobs, /settings) for all connected software
  • Role-aware routing — tasks automatically route to the best available hardware
  • py2c scripting — any automation expressible in a type-annotated .py file, compiled to native

2. Node Taxonomy & Scoring

#TypeExamplesTypical Role
1IoT DeviceSmart sensors, soil probesedge
2Edge GatewayRaspberry Pi, Jetson Nanospoke
3Cloud VMAWS EC2, GCP CE, Azure VMhub / master_hub
4WorkstationMac Studio, threadripperhub / master_hub
5MobileAndroid ARM64, iOSedge
65G DevicegNB, CPE, mmWave APspoke
7IndustrialPLC, SCADA serverspoke
8BrowserWASM nodeedge

Hardware Scoring (automatic)

composite = RAM_score + GPU_score + Model_score − Thermal_penalty − Load_penalty

RAM:  ≥64 GB = 40  ·  ≥32 GB = 30  ·  ≥16 GB = 20  ·  ≥8 GB = 10  ·  <8 GB = 5
GPU:  CUDA = 35  ·  Metal = 30  ·  ROCm = 25  ·  Vulkan = 15  ·  CPU-only = 0

3. Role Hierarchy

master_hub  (score ≥ 80)
├── hub     (score ≥ 50)
│   ├── worker   (score ≥ 20)
│   │   └── edge (score < 20)
│   └── student  (learning mode, any score)
└── spoke   (remote/WAN node)

Role assignments are automatic on peer connect. Override via:

Role assignment
$ curl -X POST http://localhost:9480/mesh/assign-roles
# or manually:
$ curl -X POST http://localhost:9480/models/roles \
-d '{"main":"qwen2.5-7b-instruct.gguf","router":"llama-3.2-3b.gguf"}'

4. Module Reference

linus-ai-core

  • EventBus — async pub/sub backbone (tokio broadcast channels)
  • LinusAIConfig — single config struct; all settings + env var overrides
  • Crypto primitives: chacha20_encrypt/decrypt, aead_encrypt/decrypt, pbkdf2_key, sign_message, verify_signature, merkle_root
  • Types: NodeRole, Priority, TaskState, ThermalState

linus-ai-net

  • MeshNetwork — Tailscale peer discovery + mDNS multicast (224.0.0.251:5353) + TCP wire
  • Wire protocol: 4-byte length-prefixed frames, HMAC-SHA256 auth, heartbeats every 30 s
  • Auto-role assignment on new peer connect
  • Auto-push largest fitting GGUF to peers with 0 models and ≥ 4 GB RAM

linus-ai-inference

  • ModelManager — GGUF format parser (zero deps), scan/load/hot-swap
  • InferenceEngine — thermal-aware request batching + SSE streaming
  • Backend selected at compile time via Cargo feature flags (set by build.sh):
    • llama-bundled (default) — statically linked llama.cpp; Metal/CUDA GPU auto-enabled
    • candle-only — pure-Rust HuggingFace candle; safe for all cross-compilation targets
    • (no features) — subprocess fallback: llama-server HTTP → ollama HTTP → llama-cli → error

linus-ai-http

  • Axum HTTP server on :9480
  • SSE streaming for /infer/stream and /agent/stream
  • Hardware detection (/hardware)
  • Multi-part upload with PDF/DOCX/TXT extraction (/upload)

linus-ai-vault

  • VaultDB — per-row ChaCha20+HMAC-SHA256 AEAD encrypted SQLite
  • Key hierarchy: OS keychain → DPAPI (Win) → Secret Service (Linux) → PBKDF2+file (0600)
  • Audit hash chain; optional Guardian 2FA gate per operation

linus-ai-guardian

  • Guardian — 2FA, TOTP (RFC 6238), sessions, bank-style auth gates
  • create_user, login, verify_totp, enroll_totp
  • Auth gates: create_gateapprove_gate(gate_id, totp_code)

linus-ai-thermal

ThermalGovernor — 5-stage machine:

StateTempAction
NOMINAL< 70 °CNormal operation
THROTTLE70–79 °CReduce batch size
HOT80–84 °CRoute new requests to peers
CRITICAL85–89 °CQueue only, no new inference
EMERGENCY≥ 90 °CHTTP 503, drain queue

linus-ai-task

  • TaskScheduler — distributed job scheduler (LINUS-AI native format)
  • Job types: service, batch, system
  • Allocation scoring: hardware composite + thermal state + current load

linus-ai-blockchain

  • TransparencyLedger — append-only SHA-256 hash chain in SQLite
  • Every inference request, model load, and peer event logged
  • GET /blockchain/entries?limit=100 — audit trail

linus_ai.compliance (Python)

  • Domain-specific compliance enforcement for 14 industry profiles across 4 tiers (OPEN/AUDIT/REGULATED/RESTRICTED)
  • PIIScanner — 12 PII types; CREDIT_CARD/CVV/SSN/PAN_LIKE block; others redact+warn
  • InjectionDetector — 8 rule families; RESTRICTED hard-blocks; REGULATED warns
  • ConsentManager — REGULATED/RESTRICTED require explicit user consent; stored in ~/.linus-ai/consent.json
  • AuditLogger — HMAC-SHA256 chained monthly .jsonl files; verify_chain() detects tampering; seal_completed_months() applies OS-level immutability
  • Env vars: LINUS_AI_AUDIT_DIR (primary), LINUS_AI_AUDIT_EXPORT_DIRS (colon-separated SIEM sinks)

linus_ai.rag_access (Python)

  • Fine-grained RAG document access control with 5 classification levels (PUBLIC → TOP_SECRET)
  • DocumentACL — allow/deny lists at user, role, department, division, company scope
  • RAGAccessController — 7-step decision algorithm; explicit DENY always wins
  • filter_rag_chunks() — filters a list of RAG chunks to only those the principal may read
  • RAGAuditLogger — same HMAC-chain format as compliance audit; propagates to export dirs
  • DocumentRegistry, PrincipalRegistry — JSON-backed persistent stores

5. HTTP API Reference

All endpoints on http://localhost:9480 (port configurable via LINUS_AI_API_PORT).

Inference

POST /infer POST /infer/stream (SSE — streams tokens) Body: { "prompt": "...", "max_tokens": 512, "temperature": 0.7, "model": "optional-override.gguf" } SSE events: {"token":"..."} ... {"type":"done","tokens":N}

Agent (ReAct loop)

POST /agent/stream (SSE — streams thought/action/observation/tokens) Body: { "message": "...", "profile": "general", "history": [], "session_id": "optional", "allow_web_search": false } SSE event types: {"type":"peer_start","peer":"addr:port","hostname":"X","role":"worker"} {"type":"peer_result","peer":"addr","text":"...","tokens":N} {"type":"thought","text":"..."} {"type":"action","tool":"search","args":"..."} {"type":"observation","text":"..."} {"type":"local_result","text":"..."} {"type":"synthesis_start","sources":N} {"token":"..."} {"type":"stats","tokens":N,"sources":N,"routed_to":"multi"}

Models

GET /models list loaded models + metadata POST /models/roles set {main, router, student, worker} GET /models/roles get current role assignments POST /models/load {"model":"name.gguf"} POST /models/unload {"model":"name.gguf"} GET /models/recommend hardware-aware catalog (22 curated models) POST /models/auto-push push best GGUF to peers with 0 models

Mesh

GET /peers connected peer list POST /mesh/assign-roles score peers, assign hub/worker/student POST /mesh/push-model {"peer":"addr:port","model":"name.gguf"}

Vault & Guardian

POST /vault/store {"key":"...", "value":{...}} GET /vault/get ?key=... POST /guardian/login {"username":"...","password":"..."} POST /guardian/totp {"gate_id":"...","code":"123456"}

Status & Monitoring

GET /status node overview (mode, peers, models, uptime) GET /stats inference stats (requests, tokens, latency p50/p95/p99) GET /thermal thermal state + temperature GET /hardware AI capability score, RAM/GPU/NPU detail GET /blockchain/entries GET /jobs running scheduled jobs

Compliance & Security

GET /compliance/preflight?text=...&profile=... run PII + injection checks GET /compliance/profiles list 14 profiles with metadata POST /compliance/consent {user_id, action, profile} GET /compliance/consent?user_id=... GET /compliance/audit?limit=N&profile=... GET /compliance/audit/verify POST /compliance/audit/seal POST /compliance/audit/export {dest_dir}

RAG Document Access Control

GET /rag/documents POST /rag/documents/register {title, path, owner_user_id, classification, ...} PUT /rag/documents/{id}/acl {allow_users[], deny_users[], allow_companies[], ...} PUT /rag/documents/{id}/classification {classification} DELETE /rag/documents/{id} POST /rag/access-check {user_id, doc_id} GET /rag/principals POST /rag/principals {user_id, name, clearance, company, division, department, roles[]} DELETE /rag/principals/{user_id} GET /rag/audit?doc_id=...&user_id=...&denied_only=true&limit=N

Agent Profiles

GET /agent/profiles list all 14 profiles Profiles: OPEN: general · creative · reasoning · code · engineering AUDIT: education · support · sales · data_science REGULATED: medical · legal · finance · hr RESTRICTED: security

6. Mesh & Overlay Protocol

LAN Discovery (mDNS)

Peers announce via UDP multicast to 224.0.0.251:5353. Announcement payload (JSON):

{ "node_id": "0c98b072-23b1-47d6-a1e3-...", "hostname": "macstudio", "api_port": 9480, "mesh_port": 9479, "ts": 1741600000, "auth": "HMAC_SHA256(mesh_secret, node_id + ':' + ts)" }

WAN Overlay Relay

For nodes that cannot reach each other directly (NAT, firewall):

Node A ──TCP──► Relay Server ──TCP──► Node B
                 :9777
WAN relay setup
# Start a relay server (no Go, no Python):
$ ./linus_ai --mode relay --listen 0.0.0.0:9777 --mesh-secret YOUR_SHARED_SECRET
 
# Connect through a relay:
$ ./linus_ai --overlay --overlay-server relay.example.com:9777 --mesh-secret YOUR_SHARED_SECRET

Node → relay message types: register, heartbeat, peers, relay

Relay → node message types: registered, pong, peers, relay, relay_ack, error

7. Security Model

Mesh Authentication

Every message between mesh peers carries an HMAC-SHA256 signature:

auth = HMAC-SHA256(key=mesh_secret, msg=node_id + ':' + unix_timestamp)

Peers reject messages with: wrong signature · timestamp delta > 60 s · unknown node_id.

Change mesh secret
$ ./linus_ai --mesh-secret "$(openssl rand -hex 32)"
# or via env:
$ export LINUS_AI_MESH_SECRET="$(openssl rand -hex 32)"

Vault Encryption

Each vault record is encrypted independently:

key = PBKDF2-HMAC-SHA256(passphrase, salt, 260 000 iterations) → 32 bytes nonce = random 12 bytes (stored with record) ciphertext = ChaCha20(key, nonce, plaintext) tag = HMAC-SHA256(key, nonce + ciphertext)

The vault passphrase is stored in the OS keychain (macOS Keychain · Windows DPAPI · Linux Secret Service). Falls back to a 0600-permission file if no keychain is available.

Guardian 2FA

Guardian wraps any sensitive operation in an auth gate:

TOTP auth gates
# Enroll TOTP (works with Google Authenticator, Authy, etc.)
$ curl -X POST http://localhost:9480/guardian/enroll-totp -d '{"username": "admin"}'
→ {"qr_uri": "otpauth://totp/LINUS-AI:admin?secret=BASE32SECRET&issuer=LINUS-AI"}
 
# Create an auth gate
$ curl -X POST http://localhost:9480/guardian/gate/create -d '{"operation": "model_push"}'
→ {"gate_id": "abc123"}
 
# Approve the gate
$ curl -X POST http://localhost:9480/guardian/gate/approve -d '{"gate_id": "abc123", "totp_code": "123456"}'
→ {"approved": true}

8. Inference Pipeline

Request
  │
  ▼
ThermalGovernor ──EMERGENCY?──► HTTP 503
  │
  ▼ (NOMINAL / THROTTLE / HOT)
MeshRouter
  ├─ score peers (composite − load − thermal)
  ├─ HOT state? → prefer peers
  ├─ THROTTLE?  → reduce batch_size
  └─ local?     → InferenceEngine
                      │
                      ▼
              Backend selection
              llama-bundled (default) ← compile-time feature
              candle-only             ← compile-time feature
              subprocess fallback:
                llama-server → ollama → llama-cli → error
                      │
                      ▼
              SSE token stream → HTTP client
                      │
                      ▼
              TransparencyLedger (log event)

Agent ReAct Loop

user message
  │
  ▼
fan-out to ≤ 3 peers (Connected, load < 80%, models > 0, score ≥ 35)
  │                  │
  ▼ peer results     ▼ local ReAct
thought → action → observation → thought → ... → local_result
  │
  ▼
synthesis (hub combines all peer results + local result)
  │
  ▼
SSE token stream

Tools available in agent mode: search (3-tier: Google News RSS → DDG HTML → DDG Instant), infer_peer, vault_read, vault_write, thermal_status.

9. py2c User Scripts

py2c translates a Python-subset to C17, compiles with the platform toolchain, and links against nxrt.h (zero-dependency C17 runtime). The same .py file runs interpreted (python3 script.py) and compiles to any of 11 targets.

Language Subset

Supported:

  • Type annotations on all function parameters and local variables (required for compiled path)
  • int, float, bool, str, bytes, list[T], Optional[T]
  • All arithmetic, bitwise, comparison, and boolean operators
  • if / elif / else, while, for with range()
  • @dataclass classes, match / case (Python 3.10+), f-strings, walrus operator :=
  • import linus_ai — resolves to #include "nxrt.h"
  • @linus_ai.native decorator — raw C passthrough (escape hatch)

Not supported (compile path): *args, **kwargs, closures capturing mutable outer variables, try/except, dynamic attribute access, third-party imports.

Example: LINUS-AI API Poller

# poll_health.py import linus_ai ENDPOINT: str = "http://127.0.0.1:9480/status" INTERVAL_MS: int = 5000 def check() -> int: resp: str = linus_ai.http_get(ENDPOINT) if linus_ai.str_contains(resp, "\"ok\":true"): print("LINUS-AI OK") return 0 print("LINUS-AI NOT OK: " + resp) return 1 def main() -> int: while True: check() linus_ai.sleep_ms(INTERVAL_MS) return 0 if __name__ == "__main__": main()
Compile and run
# Run interpreted
$ python3 poll_health.py
 
# Compile to all targets
$ linus_ai compile poll_health.py -o poll_health --target all

10. Tensor Parallelism

Tensor parallelism (TP) splits individual weight matrices across N nodes. All nodes process the same token simultaneously — each holds 1/N of every layer's weights and computes a partial result; results are reduced (summed) across nodes after each transformer block.

Backends

BackendMechanismRequirement
RPC (default)llama.cpp --rpc + llama-rpc-serverbrew install llama.cpp (≥ b2373)
NativeLINUS-AI AllReduce over HTTP (TNSR wire format)No extra tools — Phase 2

RPC Mode Operation

Coordinator (rank 0)                    Workers (rank 1…N-1)
─────────────────────                   ─────────────────────
POST /tensor/plan  ──────────────────▶  POST /tensor/plan
                                        POST /tensor/rpc/start  → llama-rpc-server :50052
POST /tensor/infer
  └─ llama-server --rpc w1:50052,w2:50052
       └─ weight tensors split by RAM proportion
       └─ GPU layers offloaded to workers
       └─ token returned to client

Native AllReduce (Megatron-LM style)

For each transformer block:

  ┌── Q/K/V projection  (column-parallel) ──────────────────────┐
  │  Each rank computes:  partial = X @ W_col_slice              │
  │  No communication needed (outputs concatenated logically)    │
  └──────────────────────────────────────────────────────────────┘
              ↓
  ┌── Output projection  (row-parallel) + AllReduce ────────────┐
  │  Each rank computes:  partial = partial_in @ W_row_slice     │
  │  POST partial to /tensor/allreduce on coordinator            │
  │  Coordinator sums all world_size partials → full activation  │
  └──────────────────────────────────────────────────────────────┘
              ↓
  ┌── FFN layer  (column then row, same pattern) ───────────────┐
  └──────────────────────────────────────────────────────────────┘

TNSR Wire Format (native AllReduce frames)

Offset  Size    Field
──────  ──────  ─────────────────────────────────
0       4 B     magic  b"TNSR"
4       1 B     rank   (u8,  0 = coordinator)
5       1 B     world_size (u8)
6       16 B    request_id ([u8; 16], UUID v4)
22      4 B     element_count (u32 LE)
26      N×4 B   f32 elements (LE), N = element_count

API

MethodPathBodyDescription
GET/tensor/planReturn active plan or {"plan": null}
POST/tensor/planTensorParallelPlan JSONSet tensor parallel plan
GET/tensor/statusPlan summary + RPC worker health
POST/tensor/infer{prompt, max_tokens?, temperature?}Run TP inference (coordinator only)
POST/tensor/allreducebinary TNSR frameSubmit partial tensor; returns reduced frame
POST/tensor/rpc/start{rpc_port?}Spawn llama-rpc-server on this node
POST/tensor/rpc/stopStop llama-rpc-server

Pipeline vs Tensor vs Hybrid

StrategySplit axisActivation flowBest for
PipelineLayer depthSequential A→B→CModels too large for any single node
TensorWeight matrix widthParallel + AllReduceLowest latency on same-size nodes
HybridBothPipeline groups with TP insideLargest models (70B+) on heterogeneous mesh

11. Compliance & Security

The compliance layer (linus_ai/compliance.py) enforces domain-specific governance before every inference request.

Profile Tiers

TierProfilesPII blockingInjectionConsent required
OPENgeneral, creative, reasoning, code, engineeringWarnWarnNo
AUDITeducation, support, sales, data_scienceBlock criticalWarnNo
REGULATEDmedical, legal, finance, hrBlock criticalWarnYes
RESTRICTEDsecurityBlock allHard blockYes

Preflight Check Flow

text + profile
  │
  ▼ PIIScanner
  │  ├─ CREDIT_CARD / CVV / SSN / PAN_LIKE → BLOCK
  │  └─ other types → warn + redact
  ▼ InjectionDetector
  │  ├─ RESTRICTED profile → BLOCK
  │  └─ REGULATED profile → WARN
  ▼ ConsentManager
  │  └─ REGULATED/RESTRICTED and no consent → BLOCK
  ▼ AuditLogger.log()
  │  ├─ write to primary dir (LINUS_AI_AUDIT_DIR or ~/.linus-ai/audit)
  │  └─ propagate to LINUS_AI_AUDIT_EXPORT_DIRS (colon-separated)
  ▼
ALLOW / BLOCK / WARN result

Audit Immutability

Completed monthly log files are sealed by AuditLogger.seal_completed_months():

  1. chmod 0o400 — read-only at OS level
  2. os.chflags(UF_IMMUTABLE) — macOS schg flag (requires root to unset)
  3. chattr +i — Linux immutable attribute (requires root to unset)

HMAC-SHA256 chaining allows tamper detection without immutability at the OS level: verify_chain() returns False if any record was altered or deleted.

12. RAG Document Access Control

The RAG access layer (linus_ai/rag_access.py) provides fine-grained per-document access control for retrieval-augmented generation.

Classification Levels

ValueNameDefault access
0PUBLICAnyone
1INTERNALAuthenticated company members via ACL
2CONFIDENTIALExplicit ACL permit required
3SECRETClearance ≥ 3 + explicit ACL permit
4TOP_SECRETClearance 4 + named on owner's explicit list

Access Decision Algorithm (7 Steps)

1. Is principal the document owner?          → PERMIT
2. Is principal in doc's deny_users list?    → DENY
3. Is document PUBLIC?                       → PERMIT
4. principal.clearance < doc.min_clearance?  → DENY
5. TOP_SECRET and principal not in allow?    → DENY
6. ACL permits at any scope?                 → PERMIT
   (company → division → department → role → user)
7. Default                                   → DENY

All decisions are HMAC-chained in ~/.linus-ai/rag-audit-<YYYY-MM>.jsonl and propagated to any export dirs configured via LINUS_AI_AUDIT_EXPORT_DIRS.

13. Changelog

v1.4.0 — current

  • Compliance & Security Layer — 14 domain-specific profiles across 4 tiers (OPEN/AUDIT/REGULATED/RESTRICTED); PII scanner (12 types, 4 blocking); injection detector (8 rule families); consent manager; immutable HMAC-chained audit logs with OS-level sealing
  • RAG Document Access Control — 5-level document classification (PUBLIC → TOP_SECRET); fine-grained ACL at user/role/department/division/company scope; 7-step access decision algorithm; filter_rag_chunks() helper; HMAC-chained RAG audit log
  • Enterprise audit routingLINUS_AI_AUDIT_DIR and LINUS_AI_AUDIT_EXPORT_DIRS env vars for real-time propagation to SIEM systems; seal_completed_months() and export_to() on AuditLogger
  • New REST API/compliance/* (8 endpoints) and /rag/* (10 endpoints) wired into main.py
  • Test suite expanded — 62 new compliance tests + 53 new RAG access tests; total 263 tests (484 including Rust)
  • Control panel UI — compliance profile card, RAG document registry with ACL editor, RAG audit viewer in linus_ai_control_panel.html

v1.2.0

  • py2c: Python-subset → C17 → native binary compiler (11 cross-compilation targets). Replaces VOLT language — same .py files run interpreted AND compile to native
  • VOLT retired: linus_ai-volt crate and linus_ai/volt/ module removed
  • Agent ReAct: fan-out to ≤ 3 peers, synthesis pass on hub
  • Privacy scopes: private | lan | open (replaces allow_web_search bool)
  • Auto-behaviors: auto-push GGUF to peers with 0 models; auto-assign roles on peer connect
  • nxrt.h: nanosleep replaces usleep for full musl/POSIX compatibility

v0.8.x

  • Rust binary replaces Nuitka-compiled Python (Phase 2 complete)
  • linus-ai-vault: OS keychain key storage
  • linus-ai-guardian: RFC 6238 TOTP, bank-style auth gates
  • linus-ai-thermal: 5-stage governor, HOT → peer-priority routing
  • linus-ai-blockchain: SQLite-backed SHA-256 hash chain
  • Overlay relay: WAN mesh without Tailscale