API Reference

LINUS-AI exposes a complete REST API on http://localhost:9480 (port configurable via LINUS_AI_API_PORT). All endpoints accept and return JSON unless noted. SSE endpoints stream text/event-stream.

Base URL http://localhost:9480 · default port · no auth required on LAN
Error Format All errors use JSON
{"error": "model not loaded", "code": "MODEL_UNAVAILABLE"}
CodeMeaning
200Success
400Bad request — invalid body or missing fields
402Payment required — insufficient billing credits
422Compliance block — PII detected or injection attempt
503Thermal emergency — node at EMERGENCY state
SSE Streaming SSE Server-Sent Events format

Stream endpoints set Content-Type: text/event-stream. Each event is a JSON object on a data: line.

data: {"token":"Hello"} data: {"token":" world"} data: {"type":"done","tokens":42,"model":"llama3.2-3b"}
const es = new EventSource('/infer/stream?prompt=Hello'); es.onmessage = ({data}) => { const ev = JSON.parse(data); if (ev.token) process.stdout.write(ev.token); if (ev.type === 'done') es.close(); };
🧠

Inference

Run completions locally. All inference is performed on your hardware — zero data sent externally.

POST /infer Single-shot completion
FieldTypeRequiredDescription
promptstringrequiredInput text
systemstringoptionalSystem message prepended to prompt
max_tokensintegeroptionalMax output tokens (default: 512)
temperaturefloatoptionalSampling temperature 0.0–2.0 (default: 0.7)
modelstringoptionalOverride active model filename
profilestringoptionalCompliance profile ID (e.g. medical)
curl -X POST http://localhost:9480/infer \ -H 'Content-Type: application/json' \ -d '{"prompt":"Explain quantum entanglement","max_tokens":256}'
{"text": "Quantum entanglement is…", "tokens": 187, "model": "llama3.2-3b.gguf", "latency_ms": 1240}
POST /infer/stream SSE Streaming token-by-token output

Same body as POST /infer. Returns text/event-stream with one {"token":"…"} event per generated token, followed by a final {"type":"done","tokens":N} event.

POST /agent/stream SSE ReAct agent loop — thought/action/observation
FieldTypeRequiredDescription
messagestringrequiredUser message
profilestringoptionalAgent compliance profile (default: general)
session_idstringoptionalConversation session for history
allow_web_searchbooleanoptionalEnable web search tool (requires scope=open)
historyarrayoptionalPrior [{role,content}] turns
{"type":"thought","text":"I should search for…"} {"type":"action","tool":"search","args":"quantum computing"} {"type":"observation","text":"Found: …"} {"token":"Based on my research…"} {"type":"done","tokens":312,"sources":2}
GET /agent/profiles List all 14 compliance profiles
{"profiles": [ {"id": "general", "tier": "OPEN", "pii_scan": true, "injection_check": true, "consent_required": false}, {"id": "medical", "tier": "REGULATED", "pii_scan": true, "injection_check": true, "consent_required": true}, {"id": "security", "tier": "RESTRICTED", "pii_scan": true, "injection_check": true, "consent_required": true}, … ]}
📦

Models

Manage GGUF models. LINUS-AI scans ~/models and ~/.linus-ai/models on startup.

GET /models List available models
{"models": [ {"name": "llama3.2-3b.gguf", "size_gb": 2.1, "fits_in_ram": true, "loaded": true, "quant": "Q4_K_M"}, {"name": "qwen2.5-72b.gguf", "size_gb": 41.2, "fits_in_ram": false, "loaded": false, "quant": "Q4_K_M"} ]}
POST /models/select Set active model
{"model": "llama3.2-3b.gguf", "force": false} // force:true overrides the 85% RAM guard
POST /models/load · /models/unload Load / unload model into memory
POST /models/load {"model": "phi-4.gguf"} POST /models/unload {"model": "phi-4.gguf"}
GETPOST /models/roles Assign models to inference roles
POST /models/roles {"main": "llama3.3-70b.gguf", "router": "llama3.2-3b.gguf", "student": null} GET /models/roles → {"main": "llama3.3-70b.gguf", "router": "llama3.2-3b.gguf"}
GET /models/recommend Hardware-aware model catalog

Returns 22 curated models ranked by fit for this node's RAM and GPU. Each includes a direct download URL and quantization advice.

POST /models/pull Download model from URL
{"url": "https://huggingface.co/…/llama3.2-3b-Q4_K_M.gguf", "name": "llama3.2-3b.gguf"}

Tensor & Pipeline Parallelism

Split large models across multiple GPUs or machines. Tensor parallelism splits weight matrices; pipeline parallelism splits transformer layers.

POST /pipeline/plan Configure pipeline parallel layout
{"model_path": "/models/llama3.3-70b.gguf", "use_mesh_peers": true} // Returns: {"plan": [{node, layers_start, layers_end}, …]}
POST /pipeline/infer Run inference across pipeline plan
{"prompt": "Hello", "max_tokens": 256, "temperature": 0.7}
GETPOST /tensor/plan Get or set tensor parallel plan
{ "plan_id": "tp-plan-1", "world_size": 2, "local_rank": 0, "backend": "Rpc", // "Rpc" | "Native" "rpc_port": 50052, "model_path": "/models/70b.gguf", "peers": [ {"rank": 0, "node_id": "node-a", "address": "node-a:9480", "rpc_address": "node-a:50052", "ram_mb": 65536}, {"rank": 1, "node_id": "node-b", "address": "node-b:9480", "rpc_address": "node-b:50052", "ram_mb": 32768} ] }
POST /tensor/infer Run tensor-parallel inference (coordinator)
{"prompt": "Hello", "max_tokens": 128}
POST /tensor/allreduce Submit partial tensor (native AllReduce)

Binary TNSR frame: b"TNSR" magic + rank (u8) + world_size (u8) + request_id (16 B UUID) + element_count (u32 LE) + f32 elements. Returns reduced sum frame.

⚖️

Compliance & Security

14 domain-specific compliance profiles across 4 tiers. PII scanning, injection detection, consent management, and immutable HMAC-chained audit logs. See Admin Guide → Compliance.

GET /compliance/preflight Check text against compliance profile
ParamTypeDescription
textstringText to check (URL-encoded)
profilestringProfile ID (default: general)
user_idstringOptional user identifier for audit log
{"decision": "allow", // "allow" | "block" | "warn" "issues": [], "redacted_text": "…", "profile": "general", "reasons": []}
{"decision": "block", "issues": ["CREDIT_CARD detected at position 42"], "reasons": ["Blocking PII type: CREDIT_CARD"]}
GET /compliance/profiles List all 14 profiles with metadata
{"profiles": [ {"id": "general", "tier": "OPEN", "regulations": [], "consent_required": false, "injection_block": false}, {"id": "medical", "tier": "REGULATED", "regulations": ["HIPAA","HITECH"], "consent_required": true, "injection_block": false}, {"id": "legal", "tier": "REGULATED", "regulations": ["ABA","GDPR"], "consent_required": true, "injection_block": false}, {"id": "finance", "tier": "REGULATED", "regulations": ["SOX","PCI-DSS","FINRA"], "consent_required": true, "injection_block": false}, {"id": "security", "tier": "RESTRICTED", "regulations": ["SOC2","ISO27001"], "consent_required": true, "injection_block": true} ]}
GET /compliance/audit Query HMAC-chained audit log
ParamDescription
limitMax records (default: 100)
profileFilter by profile ID
decisionallow | block | warn
user_idFilter by user
sinceISO8601 timestamp lower bound
GET /compliance/audit/verify Verify HMAC chain integrity
{"ok": true, "records": 1847, "months": ["2026-01","2026-02","2026-03"]} // ok:false means at least one record was tampered with or deleted
POST /compliance/audit/seal Seal completed monthly log files (OS-level immutable)
// No body required → {"sealed": ["audit-2026-01.jsonl", "audit-2026-02.jsonl"]}
POST /compliance/audit/export Export audit snapshot to directory
{"dest_dir": "/mnt/siem/linus-ai-export"} → {"ok": true, "records": 1847, "dest": "/mnt/siem/linus-ai-export/audit-2026-03.jsonl"}
🗂

RAG Document Access Control

Fine-grained document access control for RAG pipelines. 5 classification levels (PUBLIC → TOP_SECRET). ACL at user, role, department, division, and company scope. See Admin Guide → RAG.

GET /rag/documents List registered documents
{"documents": [ {"id": "doc-001", "title": "Q4 Report.pdf", "classification": "CONFIDENTIAL", "owner_user_id": "alice", "mime_type": "application/pdf", "size_bytes": 204800} ]}
POST /rag/documents/register Register a document in the access registry
ValueNameDefault access
0PUBLICEveryone
1INTERNALCompany members via ACL
2CONFIDENTIALExplicit ACL permit
3SECRETClearance ≥ 3 + ACL
4TOP_SECRETClearance 4 + named explicitly
{"title": "Q4 Report.pdf", "path": "/docs/q4-report.pdf", "owner_user_id": "alice", "owner_name": "Alice Smith", "classification": 2, // CONFIDENTIAL "mime_type": "application/pdf", "size_bytes": 204800, "content_hash": "sha256hex…"}
PUT /rag/documents/{id}/acl Update document ACL
{"allow_users": ["bob", "carol"], "deny_users": [], "allow_companies": ["ACME Corp"], "deny_companies": [], "allow_divisions": ["Engineering"], "deny_divisions": [], "allow_departments": [], "deny_departments": [], "allow_roles": ["analyst"], "deny_roles": []}

Deny always overrides allow at the same scope. Decision algorithm: owner → deny → PUBLIC → clearance → TOP_SECRET list → ACL permit → default DENY.

PUT /rag/documents/{id}/classification Update document classification level
{"classification": 3} // SECRET
DELETE /rag/documents/{id} Remove document from registry
DELETE /rag/documents/doc-001 → {"ok": true}
POST /rag/access-check Check access and write audit record
{"user_id": "bob", "doc_id": "doc-001"} → {"decision": "PERMIT", "reason": "ACL allow_users match", "doc_id": "doc-001", "user_id": "bob"}
GETPOST /rag/principals List or create/update principals
POST /rag/principals {"user_id": "bob", "name": "Bob Jones", "email": "bob@acme.com", "company": "ACME Corp", "division": "Engineering", "department": "Backend", "clearance": 2, // 0-4 "roles": ["analyst", "reviewer"]} DELETE /rag/principals/bob → {"ok": true}
GET /rag/audit Query RAG access audit log
ParamDescription
doc_idFilter by document
user_idFilter by user
denied_onlytrue to show only denials
statstrue to return aggregate statistics instead of records
limitMax records (default: 100)
🕸

Mesh & Peers

GET /peers Active peer list
{"peers": [ {"node_id": "abc123", "hostname": "macstudio", "role": "hub", "address": "192.168.1.10:9480", "ram_gb": 192, "gpu": "metal", "thermal": "NOMINAL", "models": ["llama3.3-70b.gguf"], "latency_ms": 4} ]}
POST /mesh/push-model Send model to a peer node
{"peer": "192.168.1.11:9480", "model": "llama3.2-3b.gguf"}
POST /mesh/assign-roles Re-score and assign hub/worker/spoke roles
// No body — scores all connected peers and assigns roles → {"assigned": {"abc123": "master_hub", "def456": "hub", "ghi789": "worker"}}
🛡

Vault & Guardian

ChaCha20+HMAC-SHA256 encrypted key-value store. Guardian 2FA wraps sensitive operations in TOTP gates.

POST /vault/store · /vault/get Encrypted store / retrieve
POST /vault/store {"key": "api_secret", "value": {"token": "…"}} GET /vault/get?key=api_secret → {"value": {"token": "…"}}
POST /guardian/login · /guardian/gate/approve Authenticate and approve 2FA gates
POST /guardian/login {"username": "admin", "password": "…"} POST /guardian/enroll-totp {"username": "admin"} → {"qr_uri": "otpauth://totp/LINUS-AI:admin?secret=…"} POST /guardian/gate/create {"operation": "model_push", "session_id": "…"} → {"gate_id": "abc123"} POST /guardian/gate/approve {"gate_id": "abc123", "totp_code": "123456"} → {"approved": true}
💳

Billing

LAN peers are free. Internet peers pay per inference unit (1 unit = 1,000 output tokens).

GET /billing All node accounts and balances
{"accounts": [ {"node_id": "abc123", "address": "203.0.113.5", "balance": 47, "total_calls": 53} ]}
POST /billing/topup Add inference credits to a node
{"node_id": "abc123", "address": "203.0.113.5", "units": 100}

System

GET /status · /stats · /health · /thermal Node telemetry endpoints
GET /status → {mode, peers, models, uptime_s, active_model} GET /stats → {requests, tokens, latency_p50, latency_p95, latency_p99} GET /health → {"ok": true} GET /thermal → {state: "NOMINAL", temp_c: 62, throttle_level: 0}
GETPOST /settings Read / write runtime settings
GET /settings → {"mode": "full", "scope": "private", "log_level": "warn", …} POST /settings {"scope": "open", "log_level": "info"}
POST /shell/exec Run a shell command (Launch tab)
{"command": "df -h /", "timeout_s": 10} → {"stdout": "…", "stderr": "", "exit_code": 0, "duration_ms": 45}
Destructive patterns (rm -rf /, mkfs, fork bomb) are blocked by the safety filter. Timeout is capped at 120 s.