LINUS-AI exposes a complete REST API on http://localhost:9480 (port configurable via LINUS_AI_API_PORT).
All endpoints accept and return JSON unless noted. SSE endpoints stream text/event-stream.
Base URL
http://localhost:9480
· default port · no auth required on LAN
Response
{"error": "model not loaded", "code": "MODEL_UNAVAILABLE"}
HTTP Status codes
| Code | Meaning |
| 200 | Success |
| 400 | Bad request — invalid body or missing fields |
| 402 | Payment required — insufficient billing credits |
| 422 | Compliance block — PII detected or injection attempt |
| 503 | Thermal emergency — node at EMERGENCY state |
Stream endpoints set Content-Type: text/event-stream. Each event is a JSON object on a data: line.
data: {"token":"Hello"}
data: {"token":" world"}
data: {"type":"done","tokens":42,"model":"llama3.2-3b"}
JavaScript example
const es = new EventSource('/infer/stream?prompt=Hello');
es.onmessage = ({data}) => {
const ev = JSON.parse(data);
if (ev.token) process.stdout.write(ev.token);
if (ev.type === 'done') es.close();
};
Run completions locally. All inference is performed on your hardware — zero data sent externally.
Request body
| Field | Type | Required | Description |
| prompt | string | required | Input text |
| system | string | optional | System message prepended to prompt |
| max_tokens | integer | optional | Max output tokens (default: 512) |
| temperature | float | optional | Sampling temperature 0.0–2.0 (default: 0.7) |
| model | string | optional | Override active model filename |
| profile | string | optional | Compliance profile ID (e.g. medical) |
Example
curl -X POST http://localhost:9480/infer \
-H 'Content-Type: application/json' \
-d '{"prompt":"Explain quantum entanglement","max_tokens":256}'
Response
{"text": "Quantum entanglement is…", "tokens": 187, "model": "llama3.2-3b.gguf", "latency_ms": 1240}
Same body as POST /infer. Returns text/event-stream with one {"token":"…"} event per generated token, followed by a final {"type":"done","tokens":N} event.
Request body
| Field | Type | Required | Description |
| message | string | required | User message |
| profile | string | optional | Agent compliance profile (default: general) |
| session_id | string | optional | Conversation session for history |
| allow_web_search | boolean | optional | Enable web search tool (requires scope=open) |
| history | array | optional | Prior [{role,content}] turns |
SSE event types
{"type":"thought","text":"I should search for…"}
{"type":"action","tool":"search","args":"quantum computing"}
{"type":"observation","text":"Found: …"}
{"token":"Based on my research…"}
{"type":"done","tokens":312,"sources":2}
Response
{"profiles": [
{"id": "general", "tier": "OPEN", "pii_scan": true, "injection_check": true, "consent_required": false},
{"id": "medical", "tier": "REGULATED", "pii_scan": true, "injection_check": true, "consent_required": true},
{"id": "security", "tier": "RESTRICTED", "pii_scan": true, "injection_check": true, "consent_required": true},
…
]}
Manage GGUF models. LINUS-AI scans ~/models and ~/.linus-ai/models on startup.
{"models": [
{"name": "llama3.2-3b.gguf", "size_gb": 2.1, "fits_in_ram": true, "loaded": true, "quant": "Q4_K_M"},
{"name": "qwen2.5-72b.gguf", "size_gb": 41.2, "fits_in_ram": false, "loaded": false, "quant": "Q4_K_M"}
]}
{"model": "llama3.2-3b.gguf", "force": false}
// force:true overrides the 85% RAM guard
POST /models/load {"model": "phi-4.gguf"}
POST /models/unload {"model": "phi-4.gguf"}
POST /models/roles
{"main": "llama3.3-70b.gguf", "router": "llama3.2-3b.gguf", "student": null}
GET /models/roles → {"main": "llama3.3-70b.gguf", "router": "llama3.2-3b.gguf"}
Returns 22 curated models ranked by fit for this node's RAM and GPU. Each includes a direct download URL and quantization advice.
{"url": "https://huggingface.co/…/llama3.2-3b-Q4_K_M.gguf", "name": "llama3.2-3b.gguf"}
Split large models across multiple GPUs or machines. Tensor parallelism splits weight matrices; pipeline parallelism splits transformer layers.
{"model_path": "/models/llama3.3-70b.gguf", "use_mesh_peers": true}
// Returns: {"plan": [{node, layers_start, layers_end}, …]}
{"prompt": "Hello", "max_tokens": 256, "temperature": 0.7}
POST body
{
"plan_id": "tp-plan-1",
"world_size": 2,
"local_rank": 0,
"backend": "Rpc", // "Rpc" | "Native"
"rpc_port": 50052,
"model_path": "/models/70b.gguf",
"peers": [
{"rank": 0, "node_id": "node-a", "address": "node-a:9480", "rpc_address": "node-a:50052", "ram_mb": 65536},
{"rank": 1, "node_id": "node-b", "address": "node-b:9480", "rpc_address": "node-b:50052", "ram_mb": 32768}
]
}
{"prompt": "Hello", "max_tokens": 128}
Binary TNSR frame: b"TNSR" magic + rank (u8) + world_size (u8) + request_id (16 B UUID) + element_count (u32 LE) + f32 elements. Returns reduced sum frame.
14 domain-specific compliance profiles across 4 tiers. PII scanning, injection detection, consent management, and immutable HMAC-chained audit logs. See Admin Guide → Compliance.
Query params
| Param | Type | Description |
| text | string | Text to check (URL-encoded) |
| profile | string | Profile ID (default: general) |
| user_id | string | Optional user identifier for audit log |
Response
{"decision": "allow", // "allow" | "block" | "warn"
"issues": [],
"redacted_text": "…",
"profile": "general",
"reasons": []}
{"decision": "block",
"issues": ["CREDIT_CARD detected at position 42"],
"reasons": ["Blocking PII type: CREDIT_CARD"]}
{"profiles": [
{"id": "general", "tier": "OPEN", "regulations": [], "consent_required": false, "injection_block": false},
{"id": "medical", "tier": "REGULATED", "regulations": ["HIPAA","HITECH"], "consent_required": true, "injection_block": false},
{"id": "legal", "tier": "REGULATED", "regulations": ["ABA","GDPR"], "consent_required": true, "injection_block": false},
{"id": "finance", "tier": "REGULATED", "regulations": ["SOX","PCI-DSS","FINRA"], "consent_required": true, "injection_block": false},
{"id": "security", "tier": "RESTRICTED", "regulations": ["SOC2","ISO27001"], "consent_required": true, "injection_block": true}
]}
POST /compliance/consent
{"user_id": "alice", "profile": "medical", "action": "grant"}
GET /compliance/consent?user_id=alice
→ {"consents": {"medical": {"granted": true, "ts": "2026-03-15T10:00:00Z"}}}
Query params
| Param | Description |
| limit | Max records (default: 100) |
| profile | Filter by profile ID |
| decision | allow | block | warn |
| user_id | Filter by user |
| since | ISO8601 timestamp lower bound |
{"ok": true, "records": 1847, "months": ["2026-01","2026-02","2026-03"]}
// ok:false means at least one record was tampered with or deleted
// No body required
→ {"sealed": ["audit-2026-01.jsonl", "audit-2026-02.jsonl"]}
{"dest_dir": "/mnt/siem/linus-ai-export"}
→ {"ok": true, "records": 1847, "dest": "/mnt/siem/linus-ai-export/audit-2026-03.jsonl"}
Fine-grained document access control for RAG pipelines. 5 classification levels (PUBLIC → TOP_SECRET). ACL at user, role, department, division, and company scope. See Admin Guide → RAG.
{"documents": [
{"id": "doc-001", "title": "Q4 Report.pdf", "classification": "CONFIDENTIAL",
"owner_user_id": "alice", "mime_type": "application/pdf", "size_bytes": 204800}
]}
Classification levels
| Value | Name | Default access |
| 0 | PUBLIC | Everyone |
| 1 | INTERNAL | Company members via ACL |
| 2 | CONFIDENTIAL | Explicit ACL permit |
| 3 | SECRET | Clearance ≥ 3 + ACL |
| 4 | TOP_SECRET | Clearance 4 + named explicitly |
Request body
{"title": "Q4 Report.pdf", "path": "/docs/q4-report.pdf",
"owner_user_id": "alice", "owner_name": "Alice Smith",
"classification": 2, // CONFIDENTIAL
"mime_type": "application/pdf", "size_bytes": 204800,
"content_hash": "sha256hex…"}
{"allow_users": ["bob", "carol"],
"deny_users": [],
"allow_companies": ["ACME Corp"],
"deny_companies": [],
"allow_divisions": ["Engineering"],
"deny_divisions": [],
"allow_departments": [],
"deny_departments": [],
"allow_roles": ["analyst"],
"deny_roles": []}
Deny always overrides allow at the same scope. Decision algorithm: owner → deny → PUBLIC → clearance → TOP_SECRET list → ACL permit → default DENY.
{"classification": 3} // SECRET
DELETE /rag/documents/doc-001
→ {"ok": true}
{"user_id": "bob", "doc_id": "doc-001"}
→ {"decision": "PERMIT", "reason": "ACL allow_users match", "doc_id": "doc-001", "user_id": "bob"}
POST /rag/principals
{"user_id": "bob", "name": "Bob Jones", "email": "bob@acme.com",
"company": "ACME Corp", "division": "Engineering", "department": "Backend",
"clearance": 2, // 0-4
"roles": ["analyst", "reviewer"]}
DELETE /rag/principals/bob → {"ok": true}
Query params
| Param | Description |
| doc_id | Filter by document |
| user_id | Filter by user |
| denied_only | true to show only denials |
| stats | true to return aggregate statistics instead of records |
| limit | Max records (default: 100) |
{"peers": [
{"node_id": "abc123", "hostname": "macstudio", "role": "hub",
"address": "192.168.1.10:9480", "ram_gb": 192, "gpu": "metal",
"thermal": "NOMINAL", "models": ["llama3.3-70b.gguf"], "latency_ms": 4}
]}
{"peer": "192.168.1.11:9480", "model": "llama3.2-3b.gguf"}
// No body — scores all connected peers and assigns roles
→ {"assigned": {"abc123": "master_hub", "def456": "hub", "ghi789": "worker"}}
ChaCha20+HMAC-SHA256 encrypted key-value store. Guardian 2FA wraps sensitive operations in TOTP gates.
POST /vault/store {"key": "api_secret", "value": {"token": "…"}}
GET /vault/get?key=api_secret → {"value": {"token": "…"}}
POST /guardian/login {"username": "admin", "password": "…"}
POST /guardian/enroll-totp {"username": "admin"}
→ {"qr_uri": "otpauth://totp/LINUS-AI:admin?secret=…"}
POST /guardian/gate/create {"operation": "model_push", "session_id": "…"}
→ {"gate_id": "abc123"}
POST /guardian/gate/approve {"gate_id": "abc123", "totp_code": "123456"}
→ {"approved": true}
LAN peers are free. Internet peers pay per inference unit (1 unit = 1,000 output tokens).
{"accounts": [
{"node_id": "abc123", "address": "203.0.113.5", "balance": 47, "total_calls": 53}
]}
{"node_id": "abc123", "address": "203.0.113.5", "units": 100}
GET /status → {mode, peers, models, uptime_s, active_model}
GET /stats → {requests, tokens, latency_p50, latency_p95, latency_p99}
GET /health → {"ok": true}
GET /thermal → {state: "NOMINAL", temp_c: 62, throttle_level: 0}
GET /settings → {"mode": "full", "scope": "private", "log_level": "warn", …}
POST /settings {"scope": "open", "log_level": "info"}
{"command": "df -h /", "timeout_s": 10}
→ {"stdout": "…", "stderr": "", "exit_code": 0, "duration_ms": 45}
Destructive patterns (rm -rf /, mkfs, fork bomb) are blocked by the safety filter. Timeout is capped at 120 s.