API Endpoints — LINUS-AI

⚡

API Reference

LINUS-AI exposes a complete REST API on http://localhost:9480 (port configurable via LINUS_AI_API_PORT). All endpoints accept and return JSON unless noted. SSE endpoints stream text/event-stream.

Base URL http://localhost:9480 · default port · no auth required on LAN

Error Format All errors use JSON ▼

Response

{"error": "model not loaded", "code": "MODEL_UNAVAILABLE"}

HTTP Status codes

Code	Meaning
200	Success
400	Bad request — invalid body or missing fields
402	Payment required — insufficient billing credits
422	Compliance block — PII detected or injection attempt
503	Thermal emergency — node at EMERGENCY state

SSE Streaming SSE Server-Sent Events format ▼

Stream endpoints set Content-Type: text/event-stream. Each event is a JSON object on a data: line.

data: {"token":"Hello"} data: {"token":" world"} data: {"type":"done","tokens":42,"model":"llama3.2-3b"}

JavaScript example

const es = new EventSource('/infer/stream?prompt=Hello'); es.onmessage = ({data}) => { const ev = JSON.parse(data); if (ev.token) process.stdout.write(ev.token); if (ev.type === 'done') es.close(); };

🧠

Inference

Run completions locally. All inference is performed on your hardware — zero data sent externally.

POST /infer Single-shot completion ▼

Request body

Field	Type	Required	Description
prompt	string	required	Input text
system	string	optional	System message prepended to prompt
max_tokens	integer	optional	Max output tokens (default: 512)
temperature	float	optional	Sampling temperature 0.0–2.0 (default: 0.7)
model	string	optional	Override active model filename
profile	string	optional	Compliance profile ID (e.g. `medical`)

Example

curl -X POST http://localhost:9480/infer \ -H 'Content-Type: application/json' \ -d '{"prompt":"Explain quantum entanglement","max_tokens":256}'

Response

{"text": "Quantum entanglement is…", "tokens": 187, "model": "llama3.2-3b.gguf", "latency_ms": 1240}

POST /infer/stream SSE Streaming token-by-token output ▼

Same body as POST /infer. Returns text/event-stream with one {"token":"…"} event per generated token, followed by a final {"type":"done","tokens":N} event.

POST /agent/stream SSE ReAct agent loop — thought/action/observation ▼

Request body

Field	Type	Required	Description
message	string	required	User message
profile	string	optional	Agent compliance profile (default: `general`)
session_id	string	optional	Conversation session for history
allow_web_search	boolean	optional	Enable web search tool (requires scope=open)
history	array	optional	Prior `[{role,content}]` turns

SSE event types

{"type":"thought","text":"I should search for…"} {"type":"action","tool":"search","args":"quantum computing"} {"type":"observation","text":"Found: …"} {"token":"Based on my research…"} {"type":"done","tokens":312,"sources":2}

GET /agent/profiles List all 14 compliance profiles ▼

Response

{"profiles": [ {"id": "general", "tier": "OPEN", "pii_scan": true, "injection_check": true, "consent_required": false}, {"id": "medical", "tier": "REGULATED", "pii_scan": true, "injection_check": true, "consent_required": true}, {"id": "security", "tier": "RESTRICTED", "pii_scan": true, "injection_check": true, "consent_required": true}, … ]}

📦

Models

Manage GGUF models. LINUS-AI scans ~/models and ~/.linus-ai/models on startup.

GET /models List available models ▼

{"models": [ {"name": "llama3.2-3b.gguf", "size_gb": 2.1, "fits_in_ram": true, "loaded": true, "quant": "Q4_K_M"}, {"name": "qwen2.5-72b.gguf", "size_gb": 41.2, "fits_in_ram": false, "loaded": false, "quant": "Q4_K_M"} ]}

POST /models/select Set active model ▼

{"model": "llama3.2-3b.gguf", "force": false} // force:true overrides the 85% RAM guard

POST /models/load · /models/unload Load / unload model into memory ▼

POST /models/load {"model": "phi-4.gguf"} POST /models/unload {"model": "phi-4.gguf"}

GETPOST /models/roles Assign models to inference roles ▼

POST /models/roles {"main": "llama3.3-70b.gguf", "router": "llama3.2-3b.gguf", "student": null} GET /models/roles → {"main": "llama3.3-70b.gguf", "router": "llama3.2-3b.gguf"}

POST /models/pull Download model from URL ▼

{"url": "https://huggingface.co/…/llama3.2-3b-Q4_K_M.gguf", "name": "llama3.2-3b.gguf"}

∑

Tensor & Pipeline Parallelism

Split large models across multiple GPUs or machines. Tensor parallelism splits weight matrices; pipeline parallelism splits transformer layers.

POST /pipeline/plan Configure pipeline parallel layout ▼

{"model_path": "/models/llama3.3-70b.gguf", "use_mesh_peers": true} // Returns: {"plan": [{node, layers_start, layers_end}, …]}

POST /pipeline/infer Run inference across pipeline plan ▼

{"prompt": "Hello", "max_tokens": 256, "temperature": 0.7}

GETPOST /tensor/plan Get or set tensor parallel plan ▼

POST body

{ "plan_id": "tp-plan-1", "world_size": 2, "local_rank": 0, "backend": "Rpc", // "Rpc" | "Native" "rpc_port": 50052, "model_path": "/models/70b.gguf", "peers": [ {"rank": 0, "node_id": "node-a", "address": "node-a:9480", "rpc_address": "node-a:50052", "ram_mb": 65536}, {"rank": 1, "node_id": "node-b", "address": "node-b:9480", "rpc_address": "node-b:50052", "ram_mb": 32768} ] }

POST /tensor/infer Run tensor-parallel inference (coordinator) ▼

{"prompt": "Hello", "max_tokens": 128}

POST /tensor/allreduce Submit partial tensor (native AllReduce) ▼

Binary TNSR frame: b"TNSR" magic + rank (u8) + world_size (u8) + request_id (16 B UUID) + element_count (u32 LE) + f32 elements. Returns reduced sum frame.

⚖️

Compliance & Security

14 domain-specific compliance profiles across 4 tiers. PII scanning, injection detection, consent management, and immutable HMAC-chained audit logs. See Admin Guide → Compliance.

GET /compliance/preflight Check text against compliance profile ▼

Query params

Param	Type	Description
text	string	Text to check (URL-encoded)
profile	string	Profile ID (default: `general`)
user_id	string	Optional user identifier for audit log

Response

{"decision": "allow", // "allow" | "block" | "warn" "issues": [], "redacted_text": "…", "profile": "general", "reasons": []}

{"decision": "block", "issues": ["CREDIT_CARD detected at position 42"], "reasons": ["Blocking PII type: CREDIT_CARD"]}

GET /compliance/profiles List all 14 profiles with metadata ▼

{"profiles": [ {"id": "general", "tier": "OPEN", "regulations": [], "consent_required": false, "injection_block": false}, {"id": "medical", "tier": "REGULATED", "regulations": ["HIPAA","HITECH"], "consent_required": true, "injection_block": false}, {"id": "legal", "tier": "REGULATED", "regulations": ["ABA","GDPR"], "consent_required": true, "injection_block": false}, {"id": "finance", "tier": "REGULATED", "regulations": ["SOX","PCI-DSS","FINRA"], "consent_required": true, "injection_block": false}, {"id": "security", "tier": "RESTRICTED", "regulations": ["SOC2","ISO27001"], "consent_required": true, "injection_block": true} ]}

GET /compliance/audit Query HMAC-chained audit log ▼

Query params

Param	Description
limit	Max records (default: 100)
profile	Filter by profile ID
decision	`allow` \| `block` \| `warn`
user_id	Filter by user
since	ISO8601 timestamp lower bound

GET /compliance/audit/verify Verify HMAC chain integrity ▼

{"ok": true, "records": 1847, "months": ["2026-01","2026-02","2026-03"]} // ok:false means at least one record was tampered with or deleted

POST /compliance/audit/seal Seal completed monthly log files (OS-level immutable) ▼

// No body required → {"sealed": ["audit-2026-01.jsonl", "audit-2026-02.jsonl"]}

POST /compliance/audit/export Export audit snapshot to directory ▼

{"dest_dir": "/mnt/siem/linus-ai-export"} → {"ok": true, "records": 1847, "dest": "/mnt/siem/linus-ai-export/audit-2026-03.jsonl"}

🗂

RAG Document Access Control

Fine-grained document access control for RAG pipelines. 5 classification levels (PUBLIC → TOP_SECRET). ACL at user, role, department, division, and company scope. See Admin Guide → RAG.

GET /rag/documents List registered documents ▼

{"documents": [ {"id": "doc-001", "title": "Q4 Report.pdf", "classification": "CONFIDENTIAL", "owner_user_id": "alice", "mime_type": "application/pdf", "size_bytes": 204800} ]}

POST /rag/documents/register Register a document in the access registry ▼

Classification levels

Value	Name	Default access
0	PUBLIC	Everyone
1	INTERNAL	Company members via ACL
2	CONFIDENTIAL	Explicit ACL permit
3	SECRET	Clearance ≥ 3 + ACL
4	TOP_SECRET	Clearance 4 + named explicitly

Request body

{"title": "Q4 Report.pdf", "path": "/docs/q4-report.pdf", "owner_user_id": "alice", "owner_name": "Alice Smith", "classification": 2, // CONFIDENTIAL "mime_type": "application/pdf", "size_bytes": 204800, "content_hash": "sha256hex…"}

PUT /rag/documents/{id}/acl Update document ACL ▼

{"allow_users": ["bob", "carol"], "deny_users": [], "allow_companies": ["ACME Corp"], "deny_companies": [], "allow_divisions": ["Engineering"], "deny_divisions": [], "allow_departments": [], "deny_departments": [], "allow_roles": ["analyst"], "deny_roles": []}

Deny always overrides allow at the same scope. Decision algorithm: owner → deny → PUBLIC → clearance → TOP_SECRET list → ACL permit → default DENY.

PUT /rag/documents/{id}/classification Update document classification level ▼

{"classification": 3} // SECRET

DELETE /rag/documents/{id} Remove document from registry ▼

DELETE /rag/documents/doc-001 → {"ok": true}

POST /rag/access-check Check access and write audit record ▼

{"user_id": "bob", "doc_id": "doc-001"} → {"decision": "PERMIT", "reason": "ACL allow_users match", "doc_id": "doc-001", "user_id": "bob"}

GETPOST /rag/principals List or create/update principals ▼

POST /rag/principals {"user_id": "bob", "name": "Bob Jones", "email": "bob@acme.com", "company": "ACME Corp", "division": "Engineering", "department": "Backend", "clearance": 2, // 0-4 "roles": ["analyst", "reviewer"]} DELETE /rag/principals/bob → {"ok": true}

GET /rag/audit Query RAG access audit log ▼

Query params

Param	Description
doc_id	Filter by document
user_id	Filter by user
denied_only	`true` to show only denials
stats	`true` to return aggregate statistics instead of records
limit	Max records (default: 100)

🕸

Mesh & Peers

GET /peers Active peer list ▼

{"peers": [ {"node_id": "abc123", "hostname": "macstudio", "role": "hub", "address": "192.168.1.10:9480", "ram_gb": 192, "gpu": "metal", "thermal": "NOMINAL", "models": ["llama3.3-70b.gguf"], "latency_ms": 4} ]}

POST /mesh/push-model Send model to a peer node ▼

{"peer": "192.168.1.11:9480", "model": "llama3.2-3b.gguf"}

POST /mesh/assign-roles Re-score and assign hub/worker/spoke roles ▼

// No body — scores all connected peers and assigns roles → {"assigned": {"abc123": "master_hub", "def456": "hub", "ghi789": "worker"}}

🛡

Vault & Guardian

ChaCha20+HMAC-SHA256 encrypted key-value store. Guardian 2FA wraps sensitive operations in TOTP gates.

POST /vault/store · /vault/get Encrypted store / retrieve ▼

POST /vault/store {"key": "api_secret", "value": {"token": "…"}} GET /vault/get?key=api_secret → {"value": {"token": "…"}}

POST /guardian/login · /guardian/gate/approve Authenticate and approve 2FA gates ▼

POST /guardian/login {"username": "admin", "password": "…"} POST /guardian/enroll-totp {"username": "admin"} → {"qr_uri": "otpauth://totp/LINUS-AI:admin?secret=…"} POST /guardian/gate/create {"operation": "model_push", "session_id": "…"} → {"gate_id": "abc123"} POST /guardian/gate/approve {"gate_id": "abc123", "totp_code": "123456"} → {"approved": true}

💳

Billing

LAN peers are free. Internet peers pay per inference unit (1 unit = 1,000 output tokens).

GET /billing All node accounts and balances ▼

{"accounts": [ {"node_id": "abc123", "address": "203.0.113.5", "balance": 47, "total_calls": 53} ]}

POST /billing/topup Add inference credits to a node ▼

{"node_id": "abc123", "address": "203.0.113.5", "units": 100}

⚙

System

GET /status · /stats · /health · /thermal Node telemetry endpoints ▼

GET /status → {mode, peers, models, uptime_s, active_model} GET /stats → {requests, tokens, latency_p50, latency_p95, latency_p99} GET /health → {"ok": true} GET /thermal → {state: "NOMINAL", temp_c: 62, throttle_level: 0}

GETPOST /settings Read / write runtime settings ▼

GET /settings → {"mode": "full", "scope": "private", "log_level": "warn", …} POST /settings {"scope": "open", "log_level": "info"}

POST /shell/exec Run a shell command (Launch tab) ▼

{"command": "df -h /", "timeout_s": 10} → {"stdout": "…", "stderr": "", "exit_code": 0, "duration_ms": 45}

Destructive patterns (rm -rf /, mkfs, fork bomb) are blocked by the safety filter. Timeout is capped at 120 s.