Quickstart — 60 Seconds
Download the pre-built binary for your platform, activate your license, pull a model, and run. No Python, no pip, no Cargo required.
Platform Binaries
Pre-built binaries are published to the public releases page for every tagged version.
| Platform | Asset name | Format |
|---|---|---|
| macOS arm64 (Apple Silicon) | linus-ai-v4.0.0-headless-macos-arm64 | Binary |
| macOS x86_64 (Intel) | linus-ai-v4.0.0-headless-macos-x86_64 | Binary |
| macOS Universal | LINUS-AI-v4.0.0.dmg | Signed DMG (GUI) |
| Linux x86_64 (static) | linus-ai-v4.0.0-headless-linux-x86_64 | Binary / .deb |
| Linux arm64 (static) | linus-ai-v4.0.0-headless-linux-arm64 | Binary |
| Windows 10+ x86_64 | LINUS-AI-Setup-v4.0.0.exe | NSIS installer / MSIX |
All releases include SHA256SUMS.txt. Verify: sha256sum -c SHA256SUMS.txt
Modes
Control what runs via CLI flags or the LINUS_AI_MODE and LINUS_AI_SCOPE environment variables.
| Mode | What runs |
|---|---|
| standalone | Inference engine + HTTP API |
| mesh | + LAN peer discovery (mDNS multicast) |
| hive | + Federated learning across peers |
| task | + Distributed job scheduler |
| full | Everything. Recommended. |
Key Environment Variables
| Variable | Default | Description |
|---|---|---|
| LINUS_AI_MODE | full | Operating mode |
| LINUS_AI_API_PORT | 9480 | HTTP API + GUI port |
| LINUS_AI_MODEL_DIR | ~/models,~/.linus_ai/models | Model directories (comma-separated) |
| LINUS_AI_MESH_SECRET | linus-ai-default-mesh-secret | Shared HMAC secret for mesh auth — change this |
| LINUS_AI_SCOPE | private | private / lan / open |
| LINUS_AI_NODE_NAME | system hostname | Name shown in peer cards |
| LINUS_AI_LOG_LEVEL | warn | trace / debug / info / warn / error |
| LINUS_AI_TAILSCALE | false | Enable Tailscale peer routing |
| LINUS_AI_OVERLAY_SERVER | — | WAN relay address, e.g. relay.example.com:9777 |
| LINUS_AI_DATA_DIR | ~/.linus_ai/data | Vault, ledger, session storage |
| LINUS_AI_AUDIT_DIR | ~/.linus-ai/audit | Primary directory for compliance + RAG audit logs |
| LINUS_AI_AUDIT_EXPORT_DIRS | — | Colon-separated secondary audit export dirs (enterprise SIEM routing) |
| RUST_LOG | — | Overrides --log-level when set |
Inference Backend
The active backend is selected at compile time via build.sh engine flags — not at runtime.
| Engine flag | Backend | GPU | Performance | Cross-compile |
|---|---|---|---|---|
| --llama-cpp2 (default) | Bundled llama.cpp (static) | Metal / CUDA auto | Fastest | macOS→macOS only |
| --candle-only | Pure-Rust candle | Metal / CUDA auto | ~20–30% slower | Any host → any target |
| (no features) | Subprocess fallback | Depends on external tool | Slowest | Any |
Subprocess Fallback Priority
Used when binary built without a bundled engine:
| Priority | Backend | How to activate |
|---|---|---|
| 1 | llama-server HTTP on :8181 | Start llama-server --model model.gguf --port 8181 |
| 2 | ollama HTTP on :11434 | Install Ollama and run ollama serve |
| 3 | Auto-spawn llama-server | brew install llama.cpp — spawned for models ≤ 20 GB |
| 4 | llama-cli subprocess | brew install llama.cpp — GPU first, CPU fallback |
Model RAM guard: Models that exceed 85% of physical RAM are automatically marked unavailable and shown greyed-out in the GUI. Use the ⚠ Force button to override.
GPU → CPU fallback: On GPU error (e.g. Metal API mismatch) LINUS-AI retries with CPU-only automatically.
Pipeline Parallelism — Run 70B+ Models Across Multiple Machines
LINUS-AI ships pipeline-parallel inference: a model too large for one machine's RAM is automatically split across N mesh nodes, each holding a contiguous range of transformer layers.
Request
│
▼ Head node (layers 0–15)
│ embed tokens → run layers → send activations
▼ Mid node (layers 16–47)
│ receive activations → run layers → forward
▼ Tail node (layers 48–79)
receive activations → compute logits → sample → return token
How it works
- The head node reads the GGUF tensor index (no weights loaded yet)
LayerPlannerassigns layer ranges to each node proportional to its RAM- Each node loads only its assigned layers from the GGUF file
- Activations flow node-to-node over HTTP
/pipeline/forwardusing a compact binary wire format (LNSPmagic, f16/f32, little-endian) - KV cache is maintained per-node per-request
Internet Node Payment (Blockchain Billing)
LAN peers are always free. Internet peers must pay per inference unit.
| Rule | Detail |
|---|---|
| LAN / loopback | Free — no account needed |
| New internet node | 5 free welcome units on first contact |
| Cost | 1 unit per inference call |
| Token rate | 1 unit = 1,000 output tokens |
| Denied | 402 Payment Required + JSON reason |
GUI Tabs
Open http://localhost:9480/app after starting. The control panel includes these tabs:
| Tab | What it does |
|---|---|
| Chat | Streaming tokens, markdown, document upload (.pdf/.docx/.txt), export as .md |
| Agent | ReAct loop with tools — thought/action/observation steps inline. Web search when scope=open. |
| Models | List GGUF files, load/unload, assign roles. RAM-unfit models shown disabled. |
| Setup | Node role, model role assignments, compliance profile selection, RAG document access control |
| Mesh | Peer topology map, hub-spoke visualisation, manual peer connect |
| Chorus | Federated learning controls, gradient aggregation dashboard |
| Tasks | Task submit/status/cancel |
| Thermal | 5-stage thermal state (NOMINAL → THROTTLE → HOT → CRITICAL → EMERGENCY) |
| Ledger | Blockchain transparency audit trail (SHA-256 hash chain + Merkle proofs) |
| Hardware | AI capability score, RAM/GPU/NPU detection |
| Compliance | Industry compliance profiles (14), PII scanning, injection detection, consent management, immutable audit log |
| Permissions | Node access controls, mesh auth, billing, model sharing |
| Launch | In-app shell terminal with command history and quick-launch buttons |
API Reference
All endpoints on http://localhost:9480 (configurable via LINUS_AI_API_PORT).
Inference
| Method | Path | Body | Description |
|---|---|---|---|
| POST | /infer | {prompt, system?, max_tokens?, temperature?} | Single inference |
| POST | /infer/stream | same | SSE token stream |
| POST | /agent/stream | {message, profile?, max_tokens?, session_id?} | Agent ReAct loop, SSE |
| POST | /pipeline/plan | {model_path, use_mesh_peers?} | Configure pipeline plan |
| POST | /pipeline/infer | {prompt, max_tokens?, temperature?} | Pipeline inference |
| POST | /pipeline/forward | binary frame | Node-to-node activation forwarding |
| POST | /pipeline/clear | {request_id} | Clear KV cache |
| GET/POST | /tensor/plan | TensorParallelPlan JSON | Get / set tensor parallel plan |
| GET | /tensor/status | — | TP plan + RPC worker health |
| POST | /tensor/infer | {prompt, max_tokens?, temperature?} | Tensor-parallel inference (coordinator) |
| POST | /tensor/allreduce | binary TNSR frame | Submit partial activation; returns reduced tensor |
| POST | /tensor/rpc/start | {rpc_port?} | Spawn llama-rpc-server on this node |
| POST | /tensor/rpc/stop | — | Stop llama-rpc-server |
Models
| Method | Path | Description |
|---|---|---|
| GET | /models | List models with fits_in_ram flag per model |
| POST | /models/select | Set active model — blocked if too large for RAM (pass force:true to override) |
| POST | /models/load | Load model — blocked if too large for RAM (pass force:true to override) |
| POST | /models/unload | Unload current model |
| GET/POST | /models/roles | Get/set role assignments |
| POST | /models/pull | Pull model from URL |
| PUT | /models/push | Receive model binary from a peer |
| GET | /models/recommend | Model recommendations for this node |
Mesh & Peers
| Method | Path | Description |
|---|---|---|
| GET | /peers | Active peer list with roles, RAM, thermal |
| POST | /mesh/push-model | Push a model to a specific peer |
| POST | /mesh/assign-roles | Reassign hub/spoke roles across the mesh |
| GET | /benchmark | Routing table and latency scores |
Billing
| Method | Path | Description |
|---|---|---|
| GET | /billing | All node accounts with balances |
| POST | /billing/topup | Add inference credits: {node_id, address, units} |
Compliance & Security
| Method | Path | Body / Params | Description |
|---|---|---|---|
| GET | /compliance/preflight | ?text=...&profile=... | Check text against active compliance profile |
| GET | /compliance/profiles | — | List all 14 compliance profiles |
| POST | /compliance/consent | {user_id, action, profile} | Record user consent |
| GET | /compliance/consent | ?user_id=... | Check consent status |
| GET | /compliance/audit | ?limit=N&profile=... | Query immutable audit log |
| GET | /compliance/audit/verify | — | Verify HMAC chain integrity |
| POST | /compliance/audit/seal | — | Seal all completed monthly log files |
| POST | /compliance/audit/export | {dest_dir} | Export audit snapshot to directory |
RAG Document Access Control
| Method | Path | Body / Params | Description |
|---|---|---|---|
| GET | /rag/documents | — | List registered documents with classifications |
| POST | /rag/documents/register | {title, path, owner_user_id, classification, ...} | Register document in registry |
| PUT | /rag/documents/{id}/acl | {allow_users, deny_users, allow_companies, ...} | Update document ACL |
| PUT | /rag/documents/{id}/classification | {classification} | Update document classification level |
| DELETE | /rag/documents/{id} | — | Remove document from registry |
| POST | /rag/access-check | {user_id, doc_id} | Check access and log decision |
| GET | /rag/principals | — | List registered principals |
| POST | /rag/principals | {user_id, name, clearance, company, ...} | Create or update principal |
| DELETE | /rag/principals/{user_id} | — | Remove principal |
| GET | /rag/audit | ?doc_id=...&user_id=...&denied_only=true&limit=N | Query RAG access audit log |
Shell & System
| Method | Path | Description |
|---|---|---|
| POST | /shell/exec | Run a shell command: {command, timeout_s?} |
| GET | /status | Node status, uptime, active model, backend |
| GET | /stats | Full stats: thermal, blockchain, mesh |
| GET | /health | {"ok": true} liveness probe |
| GET | /blockchain | Ledger stats, chain validity |
| GET | /thermal | Thermal state and throttle level |
| GET/POST | /settings | Runtime settings |
| GET | /endpoints | Full documented endpoint index |
Launch Shell (In-App Terminal)
The Launch tab embeds a terminal in the browser — no external terminal needed.
- Type any shell command and press Run ▶ or Enter
- Arrow keys for command history
- Quick-launch buttons: List models, Disk usage, RAM, GPU, llama procs, API status
- Safety filter blocks destructive patterns (
rm -rf /,mkfs.*, fork bomb) - Timeout capped at 120 s per command
CLI Reference
Compliance & Security Layer
LINUS-AI includes a built-in compliance engine (linus_ai/compliance.py) that enforces domain-specific governance before every inference request.
14 Compliance Profiles
| Tier | Profiles |
|---|---|
| OPEN (permissive) | general, creative, reasoning, code, engineering |
| AUDIT (logged) | education, support, sales, data_science |
| REGULATED (strict) | medical, legal, finance, hr |
| RESTRICTED (hard-block) | security |
What it checks
- PII scanning — 12 types (email, phone, SSN, credit card, CVV, passport, IP, MAC, NHS number, IBAN, PAN-like, medical record). Credit card / CVV / SSN / PAN-like inputs are blocked outright; others are redacted.
- Injection detection — 8 rule families (role override, prompt leak, jailbreak, DAN, system override, token smuggling, goal hijacking, multi-turn extraction). RESTRICTED profile hard-blocks; REGULATED profiles warn.
- Consent gating — REGULATED/RESTRICTED profiles require explicit user consent before proceeding.
- Immutable audit log — every request and decision recorded in HMAC-chained monthly
.jsonlfiles; completed months sealed withchmod 0o400+ macOSUF_IMMUTABLE+ Linuxchattr +i.
Enterprise Audit Routing
Records are written to all locations in real time. Use POST /compliance/audit/seal to seal completed months and POST /compliance/audit/export for point-in-time snapshots.
RAG Document Access Control
LINUS-AI implements fine-grained access control for Retrieval-Augmented Generation documents (linus_ai/rag_access.py).
Classification Levels
| Level | Name | Access |
|---|---|---|
| 0 | PUBLIC | Anyone |
| 1 | INTERNAL | Authenticated company members |
| 2 | CONFIDENTIAL | Explicit ACL permit required |
| 3 | SECRET | Clearance ≥ 3 + explicit permit |
| 4 | TOP_SECRET | Clearance 4 + named on explicit list |
ACL Scopes
Access rules can allow or deny at five scopes: company, division, department, role, user. Deny rules always override allow rules.
7-Step Access Decision
- Owner override (always permit)
- Explicit user DENY
- PUBLIC bypass (always permit)
- Clearance gate
- TOP_SECRET explicit list
- ACL permit (company → division → dept → role → user)
- Default DENY
All decisions are written to a tamper-evident HMAC-chained audit log.
License
LINUS-AI Source License v2.0 — free for personal / academic / small business (< $100K revenue). Buy once, own forever — updates are optional, not forced.
| Tier | Annual (updates incl.) | Perpetual |
|---|---|---|
| Community | Free forever | Free |
| Professional (1 seat) | $99/yr | $499 |
| Team (5 seats) | $199/yr | $1,499 |
| Enterprise | $7,999/yr | — |
| Enterprise Plus | $14,999/yr | — |
See Pricing page for full tier details, or LICENSE.md for license terms.