README — LINUS-AI

Getting Started

Quickstart — 60 Seconds

Download the pre-built binary for your platform, activate your license, pull a model, and run. No Python, no pip, no Cargo required.

60-Second Quickstart — Linux x86_64

# 1. Download the binary (or use the GUI installer on macOS/Windows)

$ curl -Lo linus-ai https://github.com/miryala3/linus-ai-public/releases/latest/download/linus-ai-v4.0.0-headless-linux-x86_64

$ chmod +x linus-ai && sudo mv linus-ai /usr/local/bin/

# 2. Activate your license key (sent to your email after purchase)

$ linus-ai --activate LNAI-XXXX-XXXX-XXXX-XXXX

✓ License activated · Professional · 1/1 seats

# 3. Pull a model and start the server

$ linus-ai --pull-model llama3.2 && linus-ai --serve

Pulling llama3.2 (4.1 GB) ████████████████ 100%

✓ API server running → http://localhost:9480

# 4. Open the control panel

$ open http://localhost:9480/app

ℹ

The binary auto-detects your OS, RAM, GPU, and inference backend. No Python. No pip. No configuration file required. macOS and Windows installers are available on the Download page.

Download

Platform Binaries

Pre-built binaries are published to the public releases page for every tagged version.

Platform	Asset name	Format
macOS arm64 (Apple Silicon)	`linus-ai-v4.0.0-headless-macos-arm64`	Binary
macOS x86_64 (Intel)	`linus-ai-v4.0.0-headless-macos-x86_64`	Binary
macOS Universal	`LINUS-AI-v4.0.0.dmg`	Signed DMG (GUI)
Linux x86_64 (static)	`linus-ai-v4.0.0-headless-linux-x86_64`	Binary / .deb
Linux arm64 (static)	`linus-ai-v4.0.0-headless-linux-arm64`	Binary
Windows 10+ x86_64	`LINUS-AI-Setup-v4.0.0.exe`	NSIS installer / MSIX

All releases include SHA256SUMS.txt. Verify: sha256sum -c SHA256SUMS.txt

Configuration

Modes

Control what runs via CLI flags or the LINUS_AI_MODE and LINUS_AI_SCOPE environment variables.

Mode & Scope flags

$ ./linus_ai # full mode (default) — everything on

$ ./linus_ai --mode standalone # inference only, no mesh

$ ./linus_ai --mode mesh # LAN peer discovery, no scheduler

$ ./linus_ai --scope private # no web search, no outbound (default)

$ ./linus_ai --scope lan # LAN mesh + local models only

$ ./linus_ai --scope open # allow web search in agent mode

$ RUST_LOG=info ./linus_ai # verbose startup log

$ ./linus_ai 2>/dev/null # completely silent terminal

Mode	What runs
standalone	Inference engine + HTTP API
mesh	+ LAN peer discovery (mDNS multicast)
hive	+ Federated learning across peers
task	+ Distributed job scheduler
full	Everything. Recommended.

Configuration

Key Environment Variables

Variable	Default	Description
LINUS_AI_MODE	`full`	Operating mode
LINUS_AI_API_PORT	`9480`	HTTP API + GUI port
LINUS_AI_MODEL_DIR	`~/models,~/.linus_ai/models`	Model directories (comma-separated)
LINUS_AI_MESH_SECRET	`linus-ai-default-mesh-secret`	Shared HMAC secret for mesh auth — change this
LINUS_AI_SCOPE	`private`	`private` / `lan` / `open`
LINUS_AI_NODE_NAME	system hostname	Name shown in peer cards
LINUS_AI_LOG_LEVEL	`warn`	`trace` / `debug` / `info` / `warn` / `error`
LINUS_AI_TAILSCALE	`false`	Enable Tailscale peer routing
LINUS_AI_OVERLAY_SERVER	—	WAN relay address, e.g. `relay.example.com:9777`
LINUS_AI_DATA_DIR	`~/.linus_ai/data`	Vault, ledger, session storage
LINUS_AI_AUDIT_DIR	`~/.linus-ai/audit`	Primary directory for compliance + RAG audit logs
LINUS_AI_AUDIT_EXPORT_DIRS	—	Colon-separated secondary audit export dirs (enterprise SIEM routing)
RUST_LOG	—	Overrides `--log-level` when set

Architecture

Inference Backend

The active backend is selected at compile time via build.sh engine flags — not at runtime.

Engine flag	Backend	GPU	Performance	Cross-compile
--llama-cpp2 (default)	Bundled llama.cpp (static)	Metal / CUDA auto	Fastest	macOS→macOS only
--candle-only	Pure-Rust candle	Metal / CUDA auto	~20–30% slower	Any host → any target
(no features)	Subprocess fallback	Depends on external tool	Slowest	Any

Subprocess Fallback Priority

Used when binary built without a bundled engine:

Priority	Backend	How to activate
1	`llama-server` HTTP on `:8181`	Start `llama-server --model model.gguf --port 8181`
2	`ollama` HTTP on `:11434`	Install Ollama and run `ollama serve`
3	Auto-spawn `llama-server`	`brew install llama.cpp` — spawned for models ≤ 20 GB
4	`llama-cli` subprocess	`brew install llama.cpp` — GPU first, CPU fallback

Model RAM guard: Models that exceed 85% of physical RAM are automatically marked unavailable and shown greyed-out in the GUI. Use the ⚠ Force button to override.

GPU → CPU fallback: On GPU error (e.g. Metal API mismatch) LINUS-AI retries with CPU-only automatically.

Distributed Inference

Pipeline Parallelism — Run 70B+ Models Across Multiple Machines

LINUS-AI ships pipeline-parallel inference: a model too large for one machine's RAM is automatically split across N mesh nodes, each holding a contiguous range of transformer layers.

Request
  │
  ▼ Head node (layers 0–15)
  │  embed tokens → run layers → send activations
  ▼ Mid node  (layers 16–47)
  │  receive activations → run layers → forward
  ▼ Tail node (layers 48–79)
     receive activations → compute logits → sample → return token

How it works

The head node reads the GGUF tensor index (no weights loaded yet)
LayerPlanner assigns layer ranges to each node proportional to its RAM
Each node loads only its assigned layers from the GGUF file
Activations flow node-to-node over HTTP /pipeline/forward using a compact binary wire format (LNSP magic, f16/f32, little-endian)
KV cache is maintained per-node per-request

Pipeline API

# Via API

$ curl -X POST http://localhost:9480/pipeline/plan \

-H 'Content-Type: application/json' \

-d '{"model_path": "/path/to/70B.gguf", "use_mesh_peers": true}'

$ curl -X POST http://localhost:9480/pipeline/infer \

-d '{"prompt": "Hello", "max_tokens": 256}'

Billing

Internet Node Payment (Blockchain Billing)

LAN peers are always free. Internet peers must pay per inference unit.

Rule	Detail
LAN / loopback	Free — no account needed
New internet node	5 free welcome units on first contact
Cost	1 unit per inference call
Token rate	1 unit = 1,000 output tokens
Denied	`402 Payment Required` + JSON reason

Billing API

# Check balances

$ curl http://localhost:9480/billing

# Top up a node

$ curl -X POST http://localhost:9480/billing/topup \

-d '{"node_id": "abc123", "address": "203.0.113.5", "units": 50}'

Interface

GUI Tabs

Open http://localhost:9480/app after starting. The control panel includes these tabs:

Tab	What it does
Chat	Streaming tokens, markdown, document upload (.pdf/.docx/.txt), export as .md
Agent	ReAct loop with tools — thought/action/observation steps inline. Web search when scope=open.
Models	List GGUF files, load/unload, assign roles. RAM-unfit models shown disabled.
Setup	Node role, model role assignments, compliance profile selection, RAG document access control
Mesh	Peer topology map, hub-spoke visualisation, manual peer connect
Chorus	Federated learning controls, gradient aggregation dashboard
Tasks	Task submit/status/cancel
Thermal	5-stage thermal state (NOMINAL → THROTTLE → HOT → CRITICAL → EMERGENCY)
Ledger	Blockchain transparency audit trail (SHA-256 hash chain + Merkle proofs)
Hardware	AI capability score, RAM/GPU/NPU detection
Compliance	Industry compliance profiles (14), PII scanning, injection detection, consent management, immutable audit log
Permissions	Node access controls, mesh auth, billing, model sharing
Launch	In-app shell terminal with command history and quick-launch buttons

REST API

API Reference

All endpoints on http://localhost:9480 (configurable via LINUS_AI_API_PORT).

Inference

Method	Path	Body	Description
POST	`/infer`	`{prompt, system?, max_tokens?, temperature?}`	Single inference
POST	`/infer/stream`	same	SSE token stream
POST	`/agent/stream`	`{message, profile?, max_tokens?, session_id?}`	Agent ReAct loop, SSE
POST	`/pipeline/plan`	`{model_path, use_mesh_peers?}`	Configure pipeline plan
POST	`/pipeline/infer`	`{prompt, max_tokens?, temperature?}`	Pipeline inference
POST	`/pipeline/forward`	binary frame	Node-to-node activation forwarding
POST	`/pipeline/clear`	`{request_id}`	Clear KV cache
GET/POST	`/tensor/plan`	`TensorParallelPlan` JSON	Get / set tensor parallel plan
GET	`/tensor/status`	—	TP plan + RPC worker health
POST	`/tensor/infer`	`{prompt, max_tokens?, temperature?}`	Tensor-parallel inference (coordinator)
POST	`/tensor/allreduce`	binary TNSR frame	Submit partial activation; returns reduced tensor
POST	`/tensor/rpc/start`	`{rpc_port?}`	Spawn llama-rpc-server on this node
POST	`/tensor/rpc/stop`	—	Stop llama-rpc-server

Models

Method	Path	Description
GET	`/models`	List models with `fits_in_ram` flag per model
POST	`/models/select`	Set active model — blocked if too large for RAM (pass `force:true` to override)
POST	`/models/load`	Load model — blocked if too large for RAM (pass `force:true` to override)
POST	`/models/unload`	Unload current model
GET/POST	`/models/roles`	Get/set role assignments
POST	`/models/pull`	Pull model from URL
PUT	`/models/push`	Receive model binary from a peer
GET	`/models/recommend`	Model recommendations for this node

Mesh & Peers

Method	Path	Description
GET	`/peers`	Active peer list with roles, RAM, thermal
POST	`/mesh/push-model`	Push a model to a specific peer
POST	`/mesh/assign-roles`	Reassign hub/spoke roles across the mesh
GET	`/benchmark`	Routing table and latency scores

Billing

Method	Path	Description
GET	`/billing`	All node accounts with balances
POST	`/billing/topup`	Add inference credits: `{node_id, address, units}`

Compliance & Security

Method	Path	Body / Params	Description
GET	`/compliance/preflight`	`?text=...&profile=...`	Check text against active compliance profile
GET	`/compliance/profiles`	—	List all 14 compliance profiles
POST	`/compliance/consent`	`{user_id, action, profile}`	Record user consent
GET	`/compliance/consent`	`?user_id=...`	Check consent status
GET	`/compliance/audit`	`?limit=N&profile=...`	Query immutable audit log
GET	`/compliance/audit/verify`	—	Verify HMAC chain integrity
POST	`/compliance/audit/seal`	—	Seal all completed monthly log files
POST	`/compliance/audit/export`	`{dest_dir}`	Export audit snapshot to directory

RAG Document Access Control

Method	Path	Body / Params	Description
GET	`/rag/documents`	—	List registered documents with classifications
POST	`/rag/documents/register`	`{title, path, owner_user_id, classification, ...}`	Register document in registry
PUT	`/rag/documents/{id}/acl`	`{allow_users, deny_users, allow_companies, ...}`	Update document ACL
PUT	`/rag/documents/{id}/classification`	`{classification}`	Update document classification level
DELETE	`/rag/documents/{id}`	—	Remove document from registry
POST	`/rag/access-check`	`{user_id, doc_id}`	Check access and log decision
GET	`/rag/principals`	—	List registered principals
POST	`/rag/principals`	`{user_id, name, clearance, company, ...}`	Create or update principal
DELETE	`/rag/principals/{user_id}`	—	Remove principal
GET	`/rag/audit`	`?doc_id=...&user_id=...&denied_only=true&limit=N`	Query RAG access audit log

Shell & System

Method	Path	Description
POST	`/shell/exec`	Run a shell command: `{command, timeout_s?}`
GET	`/status`	Node status, uptime, active model, backend
GET	`/stats`	Full stats: thermal, blockchain, mesh
GET	`/health`	`{"ok": true}` liveness probe
GET	`/blockchain`	Ledger stats, chain validity
GET	`/thermal`	Thermal state and throttle level
GET/POST	`/settings`	Runtime settings
GET	`/endpoints`	Full documented endpoint index

Interface

Launch Shell (In-App Terminal)

The Launch tab embeds a terminal in the browser — no external terminal needed.

Type any shell command and press Run ▶ or Enter
Arrow keys for command history
Quick-launch buttons: List models, Disk usage, RAM, GPU, llama procs, API status
Safety filter blocks destructive patterns (rm -rf /, mkfs.*, fork bomb)
Timeout capped at 120 s per command

CLI

CLI Reference

linus_ai CLI

$ linus_ai # start server (full mode)

$ linus_ai --mode standalone # inference only

$ linus_ai --scope open # enable web search in agent mode

$ linus_ai info # hardware info and exit

$ linus_ai models # list AI models found on disk

$ linus_ai compile <script.py> # compile Python script to native binary

$ linus_ai run <script.py> # run Python script interpreted

$ linus_ai emit <script.py> # print generated C source

$ linus_ai targets # list cross-compilation targets

Governance

Compliance & Security Layer

LINUS-AI includes a built-in compliance engine (linus_ai/compliance.py) that enforces domain-specific governance before every inference request.

14 Compliance Profiles

Tier	Profiles
OPEN (permissive)	`general`, `creative`, `reasoning`, `code`, `engineering`
AUDIT (logged)	`education`, `support`, `sales`, `data_science`
REGULATED (strict)	`medical`, `legal`, `finance`, `hr`
RESTRICTED (hard-block)	`security`

What it checks

PII scanning — 12 types (email, phone, SSN, credit card, CVV, passport, IP, MAC, NHS number, IBAN, PAN-like, medical record). Credit card / CVV / SSN / PAN-like inputs are blocked outright; others are redacted.
Injection detection — 8 rule families (role override, prompt leak, jailbreak, DAN, system override, token smuggling, goal hijacking, multi-turn extraction). RESTRICTED profile hard-blocks; REGULATED profiles warn.
Consent gating — REGULATED/RESTRICTED profiles require explicit user consent before proceeding.
Immutable audit log — every request and decision recorded in HMAC-chained monthly .jsonl files; completed months sealed with chmod 0o400 + macOS UF_IMMUTABLE + Linux chattr +i.

Enterprise Audit Routing

SIEM routing

$ export LINUS_AI_AUDIT_DIR=/mnt/compliance/audit

$ export LINUS_AI_AUDIT_EXPORT_DIRS=/siem/linus:/backup/audit

Records are written to all locations in real time. Use POST /compliance/audit/seal to seal completed months and POST /compliance/audit/export for point-in-time snapshots.

Security

RAG Document Access Control

LINUS-AI implements fine-grained access control for Retrieval-Augmented Generation documents (linus_ai/rag_access.py).

Classification Levels

Level	Name	Access
0	PUBLIC	Anyone
1	INTERNAL	Authenticated company members
2	CONFIDENTIAL	Explicit ACL permit required
3	SECRET	Clearance ≥ 3 + explicit permit
4	TOP_SECRET	Clearance 4 + named on explicit list

ACL Scopes

Access rules can allow or deny at five scopes: company, division, department, role, user. Deny rules always override allow rules.

7-Step Access Decision

Owner override (always permit)
Explicit user DENY
PUBLIC bypass (always permit)
Clearance gate
TOP_SECRET explicit list
ACL permit (company → division → dept → role → user)
Default DENY

All decisions are written to a tamper-evident HMAC-chained audit log.

Licensing

License

LINUS-AI Source License v2.0 — free for personal / academic / small business (< $100K revenue). Buy once, own forever — updates are optional, not forced.

Tier	Annual (updates incl.)	Perpetual
Community	Free forever	Free
Professional (1 seat)	$99/yr	$499
Team (5 seats)	$199/yr	$1,499
Enterprise	$7,999/yr	—
Enterprise Plus	$14,999/yr	—

See Pricing page for full tier details, or LICENSE.md for license terms.

README — LINUS-AI v4.0.0

Quickstart — 60 Seconds

Platform Binaries

Modes

Key Environment Variables

Inference Backend

Subprocess Fallback Priority

Pipeline Parallelism — Run 70B+ Models Across Multiple Machines

How it works

Internet Node Payment (Blockchain Billing)

GUI Tabs

API Reference

Inference

Models

Mesh & Peers

Billing

Compliance & Security

RAG Document Access Control

Shell & System

Launch Shell (In-App Terminal)

CLI Reference

Compliance & Security Layer

14 Compliance Profiles

What it checks

Enterprise Audit Routing

RAG Document Access Control

Classification Levels

ACL Scopes

7-Step Access Decision

License