LINUS-AI Documentation

Everything you need to deploy, configure, and extend LINUS-AI. From single-node chat to multi-GPU distributed inference.

User Guide

Install, configure, and chat. Start here if you're new to LINUS-AI. Covers setup, model loading, chat modes, and the CLI reference.

Open User Guide →

Admin Guide

Deploy, manage, and secure LINUS-AI in production. Covers multi-user deployments, access control, monitoring, and updates.

Open Admin Guide →

Developer Guide

APIs, integrations, and extensions. Build on top of LINUS-AI with the REST API, WebSocket streaming, and plugin system.

Open Developer Guide →

Architect Guide

Distributed topologies, tensor parallelism, pipeline parallelism, and mesh networking. For engineers running large-scale private AI.

Open Architect Guide →
📖

README

Quickstart, platform binaries, modes, API reference, CLI reference, compliance overview, and licensing — the complete README as a searchable web page.

Open README →

Technical Specification

Architecture overview, module reference, security model, mesh protocol, inference pipeline, tensor parallelism, compliance layer, and full changelog.

Open Spec →

API Endpoint Reference

Complete REST API reference: all endpoints, request/response formats, and examples for inference, models, compliance, RAG, mesh, billing, and more.

Open API Ref →

System Diagrams

Interactive flow diagrams for every subsystem: inference pipeline, payment flow, mesh networking, shell handler, blockchain ledger, build system, and test infrastructure.

Open Diagrams →

Popular Topics

Jump directly to the most-referenced documentation pages.

Getting Started

From zero to running your first private AI conversation in under 5 minutes.

Quick Start — 3 commands
# 1. Download (Linux x86_64 example)
$ curl -Lo linus-ai https://github.com/miryala3/linus-ai/releases/latest/download/linus-ai-linux-x86_64
$ chmod +x linus-ai && sudo mv linus-ai /usr/local/bin/
 
# 2. Activate your license
$ linus-ai --activate LNAI-XXXX-XXXX-XXXX-XXXX
✓ License activated (Professional · 1/1 seats)
 
# 3. Pull a model and start chatting
$ linus-ai --pull-model llama3.2
$ linus-ai --chat "Explain quantum entanglement simply"
LINUS-AI> Quantum entanglement is when two particles become…

linus_ai.toml Reference

The primary configuration file lives at ~/.linus_ai/config.toml. All settings can also be passed as CLI flags or environment variables.

~/.linus_ai/config.toml
# ~/.linus_ai/config.toml — full example
 
[license]
key = "LNAI-XXXX-XXXX-XXXX-XXXX"
email = "you@example.com"
 
[inference]
backend = "auto" # auto | cpu | cuda | metal | rocm
model = "llama3.2" # default model
context_len = 8192 # max context window
threads = 8 # CPU threads (-1 = auto)
gpu_layers = -1 # GPU layers (-1 = all)
quantize = "Q4_K_M" # default quant level
 
[server]
host = "127.0.0.1"
port = 8080
cors = ["*"]
 
[vault]
enabled = true
encrypt = true
key_path = "~/.linus_ai/vault.key"
 
[mesh]
enabled = false
role = "coordinator" # coordinator | worker
listen_port = 9090
peers = []

API Reference

LINUS-AI exposes an OpenAI-compatible REST API on http://localhost:8080 when running in server mode. Drop-in replacement for applications built against the OpenAI API.

POST /v1/chat/completions OpenAI-compatible chat completions
POST /v1/completions Legacy text completions
POST /v1/embeddings Text embeddings (768 or 1536 dims)
GET /v1/models List available models
POST /v1/models/pull Download a model from registry
DELETE /v1/models/{name} Delete a local model
GET /health Health check + version info
GET /metrics Prometheus-compatible metrics
POST /v1/vault/store Store encrypted memory entry
GET /v1/vault/query Semantic search over vault
curl example
$ curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.2",
"stream": true,
"messages": [{"role":"user","content":"Hello!"}]
}'
data: {"id":"...","choices":[{"delta":{"content":"Hi!"}}]}

Distributed Inference

Scale LINUS-AI across GPUs and machines. Tensor and pipeline parallelism allow running models larger than any single device's memory.

Tensor Parallelism Setup

4× GPU tensor parallel
# config.toml
[inference]
backend = "cuda"
tensor_parallel = 4 # split across 4 GPUs
gpu_ids = [0,1,2,3]
model = "llama3-70b"
 
$ linus-ai --serve --tensor-parallel 4
✓ Tensor parallel initialized: 4 shards (70B model @ ~17.5B params/GPU)
✓ API server ready: http://0.0.0.0:8080

Mesh Networking Setup

Coordinator + 2 Workers
# Node 1 — Coordinator (192.168.1.10)
$ linus-ai --serve --mesh-role coordinator --mesh-port 9090
 
# Node 2 — Worker (192.168.1.11)
$ linus-ai --mesh-role worker --mesh-join 192.168.1.10:9090
 
# Node 3 — Worker (192.168.1.12)
$ linus-ai --mesh-role worker --mesh-join 192.168.1.10:9090
 
✓ Mesh cluster: 1 coordinator + 2 workers
✓ Pipeline parallel: 3 stages across 3 nodes

Security & Privacy

LINUS-AI is built private-first. Here's what that means in practice.

🔒

Zero Telemetry

No usage data, no model outputs, no conversation logs are ever sent anywhere. All processing is local.

🛡

Encrypted Vault

All stored conversations and embeddings are AES-256-GCM encrypted. The key is derived from your hardware and never exported.

🔐

mTLS Mesh Transport

All inter-node mesh communication uses mutual TLS with auto-generated certificates. Traffic is encrypted end-to-end.

🏠

Air-Gap Ready

After license activation, LINUS-AI operates fully offline. No external dependencies, no model API calls, no internet required.

👤

Access Control

API key authentication, per-key rate limiting, and IP allowlisting for the server mode. LDAP integration for Enterprise.

📋

Audit Logging

Optional structured audit logs (JSON) for all API requests. Includes timestamps, model, token counts — never prompt content.

Get Help & Contribute

LINUS-AI is developed in the open. Join the community, report bugs, suggest features, or contribute code.