Documentation — LINUS-AI

User Guide

Install, configure, and chat. Start here if you're new to LINUS-AI. Covers setup, model loading, chat modes, and the CLI reference.

→ Getting Started → License Activation → Inference Modes → CLI Reference

Open User Guide →

⚙

Admin Guide

Deploy, manage, and secure LINUS-AI in production. Covers multi-user deployments, access control, monitoring, and updates.

→ Deployment → Security & Auth → Monitoring → Backup & Recovery

Open Admin Guide →

⌨

Developer Guide

APIs, integrations, and extensions. Build on top of LINUS-AI with the REST API, WebSocket streaming, and plugin system.

→ REST API Reference → WebSocket Streaming → Plugin System → Python SDK

Open Developer Guide →

∑

Architect Guide

Distributed topologies, tensor parallelism, pipeline parallelism, and mesh networking. For engineers running large-scale private AI.

→ Tensor Parallelism → Pipeline Parallelism → Mesh Networking → Performance Tuning

Open Architect Guide →

📖

README

Quickstart, platform binaries, modes, API reference, CLI reference, compliance overview, and licensing — the complete README as a searchable web page.

→ 60-Second Quickstart → API Reference → Compliance & Security → RAG Access Control

Open README →

⚗

Technical Specification

Architecture overview, module reference, security model, mesh protocol, inference pipeline, tensor parallelism, compliance layer, and full changelog.

→ Architecture Overview → Module Reference → Security Model → Changelog

Open Spec →

⬡

API Endpoint Reference

Complete REST API reference: all endpoints, request/response formats, and examples for inference, models, compliance, RAG, mesh, billing, and more.

→ Inference → Models → RAG → Billing

Open API Ref →

◫

System Diagrams

Interactive flow diagrams for every subsystem: inference pipeline, payment flow, mesh networking, shell handler, blockchain ledger, build system, and test infrastructure.

→ Pipeline Flow → Payment & Billing → Mesh Network → Build System

Open Diagrams →

Quick Access

Getting Started

From zero to running your first private AI conversation in under 5 minutes.

Quick Start — 3 commands

# 1. Download (Linux x86_64 example)

$ curl -Lo linus-ai https://github.com/miryala3/linus-ai/releases/latest/download/linus-ai-linux-x86_64

$ chmod +x linus-ai && sudo mv linus-ai /usr/local/bin/

# 2. Activate your license

$ linus-ai --activate LNAI-XXXX-XXXX-XXXX-XXXX

✓ License activated (Professional · 1/1 seats)

# 3. Pull a model and start chatting

$ linus-ai --pull-model llama3.2

$ linus-ai --chat "Explain quantum entanglement simply"

LINUS-AI> Quantum entanglement is when two particles become…

📥

Installation

Download and install the binary. Set up PATH, shell completions, and systemd service.

🔑

License Activation

Activate via CLI, environment variable, or config file. Manage seats and machine binding.

🤖

Model Management

Pull, list, delete, and configure models. Supported: Llama, Mistral, Phi, Qwen, Gemma, and more.

⚡

Inference Modes

CPU, CUDA, Metal, ROCm. Auto-detection. Quantization levels: Q2_K through F16.

Configuration

linus_ai.toml Reference

The primary configuration file lives at ~/.linus_ai/config.toml. All settings can also be passed as CLI flags or environment variables.

~/.linus_ai/config.toml

# ~/.linus_ai/config.toml — full example

[license]

key = "LNAI-XXXX-XXXX-XXXX-XXXX"

email = "you@example.com"

[inference]

backend = "auto" # auto | cpu | cuda | metal | rocm

model = "llama3.2" # default model

context_len = 8192 # max context window

threads = 8 # CPU threads (-1 = auto)

gpu_layers = -1 # GPU layers (-1 = all)

quantize = "Q4_K_M" # default quant level

[server]

host = "127.0.0.1"

port = 8080

cors = ["*"]

[vault]

enabled = true

encrypt = true

key_path = "~/.linus_ai/vault.key"

[mesh]

enabled = false

role = "coordinator" # coordinator | worker

listen_port = 9090

peers = []

REST API

API Reference

LINUS-AI exposes an OpenAI-compatible REST API on http://localhost:8080 when running in server mode. Drop-in replacement for applications built against the OpenAI API.

POST /v1/chat/completions OpenAI-compatible chat completions

POST /v1/completions Legacy text completions

POST /v1/embeddings Text embeddings (768 or 1536 dims)

GET /v1/models List available models

POST /v1/models/pull Download a model from registry

DELETE /v1/models/{name} Delete a local model

GET /health Health check + version info

GET /metrics Prometheus-compatible metrics

POST /v1/vault/store Store encrypted memory entry

GET /v1/vault/query Semantic search over vault

curl example

$ curl http://localhost:8080/v1/chat/completions \

-H "Content-Type: application/json" \

-d '{

"model": "llama3.2",

"stream": true,

"messages": [{"role":"user","content":"Hello!"}]

}'

data: {"id":"...","choices":[{"delta":{"content":"Hi!"}}]}

Architecture

Distributed Inference

Scale LINUS-AI across GPUs and machines. Tensor and pipeline parallelism allow running models larger than any single device's memory.

∑

Tensor Parallelism

Split model weight tensors horizontally across N GPUs. Each GPU holds a shard. Supports up to 8 GPUs per node with NVLink or PCIe. Automatic AllReduce synchronization.

🔗

Pipeline Parallelism

Distribute transformer layers vertically across machines or GPUs. Micro-batching hides inter-stage latency. Supports heterogeneous hardware.

🕸

Mesh Networking

P2P encrypted overlay network for multi-node inference clusters. Auto-discovery via mDNS. Each node can be coordinator or worker. Supports TCP, QUIC, and Unix socket transports.

📈

Performance Tuning

KV cache sizing, batch size, NUMA pinning, huge pages, GPU memory fraction, and continuous batching configuration for maximum throughput.

Tensor Parallelism Setup

4× GPU tensor parallel

# config.toml

[inference]

backend = "cuda"

tensor_parallel = 4 # split across 4 GPUs

gpu_ids = [0,1,2,3]

model = "llama3-70b"

$ linus-ai --serve --tensor-parallel 4

✓ Tensor parallel initialized: 4 shards (70B model @ ~17.5B params/GPU)

✓ API server ready: http://0.0.0.0:8080

Mesh Networking Setup

Coordinator + 2 Workers

# Node 1 — Coordinator (192.168.1.10)

$ linus-ai --serve --mesh-role coordinator --mesh-port 9090

# Node 2 — Worker (192.168.1.11)

$ linus-ai --mesh-role worker --mesh-join 192.168.1.10:9090

# Node 3 — Worker (192.168.1.12)

$ linus-ai --mesh-role worker --mesh-join 192.168.1.10:9090

✓ Mesh cluster: 1 coordinator + 2 workers

✓ Pipeline parallel: 3 stages across 3 nodes

Security

Security & Privacy

LINUS-AI is built private-first. Here's what that means in practice.

🔒

Zero Telemetry

No usage data, no model outputs, no conversation logs are ever sent anywhere. All processing is local.

🛡

Encrypted Vault

All stored conversations and embeddings are AES-256-GCM encrypted. The key is derived from your hardware and never exported.

🔐

mTLS Mesh Transport

All inter-node mesh communication uses mutual TLS with auto-generated certificates. Traffic is encrypted end-to-end.

🏠

Air-Gap Ready

After license activation, LINUS-AI operates fully offline. No external dependencies, no model API calls, no internet required.

👤

Access Control

API key authentication, per-key rate limiting, and IP allowlisting for the server mode. LDAP integration for Enterprise.

📋

Audit Logging

Optional structured audit logs (JSON) for all API requests. Includes timestamps, model, token counts — never prompt content.

Community

Get Help & Contribute

LINUS-AI is developed in the open. Join the community, report bugs, suggest features, or contribute code.

⌥

GitHub Repository

Browse the source code, view open issues, submit pull requests, and track the roadmap.

🐛

Issue Tracker

Report bugs, request features, or ask questions. Use the provided templates for best results.

◈

Discord Community

Chat with the team and other users. Channels for help, showcases, model testing, and development.

💬

GitHub Discussions

Long-form discussions, RFCs, and community Q&A. Best for architecture and design questions.

📧

Email Support

Licensed users get priority email support. Include your license key reference and system details.

🤝

Contributing

Read the contribution guide. We welcome PRs for bug fixes, new model support, and documentation improvements.

LINUS-AI Documentation