User Guide
Install, configure, and chat. Start here if you're new to LINUS-AI. Covers setup, model loading, chat modes, and the CLI reference.
Admin Guide
Deploy, manage, and secure LINUS-AI in production. Covers multi-user deployments, access control, monitoring, and updates.
Developer Guide
APIs, integrations, and extensions. Build on top of LINUS-AI with the REST API, WebSocket streaming, and plugin system.
Architect Guide
Distributed topologies, tensor parallelism, pipeline parallelism, and mesh networking. For engineers running large-scale private AI.
README
Quickstart, platform binaries, modes, API reference, CLI reference, compliance overview, and licensing — the complete README as a searchable web page.
Technical Specification
Architecture overview, module reference, security model, mesh protocol, inference pipeline, tensor parallelism, compliance layer, and full changelog.
API Endpoint Reference
Complete REST API reference: all endpoints, request/response formats, and examples for inference, models, compliance, RAG, mesh, billing, and more.
System Diagrams
Interactive flow diagrams for every subsystem: inference pipeline, payment flow, mesh networking, shell handler, blockchain ledger, build system, and test infrastructure.
Popular Topics
Jump directly to the most-referenced documentation pages.
Getting Started
From zero to running your first private AI conversation in under 5 minutes.
Installation
Download and install the binary. Set up PATH, shell completions, and systemd service.
License Activation
Activate via CLI, environment variable, or config file. Manage seats and machine binding.
Model Management
Pull, list, delete, and configure models. Supported: Llama, Mistral, Phi, Qwen, Gemma, and more.
Inference Modes
CPU, CUDA, Metal, ROCm. Auto-detection. Quantization levels: Q2_K through F16.
linus_ai.toml Reference
The primary configuration file lives at ~/.linus_ai/config.toml.
All settings can also be passed as CLI flags or environment variables.
API Reference
LINUS-AI exposes an OpenAI-compatible REST API on http://localhost:8080 when running in server mode.
Drop-in replacement for applications built against the OpenAI API.
Distributed Inference
Scale LINUS-AI across GPUs and machines. Tensor and pipeline parallelism allow running models larger than any single device's memory.
Tensor Parallelism
Split model weight tensors horizontally across N GPUs. Each GPU holds a shard. Supports up to 8 GPUs per node with NVLink or PCIe. Automatic AllReduce synchronization.
Pipeline Parallelism
Distribute transformer layers vertically across machines or GPUs. Micro-batching hides inter-stage latency. Supports heterogeneous hardware.
Mesh Networking
P2P encrypted overlay network for multi-node inference clusters. Auto-discovery via mDNS. Each node can be coordinator or worker. Supports TCP, QUIC, and Unix socket transports.
Performance Tuning
KV cache sizing, batch size, NUMA pinning, huge pages, GPU memory fraction, and continuous batching configuration for maximum throughput.
Tensor Parallelism Setup
Mesh Networking Setup
Security & Privacy
LINUS-AI is built private-first. Here's what that means in practice.
Zero Telemetry
No usage data, no model outputs, no conversation logs are ever sent anywhere. All processing is local.
Encrypted Vault
All stored conversations and embeddings are AES-256-GCM encrypted. The key is derived from your hardware and never exported.
mTLS Mesh Transport
All inter-node mesh communication uses mutual TLS with auto-generated certificates. Traffic is encrypted end-to-end.
Air-Gap Ready
After license activation, LINUS-AI operates fully offline. No external dependencies, no model API calls, no internet required.
Access Control
API key authentication, per-key rate limiting, and IP allowlisting for the server mode. LDAP integration for Enterprise.
Audit Logging
Optional structured audit logs (JSON) for all API requests. Includes timestamps, model, token counts — never prompt content.
Get Help & Contribute
LINUS-AI is developed in the open. Join the community, report bugs, suggest features, or contribute code.
GitHub Repository
Browse the source code, view open issues, submit pull requests, and track the roadmap.
Issue Tracker
Report bugs, request features, or ask questions. Use the provided templates for best results.
Discord Community
Chat with the team and other users. Channels for help, showcases, model testing, and development.
GitHub Discussions
Long-form discussions, RFCs, and community Q&A. Best for architecture and design questions.
Email Support
Licensed users get priority email support. Include your license key reference and system details.
Contributing
Read the contribution guide. We welcome PRs for bug fixes, new model support, and documentation improvements.