LINUS-AI™ is a self-hosted inference engine that runs powerful language models entirely on your hardware. One binary, no cloud dependencies, no telemetry, no API keys. Your data never leaves your machine.
LINUS-AI packs enterprise-grade inference capabilities into a single binary. No runtime, no containers, no external services.
Your prompts, responses, and embeddings never leave your hardware. No telemetry, no model training on your data, no remote logging.
One self-contained executable. No Python, no Node, no Docker required. Just download, chmod +x, and run on macOS, Linux, or Windows.
Split 70B+ models across up to 8 GPUs automatically. Shard weight tensors horizontally with NVLink-optimized AllReduce synchronization.
Distribute inference across machines on your private network. Peer auto-discovery via mDNS. Encrypted transport. No Kubernetes needed.
AES-256-GCM encrypted storage for conversation history, embeddings, and sensitive context. The vault key never leaves your machine.
Drop-in replacement for OpenAI's API. Works with LangChain, LlamaIndex, Open WebUI, Continue.dev, and any OpenAI SDK without code changes.
Auto backend selection: CUDA (NVIDIA), Metal (Apple Silicon), ROCm (AMD), and optimized CPU with AVX2 and AVX-512 acceleration.
Llama 3.3, Mistral 3, Phi-4, Qwen 2.5, Gemma 3, DeepSeek-R1, and more. GGUF quantization Q2_K through F16. Custom model support.
Spread transformer layers across multiple machines. Micro-batching hides inter-stage latency for near-linear throughput scaling.
Built-in tool calling, function execution, RAG pipeline, and multi-step reasoning. Run autonomous agents entirely on your hardware.
Serve multiple concurrent users efficiently with dynamic batching and priority queue scheduling for production deployments.
Built-in /metrics endpoint with token throughput, latency percentiles, GPU utilization, and queue depth. Grafana dashboards included.
HIPAA, GDPR, SOX, PCI-DSS, FINRA, EEOC, FERPA compliance tiers. HMAC-chained tamper-evident audit logs. Automatic PII scanning with blocking and redaction. Prompt injection detection. One-time consent management for regulated profiles.
Document-level access control enforced per user, department, division, company, and clearance level. PUBLIC to TOP_SECRET classification. Denied documents are silently withheld. Full tamper-evident RAG access audit trail.
Whether you're a solo developer or a security-conscious enterprise, LINUS-AI fits your deployment model.
HIPAA-compliant AI for clinical notes, research analysis, and internal documentation — with zero PHI exposure risk.
Analyze contracts, draft documents, and research case law without sending confidential client data to third-party AI providers.
Run AI on trading data, internal reports, and customer analytics in an environment satisfying SOC 2 and regulatory requirements.
Air-gapped deployments on classified networks. No internet dependency after activation. Supports FIPS-adjacent encryption configs.
Run a local coding assistant via Continue.dev or Cursor on your MacBook Pro or Linux workstation. Instant responses, no API costs.
Deploy a company-wide ChatGPT alternative on your servers. Connect to your internal knowledge base. No data leaves your VPC.
vs. cloud AI APIs and other self-hosted solutions.
Free forever for individuals. No per-token charges, ever.
Actively maintained and regularly updated. Full changelog on GitHub.
mDNS peer discovery, live thermal throttle rerouting, distributed audit ledger, encrypted vault, Tauri 2.0 packaging, Llama 3.3 + Phi-4 support.
Built-in tool calling, multi-step reasoning agent, RAG with local vector store, improved Windows support.
Full /v1/chat/completions compatibility, SSE streaming. Works with LangChain, LlamaIndex, and Open WebUI out of the box.
Native Metal acceleration for Apple Silicon (M1–M4). ROCm backend for AMD GPUs. Auto-backend detection.
Single binary, CPU + CUDA inference, GGUF model support, basic REST API, CLI chat, license activation system.
Everything you might want to know before getting started.
linus-ai --serve on any machine and point your team at it. Team plan covers 5 seats. Enterprise is unlimited. See the admin guide for NGINX, API key management, and Prometheus monitoring setup.Download the free Community edition and have private AI running on your machine in under 5 minutes. No account required. No credit card. No telemetry.
Available for macOS, Linux, and Windows · Source-available · LINUS-AI License v2.0