🎉 LINUS-AI v2.5.0 is out — Multi-node Mesh, Thermal Routing & Blockchain Audit Log. Download now →
v2.5.0 — Multi-node Mesh & Thermal Routing

Private AI.
Your Hardware.
Zero Compromise.

LINUS-AI™ is a self-hosted inference engine that runs powerful language models entirely on your hardware. One binary, no cloud dependencies, no telemetry, no API keys. Your data never leaves your machine.

macOS · Linux · Windows ⚡ Single Binary 🔒 100% Offline 🦙 Llama · Mistral · Phi · Qwen 🎛 CUDA · Metal · ROCm · CPU
linus-ai — quick start
$ linus-ai --activate LNAI-XXXX-XXXX-XXXX-XXXX
✓ License activated · Professional · 1/1 seats
 
$ linus-ai --pull-model llama3.2 && linus-ai --serve
Pulling llama3.2 (4.1 GB) ████████████████ 100%
✓ API server running → http://localhost:8080
✓ OpenAI-compatible · Zero telemetry · Fully offline
50+
Supported Models
8
Max GPUs (Tensor Parallel)
0
External API Calls
1
Binary, No Dependencies
Source-Available
LINUS-AI License v2.0

Everything you need for private AI

LINUS-AI packs enterprise-grade inference capabilities into a single binary. No runtime, no containers, no external services.

🔒

100% Private by Design

Your prompts, responses, and embeddings never leave your hardware. No telemetry, no model training on your data, no remote logging.

Single Binary

One self-contained executable. No Python, no Node, no Docker required. Just download, chmod +x, and run on macOS, Linux, or Windows.

Tensor Parallelism

Split 70B+ models across up to 8 GPUs automatically. Shard weight tensors horizontally with NVLink-optimized AllReduce synchronization.

🕸

Mesh Networking

Distribute inference across machines on your private network. Peer auto-discovery via mDNS. Encrypted transport. No Kubernetes needed.

🛡

Encrypted Vault

AES-256-GCM encrypted storage for conversation history, embeddings, and sensitive context. The vault key never leaves your machine.

🔌

OpenAI-Compatible API

Drop-in replacement for OpenAI's API. Works with LangChain, LlamaIndex, Open WebUI, Continue.dev, and any OpenAI SDK without code changes.

🎛

Multi-Backend Inference

Auto backend selection: CUDA (NVIDIA), Metal (Apple Silicon), ROCm (AMD), and optimized CPU with AVX2 and AVX-512 acceleration.

🤖

50+ Models Supported

Llama 3.3, Mistral 3, Phi-4, Qwen 2.5, Gemma 3, DeepSeek-R1, and more. GGUF quantization Q2_K through F16. Custom model support.

🔗

Pipeline Parallelism

Spread transformer layers across multiple machines. Micro-batching hides inter-stage latency for near-linear throughput scaling.

📦

Agentic Mode

Built-in tool calling, function execution, RAG pipeline, and multi-step reasoning. Run autonomous agents entirely on your hardware.

🔄

Continuous Batching

Serve multiple concurrent users efficiently with dynamic batching and priority queue scheduling for production deployments.

📊

Prometheus Metrics

Built-in /metrics endpoint with token throughput, latency percentiles, GPU utilization, and queue depth. Grafana dashboards included.

⚖️

Compliance & Audit

HIPAA, GDPR, SOX, PCI-DSS, FINRA, EEOC, FERPA compliance tiers. HMAC-chained tamper-evident audit logs. Automatic PII scanning with blocking and redaction. Prompt injection detection. One-time consent management for regulated profiles.

🗂

RAG Access Control

Document-level access control enforced per user, department, division, company, and clearance level. PUBLIC to TOP_SECRET classification. Denied documents are silently withheld. Full tamper-evident RAG access audit trail.

Built for teams that can't afford data exposure

Whether you're a solo developer or a security-conscious enterprise, LINUS-AI fits your deployment model.

🏥

Healthcare & Life Sciences

HIPAA-compliant AI for clinical notes, research analysis, and internal documentation — with zero PHI exposure risk.

⚖️

Legal & Compliance

Analyze contracts, draft documents, and research case law without sending confidential client data to third-party AI providers.

🏦

Financial Services

Run AI on trading data, internal reports, and customer analytics in an environment satisfying SOC 2 and regulatory requirements.

🛡

Defense & Government

Air-gapped deployments on classified networks. No internet dependency after activation. Supports FIPS-adjacent encryption configs.

💻

Developer Workstation

Run a local coding assistant via Continue.dev or Cursor on your MacBook Pro or Linux workstation. Instant responses, no API costs.

🏢

Enterprise Private Chat

Deploy a company-wide ChatGPT alternative on your servers. Connect to your internal knowledge base. No data leaves your VPC.

How LINUS-AI compares

vs. cloud AI APIs and other self-hosted solutions.

☁️ Cloud AI APIs

PrivacyNone
Data Retention30+ days
Offline UseImpossible
Per-Token Cost$0.002–0.06
LatencyVariable
Setup TimeMinutes
DependenciesSDK only

★ LINUS-AI

PrivacyTotal
Data RetentionYou control
Offline UseFull support
Per-Token Cost$0
LatencyLocal speed
Setup Time60 seconds
DependenciesNone

🐳 Other Self-Hosted

PrivacyGood
Data RetentionYou control
Offline UseYes
Per-Token Cost$0
LatencyLocal speed
Setup TimeHours (Docker)
DependenciesPython, Docker

Simple, honest pricing

Free forever for individuals. No per-token charges, ever.

Community — Free forever
Single user · all inference backends · 50+ models · OpenAI-compatible API
Download
Choose a paid plan
Professional
1 seat · Vault, agentic mode, RAG pipeline, API auth
$99/yr
Team
5 seats · mesh networking, tensor & pipeline parallelism
$199/yr
Secure checkout via PayPal  ·  30-day money-back  ·  Cancel updates anytime
Enterprise & Enterprise Plus
Unlimited seats · OEM rights · air-gap · dedicated SLA · SSO/LDAP
Enterprise — $7,999/yr Enterprise Plus — $14,999/yr
All paid plans include: no per-token costs, full offline operation after activation, and 12 months of updates from purchase. Buy once, own forever — updates are optional, not forced.

Recent Releases

Actively maintained and regularly updated. Full changelog on GitHub.

v2.5.0

Multi-node Mesh, Thermal Routing & Blockchain Audit Latest

mDNS peer discovery, live thermal throttle rerouting, distributed audit ledger, encrypted vault, Tauri 2.0 packaging, Llama 3.3 + Phi-4 support.

v1.3.0

Agentic Mode & RAG Pipeline

Built-in tool calling, multi-step reasoning agent, RAG with local vector store, improved Windows support.

v1.2.0

OpenAI API Compatibility

Full /v1/chat/completions compatibility, SSE streaming. Works with LangChain, LlamaIndex, and Open WebUI out of the box.

v1.1.0

Apple Metal & AMD ROCm Support

Native Metal acceleration for Apple Silicon (M1–M4). ROCm backend for AMD GPUs. Auto-backend detection.

v1.0.0

Initial Release

Single binary, CPU + CUDA inference, GGUF model support, basic REST API, CLI chat, license activation system.

Frequently Asked Questions

Everything you might want to know before getting started.

Is LINUS-AI truly private? Where does my data go?+
Yes — completely private. All inference happens on your local hardware. We never receive your prompts, model outputs, or conversation history. License activation (one-time) sends only your license key and machine fingerprint. After that, the software operates fully offline with zero outbound connections.
What models are supported? Can I use my own fine-tuned model?+
LINUS-AI supports any GGUF-format model: Llama 3.x, Mistral 3, Phi-4, Qwen 2.5, Gemma 3, DeepSeek-R1, Falcon, StarCoder, and many more. You can load custom fine-tuned models by pointing to the GGUF file in your config. We maintain an official model registry with pre-tested quantization profiles.
Do I need a GPU? What hardware do I need?+
No GPU required. LINUS-AI runs on any CPU with AVX2 support (most CPUs since 2013). A modern laptop with 16 GB RAM can comfortably run 7B–13B parameter models. GPUs dramatically improve performance — an RTX 3090 runs a 70B model in Q4 quantization at 20–30 tokens/second. Apple Silicon is particularly well-supported via Metal.
How does it compare to Ollama, LM Studio, or llama.cpp?+
LINUS-AI goes further in the enterprise direction: tensor parallelism, mesh networking, encrypted vault, access control, and commercial SLAs. If you just need a simple local chatbot, Ollama is simpler. If you need production-grade private AI infrastructure, LINUS-AI is built for that.
Is the source code open?+
LINUS-AI is source-available under the LINUS-AI Source License v2.0 — not MIT or open source. Community Edition is free for personal use and companies under $100K/yr revenue. Commercial use requires a paid license. See the pricing page for tiers.
Can I self-host it for my whole team?+
Absolutely. Run linus-ai --serve on any machine and point your team at it. Team plan covers 5 seats. Enterprise is unlimited. See the admin guide for NGINX, API key management, and Prometheus monitoring setup.

Ready to take AI off the cloud?

Download the free Community edition and have private AI running on your machine in under 5 minutes. No account required. No credit card. No telemetry.

Available for macOS, Linux, and Windows · Source-available · LINUS-AI License v2.0