v2.5.0 — Multi-node Mesh & Thermal Routing

Private AI.
Your Hardware.
Zero Compromise.

LINUS-AI™ is a self-hosted inference engine that runs powerful language models entirely on your hardware. One binary, no cloud dependencies, no telemetry, no API keys. Your data never leaves your machine.

⬇ Download Read the Docs ⌥ GitHub

macOS · Linux · Windows ⚡ Single Binary 🔒 100% Offline 🦙 Llama · Mistral · Phi · Qwen 🎛 CUDA · Metal · ROCm · CPU

linus-ai — quick start

$ linus-ai --activate LNAI-XXXX-XXXX-XXXX-XXXX

✓ License activated · Professional · 1/1 seats

$ linus-ai --pull-model llama3.2 && linus-ai --serve

Pulling llama3.2 (4.1 GB) ████████████████ 100%

✓ API server running → http://localhost:8080

✓ OpenAI-compatible · Zero telemetry · Fully offline

Features

Everything you need for private AI

LINUS-AI packs enterprise-grade inference capabilities into a single binary. No runtime, no containers, no external services.

🔒

100% Private by Design

Your prompts, responses, and embeddings never leave your hardware. No telemetry, no model training on your data, no remote logging.

⚡

Single Binary

One self-contained executable. No Python, no Node, no Docker required. Just download, chmod +x, and run on macOS, Linux, or Windows.

∑

Tensor Parallelism

Split 70B+ models across up to 8 GPUs automatically. Shard weight tensors horizontally with NVLink-optimized AllReduce synchronization.

🕸

Mesh Networking

Distribute inference across machines on your private network. Peer auto-discovery via mDNS. Encrypted transport. No Kubernetes needed.

🛡

Encrypted Vault

AES-256-GCM encrypted storage for conversation history, embeddings, and sensitive context. The vault key never leaves your machine.

🔌

OpenAI-Compatible API

Drop-in replacement for OpenAI's API. Works with LangChain, LlamaIndex, Open WebUI, Continue.dev, and any OpenAI SDK without code changes.

🎛

Multi-Backend Inference

Auto backend selection: CUDA (NVIDIA), Metal (Apple Silicon), ROCm (AMD), and optimized CPU with AVX2 and AVX-512 acceleration.

🤖

50+ Models Supported

Llama 3.3, Mistral 3, Phi-4, Qwen 2.5, Gemma 3, DeepSeek-R1, and more. GGUF quantization Q2_K through F16. Custom model support.

🔗

Pipeline Parallelism

Spread transformer layers across multiple machines. Micro-batching hides inter-stage latency for near-linear throughput scaling.

📦

Agentic Mode

Built-in tool calling, function execution, RAG pipeline, and multi-step reasoning. Run autonomous agents entirely on your hardware.

🔄

Continuous Batching

Serve multiple concurrent users efficiently with dynamic batching and priority queue scheduling for production deployments.

📊

Prometheus Metrics

Built-in /metrics endpoint with token throughput, latency percentiles, GPU utilization, and queue depth. Grafana dashboards included.

⚖️

Compliance & Audit

HIPAA, GDPR, SOX, PCI-DSS, FINRA, EEOC, FERPA compliance tiers. HMAC-chained tamper-evident audit logs. Automatic PII scanning with blocking and redaction. Prompt injection detection. One-time consent management for regulated profiles.

🗂

RAG Access Control

Document-level access control enforced per user, department, division, company, and clearance level. PUBLIC to TOP_SECRET classification. Denied documents are silently withheld. Full tamper-evident RAG access audit trail.

Use Cases

Built for teams that can't afford data exposure

Whether you're a solo developer or a security-conscious enterprise, LINUS-AI fits your deployment model.

🏥

Healthcare & Life Sciences

HIPAA-compliant AI for clinical notes, research analysis, and internal documentation — with zero PHI exposure risk.

⚖️

Legal & Compliance

Analyze contracts, draft documents, and research case law without sending confidential client data to third-party AI providers.

🏦

Financial Services

Run AI on trading data, internal reports, and customer analytics in an environment satisfying SOC 2 and regulatory requirements.

🛡

Defense & Government

Air-gapped deployments on classified networks. No internet dependency after activation. Supports FIPS-adjacent encryption configs.

💻

Developer Workstation

Run a local coding assistant via Continue.dev or Cursor on your MacBook Pro or Linux workstation. Instant responses, no API costs.

🏢

Enterprise Private Chat

Deploy a company-wide ChatGPT alternative on your servers. Connect to your internal knowledge base. No data leaves your VPC.

Comparison

How LINUS-AI compares

vs. cloud AI APIs and other self-hosted solutions.

☁️ Cloud AI APIs

PrivacyNone

Data Retention30+ days

Offline UseImpossible

Per-Token Cost$0.002–0.06

LatencyVariable

Setup TimeMinutes

DependenciesSDK only

★ LINUS-AIPrivacyTotal
Data RetentionYou control
Offline UseFull support
Per-Token Cost$0
LatencyLocal speed
Setup Time60 seconds
DependenciesNone

🐳 Other Self-Hosted

PrivacyGood

Data RetentionYou control

Offline UseYes

Per-Token Cost$0

LatencyLocal speed

Setup TimeHours (Docker)

DependenciesPython, Docker

Pricing

Simple, honest pricing

Free forever for individuals. No per-token charges, ever.

Community — Free forever

Single user · all inference backends · 50+ models · OpenAI-compatible API

Download

Choose a paid plan

Professional

1 seat · Vault, agentic mode, RAG pipeline, API auth

$99/yr

Team

5 seats · mesh networking, tensor & pipeline parallelism

$199/yr

Secure checkout via PayPal · 30-day money-back · Cancel updates anytime

Enterprise & Enterprise Plus

Unlimited seats · OEM rights · air-gap · dedicated SLA · SSO/LDAP

Enterprise — $7,999/yr Enterprise Plus — $14,999/yr

All paid plans include: no per-token costs, full offline operation after activation, and 12 months of updates from purchase. Buy once, own forever — updates are optional, not forced.

Changelog

Recent Releases

Actively maintained and regularly updated. Full changelog on GitHub.

v2.5.0

Multi-node Mesh, Thermal Routing & Blockchain Audit Latest

mDNS peer discovery, live thermal throttle rerouting, distributed audit ledger, encrypted vault, Tauri 2.0 packaging, Llama 3.3 + Phi-4 support.

v1.3.0

Agentic Mode & RAG Pipeline

Built-in tool calling, multi-step reasoning agent, RAG with local vector store, improved Windows support.

v1.2.0

OpenAI API Compatibility

Full /v1/chat/completions compatibility, SSE streaming. Works with LangChain, LlamaIndex, and Open WebUI out of the box.

v1.1.0

Apple Metal & AMD ROCm Support

Native Metal acceleration for Apple Silicon (M1–M4). ROCm backend for AMD GPUs. Auto-backend detection.

v1.0.0

Initial Release

Single binary, CPU + CUDA inference, GGUF model support, basic REST API, CLI chat, license activation system.

FAQ

Frequently Asked Questions

Everything you might want to know before getting started.

Is LINUS-AI truly private? Where does my data go?+

Yes — completely private. All inference happens on your local hardware. We never receive your prompts, model outputs, or conversation history. License activation (one-time) sends only your license key and machine fingerprint. After that, the software operates fully offline with zero outbound connections.

What models are supported? Can I use my own fine-tuned model?+

LINUS-AI supports any GGUF-format model: Llama 3.x, Mistral 3, Phi-4, Qwen 2.5, Gemma 3, DeepSeek-R1, Falcon, StarCoder, and many more. You can load custom fine-tuned models by pointing to the GGUF file in your config. We maintain an official model registry with pre-tested quantization profiles.

Do I need a GPU? What hardware do I need?+

No GPU required. LINUS-AI runs on any CPU with AVX2 support (most CPUs since 2013). A modern laptop with 16 GB RAM can comfortably run 7B–13B parameter models. GPUs dramatically improve performance — an RTX 3090 runs a 70B model in Q4 quantization at 20–30 tokens/second. Apple Silicon is particularly well-supported via Metal.

How does it compare to Ollama, LM Studio, or llama.cpp?+

LINUS-AI goes further in the enterprise direction: tensor parallelism, mesh networking, encrypted vault, access control, and commercial SLAs. If you just need a simple local chatbot, Ollama is simpler. If you need production-grade private AI infrastructure, LINUS-AI is built for that.

Is the source code open?+

LINUS-AI is source-available under the LINUS-AI Source License v2.0 — not MIT or open source. Community Edition is free for personal use and companies under $100K/yr revenue. Commercial use requires a paid license. See the pricing page for tiers.

Can I self-host it for my whole team?+

Absolutely. Run linus-ai --serve on any machine and point your team at it. Team plan covers 5 seats. Enterprise is unlimited. See the admin guide for NGINX, API key management, and Prometheus monitoring setup.