Developer Guide — LINUS-AI

Contents

Developer Guide Contents

Jump to any section of this guide.

⚡API Overview 💬Chat Completions ∿Embeddings 📦Models API ◈Agent / Agentic Stream 🔌WebSocket Streaming 🐍Python SDK ⌨Node.js / JS SDK 🔧Plugin System 📡Webhook Events ⚠Rate Limits & Errors 🛠Local Development 🔏Compliance & Security API 📂RAG Document Access Control

REST API

API Overview

LINUS-AI exposes an OpenAI-compatible REST API. Any application built against the OpenAI API can point to LINUS-AI with a single base URL change.

🌐

Base URL

http://localhost:8080 — configurable via server.host and server.port in config.toml.

🔄

OpenAI-Compatible

Drop-in replacement for the OpenAI API. Change only the base URL and API key — no other code changes required.

🔒

Authentication

When auth is enabled, include the header: Authorization: Bearer YOUR_API_KEY. Configure keys in config.toml.

📡

Streaming

Server-Sent Events (text/event-stream) for all streaming endpoints. Set "stream": true in your request.

🌍

CORS

Configurable in config.toml via server.cors. Default: ["*"] for local development.

📄

Content-Type

All requests must use Content-Type: application/json. Streaming responses use text/event-stream.

All Endpoints at a Glance

POST/v1/chat/completionsChat with a model (streaming or non-streaming)

POST/v1/completionsLegacy text completions

POST/v1/embeddingsGenerate text embeddings

GET/v1/modelsList all available models

POST/v1/models/pullDownload a model from the registry

DELETE/v1/models/{name}Remove a model from disk

GET/v1/models/{name}Get model info and metadata

POST/agent/streamAgentic multi-turn inference (SSE)

GET/agent/profilesList all 14 vertical agent profiles

GET/v1/vault/querySemantic search over encrypted vault

POST/v1/vault/storeStore entry in encrypted vault

WSws://localhost:8080/v1/streamWebSocket streaming interface

GET/healthHealth check and version info

GET/metricsPrometheus-compatible metrics

Compliance & Security API

GET/compliance/profilesAll 14 profile compliance configs (level, regulations, PII scan flag, disclaimer)

GET/compliance/statusAudit chain integrity status + monthly statistics

GET/compliance/consentsConsents granted on this machine (?machine_id=…)

GET/compliance/auditQuery HMAC-chained audit log (?profile_id, ?blocked_only, ?pii_only, ?since, ?limit)

POST/compliance/consentGrant or revoke consent for a regulated profile — body: {profile_id, action:"grant"|"revoke", machine_id}

RAG Document Access Control API

GET/rag/documentsList documents (?user_id filters to accessible subset; ?all=true for full registry)

POST/rag/documents/registerRegister a document with classification and ACL

PUT/rag/documents/{id}/aclReplace ACL for a document (allow/deny at user/dept/division/company/role scope)

PUT/rag/documents/{id}/classificationUpdate classification level (0=PUBLIC … 4=TOP_SECRET)

DELETE/rag/documents/{id}Remove document from registry

POST/rag/access-checkTest access decision for a principal + document (returns PERMIT/DENY + rule)

GET/rag/auditQuery RAG access audit log (?doc_id, ?principal_id, ?decision, ?denied_only, ?stats=true)

GET/rag/principalsList all registered principals

POST/rag/principalsRegister or update a principal (user_id, company, division, department, roles, clearance)

DELETE/rag/principals/{id}Remove a principal

Endpoint

Chat Completions

The primary inference endpoint. OpenAI-compatible with LINUS-AI extensions for profile selection and custom system prompts.

POST /v1/chat/completions OpenAI-compatible chat completions

Request Body Parameters

Field	Type	Description
model required	string	Model name, e.g. `"llama3.2"`, `"mistral-7b"`. Must be a locally available model.
messages required	array	Array of message objects with `role` (`"system"`, `"user"`, `"assistant"`) and `content` (string).
stream optional	boolean	If `true`, returns a Server-Sent Events stream. Default: `false`.
temperature optional	float	Sampling temperature between 0.0 and 2.0. Higher values increase randomness. Default: `0.7`.
max_tokens optional	integer	Maximum number of tokens to generate. Default: `2048`. Capped by the model's context length.
top_p optional	float	Nucleus sampling probability. Only tokens in the top-p probability mass are considered. Default: `0.9`.
top_k optional	integer	Limits sampling to the top-k most probable tokens. LINUS-AI extension. Default: `40`.
stop optional	string \| array	One or more stop sequences. Generation halts when any stop string is produced.
system optional	string	LINUS-AI extension. Overrides the system prompt for this request, taking precedence over the model's default and profile system prompts.
profile optional	string	LINUS-AI extension. Agent profile ID (e.g. `"medical"`, `"legal"`, `"general"`). Loads profile-specific system context and model hints.

Non-Streaming Response

Response JSON

{

"id": "chatcmpl-abc123",

"object": "chat.completion",

"created": 1710000000,

"model": "llama3.2",

"choices": [

{

"index": 0,

"message": { "role": "assistant", "content": "Hello! How can I help you?" },

"finish_reason": "stop"

}

],

"usage": { "prompt_tokens": 12, "completion_tokens": 9, "total_tokens": 21 }

}

Streaming Response (SSE)

SSE stream — text/event-stream

data: {"id":"chatcmpl-abc123","choices":[{"delta":{"role":"assistant"},"index":0}]}

data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":"Hello"},"index":0}]}

data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":"!"},"index":0}]}

data: {"id":"chatcmpl-abc123","choices":[{"delta":{},"finish_reason":"stop","index":0}]}

data: [DONE]

Code Examples

Non-streaming chat

$ curl http://localhost:8080/v1/chat/completions \

-H "Content-Type: application/json" \

-H "Authorization: Bearer YOUR_API_KEY" \

-d '{

"model": "llama3.2",

"messages": [

{"role": "system", "content": "You are a helpful assistant."},

{"role": "user", "content": "What is the speed of light?"}

],

"temperature": 0.7,

"max_tokens": 512

}'

Streaming chat

$ curl http://localhost:8080/v1/chat/completions \

-H "Content-Type: application/json" \

-H "Authorization: Bearer YOUR_API_KEY" \

--no-buffer \

-d '{

"model": "llama3.2",

"stream": true,

"messages": [{"role": "user", "content": "Write a haiku about privacy."}]

}'

data: {"choices":[{"delta":{"content":"Quiet"}}]}

data: {"choices":[{"delta":{"content":" servers hum,"}}]}

data: [DONE]

Python — openai library

from openai import OpenAI

client = OpenAI(

base_url="http://localhost:8080/v1",

api_key="YOUR_API_KEY", # or "none" if auth disabled

)

response = client.chat.completions.create(

model="llama3.2",

messages=[

{"role": "system", "content": "You are a helpful assistant."},

{"role": "user", "content": "What is the speed of light?"},

],

temperature=0.7,

max_tokens=512,

)

print(response.choices[0].message.content)

JavaScript — fetch streaming

const response = await fetch("http://localhost:8080/v1/chat/completions", {

method: "POST",

headers: {

"Content-Type": "application/json",

"Authorization": "Bearer YOUR_API_KEY",

},

body: JSON.stringify({

model: "llama3.2",

stream: true,

messages: [{ role: "user", content: "Hello!" }],

}),

});

const reader = response.body.getReader();

const decoder = new TextDecoder();

while (true) {

const { done, value } = await reader.read();

if (done) break;

const lines = decoder.decode(value).split("\n");

for (const line of lines) {

if (!line.startsWith("data: ")) continue;

const text = line.slice(6);

if (text === "[DONE]") break;

const chunk = JSON.parse(text);

process.stdout.write(chunk.choices[0].delta.content ?? "");

}

Endpoint

Embeddings

Generate vector embeddings for semantic search, RAG pipelines, clustering, and similarity tasks — all computed locally.

POST /v1/embeddings Generate text embeddings (768 or 1536 dims)

Request Parameters

Field	Type	Description
model required	string	Embedding model name, e.g. `"nomic-embed-text"`, `"mxbai-embed-large"`.
input required	string \| array	Text string or array of strings to embed. Batched inputs are processed in a single pass.

Response

{

"object": "list",

"data": [

{

"object": "embedding",

"index": 0,

"embedding": [0.0021, -0.1432, 0.0087, ...] // 768 or 1536 floats

}

],

"usage": { "prompt_tokens": 8, "total_tokens": 8 }

}

curl — batch embeddings

$ curl http://localhost:8080/v1/embeddings \

-H "Content-Type: application/json" \

-d '{

"model": "nomic-embed-text",

"input": ["Private AI inference", "Local LLM deployment"]

}'

Use Cases: Semantic search, Retrieval-Augmented Generation (RAG), document clustering, duplicate detection, and recommendation systems. Pair with /v1/vault/store to build a fully private semantic memory store.

Endpoint

Models API

List, download, inspect, and remove local models programmatically.

GET/v1/modelsList all installed models

POST/v1/models/pullDownload a model (streams progress)

GET/v1/models/{name}Get model info

DELETE/v1/models/{name}Remove model from disk

GET /v1/models — response

{

"object": "list",

"data": [

{

"id": "llama3.2",

"object": "model",

"created": 1710000000,

"size_gb": 4.7,

"parameters": "3.2B",

"quantization": "Q4_K_M",

"context_length": 128000

},

{ "id": "mistral-7b", "parameters": "7B", "quantization": "Q5_K_M", ... }

]

}

POST /v1/models/pull — stream download progress

$ curl http://localhost:8080/v1/models/pull \

-H "Content-Type: application/json" \

--no-buffer \

-d '{"name": "llama3.2"}'

data: {"status":"downloading","progress":12,"total_mb":4812}

data: {"status":"downloading","progress":48,"total_mb":4812}

data: {"status":"verifying","progress":100}

data: {"status":"done","model":"llama3.2"}

data: [DONE]

DELETE and GET /v1/models/{name}

# Get model info

$ curl http://localhost:8080/v1/models/llama3.2

{"id":"llama3.2","size_gb":4.7,"parameters":"3.2B","quantization":"Q4_K_M"}

# Remove model from disk

$ curl -X DELETE http://localhost:8080/v1/models/llama3.2

{"deleted": true, "id": "llama3.2"}

LINUS-AI Extension

Agent / Agentic Stream

LINUS-AI's native agentic inference endpoints. Supports multi-turn reasoning, tool use, 14 vertical profiles, and an encrypted semantic vault.

POST/agent/streamMulti-turn agentic inference (SSE)

GET/agent/profilesList all 14 vertical profiles

GET/v1/vault/query?q=QUERYSemantic search over encrypted vault

POST/v1/vault/storeStore entry in encrypted vault

POST /agent/stream — Request Parameters

Field	Type	Description
model required	string	Model to use for inference, e.g. `"llama3.2"`.
message required	string	The current user message (latest turn).
profile optional	string	One of the 14 vertical profile IDs: `general`, `medical`, `legal`, `financial`, `research`, `coding`, `education`, `creative`, `security`, `data`, `hr`, `executive`, `customer`, `compliance`.
history optional	array	Prior conversation turns. Same format as `messages` in chat completions.
session_id optional	string	Session identifier for vault-backed memory retrieval. Omit to start a stateless session.
system optional	string	Custom system prompt. Overrides the profile system prompt when set.
max_turns optional	integer	Maximum agentic reasoning turns before returning. Default: `8`.
temperature optional	float	Sampling temperature. Default: `0.7`.

SSE Event Types

Agent SSE stream events

# Token delta — streamed text content

data: {"type":"delta","content":"Based on the latest research..."}

# Tool call — agent invoking a registered tool

data: {"type":"tool_call","tool":"web_search","args":{"query":"privacy laws 2025"}}

# Tool result — result returned to agent

data: {"type":"tool_result","tool":"web_search","result":"...search results..."}

# Done — inference complete

data: {"type":"done","usage":{"prompt_tokens":421,"completion_tokens":187}}

data: [DONE]

GET /agent/profiles — Response

Profile list

{

"profiles": [

{

"id": "medical",

"name": "Medical Assistant",

"icon": "🏥",

"description": "Clinical decision support, drug interactions, diagnostics.",

"model_hints": ["llama3-70b", "meditron-7b"],

"suggested_model": "meditron-7b"

},

{ "id": "legal", "name": "Legal Assistant", "icon": "⚖", ... },

{ "id": "coding", "name": "Code Assistant", "icon": "⌨", ... },

{ "id": "security", "name": "Security Analyst", "icon": "🛡", ... },

// ... 14 profiles total

]

}

WebSocket

WebSocket Streaming

For real-time bidirectional communication — ideal for chat UIs and applications that need low-latency token streaming without managing SSE connections.

WS ws://localhost:8080/v1/stream WebSocket streaming interface

Message Protocol

Direction	Type	Fields
Client → Server	`chat`	`type, model, messages, temperature, max_tokens, profile, session_id`
Server → Client	`delta`	`type, content` — streamed token chunk
Server → Client	`done`	`type, usage` — inference complete with token counts
Server → Client	`error`	`type, code, message` — error condition
Client → Server	`auth`	`type, api_key` — send immediately after connection when auth is enabled

JavaScript — WebSocket client with reconnection

class LinusAISocket {

constructor(url = "ws://localhost:8080/v1/stream", apiKey = null) {

this.url = url;

this.apiKey = apiKey;

this.reconnectDelay = 1000;

this.connect();

}

connect() {

this.ws = new WebSocket(this.url);

this.ws.onopen = () => {

this.reconnectDelay = 1000; // reset backoff on success

if (this.apiKey) {

this.ws.send(JSON.stringify({ type: "auth", api_key: this.apiKey }));

}

};

this.ws.onmessage = (event) => {

const msg = JSON.parse(event.data);

if (msg.type === "delta") process.stdout.write(msg.content);

if (msg.type === "done") console.log("\n[done]", msg.usage);

if (msg.type === "error") console.error("Error:", msg.message);

};

this.ws.onclose = () => {

setTimeout(() => this.connect(), this.reconnectDelay);

this.reconnectDelay = Math.min(this.reconnectDelay * 2, 30000);

};

}

chat(model, messages) {

this.ws.send(JSON.stringify({ type: "chat", model, messages }));

}

# Usage

const ai = new LinusAISocket("ws://localhost:8080/v1/stream", "YOUR_API_KEY");

ai.chat("llama3.2", [{ role: "user", content: "Hello from WebSocket!" }]);

Reconnection: The client above implements exponential backoff (1s → 2s → 4s … up to 30s). The server retains no in-flight state between connections — resend the full message history on reconnect.

SDK

Python SDK

LINUS-AI uses the standard openai Python library for all OpenAI-compatible endpoints, plus direct HTTP for LINUS-AI extensions.

Install

$ pip install openai

Successfully installed openai-1.x.x

linus_ai_example.py — complete working script

#!/usr/bin/env python3

"""LINUS-AI Python integration example."""

import httpx

import json

from openai import OpenAI

BASE_URL = "http://localhost:8080"

API_KEY = "YOUR_API_KEY" # or "none" if auth is disabled

# ── 1. OpenAI-compatible client ──────────────────────────

client = OpenAI(base_url=f"{BASE_URL}/v1", api_key=API_KEY)

# ── 2. Standard chat completion ──────────────────────────

def chat(prompt: str, model: str = "llama3.2") -> str:

resp = client.chat.completions.create(

model=model,

messages=[{"role": "user", "content": prompt}],

temperature=0.7,

max_tokens=512,

)

return resp.choices[0].message.content

# ── 3. Streaming chat ────────────────────────────────────

def chat_stream(prompt: str, model: str = "llama3.2"):

stream = client.chat.completions.create(

model=model,

messages=[{"role": "user", "content": prompt}],

stream=True,

)

for chunk in stream:

delta = chunk.choices[0].delta.content

if delta:

print(delta, end="", flush=True)

print() # newline at end

# ── 4. Embeddings ────────────────────────────────────────

def embed(texts: list[str]) -> list[list[float]]:

resp = client.embeddings.create(

model="nomic-embed-text",

input=texts,

)

return [d.embedding for d in resp.data]

# ── 5. Agent stream (LINUS-AI extension) ─────────────────

def agent_stream(message: str, profile: str = "general"):

with httpx.stream("POST", f"{BASE_URL}/agent/stream",

headers={"Authorization": f"Bearer {API_KEY}"},

json={"model": "llama3.2", "message": message,

"profile": profile}) as r:

for line in r.iter_lines():

if not line.startswith("data: "): continue

payload = line[6:]

if payload == "[DONE]": break

event = json.loads(payload)

if event["type"] == "delta":

print(event["content"], end="", flush=True)

print()

# ── 6. List agent profiles ───────────────────────────────

def list_profiles():

r = httpx.get(f"{BASE_URL}/agent/profiles")

return r.json()["profiles"]

# ── Demo ─────────────────────────────────────────────────

if __name__ == "__main__":

print("Chat:", chat("What is entropy?"))

print("\nStream: ", end="")

chat_stream("Explain quantum computing in 3 sentences.")

vecs = embed(["Local AI is private", "Cloud AI shares data"])

print(f"\nEmbedding dim: {len(vecs[0])}")

print("\nAgent (medical profile):", end=" ")

agent_stream("What are the symptoms of hyponatremia?", profile="medical")

SDK

Node.js / JavaScript SDK

Use the official openai npm package for OpenAI-compatible endpoints, and fetch for LINUS-AI extensions.

Install

$ npm install openai

added 1 package, audited 1 package in 0.8s

linus-ai-example.mjs — complete working script

import OpenAI from "openai";

const BASE_URL = "http://localhost:8080";

const API_KEY = "YOUR_API_KEY";

const client = new OpenAI({

baseURL: `${BASE_URL}/v1`,

apiKey: API_KEY,

});

// ── Standard chat ───────────────────────────────────────

async function chat(prompt, model = "llama3.2") {

const resp = await client.chat.completions.create({

model,

messages: [{ role: "user", content: prompt }],

temperature: 0.7,

});

return resp.choices[0].message.content;

}

// ── Streaming chat ───────────────────────────────────────

async function chatStream(prompt, model = "llama3.2") {

const stream = await client.chat.completions.create({

model, stream: true,

messages: [{ role: "user", content: prompt }],

});

for await (const chunk of stream) {

const delta = chunk.choices[0]?.delta?.content ?? "";

process.stdout.write(delta);

}

process.stdout.write("\n");

}

// ── Agent stream (LINUS-AI extension via fetch) ──────────

async function agentStream(message, profile = "general") {

const resp = await fetch(`${BASE_URL}/agent/stream`, {

method: "POST",

headers: {

"Content-Type": "application/json",

"Authorization": `Bearer ${API_KEY}`,

},

body: JSON.stringify({ model: "llama3.2", message, profile }),

});

const reader = resp.body.getReader();

const dec = new TextDecoder();

while (true) {

const { done, value } = await reader.read();

if (done) break;

for (const line of dec.decode(value).split("\n")) {

if (!line.startsWith("data: ")) continue;

const payload = line.slice(6);

if (payload === "[DONE]") return;

const event = JSON.parse(payload);

if (event.type === "delta") process.stdout.write(event.content);

}

// ── Demo ────────────────────────────────────────────────

console.log(await chat("What is differential privacy?"));

await chatStream("Explain zero-knowledge proofs simply.");

await agentStream("Summarize GDPR for a developer.", "legal");

Extensions

Plugin System

Extend the LINUS-AI agent with custom tools by dropping Python modules into the plugin directory. No server restart required — plugins are hot-loaded.

📁

Plugin Directory

Place plugin packages in ~/.linus_ai/plugins/. Each plugin is a directory containing plugin.json and a Python module.

🔌

Entry Point

Every plugin must expose a register(app) function in its main module. Called at load time with the LINUS-AI app context.

🛠

Tool Registration

Use @tool decorator to register functions as agent tools. The agent discovers and invokes them during agentic inference.

📄

Plugin Manifest

Each plugin declares its identity and tools in plugin.json: name, version, description, author, and tools array.

Plugin Manifest — plugin.json

~/.linus_ai/plugins/weather/plugin.json

{

"name": "weather",

"version": "1.0.0",

"description": "Fetches current weather for any city.",

"author": "Your Name",

"main": "weather_plugin.py",

"tools": ["get_weather", "get_forecast"]

}

Complete Plugin Example — Weather Tool

~/.linus_ai/plugins/weather/weather_plugin.py

"""LINUS-AI weather plugin — fetches weather data locally."""

import httpx

from linus_ai.plugin import tool, register

@tool(

name="get_weather",

description="Get current weather conditions for a city.",

params={

"city": {"type": "string", "description": "City name, e.g. 'Berlin'"},

"units": {"type": "string", "enum": ["metric", "imperial"], "default": "metric"},

},

)

def get_weather(city: str, units: str = "metric") -> dict:

"""Fetch real-time weather from Open-Meteo (no API key required)."""

geo = httpx.get(

"https://geocoding-api.open-meteo.com/v1/search",

params={"name": city, "count": 1}

).json()["results"][0]

weather = httpx.get(

"https://api.open-meteo.com/v1/forecast",

params={

"latitude": geo["latitude"],

"longitude": geo["longitude"],

"current": "temperature_2m,wind_speed_10m,weather_code",

}

).json()

cur = weather["current"]

return {

"city": city,

"temperature": cur["temperature_2m"],

"wind_speed": cur["wind_speed_10m"],

"weather_code": cur["weather_code"],

}

def register(app):

"""Called by LINUS-AI at plugin load time."""

app.tools.register(get_weather)

app.logger.info("weather plugin loaded — tools: get_weather")

Plugin API import: from linus_ai.plugin import tool, register is available in any plugin. Tools are automatically discovered by the agent and included in its reasoning context. No server restart is required — LINUS-AI hot-loads plugins from the plugins directory on startup and when a SIGHUP is received.

Integrations

Webhook Events

Configure LINUS-AI to push structured event notifications to any HTTP endpoint. Useful for monitoring, audit pipelines, and workflow automation.

Privacy note: Webhook payloads never include prompt content or model outputs. Only metadata is transmitted: event type, timestamp, session ID, model name, and token counts.

Configuration

~/.linus_ai/config.toml

[webhooks]

url = "https://your-server.example.com/linus-ai-events"

secret = "your-hmac-secret-for-signature-verification"

events = ["chat.start", "chat.complete", "model.load", "model.unload", "error"]

timeout_ms = 5000

retry_count = 3

Event Types

chat.start chat.complete model.load model.unload error

Payload Format

POST to your webhook URL

# chat.complete event payload

{

"event_type": "chat.complete",

"timestamp": "2026-03-15T14:23:07Z",

"session_id": "sess_a1b2c3d4",

"model": "llama3.2",

"token_counts": {

"prompt_tokens": 142,

"completion_tokens": 87,

"total_tokens": 229

},

"latency_ms": 1240,

"profile": "general"

// NOTE: no prompt content, no response content — metadata only

}

Signature Verification

Python — HMAC-SHA256 verification

import hmac, hashlib

from flask import Flask, request, abort

WEBHOOK_SECRET = b"your-hmac-secret-for-signature-verification"

app = Flask(__name__)

@app.post("/linus-ai-events")

def webhook():

signature = request.headers.get("X-LinusAI-Signature", "")

expected = hmac.new(WEBHOOK_SECRET, request.data, hashlib.sha256).hexdigest()

if not hmac.compare_digest(f"sha256={expected}", signature):

abort(401)

event = request.json

print(f"Event: {event['event_type']} | tokens: {event['token_counts']['total_tokens']}")

return "", 204

Reference

Rate Limits & Error Codes

HTTP status codes, error response format, rate-limit headers, and a reference table of all common error codes.

HTTP Status Codes

Code	Meaning	When it occurs
200	OK	Request succeeded. Response body contains the result or SSE stream.
400	Bad Request	Malformed JSON, missing required fields, or invalid parameter values.
401	Unauthorized	API key missing or invalid when authentication is enabled.
429	Too Many Requests	Rate limit exceeded. Check `X-RateLimit-Reset` header for retry time.
500	Internal Server Error	Unexpected error in the inference engine. Check server logs.
503	Service Unavailable	Model is loading, GPU OOM condition, or server is at capacity.

Error Response Format

Error JSON

{

"error": {

"code": "model_not_found",

"message": "Model 'gpt-4' is not installed. Run: linus-ai --pull-model llama3.2",

"type": "invalid_request_error"

}

Rate-Limit Headers

Header	Type	Description
X-RateLimit-Limit	integer	Maximum number of requests allowed per minute for this API key.
X-RateLimit-Remaining	integer	Number of requests remaining in the current rate-limit window.
X-RateLimit-Reset	unix timestamp	UTC timestamp (seconds) when the rate-limit window resets.

Common Error Codes

Error Code	HTTP	Description & Resolution
model_not_found	400	The requested model is not installed locally. Pull it with `POST /v1/models/pull` or `linus-ai --pull-model <name>`.
context_length_exceeded	400	The combined token count of your messages exceeds the model's context window. Reduce message history or `max_tokens`.
license_required	401	The requested feature requires an active LINUS-AI license. Activate with `linus-ai --activate <key>`.
out_of_memory	503	GPU or RAM exhausted. Try a smaller model, reduce `gpu_layers`, lower `context_len`, or use a higher quantization level.
model_loading	503	Model is currently loading into memory. Retry after a few seconds. Check `Retry-After` header.
invalid_api_key	401	The provided API key is not recognized. Check your `Authorization` header and config.

Contributing

Local Development

Set up a development environment to contribute to LINUS-AI, run tests, or build custom inference backends.

Clone and Python setup

# Clone the repository

$ git clone https://github.com/LINUS-AI-PRO/linus-ai.git

$ cd linus-ai

# Create a virtual environment and install in editable mode with dev extras

$ python -m venv .venv && source .venv/bin/activate

$ pip install -e ".[dev]"

Successfully installed linus-ai-0.x.x (editable)

# Run the test suite

$ pytest tests/ -v

collected 142 items

tests/test_api.py::test_chat_completions PASSED

tests/test_api.py::test_streaming PASSED

tests/test_embeddings.py::test_batch PASSED

...

142 passed in 18.4s

Rust backend (linus-ai-rs)

# Build the Rust inference engine in release mode

$ cd linus-ai-rs/

$ cargo build --release

Compiling linus-ai-rs v0.x.x

Finished release [optimized] target(s) in 42.1s

# Run Rust unit tests

$ cargo test

running 38 tests ... ok

🧪

Running Tests

Python: pytest tests/ -v
Rust: cargo test
Integration: pytest tests/integration/ --live (requires a running server).

🔍

Linting & Formatting

Python: ruff check . and ruff format .
Rust: cargo clippy and cargo fmt

🤝

Contributing

Read CONTRIBUTING.md before submitting a PR. We welcome bug fixes, new model support, and documentation improvements.

🌿

Branch Conventions

Features: feat/short-description
Bug fixes: fix/short-description
Docs: docs/short-description