Developer Guide — LINUS-AI

Integrate LINUS-AI into your applications using the REST API, WebSocket streaming, and Python SDK.

Developer Guide Contents

Jump to any section of this guide.

API Overview

LINUS-AI exposes an OpenAI-compatible REST API. Any application built against the OpenAI API can point to LINUS-AI with a single base URL change.

🌐

Base URL

http://localhost:8080 — configurable via server.host and server.port in config.toml.

🔄

OpenAI-Compatible

Drop-in replacement for the OpenAI API. Change only the base URL and API key — no other code changes required.

🔒

Authentication

When auth is enabled, include the header: Authorization: Bearer YOUR_API_KEY. Configure keys in config.toml.

📡

Streaming

Server-Sent Events (text/event-stream) for all streaming endpoints. Set "stream": true in your request.

🌍

CORS

Configurable in config.toml via server.cors. Default: ["*"] for local development.

📄

Content-Type

All requests must use Content-Type: application/json. Streaming responses use text/event-stream.

All Endpoints at a Glance

POST/v1/chat/completionsChat with a model (streaming or non-streaming)
POST/v1/completionsLegacy text completions
POST/v1/embeddingsGenerate text embeddings
GET/v1/modelsList all available models
POST/v1/models/pullDownload a model from the registry
DELETE/v1/models/{name}Remove a model from disk
GET/v1/models/{name}Get model info and metadata
POST/agent/streamAgentic multi-turn inference (SSE)
GET/agent/profilesList all 14 vertical agent profiles
GET/v1/vault/querySemantic search over encrypted vault
POST/v1/vault/storeStore entry in encrypted vault
WSws://localhost:8080/v1/streamWebSocket streaming interface
GET/healthHealth check and version info
GET/metricsPrometheus-compatible metrics

Compliance & Security API

GET/compliance/profilesAll 14 profile compliance configs (level, regulations, PII scan flag, disclaimer)
GET/compliance/statusAudit chain integrity status + monthly statistics
GET/compliance/consentsConsents granted on this machine (?machine_id=…)
GET/compliance/auditQuery HMAC-chained audit log (?profile_id, ?blocked_only, ?pii_only, ?since, ?limit)
POST/compliance/consentGrant or revoke consent for a regulated profile — body: {profile_id, action:"grant"|"revoke", machine_id}

RAG Document Access Control API

GET/rag/documentsList documents (?user_id filters to accessible subset; ?all=true for full registry)
POST/rag/documents/registerRegister a document with classification and ACL
PUT/rag/documents/{id}/aclReplace ACL for a document (allow/deny at user/dept/division/company/role scope)
PUT/rag/documents/{id}/classificationUpdate classification level (0=PUBLIC … 4=TOP_SECRET)
DELETE/rag/documents/{id}Remove document from registry
POST/rag/access-checkTest access decision for a principal + document (returns PERMIT/DENY + rule)
GET/rag/auditQuery RAG access audit log (?doc_id, ?principal_id, ?decision, ?denied_only, ?stats=true)
GET/rag/principalsList all registered principals
POST/rag/principalsRegister or update a principal (user_id, company, division, department, roles, clearance)
DELETE/rag/principals/{id}Remove a principal

Chat Completions

The primary inference endpoint. OpenAI-compatible with LINUS-AI extensions for profile selection and custom system prompts.

POST /v1/chat/completions OpenAI-compatible chat completions

Request Body Parameters

Field Type Description
model required string Model name, e.g. "llama3.2", "mistral-7b". Must be a locally available model.
messages required array Array of message objects with role ("system", "user", "assistant") and content (string).
stream optional boolean If true, returns a Server-Sent Events stream. Default: false.
temperature optional float Sampling temperature between 0.0 and 2.0. Higher values increase randomness. Default: 0.7.
max_tokens optional integer Maximum number of tokens to generate. Default: 2048. Capped by the model's context length.
top_p optional float Nucleus sampling probability. Only tokens in the top-p probability mass are considered. Default: 0.9.
top_k optional integer Limits sampling to the top-k most probable tokens. LINUS-AI extension. Default: 40.
stop optional string | array One or more stop sequences. Generation halts when any stop string is produced.
system optional string LINUS-AI extension. Overrides the system prompt for this request, taking precedence over the model's default and profile system prompts.
profile optional string LINUS-AI extension. Agent profile ID (e.g. "medical", "legal", "general"). Loads profile-specific system context and model hints.

Non-Streaming Response

Response JSON
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1710000000,
"model": "llama3.2",
"choices": [
{
"index": 0,
"message": { "role": "assistant", "content": "Hello! How can I help you?" },
"finish_reason": "stop"
}
],
"usage": { "prompt_tokens": 12, "completion_tokens": 9, "total_tokens": 21 }
}

Streaming Response (SSE)

SSE stream — text/event-stream
data: {"id":"chatcmpl-abc123","choices":[{"delta":{"role":"assistant"},"index":0}]}
data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":"Hello"},"index":0}]}
data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":"!"},"index":0}]}
data: {"id":"chatcmpl-abc123","choices":[{"delta":{},"finish_reason":"stop","index":0}]}
data: [DONE]

Code Examples

Non-streaming chat
$ curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "llama3.2",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the speed of light?"}
],
"temperature": 0.7,
"max_tokens": 512
}'
Streaming chat
$ curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
--no-buffer \
-d '{
"model": "llama3.2",
"stream": true,
"messages": [{"role": "user", "content": "Write a haiku about privacy."}]
}'
data: {"choices":[{"delta":{"content":"Quiet"}}]}
data: {"choices":[{"delta":{"content":" servers hum,"}}]}
data: [DONE]
Python — openai library
from openai import OpenAI
 
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="YOUR_API_KEY", # or "none" if auth disabled
)
 
response = client.chat.completions.create(
model="llama3.2",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the speed of light?"},
],
temperature=0.7,
max_tokens=512,
)
 
print(response.choices[0].message.content)
JavaScript — fetch streaming
const response = await fetch("http://localhost:8080/v1/chat/completions", {
method: "POST",
headers: {
"Content-Type": "application/json",
"Authorization": "Bearer YOUR_API_KEY",
},
body: JSON.stringify({
model: "llama3.2",
stream: true,
messages: [{ role: "user", content: "Hello!" }],
}),
});
 
const reader = response.body.getReader();
const decoder = new TextDecoder();
 
while (true) {
const { done, value } = await reader.read();
if (done) break;
const lines = decoder.decode(value).split("\n");
for (const line of lines) {
if (!line.startsWith("data: ")) continue;
const text = line.slice(6);
if (text === "[DONE]") break;
const chunk = JSON.parse(text);
process.stdout.write(chunk.choices[0].delta.content ?? "");
}
}

Embeddings

Generate vector embeddings for semantic search, RAG pipelines, clustering, and similarity tasks — all computed locally.

POST /v1/embeddings Generate text embeddings (768 or 1536 dims)

Request Parameters

FieldTypeDescription
model required string Embedding model name, e.g. "nomic-embed-text", "mxbai-embed-large".
input required string | array Text string or array of strings to embed. Batched inputs are processed in a single pass.
Response
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [0.0021, -0.1432, 0.0087, ...] // 768 or 1536 floats
}
],
"usage": { "prompt_tokens": 8, "total_tokens": 8 }
}
curl — batch embeddings
$ curl http://localhost:8080/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "nomic-embed-text",
"input": ["Private AI inference", "Local LLM deployment"]
}'
Use Cases: Semantic search, Retrieval-Augmented Generation (RAG), document clustering, duplicate detection, and recommendation systems. Pair with /v1/vault/store to build a fully private semantic memory store.

Models API

List, download, inspect, and remove local models programmatically.

GET/v1/modelsList all installed models
POST/v1/models/pullDownload a model (streams progress)
GET/v1/models/{name}Get model info
DELETE/v1/models/{name}Remove model from disk
GET /v1/models — response
{
"object": "list",
"data": [
{
"id": "llama3.2",
"object": "model",
"created": 1710000000,
"size_gb": 4.7,
"parameters": "3.2B",
"quantization": "Q4_K_M",
"context_length": 128000
},
{ "id": "mistral-7b", "parameters": "7B", "quantization": "Q5_K_M", ... }
]
}
POST /v1/models/pull — stream download progress
$ curl http://localhost:8080/v1/models/pull \
-H "Content-Type: application/json" \
--no-buffer \
-d '{"name": "llama3.2"}'
data: {"status":"downloading","progress":12,"total_mb":4812}
data: {"status":"downloading","progress":48,"total_mb":4812}
data: {"status":"verifying","progress":100}
data: {"status":"done","model":"llama3.2"}
data: [DONE]
DELETE and GET /v1/models/{name}
# Get model info
$ curl http://localhost:8080/v1/models/llama3.2
{"id":"llama3.2","size_gb":4.7,"parameters":"3.2B","quantization":"Q4_K_M"}
 
# Remove model from disk
$ curl -X DELETE http://localhost:8080/v1/models/llama3.2
{"deleted": true, "id": "llama3.2"}

Agent / Agentic Stream

LINUS-AI's native agentic inference endpoints. Supports multi-turn reasoning, tool use, 14 vertical profiles, and an encrypted semantic vault.

POST/agent/streamMulti-turn agentic inference (SSE)
GET/agent/profilesList all 14 vertical profiles
GET/v1/vault/query?q=QUERYSemantic search over encrypted vault
POST/v1/vault/storeStore entry in encrypted vault

POST /agent/stream — Request Parameters

FieldTypeDescription
model required string Model to use for inference, e.g. "llama3.2".
message required string The current user message (latest turn).
profile optional string One of the 14 vertical profile IDs: general, medical, legal, financial, research, coding, education, creative, security, data, hr, executive, customer, compliance.
history optional array Prior conversation turns. Same format as messages in chat completions.
session_id optional string Session identifier for vault-backed memory retrieval. Omit to start a stateless session.
system optional string Custom system prompt. Overrides the profile system prompt when set.
max_turns optional integer Maximum agentic reasoning turns before returning. Default: 8.
temperature optional float Sampling temperature. Default: 0.7.

SSE Event Types

Agent SSE stream events
# Token delta — streamed text content
data: {"type":"delta","content":"Based on the latest research..."}
 
# Tool call — agent invoking a registered tool
data: {"type":"tool_call","tool":"web_search","args":{"query":"privacy laws 2025"}}
 
# Tool result — result returned to agent
data: {"type":"tool_result","tool":"web_search","result":"...search results..."}
 
# Done — inference complete
data: {"type":"done","usage":{"prompt_tokens":421,"completion_tokens":187}}
data: [DONE]

GET /agent/profiles — Response

Profile list
{
"profiles": [
{
"id": "medical",
"name": "Medical Assistant",
"icon": "🏥",
"description": "Clinical decision support, drug interactions, diagnostics.",
"model_hints": ["llama3-70b", "meditron-7b"],
"suggested_model": "meditron-7b"
},
{ "id": "legal", "name": "Legal Assistant", "icon": "⚖", ... },
{ "id": "coding", "name": "Code Assistant", "icon": "⌨", ... },
{ "id": "security", "name": "Security Analyst", "icon": "🛡", ... },
// ... 14 profiles total
]
}

WebSocket Streaming

For real-time bidirectional communication — ideal for chat UIs and applications that need low-latency token streaming without managing SSE connections.

WS ws://localhost:8080/v1/stream WebSocket streaming interface

Message Protocol

DirectionTypeFields
Client → Server chat type, model, messages, temperature, max_tokens, profile, session_id
Server → Client delta type, content — streamed token chunk
Server → Client done type, usage — inference complete with token counts
Server → Client error type, code, message — error condition
Client → Server auth type, api_key — send immediately after connection when auth is enabled
JavaScript — WebSocket client with reconnection
class LinusAISocket {
constructor(url = "ws://localhost:8080/v1/stream", apiKey = null) {
this.url = url;
this.apiKey = apiKey;
this.reconnectDelay = 1000;
this.connect();
}
 
connect() {
this.ws = new WebSocket(this.url);
 
this.ws.onopen = () => {
this.reconnectDelay = 1000; // reset backoff on success
if (this.apiKey) {
this.ws.send(JSON.stringify({ type: "auth", api_key: this.apiKey }));
}
};
 
this.ws.onmessage = (event) => {
const msg = JSON.parse(event.data);
if (msg.type === "delta") process.stdout.write(msg.content);
if (msg.type === "done") console.log("\n[done]", msg.usage);
if (msg.type === "error") console.error("Error:", msg.message);
};
 
this.ws.onclose = () => {
setTimeout(() => this.connect(), this.reconnectDelay);
this.reconnectDelay = Math.min(this.reconnectDelay * 2, 30000);
};
}
 
chat(model, messages) {
this.ws.send(JSON.stringify({ type: "chat", model, messages }));
}
}
 
# Usage
const ai = new LinusAISocket("ws://localhost:8080/v1/stream", "YOUR_API_KEY");
ai.chat("llama3.2", [{ role: "user", content: "Hello from WebSocket!" }]);
Reconnection: The client above implements exponential backoff (1s → 2s → 4s … up to 30s). The server retains no in-flight state between connections — resend the full message history on reconnect.

Python SDK

LINUS-AI uses the standard openai Python library for all OpenAI-compatible endpoints, plus direct HTTP for LINUS-AI extensions.

Install
$ pip install openai
Successfully installed openai-1.x.x
linus_ai_example.py — complete working script
#!/usr/bin/env python3
"""LINUS-AI Python integration example."""
 
import httpx
import json
from openai import OpenAI
 
BASE_URL = "http://localhost:8080"
API_KEY = "YOUR_API_KEY" # or "none" if auth is disabled
 
# ── 1. OpenAI-compatible client ──────────────────────────
client = OpenAI(base_url=f"{BASE_URL}/v1", api_key=API_KEY)
 
# ── 2. Standard chat completion ──────────────────────────
def chat(prompt: str, model: str = "llama3.2") -> str:
resp = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
temperature=0.7,
max_tokens=512,
)
return resp.choices[0].message.content
 
# ── 3. Streaming chat ────────────────────────────────────
def chat_stream(prompt: str, model: str = "llama3.2"):
stream = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)
print() # newline at end
 
# ── 4. Embeddings ────────────────────────────────────────
def embed(texts: list[str]) -> list[list[float]]:
resp = client.embeddings.create(
model="nomic-embed-text",
input=texts,
)
return [d.embedding for d in resp.data]
 
# ── 5. Agent stream (LINUS-AI extension) ─────────────────
def agent_stream(message: str, profile: str = "general"):
with httpx.stream("POST", f"{BASE_URL}/agent/stream",
headers={"Authorization": f"Bearer {API_KEY}"},
json={"model": "llama3.2", "message": message,
"profile": profile}) as r:
for line in r.iter_lines():
if not line.startswith("data: "): continue
payload = line[6:]
if payload == "[DONE]": break
event = json.loads(payload)
if event["type"] == "delta":
print(event["content"], end="", flush=True)
print()
 
# ── 6. List agent profiles ───────────────────────────────
def list_profiles():
r = httpx.get(f"{BASE_URL}/agent/profiles")
return r.json()["profiles"]
 
# ── Demo ─────────────────────────────────────────────────
if __name__ == "__main__":
print("Chat:", chat("What is entropy?"))
print("\nStream: ", end="")
chat_stream("Explain quantum computing in 3 sentences.")
vecs = embed(["Local AI is private", "Cloud AI shares data"])
print(f"\nEmbedding dim: {len(vecs[0])}")
print("\nAgent (medical profile):", end=" ")
agent_stream("What are the symptoms of hyponatremia?", profile="medical")

Node.js / JavaScript SDK

Use the official openai npm package for OpenAI-compatible endpoints, and fetch for LINUS-AI extensions.

Install
$ npm install openai
added 1 package, audited 1 package in 0.8s
linus-ai-example.mjs — complete working script
import OpenAI from "openai";
 
const BASE_URL = "http://localhost:8080";
const API_KEY = "YOUR_API_KEY";
 
const client = new OpenAI({
baseURL: `${BASE_URL}/v1`,
apiKey: API_KEY,
});
 
// ── Standard chat ───────────────────────────────────────
async function chat(prompt, model = "llama3.2") {
const resp = await client.chat.completions.create({
model,
messages: [{ role: "user", content: prompt }],
temperature: 0.7,
});
return resp.choices[0].message.content;
}
 
// ── Streaming chat ───────────────────────────────────────
async function chatStream(prompt, model = "llama3.2") {
const stream = await client.chat.completions.create({
model, stream: true,
messages: [{ role: "user", content: prompt }],
});
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta?.content ?? "";
process.stdout.write(delta);
}
process.stdout.write("\n");
}
 
// ── Agent stream (LINUS-AI extension via fetch) ──────────
async function agentStream(message, profile = "general") {
const resp = await fetch(`${BASE_URL}/agent/stream`, {
method: "POST",
headers: {
"Content-Type": "application/json",
"Authorization": `Bearer ${API_KEY}`,
},
body: JSON.stringify({ model: "llama3.2", message, profile }),
});
const reader = resp.body.getReader();
const dec = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
for (const line of dec.decode(value).split("\n")) {
if (!line.startsWith("data: ")) continue;
const payload = line.slice(6);
if (payload === "[DONE]") return;
const event = JSON.parse(payload);
if (event.type === "delta") process.stdout.write(event.content);
}
}
}
 
// ── Demo ────────────────────────────────────────────────
console.log(await chat("What is differential privacy?"));
await chatStream("Explain zero-knowledge proofs simply.");
await agentStream("Summarize GDPR for a developer.", "legal");

Plugin System

Extend the LINUS-AI agent with custom tools by dropping Python modules into the plugin directory. No server restart required — plugins are hot-loaded.

📁

Plugin Directory

Place plugin packages in ~/.linus_ai/plugins/. Each plugin is a directory containing plugin.json and a Python module.

🔌

Entry Point

Every plugin must expose a register(app) function in its main module. Called at load time with the LINUS-AI app context.

🛠

Tool Registration

Use @tool decorator to register functions as agent tools. The agent discovers and invokes them during agentic inference.

📄

Plugin Manifest

Each plugin declares its identity and tools in plugin.json: name, version, description, author, and tools array.

Plugin Manifest — plugin.json

~/.linus_ai/plugins/weather/plugin.json
{
"name": "weather",
"version": "1.0.0",
"description": "Fetches current weather for any city.",
"author": "Your Name",
"main": "weather_plugin.py",
"tools": ["get_weather", "get_forecast"]
}

Complete Plugin Example — Weather Tool

~/.linus_ai/plugins/weather/weather_plugin.py
"""LINUS-AI weather plugin — fetches weather data locally."""
import httpx
from linus_ai.plugin import tool, register
 
@tool(
name="get_weather",
description="Get current weather conditions for a city.",
params={
"city": {"type": "string", "description": "City name, e.g. 'Berlin'"},
"units": {"type": "string", "enum": ["metric", "imperial"], "default": "metric"},
},
)
def get_weather(city: str, units: str = "metric") -> dict:
"""Fetch real-time weather from Open-Meteo (no API key required)."""
geo = httpx.get(
"https://geocoding-api.open-meteo.com/v1/search",
params={"name": city, "count": 1}
).json()["results"][0]
weather = httpx.get(
"https://api.open-meteo.com/v1/forecast",
params={
"latitude": geo["latitude"],
"longitude": geo["longitude"],
"current": "temperature_2m,wind_speed_10m,weather_code",
}
).json()
cur = weather["current"]
return {
"city": city,
"temperature": cur["temperature_2m"],
"wind_speed": cur["wind_speed_10m"],
"weather_code": cur["weather_code"],
}
 
def register(app):
"""Called by LINUS-AI at plugin load time."""
app.tools.register(get_weather)
app.logger.info("weather plugin loaded — tools: get_weather")
Plugin API import: from linus_ai.plugin import tool, register is available in any plugin. Tools are automatically discovered by the agent and included in its reasoning context. No server restart is required — LINUS-AI hot-loads plugins from the plugins directory on startup and when a SIGHUP is received.

Webhook Events

Configure LINUS-AI to push structured event notifications to any HTTP endpoint. Useful for monitoring, audit pipelines, and workflow automation.

Privacy note: Webhook payloads never include prompt content or model outputs. Only metadata is transmitted: event type, timestamp, session ID, model name, and token counts.

Configuration

~/.linus_ai/config.toml
[webhooks]
url = "https://your-server.example.com/linus-ai-events"
secret = "your-hmac-secret-for-signature-verification"
events = ["chat.start", "chat.complete", "model.load", "model.unload", "error"]
timeout_ms = 5000
retry_count = 3

Event Types

chat.start chat.complete model.load model.unload error

Payload Format

POST to your webhook URL
# chat.complete event payload
{
"event_type": "chat.complete",
"timestamp": "2026-03-15T14:23:07Z",
"session_id": "sess_a1b2c3d4",
"model": "llama3.2",
"token_counts": {
"prompt_tokens": 142,
"completion_tokens": 87,
"total_tokens": 229
},
"latency_ms": 1240,
"profile": "general"
// NOTE: no prompt content, no response content — metadata only
}

Signature Verification

Python — HMAC-SHA256 verification
import hmac, hashlib
from flask import Flask, request, abort
 
WEBHOOK_SECRET = b"your-hmac-secret-for-signature-verification"
 
app = Flask(__name__)
 
@app.post("/linus-ai-events")
def webhook():
signature = request.headers.get("X-LinusAI-Signature", "")
expected = hmac.new(WEBHOOK_SECRET, request.data, hashlib.sha256).hexdigest()
if not hmac.compare_digest(f"sha256={expected}", signature):
abort(401)
event = request.json
print(f"Event: {event['event_type']} | tokens: {event['token_counts']['total_tokens']}")
return "", 204

Rate Limits & Error Codes

HTTP status codes, error response format, rate-limit headers, and a reference table of all common error codes.

HTTP Status Codes

CodeMeaningWhen it occurs
200 OK Request succeeded. Response body contains the result or SSE stream.
400 Bad Request Malformed JSON, missing required fields, or invalid parameter values.
401 Unauthorized API key missing or invalid when authentication is enabled.
429 Too Many Requests Rate limit exceeded. Check X-RateLimit-Reset header for retry time.
500 Internal Server Error Unexpected error in the inference engine. Check server logs.
503 Service Unavailable Model is loading, GPU OOM condition, or server is at capacity.

Error Response Format

Error JSON
{
"error": {
"code": "model_not_found",
"message": "Model 'gpt-4' is not installed. Run: linus-ai --pull-model llama3.2",
"type": "invalid_request_error"
}
}

Rate-Limit Headers

HeaderTypeDescription
X-RateLimit-Limit integer Maximum number of requests allowed per minute for this API key.
X-RateLimit-Remaining integer Number of requests remaining in the current rate-limit window.
X-RateLimit-Reset unix timestamp UTC timestamp (seconds) when the rate-limit window resets.

Common Error Codes

Error CodeHTTPDescription & Resolution
model_not_found 400 The requested model is not installed locally. Pull it with POST /v1/models/pull or linus-ai --pull-model <name>.
context_length_exceeded 400 The combined token count of your messages exceeds the model's context window. Reduce message history or max_tokens.
license_required 401 The requested feature requires an active LINUS-AI license. Activate with linus-ai --activate <key>.
out_of_memory 503 GPU or RAM exhausted. Try a smaller model, reduce gpu_layers, lower context_len, or use a higher quantization level.
model_loading 503 Model is currently loading into memory. Retry after a few seconds. Check Retry-After header.
invalid_api_key 401 The provided API key is not recognized. Check your Authorization header and config.

Local Development

Set up a development environment to contribute to LINUS-AI, run tests, or build custom inference backends.

Clone and Python setup
# Clone the repository
$ git clone https://github.com/LINUS-AI-PRO/linus-ai.git
$ cd linus-ai
 
# Create a virtual environment and install in editable mode with dev extras
$ python -m venv .venv && source .venv/bin/activate
$ pip install -e ".[dev]"
Successfully installed linus-ai-0.x.x (editable)
 
# Run the test suite
$ pytest tests/ -v
collected 142 items
tests/test_api.py::test_chat_completions PASSED
tests/test_api.py::test_streaming PASSED
tests/test_embeddings.py::test_batch PASSED
...
142 passed in 18.4s
Rust backend (linus-ai-rs)
# Build the Rust inference engine in release mode
$ cd linus-ai-rs/
$ cargo build --release
Compiling linus-ai-rs v0.x.x
Finished release [optimized] target(s) in 42.1s
 
# Run Rust unit tests
$ cargo test
running 38 tests ... ok
🧪

Running Tests

Python: pytest tests/ -v
Rust: cargo test
Integration: pytest tests/integration/ --live (requires a running server).

🔍

Linting & Formatting

Python: ruff check . and ruff format .
Rust: cargo clippy and cargo fmt

🤝

Contributing

Read CONTRIBUTING.md before submitting a PR. We welcome bug fixes, new model support, and documentation improvements.

🌿

Branch Conventions

Features: feat/short-description
Bug fixes: fix/short-description
Docs: docs/short-description