NEXUS Runtime Documentation

NEXUS Runtime is the reliability control plane and AI Runtime Operations Platform for teams shipping LLM-powered features to production. It sits between your application and your AI providers, including OpenAI, Anthropic, Google Gemini, AWS Bedrock, Mistral AI, Cohere, Groq, and DeepSeek, adding routing, failover, Reliability Score, root cause analysis, observability, policy enforcement, AI prompts repo workflows, AI Release Manager releases, AI change management, and versioned prompt management.

5-minute quickstart Create an account → add a provider API key → create a NEXUS API key → change your client base URL to https://runtime.nexusai.run/v1. No other code changes required.

Quickstart

Step 1 — Add a gateway provider

In the portal, navigate to Gateway → Providers. Add your provider credential for OpenAI, Anthropic, Google Gemini, AWS Bedrock, Mistral AI, Cohere, Groq, or DeepSeek. NEXUS stores provider credentials centrally so application teams do not spread raw provider keys across services.

Step 2 — Create an API key

Navigate to API Keys → New key. Select the completions:write scope. Copy the full key — it is shown only once at creation time. The key prefix (first 8 characters) is displayed in the portal for identification.

Step 3 — Send your first request

curl https://runtime.nexusai.run/v1/chat/completions \
  -H "Authorization: Bearer nxk_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{ "role": "user", "content": "Hello" }]
  }'

The response includes routing metadata in headers:

x-nexus-provider: openai
x-nexus-attempt: 1
x-nexus-latency-ms: 342

Core Concepts

Tenants

A tenant is an isolated workspace. All resources — providers, API keys, prompts, policies — are scoped to a tenant. You can be a member of multiple tenants.

API Keys

NEXUS API keys authenticate requests to the gateway. Keys have scopes that restrict what operations they can perform. Only completions:write is required for basic gateway use.

Providers

A provider is a configured AI backend such as OpenAI, Anthropic, Gemini, Bedrock, Mistral AI, Cohere, Groq, or DeepSeek. NEXUS routes requests to providers based on the requested model. Each provider has its own circuit breaker state.

Authentication

All gateway requests require a NEXUS API key as a Bearer token:

Authorization: Bearer nxk_your_key_here

Session-based authentication (Google OAuth, GitHub OAuth) is used for portal UI access only. The /v1/chat/completions endpoint accepts only API key auth.

Chat Completions

NEXUS proxies the OpenAI chat completions format. Any client that supports a custom base URL works without modification.

Parameter	Type	Required	Description
model	string	required	Model identifier (e.g. gpt-4o, claude-opus-4-7)
messages	array	required	OpenAI-format messages array
stream	boolean	optional	Enable SSE streaming. Default: false
max_tokens	integer	optional	Maximum tokens in the completion
temperature	number	optional	Sampling temperature 0–2

Circuit Breaker

How it works Each provider has an independent circuit breaker with three states: closed (normal), open (provider bypassed for 30s), and half-open (probing recovery). Opens after 5 consecutive 5xx errors or timeouts. Two consecutive successes in half-open close it. 4xx errors never trip the circuit.

Parameter	Default	Description
Failure threshold	5	Consecutive 5xx/timeout errors before opening
Open window	30 s	How long the circuit stays open
Success threshold	2	Consecutive successes in half-open to close
Trip statuses	500, 502, 503, 504	HTTP codes that increment the failure counter
Non-trip statuses	4xx	Never trip the circuit

Automatic Failover

When a provider's circuit is open, NEXUS automatically routes to the next available provider by weight. Retryable statuses (408, 429, 500, 502, 503, 504) trigger a retry on the next provider before failing the request.

Streaming (SSE)

Set "stream": true in the request body. NEXUS pipes the provider's SSE stream directly to the client without buffering. Circuit breaker state is updated based on the initial response status.

Managing Prompts

Prompts are versioned text templates with declared variables. Create a prompt with a slug and name, then add versions as you iterate.

POST /v1/prompts
{ "slug": "support-reply", "name": "Support Reply" }

Semver Versioning

Every change creates a new version (MAJOR.MINOR.PATCH). Versions start as draft (editable), then get published (immutable). Only one draft can exist per prompt at a time — enforced by a partial unique index in Postgres.

POST /v1/prompts/:id/versions
{
  "semver": "1.2.0",
  "content": "You are a support agent for {{company}}.\n\nTicket: {{ticket_content}}",
  "variables": [
    { "name": "company" },
    { "name": "ticket_content" }
  ]
}

Rendering Prompts

The renderer resolves {{varName}} and {{varName | default}} patterns. Undeclared variables pass through unchanged.

POST /v1/prompts/:id/render
{ "env": "prod", "variables": { "company": "Acme", "ticket_content": "Order missing" } }

Environments & Deployments

Each prompt has one active deployment per environment (dev, staging, prod). Deploying atomically replaces the current deployment and stores previous_version_id for rollback.

POST /v1/prompts/:id/deploy
{ "version_id": "uuid-of-published-version", "env": "prod" }

Rollback

One-step rollback restores the previous version. The previous_version_id stored at deploy time makes this O(1).

POST /v1/prompts/:id/rollback
{ "env": "prod" }

Running Evals

Evals test a prompt version against a set of cases before it reaches production. Each case defines an input variable map and one or more assertions about the expected output.

Assertion types

Type	Description
pass_fail	Boolean pass/fail check on output
json_schema	Validates output against JSON Schema
regex	Output matches a regex pattern
semantic	Semantic similarity via embedding

gate_publish

When gate_publish: true is set on an eval suite, publishing a prompt version is blocked if any eval case fails.

Policy Engine

Policies intercept gateway requests in priority order. Types: pii_detection, model_restriction, spend_limit, time_restriction. Action: block or log-only.

GitOps — GitHub Integration

Connect your GitHub repository to NEXUS. A push to the configured branch triggers the pipeline.

In the portal, open GitOps → Settings and configure your repo and branch.
Copy the webhook secret generated by NEXUS.
In GitHub: Settings → Webhooks → Add webhook. Set the payload URL to https://runtime.nexusai.run/v1/webhooks/gitops/<tenant-slug> and paste the secret.
Select Just the push event and save.

Pipeline steps

1. validate        → version exists and is published
2. deploy_prompt   → upserts prompt_deployments, stores previous_version_id
3. verify          → queries prompt_deployments to confirm new version

Each step records start time, completion time, and output. The webhook returns 202 immediately — the pipeline runs asynchronously.

API Reference — Gateway

POST/v1/chat/completions

Proxy a chat completion to the best available provider. Requires Authorization: Bearer <key> with completions:write scope.

GET/v1/gateway/health

Returns circuit breaker state and configured providers. No authentication required.

API Reference — Prompt Registry

GET/v1/prompts

List all prompts. Supports ?page=1&limit=50 pagination.

POST/v1/prompts

Create a new prompt. Body: { slug, name, description }.

POST/v1/prompts/:id/versions

Create a new version (draft). Body: { semver, content, variables[] }.

POST/v1/prompts/:id/versions/:vId/publish

Publish a draft version, making it immutable and deployable.

POST/v1/prompts/:id/render

Render a prompt for a given environment with variable substitution.

POST/v1/prompts/:id/deploy

Deploy a published version to an environment.

POST/v1/prompts/:id/rollback

Roll back an environment to the previous deployed version.

API Reference — Keys

GET/v1/keys

List API keys. Returns prefix, scope, and creation date. Never returns the full key.

POST/v1/keys

Create a new API key. Body: { name, scopes[] }. Returns the full key once — store it immediately.

DELETE/v1/keys/:id

Revoke an API key immediately. All in-flight requests using that key fail with 401.