Open Console
NEXUS Runtime is the reliability control plane and AI Runtime Operations Platform for teams shipping LLM-powered features to production. It sits between your application and your AI providers, including OpenAI, Anthropic, Google Gemini, AWS Bedrock, Mistral AI, Cohere, Groq, and DeepSeek, adding routing, failover, Reliability Score, root cause analysis, observability, policy enforcement, AI prompts repo workflows, AI Release Manager releases, AI change management, and versioned prompt management.
https://runtime.nexusai.run/v1. No other code changes required.
In the portal, navigate to Gateway → Providers. Add your provider credential for OpenAI, Anthropic, Google Gemini, AWS Bedrock, Mistral AI, Cohere, Groq, or DeepSeek. NEXUS stores provider credentials centrally so application teams do not spread raw provider keys across services.
Navigate to API Keys → New key. Select the completions:write scope. Copy the full key — it is shown only once at creation time. The key prefix (first 8 characters) is displayed in the portal for identification.
curl https://runtime.nexusai.run/v1/chat/completions \
-H "Authorization: Bearer nxk_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{ "role": "user", "content": "Hello" }]
}'
The response includes routing metadata in headers:
x-nexus-provider: openai x-nexus-attempt: 1 x-nexus-latency-ms: 342
A tenant is an isolated workspace. All resources — providers, API keys, prompts, policies — are scoped to a tenant. You can be a member of multiple tenants.
NEXUS API keys authenticate requests to the gateway. Keys have scopes that restrict what operations they can perform. Only completions:write is required for basic gateway use.
A provider is a configured AI backend such as OpenAI, Anthropic, Gemini, Bedrock, Mistral AI, Cohere, Groq, or DeepSeek. NEXUS routes requests to providers based on the requested model. Each provider has its own circuit breaker state.
All gateway requests require a NEXUS API key as a Bearer token:
Authorization: Bearer nxk_your_key_here
Session-based authentication (Google OAuth, GitHub OAuth) is used for portal UI access only. The /v1/chat/completions endpoint accepts only API key auth.
NEXUS proxies the OpenAI chat completions format. Any client that supports a custom base URL works without modification.
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | required | Model identifier (e.g. gpt-4o, claude-opus-4-7) |
| messages | array | required | OpenAI-format messages array |
| stream | boolean | optional | Enable SSE streaming. Default: false |
| max_tokens | integer | optional | Maximum tokens in the completion |
| temperature | number | optional | Sampling temperature 0–2 |
| Parameter | Default | Description |
|---|---|---|
| Failure threshold | 5 | Consecutive 5xx/timeout errors before opening |
| Open window | 30 s | How long the circuit stays open |
| Success threshold | 2 | Consecutive successes in half-open to close |
| Trip statuses | 500, 502, 503, 504 | HTTP codes that increment the failure counter |
| Non-trip statuses | 4xx | Never trip the circuit |
When a provider's circuit is open, NEXUS automatically routes to the next available provider by weight. Retryable statuses (408, 429, 500, 502, 503, 504) trigger a retry on the next provider before failing the request.
Set "stream": true in the request body. NEXUS pipes the provider's SSE stream directly to the client without buffering. Circuit breaker state is updated based on the initial response status.
Prompts are versioned text templates with declared variables. Create a prompt with a slug and name, then add versions as you iterate.
POST /v1/prompts
{ "slug": "support-reply", "name": "Support Reply" }
Every change creates a new version (MAJOR.MINOR.PATCH). Versions start as draft (editable), then get published (immutable). Only one draft can exist per prompt at a time — enforced by a partial unique index in Postgres.
POST /v1/prompts/:id/versions
{
"semver": "1.2.0",
"content": "You are a support agent for {{company}}.\n\nTicket: {{ticket_content}}",
"variables": [
{ "name": "company" },
{ "name": "ticket_content" }
]
}
The renderer resolves {{varName}} and {{varName | default}} patterns. Undeclared variables pass through unchanged.
POST /v1/prompts/:id/render
{ "env": "prod", "variables": { "company": "Acme", "ticket_content": "Order missing" } }
Each prompt has one active deployment per environment (dev, staging, prod). Deploying atomically replaces the current deployment and stores previous_version_id for rollback.
POST /v1/prompts/:id/deploy
{ "version_id": "uuid-of-published-version", "env": "prod" }
One-step rollback restores the previous version. The previous_version_id stored at deploy time makes this O(1).
POST /v1/prompts/:id/rollback
{ "env": "prod" }
Evals test a prompt version against a set of cases before it reaches production. Each case defines an input variable map and one or more assertions about the expected output.
| Type | Description |
|---|---|
| pass_fail | Boolean pass/fail check on output |
| json_schema | Validates output against JSON Schema |
| regex | Output matches a regex pattern |
| semantic | Semantic similarity via embedding |
When gate_publish: true is set on an eval suite, publishing a prompt version is blocked if any eval case fails.
Policies intercept gateway requests in priority order. Types: pii_detection, model_restriction, spend_limit, time_restriction. Action: block or log-only.
Connect your GitHub repository to NEXUS. A push to the configured branch triggers the pipeline.
https://runtime.nexusai.run/v1/webhooks/gitops/<tenant-slug> and paste the secret.1. validate → version exists and is published 2. deploy_prompt → upserts prompt_deployments, stores previous_version_id 3. verify → queries prompt_deployments to confirm new version
Each step records start time, completion time, and output. The webhook returns 202 immediately — the pipeline runs asynchronously.
Proxy a chat completion to the best available provider. Requires Authorization: Bearer <key> with completions:write scope.
Returns circuit breaker state and configured providers. No authentication required.
List all prompts. Supports ?page=1&limit=50 pagination.
Create a new prompt. Body: { slug, name, description }.
Create a new version (draft). Body: { semver, content, variables[] }.
Publish a draft version, making it immutable and deployable.
Render a prompt for a given environment with variable substitution.
Deploy a published version to an environment.
Roll back an environment to the previous deployed version.
List API keys. Returns prefix, scope, and creation date. Never returns the full key.
Create a new API key. Body: { name, scopes[] }. Returns the full key once — store it immediately.
Revoke an API key immediately. All in-flight requests using that key fail with 401.