How does the NEXUS AI gateway circuit breaker work?

Each provider has an independent circuit breaker with three states: closed (normal routing), open (provider bypassed for 30 seconds), and half-open (probing with one request). The circuit opens after 5 consecutive 5xx errors or timeouts. Two consecutive successes in half-open close it. 4xx errors never trip the circuit — they are caller errors, not provider failures.

What is semver prompt versioning in NEXUS?

NEXUS prompt versions follow semantic versioning (MAJOR.MINOR.PATCH). Versions are immutable once published — no edits allowed after publish. Only one draft can exist per prompt at a time (enforced by a partial unique index in Postgres). Published versions can be deployed to dev, staging, and prod environments independently.

What assertion types does NEXUS evals support?

NEXUS evals support four assertion types: pass_fail (boolean check), json_schema (validates output against a JSON Schema definition), regex (pattern match against output), and semantic (embedding-based similarity check against expected text). When gate_publish is enabled, publishing a prompt version is blocked if any eval case fails.

How does the NEXUS GitOps pipeline work?

The NEXUS GitOps pipeline triggers on GitHub push webhooks (validated with HMAC-SHA256). It runs three steps asynchronously: validate (checks the prompt version exists and is published), deploy_prompt (upserts the prompt_deployments record, storing previous_version_id for rollback), and verify (queries the deployment record to confirm success). The HTTP response returns 202 immediately — the pipeline runs fire-and-forget.

NEXUS Runtime Features — AI Gateway, Prompt Registry, Reliability & GitOps

AI Gateway

Route any LLM request through one endpoint with automatic failover.

How it works NEXUS proxies the OpenAI chat completions format. Point your client's base URL to https://runtime.nexusai.run/v1 and replace your provider key with a NEXUS API key. No other code changes required — every OpenAI-compatible client works as-is. Streaming SSE is piped directly without buffering.

Circuit breaker per provider

Three states: closed → open → half-open. Opens after 5 consecutive 5xx errors or timeouts. 4xx errors never trip it.

Automatic failover

When a circuit opens, requests route to the next available provider by weight. Retries on 408, 429, 5xx before failing.

OpenAI-compatible API

Works with any client that supports a custom base URL — LangChain, Vercel AI SDK, raw fetch, and more.

POST /v1/chat/completions
Authorization: Bearer nxk_your_key_here

Response headers:
  x-nexus-provider: anthropic   ← which provider handled it
  x-nexus-attempt:  2           ← openai was tried first (circuit open)
  x-nexus-latency-ms: 487

Prompt Registry

Version prompts like code. Deploy to environments. Roll back in one step.

Semver lifecycle Prompts have semantic versions (MAJOR.MINOR.PATCH). A version starts as a draft (editable), then gets published (immutable). Only one draft can exist per prompt at a time — enforced by a partial unique index in Postgres. Published versions can be deployed independently to dev, staging, or prod.

Immutable published versions

Once published, a version cannot be edited. All changes require a new semver bump.

Per-environment deployments

dev, staging, and prod are independent. Promote from staging to prod without touching dev.

One-step rollback

previous_version_id is stored at deploy time — rollback is O(1) with no history scan.

POST /v1/prompts/:id/render
{ "env": "prod", "variables": { "company": "Acme", "ticket": "Order missing" } }

→ resolves {{company}} and {{ticket | no ticket provided}}
→ logs SHA-256 of variable map for deduplication

Evals

Test prompts against real cases before they reach production.

gate_publish When gate_publish: true is set on an eval suite, publishing a prompt version is blocked if any case fails. This enforces quality gates in the CI/CD pipeline without manual review steps.

01 pass_fail Boolean check on the output — non-empty, contains expected substring, or any custom condition.

02 json_schema Validates the output against a JSON Schema definition. Catches missing fields, wrong types, and extra keys.

03 regex Pattern match against the full output string. Useful for format checks like email addresses or JSON structure.

04 semantic Embedding-based similarity check against an expected text. Catches meaning regressions without exact-match brittleness.

05 gate_publish Block prompt version publish if any eval case in the suite fails. Enforces quality gates automatically.

06 Per-case results Full output excerpt and pass/fail per case stored in history. Queryable for regression tracking across versions.

Policy Engine

Intercept and enforce rules on every gateway request.

How policies work Policies execute in priority order (lower number = higher priority) before the request reaches any provider. Each policy has a type, an action (block or log-only), and configuration. Multiple policies compose — a PII rule and a model restriction rule can both be active simultaneously.

PII detection

Scans request content for credit cards, SSNs, emails, and phone numbers. Block or log-only.

Model restriction

Allowlist specific models per tenant or API key. Requests for blocked models return 403.

Spend limits

Block requests when daily or monthly spend exceeds a threshold. Tracked via the daily rollup table.

Observability

Real metrics, not guesses. Every request logged and queryable.

Per-provider latency

p50, p95, p99 latency per provider. See which model is slowing down the p99 before users notice.

Reliability Score

Composite 0-100 score from gateway availability, latency stability, policy friction, prompt eval health, and deployment health.

Token spend

Daily cost trend by model and provider. Budget alerts fire at configurable thresholds (80%, 100%).

Circuit breaker state

Real-time per-provider circuit state (closed / open / half-open) polled every 30 seconds in the portal.

Error rates

Error breakdown by HTTP status code. Retryable vs non-retryable distinguished in all dashboards.

Request logs

Every gateway request logged with provider, model, latency, tokens, cost. Partitioned by day, 90-day retention.

Prompt render logs

Every render logged with a SHA-256 of the variable map for deduplication. Partitioned by month.

Root cause analysis

Correlates gateway errors, provider timeouts, rate limits, policy blocks, eval failures, and deployment failures into clear failure explanations.

GitOps CI/CD

Merge to main. Prompt deploys automatically.

3-step pipeline A GitHub push webhook triggers a fire-and-forget pipeline: validate (prompt version exists and is published), deploy_prompt (upserts prompt_deployments, stores previous_version_id), verify (re-queries the record to confirm). Each step records start time, completion, and output. The webhook returns 202 before any pipeline logic runs.

POST /v1/webhooks/gitops/:tenantSlug
X-Hub-Signature-256: sha256=<hmac>    ← validated before any logic

Pipeline steps (async, fire-and-forget):
  1. validate        → version exists, status = 'published'
  2. deploy_prompt   → INSERT ... ON CONFLICT DO UPDATE
  3. verify          → SELECT confirms new version_id in place

HMAC-SHA256 webhook validation

GitHub push webhooks validated with the shared secret before any pipeline logic executes.

Interactive run detail

Click any pipeline run to see step-by-step progress, output, duration, branch, and commit SHA in a slide-in drawer.

Auto-refresh for active runs

The detail drawer polls every 3 seconds for queued or running pipelines. Stops when the run completes.

Cost Governance

Know exactly what you're spending before the bill arrives.

Daily spend rollup

Per-tenant daily spend tracked in tenant_daily_spend, updated after every completed request. Queryable for dashboards and used by budget alerts and spend-limit policies.

Budget alerts

Email alerts fire when spend crosses configurable thresholds (e.g. 80%, 100% of monthly budget). Background checker polls budget_alerts every 5 minutes.

Quota enforcement

Per-key monthly request quotas enforced at the gateway by enforceQuota() middleware. Exceeding the limit returns HTTP 429 with QUOTA_EXCEEDED before the request reaches any provider.

Security & RBAC

Scoped API keys, tenant isolation, and role-based access control.

API key security NEXUS stores only a SHA-256 hash of each API key — the plaintext is shown once at creation and never stored. The key prefix (first 8 characters) is displayed in the portal for identification. Keys have scopes (completions:write) that restrict what operations they can perform.

Admin Full access — manage providers, keys, policies, members, and billing settings.

Developer Manage prompts, evals, agents, and GitOps pipelines. Cannot manage tenant membership or billing.

Viewer Read-only access to dashboards, run logs, and prompt versions. No write operations.

API Key scope: completions:write Allows calling POST /v1/chat/completions. Keys without this scope return 403 on gateway requests.

FAQ

Common questions about NEXUS features.

How does the circuit breaker know when a provider recovers?

After the 30-second open window, the circuit moves to half-open and allows one probe request. If that request succeeds, it counts as the first of two required successes. A second consecutive success closes the circuit and restores normal routing. A failure in half-open re-opens the circuit for another 30 seconds.

Can I use NEXUS with LangChain or Vercel AI SDK?

Yes. Both support a custom base URL. Set the base URL to https://runtime.nexusai.run/v1 and use your NEXUS API key. No other changes are needed — NEXUS accepts the full OpenAI chat completions request format including streaming.

What happens if a GitOps pipeline step fails?

The failed step records its error output and marks the run as failed. Subsequent steps do not execute. The prompt deployment is not changed if the validate or deploy_prompt steps fail. You can re-trigger a run from the GitOps page after fixing the underlying issue.

Are prompt versions really immutable?

Yes. Once a version is published, its content cannot be changed. This is enforced at the API level and by the database — there is no admin override to edit published versions. This ensures that what you deployed to production today is provably identical to what you tested.

Everything you need to shipAI to production