Circuit breaker per provider
Three states: closed → open → half-open. Opens after 5 consecutive 5xx errors or timeouts. 4xx errors never trip it.
Open Console
Gateway failover, versioned prompt releases, eval quality gates, policy enforcement, full observability — all in one platform.
https://runtime.nexusai.run/v1 and replace your provider key with a NEXUS API key. No other code changes required — every OpenAI-compatible client works as-is. Streaming SSE is piped directly without buffering.
Three states: closed → open → half-open. Opens after 5 consecutive 5xx errors or timeouts. 4xx errors never trip it.
When a circuit opens, requests route to the next available provider by weight. Retries on 408, 429, 5xx before failing.
Works with any client that supports a custom base URL — LangChain, Vercel AI SDK, raw fetch, and more.
POST /v1/chat/completions Authorization: Bearer nxk_your_key_here Response headers: x-nexus-provider: anthropic ← which provider handled it x-nexus-attempt: 2 ← openai was tried first (circuit open) x-nexus-latency-ms: 487
Once published, a version cannot be edited. All changes require a new semver bump.
dev, staging, and prod are independent. Promote from staging to prod without touching dev.
previous_version_id is stored at deploy time — rollback is O(1) with no history scan.
POST /v1/prompts/:id/render
{ "env": "prod", "variables": { "company": "Acme", "ticket": "Order missing" } }
→ resolves {{company}} and {{ticket | no ticket provided}}
→ logs SHA-256 of variable map for deduplication
gate_publish: true is set on an eval suite, publishing a prompt version is blocked if any case fails. This enforces quality gates in the CI/CD pipeline without manual review steps.
Scans request content for credit cards, SSNs, emails, and phone numbers. Block or log-only.
Allowlist specific models per tenant or API key. Requests for blocked models return 403.
Block requests when daily or monthly spend exceeds a threshold. Tracked via the daily rollup table.
p50, p95, p99 latency per provider. See which model is slowing down the p99 before users notice.
Composite 0-100 score from gateway availability, latency stability, policy friction, prompt eval health, and deployment health.
Daily cost trend by model and provider. Budget alerts fire at configurable thresholds (80%, 100%).
Real-time per-provider circuit state (closed / open / half-open) polled every 30 seconds in the portal.
Error breakdown by HTTP status code. Retryable vs non-retryable distinguished in all dashboards.
Every gateway request logged with provider, model, latency, tokens, cost. Partitioned by day, 90-day retention.
Every render logged with a SHA-256 of the variable map for deduplication. Partitioned by month.
Correlates gateway errors, provider timeouts, rate limits, policy blocks, eval failures, and deployment failures into clear failure explanations.
POST /v1/webhooks/gitops/:tenantSlug X-Hub-Signature-256: sha256=<hmac> ← validated before any logic Pipeline steps (async, fire-and-forget): 1. validate → version exists, status = 'published' 2. deploy_prompt → INSERT ... ON CONFLICT DO UPDATE 3. verify → SELECT confirms new version_id in place
GitHub push webhooks validated with the shared secret before any pipeline logic executes.
Click any pipeline run to see step-by-step progress, output, duration, branch, and commit SHA in a slide-in drawer.
The detail drawer polls every 3 seconds for queued or running pipelines. Stops when the run completes.
Per-tenant daily spend tracked in tenant_daily_spend, updated after every completed request. Queryable for dashboards and used by budget alerts and spend-limit policies.
Email alerts fire when spend crosses configurable thresholds (e.g. 80%, 100% of monthly budget). Background checker polls budget_alerts every 5 minutes.
Per-key monthly request quotas enforced at the gateway by enforceQuota() middleware. Exceeding the limit returns HTTP 429 with QUOTA_EXCEEDED before the request reaches any provider.
completions:write) that restrict what operations they can perform.
POST /v1/chat/completions. Keys without this scope return 403 on gateway requests.
After the 30-second open window, the circuit moves to half-open and allows one probe request. If that request succeeds, it counts as the first of two required successes. A second consecutive success closes the circuit and restores normal routing. A failure in half-open re-opens the circuit for another 30 seconds.
Yes. Both support a custom base URL. Set the base URL to https://runtime.nexusai.run/v1 and use your NEXUS API key. No other changes are needed — NEXUS accepts the full OpenAI chat completions request format including streaming.
The failed step records its error output and marks the run as failed. Subsequent steps do not execute. The prompt deployment is not changed if the validate or deploy_prompt steps fail. You can re-trigger a run from the GitOps page after fixing the underlying issue.
Yes. Once a version is published, its content cannot be changed. This is enforced at the API level and by the database — there is no admin override to edit published versions. This ensures that what you deployed to production today is provably identical to what you tested.