Beta · Q2 2026

Lower LLM costs. Zero production risk.

Swap one URL. Prompts are never stored. If CostGhost fails, your requests go through unmodified. Fail-open by default.

api.openai.com gw.costghost.dev

Three steps. One line of code.

No SDK changes. No model mapping. 23 declarative routing rules with a learning cache that adapts to your traffic.

01

Route through CostGhost

Point your existing OpenAI or Anthropic client at our gateway. One base URL change. Your code, prompts, and parameters stay identical.

02

Classify & route at the edge

Every request is classified in <1ms. Budget phase, task type, and priority determine the model. 23 rules, 5 budget phases, no latency overhead.

03

Save without losing quality

Low-priority classification? Haiku instead of Opus. Critical reasoning? Best model, guaranteed. You save on requests that never needed the expensive model.

Edge-native. Zero servers.

Built on Cloudflare Workers with Durable Objects for per-tenant state. Deployed across 300+ locations. No cold starts.

┌────────────────────────────────────────────────────────┐ Your app POST /v1/chat/completions POST /v1/messages (native Anthropic) ┌────────────────────────────────────┐ │ CostGhost Gateway (Hono + Edge) │ │ Classify → Route → Forward │ └──────────────┬─────────────────────┘ ┌────────────────────────────────────┐ │ Budget State Machine (DO) │ │ GREEN → YELLOW → ORANGE → RED │ │ 23 rules × 5 phases × task type │ └──────────────┬─────────────────────┘ ┌─────────┴─────────┐ ▼ ▼ ▼ ▼ Anthropic OpenAI Mistral Google └────────────────────────────────────────────────────────┘
<5ms
Routing latency
23
Routing rules
5
Budget phases
13
Models, 4 providers

Built for production. Not for demos.

Every claim on this page is backed by implemented code and automated tests.

Zero-retention by default

Your prompts are never stored. Audit logs contain only metadata: model, cost, latency, tenant ID, timestamp. Prompt logging is explicit opt-in per tenant.

Verified by 60 automated tests. No prompt data in serialized JSON.

Fail-open + kill-switch

If the Durable Object times out (>800ms), your request goes to the provider unmodified. Budget control is disabled during bypass — availability over cost control. BYPASS_GATEWAY=true disables CostGhost entirely. Per-tenant configurable: choose fail-open (default) or fail-closed for strict budget enforcement.

Default is fail-open. CostGhost never blocks your production traffic.

Shadow mode

Set X-CG-Mode: observation. Your requests go through unchanged. CostGhost computes hypothetical savings in the background and logs them to R2.

Risk-free proof of concept. Zero production impact.

Declarative routing with a learning cache.

23 rules ordered by specificity. First match wins. The learning cache (EMA) converges after ~500 requests to the cheapest model per task type that meets quality thresholds.

Budget state machine

5 phases: GREEN through HARD_STOP. Sequential transitions only. Idempotent spend recording. Automatic monthly reset. Sub-budgets per team and project.

Structured audit logs

Every routing decision logged to R2. Append-only NDJSON, partitioned by tenant/day/hour. Full traceability of what was routed where and why.

Native support. No format translation.

Two API formats: /v1/chat/completions (OpenAI-compatible) and /v1/messages (Anthropic-native). Routing downgrades within the same provider only. No cross-provider payload rewrite.

Anthropic
Claude 4.x + 3.x (native)
OpenAI
GPT-4o, GPT-4o-mini
Mistral
Large, Small
Google
Gemini 2.5 Pro + Flash

5% of your LLM spend. That's it.

No monthly minimum. No commitment. No hidden costs. You pay a platform fee on actual spend routed through CostGhost.

Example: $10,000/month LLM spend

Your LLM spend $10,000 CostGhost routing saves ~$3,000 (30%) Platform fee (5%) -$500 Net savings $2,500/month

The 5% fee applies to total spend routed through CostGhost, not to savings. If CostGhost saves you nothing, the fee is still 5% of spend. We believe the routing engine pays for itself within the first week.

30 seconds to integrate.

Change one line in your OpenAI or Anthropic client. No new dependencies, no config files, no migration.

  • OpenAI SDK: change baseURL
  • Anthropic SDK: use /v1/messages natively
  • Optional priority header for fine-grained control
  • Team and project sub-budgets via custom headers
  • Usage API for real-time cost monitoring
app.ts
// OpenAI — change one line: const client = new OpenAI({ baseURL: "https://api.openai.com/v1" baseURL: "https://gw.costghost.dev/v1" }); // Anthropic — same idea: const client = new Anthropic({ baseURL: "https://gw.costghost.dev" }); // Optional: set priority per request headers: { "X-CG-Priority": "low" }

We're onboarding the first 50 teams.

Start with shadow mode. See your savings before you commit. No credit card.

We'll reach out within 24 hours to set up your tenant.