CostGhost — Lower LLM costs. Zero production risk.

How it works

Three steps. One line of code.

No SDK changes. No model mapping. 23 declarative routing rules with a learning cache that adapts to your traffic.

Route through CostGhost

Point your existing OpenAI or Anthropic client at our gateway. One base URL change. Your code, prompts, and parameters stay identical.

Classify & route at the edge

Every request is classified in <1ms. Budget phase, task type, and priority determine the model. 23 rules, 5 budget phases, no latency overhead.

Save without losing quality

Low-priority classification? Haiku instead of Opus. Critical reasoning? Best model, guaranteed. You save on requests that never needed the expensive model.

Architecture

Edge-native. Zero servers.

Built on Cloudflare Workers with Durable Objects for per-tenant state. Deployed across 300+ locations. No cold starts.

┌────────────────────────────────────────────────────────┐ │ Your app │ │ POST /v1/chat/completions │ │ POST /v1/messages (native Anthropic) │ │ │ │ │ ▼ │ │ ┌────────────────────────────────────┐ │ │ │ CostGhost Gateway (Hono + Edge) │ │ │ │ Classify → Route → Forward │ │ │ └──────────────┬─────────────────────┘ │ │ │ │ │ ▼ │ │ ┌────────────────────────────────────┐ │ │ │ Budget State Machine (DO) │ │ │ │ GREEN → YELLOW → ORANGE → RED │ │ │ │ 23 rules × 5 phases × task type │ │ │ └──────────────┬─────────────────────┘ │ │ ┌─────────┴─────────┐ │ │ ▼ ▼ ▼ ▼ │ │ Anthropic OpenAI Mistral Google │ └────────────────────────────────────────────────────────┘

300+

Edge locations

Routing rules

Budget phases

Models, 4 providers

Why you can trust it

Built for production. Not for demos.

Every claim on this page is backed by implemented code and automated tests.

Zero-retention by default

Your prompts are never stored. Audit logs contain only metadata: model, cost, latency, tenant ID, timestamp. Prompt logging is explicit opt-in per tenant.

Verified by 60 automated tests. No prompt data in serialized JSON.

Fail-open + kill-switch

If the Durable Object times out (>800ms), your request goes to the provider unmodified. Budget control is disabled during bypass — availability over cost control. BYPASS_GATEWAY=true disables CostGhost entirely. Per-tenant configurable: choose fail-open (default) or fail-closed for strict budget enforcement.

Default is fail-open. CostGhost never blocks your production traffic.

Shadow mode

Set X-CG-Mode: observation. Your requests go through unchanged. CostGhost computes hypothetical savings in the background and logs them to R2.

Risk-free proof of concept. Zero production impact.

Under the hood

Declarative routing with a learning cache.

23 rules ordered by specificity. First match wins. The learning cache (EMA) converges after ~500 requests to the cheapest model per task type that meets quality thresholds.

Budget state machine

5 phases: GREEN through HARD_STOP. Sequential transitions only. Idempotent spend recording. Automatic monthly reset. Sub-budgets per team and project.

Structured audit logs

Every routing decision logged to R2. Append-only NDJSON, partitioned by tenant/day/hour. Full traceability of what was routed where and why.

Provider compatibility

Native support. No format translation.

Two API formats: /v1/chat/completions (OpenAI-compatible) and /v1/messages (Anthropic-native). Routing downgrades within the same provider only. No cross-provider payload rewrite.

Anthropic

Claude 4.x + 3.x (native)

OpenAI

GPT-4o, GPT-4o-mini

Mistral

Large, Small

Google

Gemini 2.5 Pro + Flash

Pricing

5% of your LLM spend. That's it.

No monthly minimum. No commitment. No hidden costs. You pay a platform fee on actual spend routed through CostGhost.

Example: $10,000/month LLM spend

Your LLM spend $10,000 CostGhost routing saves ~$3,000 (30%) Platform fee (5%) -$500 Net savings $2,500/month

The 5% fee applies to total spend routed through CostGhost, not to savings. If CostGhost saves you nothing, the fee is still 5% of spend. We believe the routing engine pays for itself within the first week.

Integration

30 seconds to integrate.

Change one line in your OpenAI or Anthropic client. No new dependencies, no config files, no migration.

OpenAI SDK: change baseURL
Anthropic SDK: use /v1/messages natively
Optional priority header for fine-grained control
Team and project sub-budgets via custom headers
Usage API for real-time cost monitoring

          
          app.ts
        
// OpenAI — change one line:
const client = new OpenAI({
  baseURL: "https://api.openai.com/v1"
  baseURL: "https://gw.costghost.dev/v1"
});

// Anthropic — same idea:
const client = new Anthropic({
  baseURL: "https://gw.costghost.dev"
});

// Optional: set priority per request
headers: { "X-CG-Priority": "low" }

Lower LLM costs. Zero production risk.