The spend-control plane for AI agents

Your AI agents are spending in the dark.

One OpenAI-compatible proxy gives every turn a receipt, caps runaway loops before upstream billing, routes simple work to cheaper models, and compresses bulky tool context before it reaches the LLM.

Start free with emailEstimate savingsView routing winsOpenClaw guideHermes guideSelf-host docs

See it. Cap it. Shrink it. Keep it running. No token markup. Bring your own provider keys.

sample receiptOpenClaw session
Asked model
gpt-5.5
Landed model
gemini-2.5-flash-lite
Cheaper route
89×
≈$4.2k / 100k similar turns
Quality proof
Quality checked
A receipt tells your team what changed, why it changed, how much context was compressed, and whether the cheaper route is ready to promote.
Get started in one SDK change

Four steps, no agent rewrite.

Add the provider key you already use, mint a TokSuan project key, swap base_url, then inspect the first receipt.

01Provider key
02Project key
03base_url
04Receipt
Hosted value

Open runtime. Hosted policy operations.

The gateway path is inspectable: budgets, routing, provider resolution, receipts, and key handling are open. Hosted TokSuan adds the work nobody wants to operate every week: benchmark rosters, aggregate routing intelligence, policy promotion, rollback, provider health, and abuse review.

See trust boundarySee aggregate proof
01

Policy factory

Private eval recipes and model rosters produce candidate routing policies without touching the runtime path.

02

Aggregate intelligence

Hosted and opt-in self-host aggregates surface route pairs that repeatedly save money after privacy thresholds.

03

Ops guardrails

Provider health, DB sanity, incident snapshots, and report approvals keep policy changes reversible.

Developer friendly

Keep the SDK. Swap the gateway.

Cursor, OpenClaw, Hermes, LangChain, Vercel AI SDK, Cline, and internal bots can keep their OpenAI-compatible workflow. TokSuan adds receipts, budgets, routing, and deterministic context compression in the middle.

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://gateway.tokensmt.com/v1",
  apiKey: "ts_your_project_key",
});

await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Ship it." }],
});
Why teams add a control plane

Model gateways give access. TokSuan controls spend.

Agent workloads are not normal API traffic. They retry, call tools, branch into long sessions, replay bulky context, and silently swap from cheap to frontier models. TokSuan decides when a turn can move down, when context can shrink, and when it must stay untouched — then gives each decision a receipt.

01

Bills arrive after the damage

Which agent, project, or prompt created the spike?

02

Agents repeat expensive mistakes

Looping sessions can keep spending while nobody is watching.

03

Routing needs proof

Cheaper models need evidence before production traffic moves.

Control loop

See it. Cap it. Shrink it. Keep it running.

Four product surfaces work together: a ledger for visibility, budget guards for control, routing proof for safe savings, and reversible context compression for bulky tool output.

See it

See every turn.

Every request becomes a receipt: asked model, landed model, tokens, latency, routing reason, compression savings, and exact cost.

Cap it

Cap runaway spend.

Daily budgets, plan caps, and loop detection stop bad agent behavior before the provider bills you.

Shrink it

Shrink the replay tax.

Agents replay tool output. TokSuan compresses JSON rows, logs, diffs, stack traces, and shell output before they become input tokens.

Keep it running

Route and compress with proof.

Public benchmarks provide the day-one routing frontier. Audit/optimize compression modes show byte savings before prompts are rewritten, and reversible storage keeps original tool output available.

No token markup

Keep paying providers directly. TokSuan is the control layer, not a reseller.

Encrypted BYO keys

Your app uses ts_ project keys while upstream secrets stay encrypted at rest.

Self-hostable

Run the Apache-2.0 code with Postgres when your team needs full control.

Before you route traffic

The questions buyers ask first.

The short version for engineering, finance, and security before you put production agent calls behind a gateway.

Do I need to change my app?

Usually one base_url and one API key. Your request shape stays OpenAI-compatible.

Is this OpenRouter?

No. OpenRouter gives access to many models. TokSuan decides which model an agent turn should use, enforces budgets, and learns from your workload.

Can I see the exact request that spent money?

Yes. The ledger stores model, provider, tokens, latency, tags, cost, receipt headers, and, when reversible context compression is enabled, the original tool output.

Will this hurt model quality?

Routes promote only when the policy and your receipts show the cheaper model is safe; shadow trials let you prove quality before switching production traffic.

What if an agent loops?

Loop detection and budgets can stop repeated turns before they reach upstream.

Is hosted the only option?

No. Use the hosted gateway or self-host when deployment control matters more.

Start with one real request

Send one agent request. Get one receipt your team can trust.

Estimate the opportunity, connect a provider key, and inspect the first request before changing a production route.

Start freeQuickstart guideSee trust boundary