Your AI agents are spending in the dark.
One OpenAI-compatible proxy gives every turn a receipt, caps runaway loops before upstream billing, routes simple work to cheaper models, and compresses bulky tool context before it reaches the LLM.
See it. Cap it. Shrink it. Keep it running. No token markup. Bring your own provider keys.
gpt-5.5gemini-2.5-flash-liteFour steps, no agent rewrite.
Add the provider key you already use, mint a TokSuan project key, swap base_url, then inspect the first receipt.
Open runtime. Hosted policy operations.
The gateway path is inspectable: budgets, routing, provider resolution, receipts, and key handling are open. Hosted TokSuan adds the work nobody wants to operate every week: benchmark rosters, aggregate routing intelligence, policy promotion, rollback, provider health, and abuse review.
Policy factory
Private eval recipes and model rosters produce candidate routing policies without touching the runtime path.
Aggregate intelligence
Hosted and opt-in self-host aggregates surface route pairs that repeatedly save money after privacy thresholds.
Ops guardrails
Provider health, DB sanity, incident snapshots, and report approvals keep policy changes reversible.
Keep the SDK. Swap the gateway.
Cursor, OpenClaw, Hermes, LangChain, Vercel AI SDK, Cline, and internal bots can keep their OpenAI-compatible workflow. TokSuan adds receipts, budgets, routing, and deterministic context compression in the middle.
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://gateway.tokensmt.com/v1",
apiKey: "ts_your_project_key",
});
await client.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "Ship it." }],
});Model gateways give access. TokSuan controls spend.
Agent workloads are not normal API traffic. They retry, call tools, branch into long sessions, replay bulky context, and silently swap from cheap to frontier models. TokSuan decides when a turn can move down, when context can shrink, and when it must stay untouched — then gives each decision a receipt.
Bills arrive after the damage
Which agent, project, or prompt created the spike?
Agents repeat expensive mistakes
Looping sessions can keep spending while nobody is watching.
Routing needs proof
Cheaper models need evidence before production traffic moves.
See it. Cap it. Shrink it. Keep it running.
Four product surfaces work together: a ledger for visibility, budget guards for control, routing proof for safe savings, and reversible context compression for bulky tool output.
See every turn.
Every request becomes a receipt: asked model, landed model, tokens, latency, routing reason, compression savings, and exact cost.
Cap runaway spend.
Daily budgets, plan caps, and loop detection stop bad agent behavior before the provider bills you.
Shrink the replay tax.
Agents replay tool output. TokSuan compresses JSON rows, logs, diffs, stack traces, and shell output before they become input tokens.
Route and compress with proof.
Public benchmarks provide the day-one routing frontier. Audit/optimize compression modes show byte savings before prompts are rewritten, and reversible storage keeps original tool output available.
No token markup
Keep paying providers directly. TokSuan is the control layer, not a reseller.
Encrypted BYO keys
Your app uses ts_ project keys while upstream secrets stay encrypted at rest.
Self-hostable
Run the Apache-2.0 code with Postgres when your team needs full control.
The questions buyers ask first.
The short version for engineering, finance, and security before you put production agent calls behind a gateway.
Do I need to change my app?
Usually one base_url and one API key. Your request shape stays OpenAI-compatible.
Is this OpenRouter?
No. OpenRouter gives access to many models. TokSuan decides which model an agent turn should use, enforces budgets, and learns from your workload.
Can I see the exact request that spent money?
Yes. The ledger stores model, provider, tokens, latency, tags, cost, receipt headers, and, when reversible context compression is enabled, the original tool output.
Will this hurt model quality?
Routes promote only when the policy and your receipts show the cheaper model is safe; shadow trials let you prove quality before switching production traffic.
What if an agent loops?
Loop detection and budgets can stop repeated turns before they reach upstream.
Is hosted the only option?
No. Use the hosted gateway or self-host when deployment control matters more.
Send one agent request. Get one receipt your team can trust.
Estimate the opportunity, connect a provider key, and inspect the first request before changing a production route.