Shift left
Review cost before merge.
A clear monthly dollar impact, posted in the PR thread, while the architecture is still cheap to change.
Dossier № 01 · Cost Engineering · v0.4 Preview
Two pipes feed the same bill: Terraform plans and model calls. CloudCost AI prices the first in your pull requests, meters the second in flight — and never sees a prompt.
Engine 01 · GitHub App
A webhook reads your Terraform plan JSON, prices each resource through Infracost, and comments the monthly delta on the PR — where reviewers already are.
Catch the $1,430-a-month NAT gateway before it ships. Not after the bill.
| Owner | Model | In | Out | Cost |
|---|---|---|---|---|
| team:research | claude-opus-4.7 | 42,180 | 3,920 | $0.31 |
| agent:rag-loop | gpt-4o-mini | 218,402 | 14,118 | $0.04 |
| user:m.adler | gpt-4o | 8,440 | 1,260 | $0.03 |
| feature:summarize | gemini-2.5-pro | 19,210 | 2,840 | $0.05 |
| team:growth | claude-haiku-4.5 | 62,118 | 8,402 | $0.10 |
Engine 02 · LiteLLM Gateway
Issue virtual keys, enforce per-team budgets, meter every token — input, output, reasoning — across every provider you call.
Telemetry keeps the counts and the cost. Prompt payloads stay where they were.
Schema · § 02 · How the engines connect
CloudCost AI taps into the moments where money is committed — the merge button and the API call — and never anywhere else.
Reckoning · § 03 · Try the math
Pick a model. Drag the sliders. Watch the monthly bill resolve in real time.
This is roughly what CloudCost AI will show you across every team, feature, and agent — with real traffic instead of estimates. You will be surprised which workloads are the expensive ones. Most teams are.
Receipts - Live artifacts
CloudCost AI should not ask teams to trust a new dashboard first. It should show up where cost decisions already happen: inside the pull request and inside the model gateway.
Plan JSON found, priced, and posted as one review comment for the team.
Spend grouped by model, key alias, user, team, and service metadata.
Run the pricing API yourself, then let CloudCost call the local CLI.
Receipts first. Dashboard later.
Installation · § 04
Two engines, no custom SDK. Point your existing OpenAI-compatible client at the gateway, then install the GitHub app on the repos you care about.
House Rules · § 05
Three places where cost surprises are cheapest to kill. Everything else CloudCost AI does — dashboards, reports, alerts — is downstream of these.
Shift left
A clear monthly dollar impact, posted in the PR thread, while the architecture is still cheap to change.
Attribute
Know which users, features, teams, and autonomous workflows are driving every dollar of token usage.
Enforce
Programmable budget ceilings so an agent in a loop fails closed — before an overnight experiment turns into an incident.
Tariff · § 06
Free to evaluate. Flat per-engineer when you're ready. Talk to us only when you have to.
For one engineer, one repo
For engineering orgs shipping with AI
For platforms with compliance teams
Questions answered · § 07
No. The LiteLLM gateway sees the request in transit, tags it, counts the tokens, and forwards it. Payload bodies — your prompt and the model's completion — are dropped before anything is written to disk. What we persist: token counts, model, tags (user, team, feature, agent), latency, and the computed cost.
If you need a redacted sample for debugging, you can opt-in per-route with a TTL. Default is off.
Yes, on the Enterprise tier. The gateway ships as a container, and the Terraform analyzer is a GitHub Action you already run in your own CI. The control plane (dashboards, alerts, budget rules) can point at a self-hosted database — Postgres or ClickHouse — that never leaves your network.
We use Infracost under the hood for unit pricing — they're great at it. CloudCost AI is the layer above: the GitHub App that knows which PRs to comment on, the policy engine that escalates large deltas to senior reviewers, the dashboard that joins Terraform spend with the matching cloud bill, and the LLM half that Infracost doesn't cover at all.
If you already love Infracost, you can keep using it directly. If you want budgets, attribution, alerts, and a single ledger across infra and LLMs, that's us.
Clouds: AWS, Google Cloud, Azure, Kubernetes (any), Cloudflare, and Fly.io. Models: anything reachable via an OpenAI-compatible endpoint — OpenAI, Anthropic, Google Gemini, Mistral, Meta Llama via Together / Groq / Fireworks, AWS Bedrock, and Azure OpenAI.
Missing yours? Tell us during early-access onboarding. Provider requests from early access conversations are how the roadmap gets ordered right now.
For the LLM gateway: roughly 10 minutes. Install the GitHub App, get a virtual key, point your OPENAI_BASE_URL at our endpoint. Existing SDKs need no other changes.
For the Terraform PR analyzer: roughly 15 minutes. Install the GitHub App on the repos you want, drop the workflow YAML in, open a PR with an infra change to see the comment.
Internal target: p50 under 12 seconds, p95 under 30 seconds, from tfplan.json upload to comment posted. We will publish the production SLA once there is production traffic to measure against. Larger plans (1000+ resources) take longer; the comment appears as a draft within 30s and finalizes when pricing resolves.
Early Access · Q3 2026
We'd rather you saw it in the pull request. Early access this quarter, for engineering teams shipping with LLMs and Terraform.