Know the cost before you merge.

Engine 01 · GitHub App

Terraform plans, priced in review.

A webhook reads your Terraform plan JSON, prices each resource through Infracost, and comments the monthly delta on the PR — where reviewers already are.

Catch the $1,430-a-month NAT gateway before it ships. Not after the bill.

Reads plan JSON straight from CI
Breaks down by resource and provider
Comments inline on the diff — no new dashboard to check
Supports AWS, GCP, Azure, and Kubernetes manifests

~/infra/eks-prod-redesign — terraform plan

$ terraform plan -out=tfplan $ terraform show -json tfplan > tfplan.json # GitHub PR opened with tfplan.json # CloudCost webhook priced it via Infracost Resource Monthly ──────────────────────────────────────────── + aws_rds_cluster.analytics +$1,180.00 + aws_nat_gateway.prod-az-a +$ 38.00 + aws_eks_node_group.gpu +$ 524.00 − aws_instance.legacy-jumpbox −$ 312.00 ──────────────────────────────────────────── Δ monthly +$1,430.00 ✓ Posted to github.com/acme/infra#847 $

Token ledger · last 60s Live

Owner	Model	In	Out	Cost
team:research	claude-opus-4.7	42,180	3,920	$0.31
agent:rag-loop	gpt-4o-mini	218,402	14,118	$0.04
user:m.adler	gpt-4o	8,440	1,260	$0.03
feature:summarize	gemini-2.5-pro	19,210	2,840	$0.05
team:growth	claude-haiku-4.5	62,118	8,402	$0.10

Engine 02 · LiteLLM Gateway

LLM spend, per key. Per user. Per agent.

Issue virtual keys, enforce per-team budgets, meter every token — input, output, reasoning — across every provider you call.

Telemetry keeps the counts and the cost. Prompt payloads stay where they were.

Drop-in OpenAI-compatible endpoint
Tag traffic by user, team, feature, or agent
Hard budget ceilings — fail closed, not open
Zero prompt or completion storage by default

Schema · § 02 · How the engines connect

Two pipes. One ledger of truth.

CloudCost AI taps into the moments where money is committed — the merge button and the API call — and never anywhere else.

Engine 01 · Terraform PR Analyzer read-only · sub-30s

i. git push Developer opens or updates a PR with infra changes.

ii. CI runs plan Existing pipeline emits a Terraform plan JSON.

iii. cost analyzer CloudCost reads the plan, prices each resource via Infracost.

iv. PR comment Monthly delta posted as a review comment on the diff.

Engine 02 · LLM Gateway no prompt storage

i. your app SDK call with a CloudCost-issued virtual key.

ii. LiteLLM proxy Routes to provider, checks budgets, tags the request.

iii. provider API OpenAI, Anthropic, Google, or any compatible endpoint.

iv. metered ledger Counts, tags, and cost — payload dropped before storage.

Reckoning · § 03 · Try the math

Cost a model. Live.

Pick a model. Drag the sliders. Watch the monthly bill resolve in real time.

This is roughly what CloudCost AI will show you across every team, feature, and agent — with real traffic instead of estimates. You will be surprised which workloads are the expensive ones. Most teams are.

Model

Daily traffic

Input tokens / day 500,000

Output tokens / day 120,000

Estimated monthly spend

Input · $5.00 / 1M$0.00

Output · $25.00 / 1M$0.00

Days in month30

Rates load from the local LiteLLM pricing map when available. CloudCost AI still records the actual cost reported by your gateway traffic.

Receipts - Live artifacts

Every expensive action leaves a receipt.

CloudCost AI should not ask teams to trust a new dashboard first. It should show up where cost decisions already happen: inside the pull request and inside the model gateway.

i No mystery math.Terraform plans are priced before merge, with the monthly delta attached to the review.

ii No prompt storage.LiteLLM sends token counts, model names, teams, and cost metadata only.

iii No forced SaaS pricing pipe.The Infracost CLI can point at your self-hosted Cloud Pricing API.

GitHub pull request Ready before merge

+$184.31/mo

Plan JSON found, priced, and posted as one review comment for the team.

Sourcetfplan.json
Policyreview
Statuscommented

LiteLLM gateway Prompt blind

$27.48 today

Spend grouped by model, key alias, user, team, and service metadata.

Promptsnot stored
Budgetactive
KeysNeon

Pricing engine Self-hostable

your network

Run the pricing API yourself, then let CloudCost call the local CLI.

CloudAWS/Azure/GCP
ModeCLI

Receipts first. Dashboard later.

Installation · § 04

Wired in, before lunch.

Two engines, no custom SDK. Point your existing OpenAI-compatible client at the gateway, then install the GitHub app on the repos you care about.

~/your-app — route model calls

# Step i — use your existing SDK $ pip install openai # Step ii — point your SDK at CloudCost OPENAI_BASE_URL="/llm/v1" OPENAI_API_KEY="cc_sk_live_..." # Step iii — tag the traffic client.chat.completions.create( model="claude-opus-4.7", extra_body={ "metadata": { "cc_team": "research", "cc_feature": "summarize", }, }, ... ) ✓ Traffic now tagged and metered. $

~/infra — connect Terraform PRs

# Step i — install the GitHub App $ open "/install/github" ✓ Choose repos. CloudCost receives PR webhooks. # Step ii — include a Terraform plan JSON $ terraform plan -out=tfplan $ terraform show -json tfplan > tfplan.json ✓ Next PR with tfplan.json gets a cost comment. $

House Rules · § 05

Three principles, strictly observed.

Three places where cost surprises are cheapest to kill. Everything else CloudCost AI does — dashboards, reports, alerts — is downstream of these.

Shift left

Review cost before merge.

A clear monthly dollar impact, posted in the PR thread, while the architecture is still cheap to change.

ii.

Attribute

Map AI spend to owners.

Know which users, features, teams, and autonomous workflows are driving every dollar of token usage.

iii.

Enforce

Stop runaway loops.

Programmable budget ceilings so an agent in a loop fails closed — before an overnight experiment turns into an incident.

Tariff · § 06

Three tiers, plainly priced.

Free to evaluate. Flat per-engineer when you're ready. Talk to us only when you have to.

Solo

For one engineer, one repo

$0 forever

PR cost comments on public repos
LLM gateway · 250k tokens / day
One contributor, one workspace
14-day spend retention
Community support

Request Solo access →

Most teams II.

Team

For engineering orgs shipping with AI

$29 / engineer / mo

Everything in Solo
Private repos, unlimited contributors
Team, feature, and agent budgets
Slack and PagerDuty alerts
90-day spend retention
Fail-closed enforcement
Email support · 1 business day

Request Team access →

III.

Enterprise

For platforms with compliance teams

Custom

Everything in Team
Self-hosted gateway option
SAML SSO and SCIM
Audit logs and SOC 2 reporting
Unlimited retention
Dedicated success engineer
Custom SLAs

Talk to us →

Questions answered · § 07

Things teams ask before signing up.

Do you store our prompts or completions?+

No. The LiteLLM gateway sees the request in transit, tags it, counts the tokens, and forwards it. Payload bodies — your prompt and the model's completion — are dropped before anything is written to disk. What we persist: token counts, model, tags (user, team, feature, agent), latency, and the computed cost.

If you need a redacted sample for debugging, you can opt-in per-route with a TTL. Default is off.

Can we self-host the gateway and analyzer?+

Yes, on the Enterprise tier. The gateway ships as a container, and the Terraform analyzer is a GitHub Action you already run in your own CI. The control plane (dashboards, alerts, budget rules) can point at a self-hosted database — Postgres or ClickHouse — that never leaves your network.

How is this different from running Infracost ourselves?+

We use Infracost under the hood for unit pricing — they're great at it. CloudCost AI is the layer above: the GitHub App that knows which PRs to comment on, the policy engine that escalates large deltas to senior reviewers, the dashboard that joins Terraform spend with the matching cloud bill, and the LLM half that Infracost doesn't cover at all.

If you already love Infracost, you can keep using it directly. If you want budgets, attribution, alerts, and a single ledger across infra and LLMs, that's us.

Which clouds and model providers are supported today?+

Clouds: AWS, Google Cloud, Azure, Kubernetes (any), Cloudflare, and Fly.io. Models: anything reachable via an OpenAI-compatible endpoint — OpenAI, Anthropic, Google Gemini, Mistral, Meta Llama via Together / Groq / Fireworks, AWS Bedrock, and Azure OpenAI.

Missing yours? Tell us during early-access onboarding. Provider requests from early access conversations are how the roadmap gets ordered right now.

How long does setup take?+

For the LLM gateway: roughly 10 minutes. Install the GitHub App, get a virtual key, point your OPENAI_BASE_URL at our endpoint. Existing SDKs need no other changes.

For the Terraform PR analyzer: roughly 15 minutes. Install the GitHub App on the repos you want, drop the workflow YAML in, open a PR with an infra change to see the comment.

What's the SLA on PR comments?+

Internal target: p50 under 12 seconds, p95 under 30 seconds, from tfplan.json upload to comment posted. We will publish the production SLA once there is production traffic to measure against. Larger plans (1000+ resources) take longer; the comment appears as a draft within 30s and finalizes when pricing resolves.

Early Access · Q3 2026

The cloud bill arrives once a month.

We'd rather you saw it in the pull request. Early access this quarter, for engineering teams shipping with LLMs and Terraform.