Dossier № 01 · Cost Engineering · v0.4 Preview

Know the cost before you merge.

Two pipes feed the same bill: Terraform plans and model calls. CloudCost AI prices the first in your pull requests, meters the second in flight — and never sees a prompt.

Clouds AWS Google Cloud Azure Kubernetes Cloudflare Fly.io
Models OpenAI Anthropic Google Gemini Mistral Meta Llama AWS Bedrock Azure OpenAI

Engine 01 · GitHub App

Terraform plans, priced in review.

A webhook reads your Terraform plan JSON, prices each resource through Infracost, and comments the monthly delta on the PR — where reviewers already are.

Catch the $1,430-a-month NAT gateway before it ships. Not after the bill.

  • Reads plan JSON straight from CI
  • Breaks down by resource and provider
  • Comments inline on the diff — no new dashboard to check
  • Supports AWS, GCP, Azure, and Kubernetes manifests
~/infra/eks-prod-redesign — terraform plan
$ terraform plan -out=tfplan $ terraform show -json tfplan > tfplan.json   # GitHub PR opened with tfplan.json # CloudCost webhook priced it via Infracost   Resource Monthly ──────────────────────────────────────────── + aws_rds_cluster.analytics +$1,180.00 + aws_nat_gateway.prod-az-a +$   38.00 + aws_eks_node_group.gpu +$  524.00 − aws_instance.legacy-jumpbox −$  312.00 ──────────────────────────────────────────── Δ monthly +$1,430.00   Posted to github.com/acme/infra#847 $
Token ledger · last 60s Live
Owner Model In Out Cost
team:research claude-opus-4.7 42,180 3,920 $0.31
agent:rag-loop gpt-4o-mini 218,402 14,118 $0.04
user:m.adler gpt-4o 8,440 1,260 $0.03
feature:summarize gemini-2.5-pro 19,210 2,840 $0.05
team:growth claude-haiku-4.5 62,118 8,402 $0.10

Engine 02 · LiteLLM Gateway

LLM spend, per key. Per user. Per agent.

Issue virtual keys, enforce per-team budgets, meter every token — input, output, reasoning — across every provider you call.

Telemetry keeps the counts and the cost. Prompt payloads stay where they were.

  • Drop-in OpenAI-compatible endpoint
  • Tag traffic by user, team, feature, or agent
  • Hard budget ceilings — fail closed, not open
  • Zero prompt or completion storage by default

Schema · § 02 · How the engines connect

Two pipes. One ledger of truth.

CloudCost AI taps into the moments where money is committed — the merge button and the API call — and never anywhere else.

Engine 01 · Terraform PR Analyzer read-only · sub-30s
i. git push Developer opens or updates a PR with infra changes.
ii. CI runs plan Existing pipeline emits a Terraform plan JSON.
iii. cost analyzer CloudCost reads the plan, prices each resource via Infracost.
iv. PR comment Monthly delta posted as a review comment on the diff.
Engine 02 · LLM Gateway no prompt storage
i. your app SDK call with a CloudCost-issued virtual key.
ii. LiteLLM proxy Routes to provider, checks budgets, tags the request.
iii. provider API OpenAI, Anthropic, Google, or any compatible endpoint.
iv. metered ledger Counts, tags, and cost — payload dropped before storage.

Reckoning · § 03 · Try the math

Cost a model. Live.

Pick a model. Drag the sliders. Watch the monthly bill resolve in real time.

This is roughly what CloudCost AI will show you across every team, feature, and agent — with real traffic instead of estimates. You will be surprised which workloads are the expensive ones. Most teams are.

Input tokens / day 500,000
Output tokens / day 120,000
Estimated monthly spend
$0
Input · $5.00 / 1M$0.00
Output · $25.00 / 1M$0.00
Days in month30

Rates load from the local LiteLLM pricing map when available. CloudCost AI still records the actual cost reported by your gateway traffic.

Receipts - Live artifacts

Every expensive action leaves a receipt.

CloudCost AI should not ask teams to trust a new dashboard first. It should show up where cost decisions already happen: inside the pull request and inside the model gateway.

i No mystery math.Terraform plans are priced before merge, with the monthly delta attached to the review.
ii No prompt storage.LiteLLM sends token counts, model names, teams, and cost metadata only.
iii No forced SaaS pricing pipe.The Infracost CLI can point at your self-hosted Cloud Pricing API.
GitHub pull request Ready before merge
+$184.31/mo

Plan JSON found, priced, and posted as one review comment for the team.

  • Sourcetfplan.json
  • Policyreview
  • Statuscommented
LiteLLM gateway Prompt blind
$27.48 today

Spend grouped by model, key alias, user, team, and service metadata.

  • Promptsnot stored
  • Budgetactive
  • KeysNeon
Pricing engine Self-hostable
your network

Run the pricing API yourself, then let CloudCost call the local CLI.

  • CloudAWS/Azure/GCP
  • ModeCLI

Receipts first. Dashboard later.

Installation · § 04

Wired in, before lunch.

Two engines, no custom SDK. Point your existing OpenAI-compatible client at the gateway, then install the GitHub app on the repos you care about.

~/your-app — route model calls
# Step i — use your existing SDK $ pip install openai   # Step ii — point your SDK at CloudCost OPENAI_BASE_URL="/llm/v1" OPENAI_API_KEY="cc_sk_live_..."   # Step iii — tag the traffic client.chat.completions.create( model="claude-opus-4.7", extra_body={ "metadata": { "cc_team": "research", "cc_feature": "summarize", }, }, ... ) Traffic now tagged and metered. $
~/infra — connect Terraform PRs
# Step i — install the GitHub App $ open "/install/github" Choose repos. CloudCost receives PR webhooks.   # Step ii — include a Terraform plan JSON $ terraform plan -out=tfplan $ terraform show -json tfplan > tfplan.json Next PR with tfplan.json gets a cost comment. $

House Rules · § 05

Three principles, strictly observed.

Three places where cost surprises are cheapest to kill. Everything else CloudCost AI does — dashboards, reports, alerts — is downstream of these.

i.

Shift left

Review cost before merge.

A clear monthly dollar impact, posted in the PR thread, while the architecture is still cheap to change.

ii.

Attribute

Map AI spend to owners.

Know which users, features, teams, and autonomous workflows are driving every dollar of token usage.

iii.

Enforce

Stop runaway loops.

Programmable budget ceilings so an agent in a loop fails closed — before an overnight experiment turns into an incident.

Tariff · § 06

Three tiers, plainly priced.

Free to evaluate. Flat per-engineer when you're ready. Talk to us only when you have to.

I.

Solo

For one engineer, one repo

$0 forever
  • PR cost comments on public repos
  • LLM gateway · 250k tokens / day
  • One contributor, one workspace
  • 14-day spend retention
  • Community support
Request Solo access →
III.

Enterprise

For platforms with compliance teams

Custom
  • Everything in Team
  • Self-hosted gateway option
  • SAML SSO and SCIM
  • Audit logs and SOC 2 reporting
  • Unlimited retention
  • Dedicated success engineer
  • Custom SLAs
Talk to us →

Questions answered · § 07

Things teams ask before signing up.

Do you store our prompts or completions?+

No. The LiteLLM gateway sees the request in transit, tags it, counts the tokens, and forwards it. Payload bodies — your prompt and the model's completion — are dropped before anything is written to disk. What we persist: token counts, model, tags (user, team, feature, agent), latency, and the computed cost.

If you need a redacted sample for debugging, you can opt-in per-route with a TTL. Default is off.

Can we self-host the gateway and analyzer?+

Yes, on the Enterprise tier. The gateway ships as a container, and the Terraform analyzer is a GitHub Action you already run in your own CI. The control plane (dashboards, alerts, budget rules) can point at a self-hosted database — Postgres or ClickHouse — that never leaves your network.

How is this different from running Infracost ourselves?+

We use Infracost under the hood for unit pricing — they're great at it. CloudCost AI is the layer above: the GitHub App that knows which PRs to comment on, the policy engine that escalates large deltas to senior reviewers, the dashboard that joins Terraform spend with the matching cloud bill, and the LLM half that Infracost doesn't cover at all.

If you already love Infracost, you can keep using it directly. If you want budgets, attribution, alerts, and a single ledger across infra and LLMs, that's us.

Which clouds and model providers are supported today?+

Clouds: AWS, Google Cloud, Azure, Kubernetes (any), Cloudflare, and Fly.io. Models: anything reachable via an OpenAI-compatible endpoint — OpenAI, Anthropic, Google Gemini, Mistral, Meta Llama via Together / Groq / Fireworks, AWS Bedrock, and Azure OpenAI.

Missing yours? Tell us during early-access onboarding. Provider requests from early access conversations are how the roadmap gets ordered right now.

How long does setup take?+

For the LLM gateway: roughly 10 minutes. Install the GitHub App, get a virtual key, point your OPENAI_BASE_URL at our endpoint. Existing SDKs need no other changes.

For the Terraform PR analyzer: roughly 15 minutes. Install the GitHub App on the repos you want, drop the workflow YAML in, open a PR with an infra change to see the comment.

What's the SLA on PR comments?+

Internal target: p50 under 12 seconds, p95 under 30 seconds, from tfplan.json upload to comment posted. We will publish the production SLA once there is production traffic to measure against. Larger plans (1000+ resources) take longer; the comment appears as a draft within 30s and finalizes when pricing resolves.

Early Access · Q3 2026

The cloud bill arrives once a month.

We'd rather you saw it in the pull request. Early access this quarter, for engineering teams shipping with LLMs and Terraform.