AI billing means charging for AI product usage when every API call costs real money. SaaS billing assumed a seat was cheap to serve regardless of usage. AI billing cannot assume that. It has to measure cost per interaction and pass it through. This article covers what AI billing is, why it diverged from SaaS billing, how it works, and what pricing models it supports.
What is AI billing, exactly?
AI billing is the umbrella for metering, pricing, balance management, authorization, and settlement of products whose cost of goods sold is per-inference rather than per-seat. It is not a synonym for usage-based billing applied to AI products. It is a category in its own right, defined by what AI workloads do to billing assumptions that worked for SaaS.
Traditional SaaS billing assumed zero marginal cost per user interaction. A seat was a fixed revenue unit. Whether a customer logged in once or a thousand times barely moved the infrastructure bill. Pricing could be flat. Invoices could close at the end of the month. The billing system was a reporting layer over a stable cost base.
AI inverts every part of that. The SaaS CFO frame for this shift is "COGS matter again." Infrastructure cost has risen from roughly 10% of revenue in traditional SaaS to 35 to 40% for scaling AI companies, per the cross-source synthesis on billing infrastructure for AI-native startups (Bessemer, November 2025). AI billing is the discipline that emerged because the old billing layer cannot describe, let alone control, a business with that cost shape.
Why does AI billing exist as its own category?
AI billing exists because three structural constraints break SaaS billing assumptions at the same time. Each one alone would be manageable. Together they force a new category.
Every inference call costs real money in real time
Every AI request consumes GPU time and tokens that the provider bills before the customer pays. ICONIQ's 2026 State of AI panel pegged AI gross margins at 45% in 2025, with a 52% projection for 2026. Classical SaaS runs at 80 to 90%. Bessemer's Supernova analysis put the early-stage AI floor lower, at a 25% Y1 gross margin, with some cohorts posting negative margins as they fight for share.
That margin gap is structural. It is not a transient phase that companies grow out of. The same ICONIQ panel showed inference alone running at about 23% of revenue at the scaling stage. A billing system that reconciles at month end is too slow to surface a customer whose cost-to-serve has flipped negative inside the cycle.
AI usage can outrun monthly billing
Concurrent AI requests can each read a shared balance before any prior debit has settled, allowing multiple requests to proceed against funds that are already committed elsewhere. The Cursor pricing incident in June 2025 is the canonical public example. Cursor replaced request caps with a monthly credit pool priced at frontier-model API rates. Users running long-horizon agent tasks burned through the pool in days and started receiving overage charges they had not expected. CEO Michael Truell apologized publicly and opened a 19-day refund window. TechCrunch confirmed the cause: long-horizon agent sessions cost more than the flat plan absorbed.
The pattern generalizes. Agentic task token consumption has jumped 10x to 100x since December 2023, per SaaStr's analysis of OpenAI compute margins (Jason Lemkin, October 2025). The Head of ChatGPT, Nick Turley, was quoted in CNBC's April 2026 coverage: an unlimited plan for AI "is like an unlimited electricity plan. It just doesn't make sense." When concurrent users can drain a shared balance faster than a billing cycle can settle, the cycle is not the source of truth anymore.
Per-unit margin shifts every time the model changes
The token-price spread across providers is about 150x at the input layer. PE Collective's April 2026 cross-provider comparison shows Meta Llama 4 Scout at $0.10 per million input tokens. Claude Opus 4 is $15. GPT-4o sits at $2.50. Claude Sonnet 4 sits at $3. The output side runs higher.
Providers also reprice. PricingSaaS Q1 2026 logged Anthropic's Opus output dropping 67% from $75 to $25 per million tokens for v4.5 and OpenAI's primary GPT output dropping 20%. Intercom Fin uses 12 or more LLM models in production, each picked for a stage of a conversation. A billing architecture that prices at invoice time cannot describe a product whose per-call cost moves mid-cycle.
How does AI billing actually work?
AI billing has to do four jobs: capture usage, price it, decide whether to authorize the spend against the customer's balance, and settle the charge. The architectural split in the market is when authorization happens. In invoice-based architectures, usage is captured throughout the period and reconciled into an invoice at cycle end; authorization, if it exists at all, runs at billing time. In real-time architectures, authorization runs before the action, and usage capture, pricing, and balance debit collapse into one atomic operation.
The cleanest way to read the market is by architecture, not by vendor. Three categories cover the field.
| Property | Invoice-based | Real-time | Subscription-first with metered add-ons |
|---|---|---|---|
| Source of truth | Invoice at cycle end | Balance, updated continuously | Subscription record + meter totals |
| Authorize usage before cost? | No | Yes | Partial (caps after the fact) |
| Concurrency safety | Reconciles after | Atomic per request | PSP-level only |
| Pricing change cadence | Cycle-bounded | Per-event | Plan-bounded |
| Best fit | Enterprise, cycle-end invoicing | AI inference, agentic workloads, prepaid models | SaaS with optional metered features |
| Example platforms | Orb, Metronome, Lago | Credyt, Stigg | Stripe Billing |
Real-time monetization is the architecture where authorization runs at the moment of usage, against the customer's current balance. Invoice-based architectures meter throughout the period and reconcile into an invoice at cycle end. Subscription-first systems were designed around recurring plans with metered overages bolted on. None of the three is universally better. They suit different workloads.
Stripe's Meter Events API illustrates the practical limit of the synchronous invoice-based shape at AI scale: 1,000 events per second on the live mode endpoint and a 35-day backdating window for late events. That is fine for many workloads. It is not fine for an agentic product where a single customer can fire hundreds of requests per second across long-horizon tasks. For a deeper architecture comparison, see how to choose SaaS billing software for AI products.
What pricing models does AI billing support?
Any of them. AI billing supports per-unit, prepaid credits, hybrid, outcome-based, and seat-plus-credits — the same five pricing models the broader usage-based billing market converged on. Picking one or switching between them is a pricing decision; the billing architecture should make it a config change, not a rewrite. Effective AI monetization strategies walks through which of these models fits which company shape.
In practice, AI products are converging on credits and hybrid. ICONIQ's 2026 panel shows consumption-based pricing reaching 35% of AI companies (up from 19%), outcome-based at 18% (up from 2%), hybrid at 41% in the broader 240-company B2B sample, and seat-based-only dropping to 15%. About 58% still carry a subscription or platform component somewhere in their pricing. Credit-model adoption surged 126% year-over-year in the same PricingSaaS dataset, which tracked 8,394 pricing events across 498 SaaS and AI companies. Credits work because they abstract token-level cost volatility from the customer and let the vendor collect a prepay before the LLM bill lands.
What does not change across pricing models is the need to see cost per interaction. Without it, the team is pricing in the dark, and any of these models will produce a customer who looks profitable in aggregate but is bleeding money one interaction at a time. Outcome pricing is the direction of travel for some companies, but it is not the starting point for most. A16z's Olivia Moore framed it accurately: outcome-based pricing is the destination, not the entry point, for 95% of the market. Attribution mechanics are the barrier, not pricing theory.
Who is using AI billing today and how?
Cursor, Replit, Intercom Fin, and the OpenAI and Anthropic APIs cover four distinct production AI billing models: hybrid subscription plus credit pool, hybrid with agent credits, outcome-based per-resolution, and pure per-token prepaid.
Cursor runs a hybrid subscription plus monthly credit pool, with the pool priced at frontier-model API rates. Cursor Pro is $20 per month; Cursor Ultra is $200. The June 2025 incident showed the concurrent-depletion failure mode in public. Cursor kept the credit model after the incident. The fix was clarity and refunds, not architecture replacement.
Replit runs a hybrid subscription with a credit component for its AI agent. The product manager analyst Aakash Gupta documented a gross margin swing from 36% to negative 14% as the agent consumed more LLM than the pricing covered. This figure is widely cited; it is Gupta's analysis rather than a primary Replit disclosure. The shape of the swing is what matters: a product can ship a pricing model that looks balanced at launch and find the math has inverted six months later.
Intercom Fin runs outcome-based at $0.99 per resolution. Intercom's CFO Dan Griggs walked through the choice on the Get Paid podcast. Resolution rate improved from 25% at launch in early 2023 to about 70% in 2026. Single-customer monthly bills range from $50 to $30,000 depending on volume. The product orchestrates 12 or more LLM models simultaneously to hit that resolution rate at a cost that supports the price.
OpenAI and Anthropic APIs are the infrastructure layer. Both run pure per-token, prepaid, with credit balances customers manage directly. ElevenLabs runs a freemium hybrid with subscription credit allowances. Lovable runs hybrid subscription plus recurring credits in its vibe-coder product. The Metronome Pricing Index catalogs 33 AI and SaaS pricing models with the per-company detail.
About 37% of AI companies plan another pricing change within 12 months, per the ICONIQ 2026 State of AI panel. The shape of AI billing is not settled. Companies are iterating publicly, often more than once a year.
What to consider when implementing AI billing
Does the AI billing platform meter in real time or aggregate at period end? AI inference cost arrives in real time; the metering layer has to as well. If the platform's source of truth is the invoice, the platform's source of truth is a stale view of the business.
Can the platform answer "what does customer X cost me this minute"? Per-customer cost attribution is the precondition for unit economics. Only 43% of AI companies track AI cost per customer, and only 22% track per transaction, per CloudZero's State of AI Costs in 2025 report (May 2025). If the billing system can't do this, the platform team builds the attribution layer themselves, and that's the most expensive option in the long run.
Is the balance read atomic under concurrency? Concurrent requests against a shared balance need atomic balance reads, not eventually-consistent dashboards. Without atomic reads, multiple simultaneous requests can each see a non-depleted balance and all proceed, shipping overage. A balance that lags by 30 seconds is a balance that ships overage.
Does the platform handle idempotent ingestion and late-arriving events? Production failure mode one for first-time metering teams is double-billing on retried events. Production failure mode two is dropping events that arrive after the cycle closes. The platform should handle both natively, not by application-layer convention.
Is changing the pricing model a config change or a sprint? Average pricing-change cadence in 2025 was 3.6 changes per company across the top 500 tracked SaaS and AI companies (Growth Unhinged, December 2025). A billing system that requires an engineering sprint to re-price is a billing system that will be ripped out within 18 months.
How Credyt handles AI billing
If you are building AI billing into your product, Credyt provides the real-time infrastructure to do it without writing the wallet, metering, and authorization code yourself.
With Credyt, you can:
- Check the customer's balance in real time before each inference call. Your platform makes the allow-or-block decision; Credyt provides the balance state and records the spend atomically.
- Bill per-token, per-call, per-credit, per-outcome, or any hybrid combination in one platform.
- Attribute compute cost per customer, per feature, and per agent in real time across OpenAI, Anthropic, Google, and self-hosted models.
- Ship a branded billing portal with self-service top-ups and customer-initiated auto top-up, without writing the frontend.
Pricing starts free: the first 10 active wallets and the first one million events per month are included every month. Beyond that, $1 per Monthly Active Wallet. Read the technical overview.
Related resources
- What is usage-based billing?. The umbrella concept and the five pricing structures, including how AI workloads change the mechanics.
- What is metered billing?. The metering layer in invoice-based architectures and where it fits in real-time systems.
- Why AI companies need real-time economic control. The thought-leadership companion to this explainer, on the gap between cost recorded and cost incurred.
