Usage-based billing charges customers based on their actual consumption of a product or service during each billing cycle. Instead of flat monthly fees, customers pay in proportion to what they use. This article covers how it works, the five main pricing structures, and what changes when the product you're billing is AI.
How does usage-based billing work?
Usage-based billing converts product actions into charges. A customer performs an action (generates an image, makes an API call, processes a document), and that action becomes a billable event. A pricing rule converts the event into a charge. That loop repeats for every action, every customer, every cycle.
Think of it like an electricity meter. You do not pay a flat monthly fee regardless of how much power you use. You pay for kilowatt-hours consumed. Usage-based billing applies the same principle to software: identify which product actions customers should pay for, set a price per event, and the billing system handles the rest.
The infrastructure underneath that loop has two jobs. First, it records every billable event. Second, it converts those events into charges using your pricing rules. How those two jobs are sequenced matters enormously, particularly for AI products. More on that shortly.
For a deeper look at how usage events relate to revenue recognition, see revenue recognition for usage-based billing.
What are the main usage-based pricing models?
Usage-based billing is not one model. It is a family of models, each suited to different products and customer relationships.
| Model | How it works | Best for | Example |
|---|---|---|---|
| Per-unit | Fixed rate per event occurrence, regardless of the work done internally | Discrete outcomes: image generated, ticket resolved, document processed | Midjourney: per image generated; Intercom: per ticket resolved |
| Tiered | Rate per unit decreases as volume increases | Cloud storage, data transfer | AWS S3 storage tiers |
| Volume | Price based on a measurable quantity within each event | AI inference where cost scales with input or output size | OpenAI: tokens consumed per chat_completed event |
| Prepaid credits | Customers buy credits in advance; events deduct from balance | AI products, developer tools | Anthropic API credits |
| Hybrid | Subscription base fee plus usage charges | SaaS with AI features added | Clay: unlimited seats with recurring credit allocation |
A sixth pattern worth naming separately is dimensional pricing. This is where the price per event varies based on attributes of the event itself, not volume alone. A video generation service might charge $1.00 per minute for fast processing and $0.40 per minute for standard. An AI coding assistant might charge different rates depending on which model handles the request. You define one price structure, assign rates to each dimension value, and the billing system reads the attribute from each incoming event and applies the right rate automatically. For products where infrastructure cost varies by task complexity or model selection, dimensional pricing is the most accurate way to reflect that variability.
Who uses usage-based billing?
Usage-based billing started in cloud infrastructure. AWS, GCP, and Azure built their businesses on it. Customers pay for compute, storage, and data transfer by the unit. It remains the dominant model in that category.
Credit-based consumption pricing has surged across SaaS. The PricingSaaS Trends Report (Q1 2026) tracked 498 companies across 12 software categories and found credit model adoption grew 126% year-over-year in 2025. Companies like Figma and HubSpot added credit systems alongside their existing subscription models as AI features became core functionality rather than optional add-ons. The Metronome Pricing Index (2026) shows that 15 of the 33 major AI and SaaS companies it tracks use usage-based or hybrid pricing, including OpenAI, Anthropic, Cohere, Lovable, and AWS Lambda.
A more recent group driving this shift is vibe coders: developers building AI products on platforms like Lovable and Bolt.new, often without traditional engineering backgrounds. They ship working apps fast and need to charge for them immediately.
Usage-based billing fits naturally because customers pay when they use compute, not for a seat they may never touch. The challenge is that platforms like Stripe Billing, Orb, and Metronome were not designed for this audience. What they need is something that works out of the box, without weeks of configuration and no billing engineering project to manage. See vibe coders can build apps in hours. Billing takes them months.
Key trade-offs to understand before committing to the model:
- Revenue expands automatically with usage. High-consumption customers pay more without manual upgrades or upsells.
- Customers start at low or zero cost, which reduces adoption friction and churn from customers overpaying for unused seats.
- Revenue is harder to predict. Customers who use less in a given month generate less revenue, and forecasting requires usage modeling rather than headcount.
- Bill shock is a significant churn risk. Customers who receive an unexpectedly large invoice tend not to renew. Real-time balance visibility and spend alerts are churn prevention, not a nice-to-have.
Why is billing AI products different?
Billing AI products is different because every product action has a direct, variable infrastructure cost that hits your account before the customer pays. Traditional SaaS billing has a forgiving property: a collaboration tool costs roughly the same per seat whether usage is light or heavy, so invoicing at month-end carries little exposure. AI products break that assumption in three specific ways.
Every product action has a direct infrastructure cost. Inference costs vary by an order of magnitude across models. Anthropic's Claude Opus charges $25 per million output tokens as of early 2026, down from $75 following a 67% price cut (PricingSaaS Trends Report, Q1 2026), while lighter open-source models can cost a fraction of that. That cost hits before you have collected from the customer.
Concurrent requests create credit depletion races. A customer who fires ten simultaneous requests needs all ten checked against their available balance before any proceed. You cannot process them incrementally. A single atomic check determines whether all ten proceed or all ten are blocked. Traditional metering systems were not designed for this.
Your margins change every time you switch models. If you run on Claude, your per-request cost is Y. Switch to GPT-4 and it becomes 3Y. Switch to Llama and it drops to 0.1Y. Cost is tied to model selection, not customer behavior. Traditional billing tracks what customers do. AI billing also needs to track what infrastructure you used to serve them.
Dimensional pricing becomes essential here. When cost varies by model, quality tier, or task complexity, pricing must adapt to the attributes of each event rather than apply a static rate uniformly.
OpenAI prices by tokens consumed per completion: volume within each event. Anthropic sells prepaid API credits customers draw down as they use the API. Midjourney charges per image generated: a flat rate per event, regardless of what the model did internally to produce it. Each is a response to the same underlying reality: AI product costs are variable, and pricing infrastructure must reflect that.
For a concrete look at billing in product-native units like tokens and GPU hours, see how to bill in product-native units instead of dollars.
Real-time billing vs post-usage invoicing
Post-usage invoicing captures usage events throughout the billing period and generates charges at cycle end. Real-time billing authorizes and deducts at the moment each event occurs, checking the customer's balance before the work happens. The distinction determines how much spend exposure you carry and whether runaway usage is possible at all.
Post-usage invoicing platforms (Stripe Billing, Orb, Metronome, Lago, aggregate events against meters you define upfront, then bill at period close. This works well for predictable SaaS products where costs are stable and customers are unlikely to exhaust their budget overnight.
Real-time billing closes a loop that post-usage invoicing leaves open. Authorization happens before the cost is incurred, not after. When hard spend controls are enabled, a request that would exceed the customer's balance is blocked before the work proceeds. By default, wallets can go into a negative balance; hard enforcement at zero is opt-in.
| Post-usage invoicing | Real-time billing | |
|---|---|---|
| When does deduction happen? | End of billing period | At the moment of each event |
| Balance checked before usage? | No | Yes |
| Setup approach | Define meters, attach to product config | Define billable events on the product |
| Spend exposure | Unbounded until period closes | Capped at current balance when hard limits are enabled; overdraft permitted by default |
| Best for | Predictable SaaS usage, enterprise invoicing | AI products, prepaid credit models |
| Runaway spend protection | Requires separate limit logic | Built into the authorization layer |
For AI products, post-usage invoicing introduces a specific exposure: a customer's agent can run unconstrained overnight, consuming inference at cost, and you will not know until the billing period closes. Real-time billing closes that loop.
How do you implement usage-based billing?
Implementing usage-based billing well comes down to five decisions made before you write any code.
1. Identify your billable events and outcomes. The starting question is not "what metric do I track?". That is metering-platform language. The starting question is: what product actions should customers pay for? For an AI coding assistant, is it per suggestion accepted? Per model call? Per session? The answer shapes everything downstream. Start with what customers understand and what maps to the value they receive.
2. Decide when billing happens. Post-usage invoicing or real-time deduction. This is a business decision, not a configuration detail. It comes down to two things: how your customers expect to be billed, and whether you can afford to front their usage costs until month-end. Most early-stage companies, particularly those not yet funded, cannot absorb a month of inference costs before collecting. Real-time deduction transfers that exposure back to the customer at the moment it occurs. If you are building a SaaS product with stable per-seat costs and customers who expect monthly invoicing, post-usage may work fine.
3. Define pricing rules before writing code. Per-unit rates, tiered structures, and dimensional pricing need to be modeled before implementation, not discovered during it. Pricing that seems simple often has edge cases: volume discounts, promotional grants, free tiers, plan entitlements. Map them out first.
4. Give customers real-time balance and usage visibility. Customers who cannot see what they have spent or what they have left will be surprised by their bill. Spend alerts and balance displays are churn prevention tools.
5. Choose infrastructure that does not require rebuilding when pricing evolves. Pricing models change. New AI models get added. Dimensional pricing gets introduced. Vibe coders need this to work immediately without a billing project. Scaling teams need it to handle new complexity without rearchitecting. Stripe Billing works well for straightforward subscription and usage combinations. It becomes complex quickly when you add variable inference costs, concurrent request authorization, and dimensional pricing across multiple models. That complexity is worth understanding upfront.
How Credyt handles usage-based billing
If you are building an AI product that needs real-time usage billing, Credyt provides the infrastructure to do it without building from scratch.
Credyt's model differs from metering-first platforms. You create a product and define the billable events you want to charge for. When events arrive, Credyt deducts from the customer's wallet in real time. No separate meter definition, no end-of-cycle aggregation step, no batch job.
With Credyt, you can:
- Bill any event in real time. API calls, tokens, GPU seconds, agent tasks, and outcomes are deducted the moment they occur.
- Authorize usage before any cost is incurred. Credyt surfaces the customer's current wallet balance so your platform can decide whether to allow a billable action before calling the underlying model or service. The deduction happens atomically when the event arrives.
- See margin per event, customer, and product. Credyt's profitability feature correlates revenue and AI costs in real time: gross revenue, total cost, net revenue, and margin at the event level. Know which customers, workflows, and product activities are actually profitable. See profitability.
- Fund wallets multiple ways. Customers top up manually via the portal, configure auto top-ups that trigger when balance falls below a threshold, or subscribe to a hybrid plan that combines a recurring fixed fee with a bundled credit entitlement. Overages bill as usage once the included allowance runs out. See entitlements.
- Hold any asset type in a single wallet. USD, tokens, GPU hours, or any product-native unit, side by side.
- Price by event attributes. Dimensional pricing adapts the rate automatically based on model, quality, or complexity. No separate pricing tables to maintain.
- Ship a customer portal with no frontend work. Embedded balance display, usage history, and self-service top-up out of the box.
