← Back to blog
Usage-based billing guides

What is metered billing?

Nick Thomson

Nick is seasoned in building payments and marketplace products used around the world. He was previously Head of Product at Checkout.com and Chief Product Officer at Banked.

On this page

Metered billing charges customers for measured consumption (API calls, tokens, GPU seconds), not a flat fee. How metering actually works depends on architecture: invoice-based systems meter for later reconciliation; real-time systems combine usage capture and billing into one operation. This article covers both paths, five pricing models, and which platforms fit which kind of product.

What is metered billing?

Metered billing is a monetization model where customers are charged in proportion to measured product usage instead of a flat recurring fee. Each unit of consumption (an API call, a token, a GPU second, a generated image) becomes a billable event. The system records the event and converts it into a charge.

How that conversion happens depends on the system's architecture. Two patterns dominate the market. Traditional invoice-based systems meter events throughout a billing period and reconcile them into an invoice at cycle end (Orb, Metronome, Lago, and Stripe Billing's metered add-ons). Real-time systems combine usage capture, pricing, and wallet debit into one atomic operation. The fork matters because it determines whether "metering" is a separate concept at all.

Metered billing is no longer a niche pattern. According to Metronome and Greyhound Capital's State of Usage-Based Pricing 2025, 85% of SaaS companies now use some form of usage-based billing, with 78% having adopted it within the last five years. Credit-based pricing models grew 126% year-over-year in 2025 across 498 tracked companies (PricingSaaS Q1 2026 Trends Report). Metered billing has already won. The live debate is architectural.

How does metered billing work?

Metered billing works two ways depending on the underlying architecture. Invoice-based systems run a four-stage pipeline. Real-time systems collapse the stages into one atomic operation.

The invoice-based pipeline

In invoice-based architectures, events flow through four distinct stages. The customer's product emits usage events. A metering layer captures and aggregates them throughout the billing period. At cycle end, a reconciliation job reads the meter totals and feeds them into a billing engine. The billing engine produces an invoice. Payment follows days or weeks later.

The metering layer is a first-class requirement here because pricing is applied at reconciliation, not at event capture. Operationally, this means the system has to handle late-arriving events, deduplicate retries, and run the reconciliation step as a separate engineering and finance operation. Stripe's Meter Events API accepts up to 1,000 events per second on the standard endpoint (10,000 via the v2 stream), with a 35-day backdating window for late events. Stripe launched the Meters API at Sessions 2024 in April 2024, adding an invoice-based metering primitive to a billing platform that started subscription-first.

The real-time atomic flow

In real-time architectures, the platform first checks the customer's wallet state. If the balance is sufficient, the action proceeds. The platform then sends the usage event to the billing system, which prices it and debits the wallet in one atomic transaction.

There is no reconciliation job. No separate metering layer. The wallet balance is the source of truth, updated on every event in milliseconds. The decision to allow or block lives with the platform; the billing system provides the atomic balance state that makes that decision possible. Real-time billing fits products where the lag between usage and billing is itself a risk. Adopting real-time billing typically happens in stages, not as a single migration.

What is the difference between metering and billing?

The distinction between metering and billing is an artifact of invoice-based architectures. In those systems, metering and billing are two separate stages connected by a reconciliation job. In real-time architectures, the distinction collapses: the same atomic operation records and bills the usage.

Metering, in the traditional sense, is the recording of usage events for later billing. The metering layer is responsible for capturing every billable event, attributing it to the right customer, and aggregating totals against the right meter. It does not price events. It does not produce charges. Pricing and charging happen later, in the billing engine.

Billing converts metered records into a charge. In invoice-based platforms (Orb, Metronome, Lago), this is the reconciliation step. The billing engine reads the meters, applies pricing rules, generates the invoice, and triggers collection.

In real-time systems, the two operations collapse. The system records usage, prices it, and debits the wallet in the same transaction. There is no "for later" because billing is now. The traditional definition of metering (record now, bill later) stops applying.

This is not a contradiction or a marketing claim. It is two architectures designed for different problems. Invoice-based suits enterprise contracts with cycle-end invoicing and complex negotiated terms. Real-time suits products where costs hit before the next invoice would.

What pricing models does metered billing support?

Metered billing supports five pricing models: per-unit, tiered, volume, prepaid credits, and hybrid. Each makes different trade-offs between predictability and cost-tracking precision.

ModelHow it worksAI example
Per-unitFixed price per measured unitAnthropic Claude Sonnet at $3 input / $15 output per 1M tokens; Claude Opus at $5 / $25 (Anthropic Pricing, April 2026). OpenAI GPT-4.1 at $2 / $8 per 1M tokens; GPT-4o at $2.50 / $10 (OpenAI Pricing, accessed April 2026)
TieredPrice decreases as cumulative volume risesOpenAI's API tier ladder unlocks at $5 cumulative spend (Tier 1) up to a $200,000/month ceiling at Tier 5 (OpenAI rate limits, accessed April 2026)
VolumeOne price applies to all usage based on the tier reached in a periodCommon in cloud GPU pricing
Prepaid creditsCustomers buy credits in advance, debited per useElevenLabs charges 1 credit per character on the standard model and 0.5 credits on Flash/Turbo, with 10,000 free credits per month (ElevenLabs Pricing); Cursor Pro packages $20/month into roughly 225 Claude Sonnet requests at median token use (Cursor Blog, July 2025)
HybridSubscription base plus metered overagesAnthropic Max at $100 to $200/month with weekly consumption envelopes; the dominant pattern at 41% of B2B software companies in 2025, up from 27% the prior year (Growth Unhinged, October 2025)

Implementation is its own discipline. For a deeper walkthrough of the build path, see how to implement consumption based pricing.

Credits deserve special attention. They are powerful as a pricing abstraction because they let vendors absorb model price changes without announcing a rate change. The same opacity creates trust failures when burn rates are undisclosed or variable.

The Cursor incident in June 2025 is the canonical example. Users exhausted credits within days at frontier-model API rates and were charged overages they hadn't anticipated. The failure wasn't architectural; it was transparency. Burn rates weren't visible, balances weren't surfaced, and overages auto-charged.

Which metered billing platform fits your product?

Six platforms dominate metered billing today: Stripe Billing, Orb, Metronome, Lago, Stigg, and Credyt. They split into three architectural categories. The right choice depends on how your costs behave, not on vendor reputation.

PlatformArchitectureFits when
Stripe BillingSubscription-first with metered add-onsPricing is mostly subscription with simple metered overages on top. Its Meter Events API aggregates usage asynchronously and applies the totals to an invoice at cycle end. Stripe acquired Metronome in early 2026 for approximately $1 billion (Upstarts Media, December 2025), a signal that Stripe Billing alone wasn't enough for purpose-built metered billing at scale.
OrbInvoice-based UBBEnterprise contracts, cycle-end invoicing, high-volume event ingestion where throughput beats latency. Negotiated terms reconcile at cycle end.
MetronomeInvoice-based UBBComplex enterprise contracts and late-arriving events. Metronome powers OpenAI's billing; OpenAI's pricing lead reports shipping pricing changes in under an hour with no engineering work (Metronome: OpenAI customer story).
LagoInvoice-based UBB (open source)Self-hosted teams that want control over the metering layer. Same architecture as Orb and Metronome, different deployment model.
StiggReal-time orchestration over downstream billingYou need real-time entitlements but keep an existing billing system (Stripe, Zuora, Chargebee) for invoices. Real-time at the entitlement layer, cycle-end at the financial settlement layer.
CredytReal-time wallet-native, end-to-endReal-time costs (AI inference, GPU workloads), prepaid credit models, products that need to block runaway usage before it becomes an invoice line item.

Three rules of thumb to map vendor to product:

  1. Costs hit in real time? Pick a real-time architecture. AI inference and GPU compute incur cost the moment a request is made. Invoice-based platforms can describe the cost after the fact; they cannot block it.
  2. Sign annual enterprise contracts with quarterly invoices? An invoice-based platform fits. The reconciliation step is where complex contract terms get applied.
  3. Already have downstream billing and just need real-time entitlements? An orchestration layer is enough. You don't need to migrate billing systems.

Stripe Billing is a fourth case: it fits products whose pricing is mostly subscription with limited usage on top. Beyond that, the metered side becomes a liability rather than a primitive (see common issues with Stripe metered billing for AI products).

What else to evaluate beyond architecture?

Picking the architecture narrows the shortlist. Within any category, platforms still differ on operational properties that determine whether the system holds up in production. Five criteria matter regardless of whether you land on invoice-based or real-time.

  1. Idempotent event ingestion. Network retries are unavoidable at scale. Without server-side deduplication keyed on a client-supplied identifier, a single billable event can become two charges on a transient failure. Ask how the platform deduplicates and what its key retention window is.
  2. Late-event handling window. Both architectures need a defined policy for events that arrive after the fact. Stripe's window is 35 days; Metronome's is 34 days (Metronome Docs, accessed April 2026). Shorter windows mean stricter ingestion discipline; longer ones mean more reconciliation surface area for finance.
  3. Pricing decoupling. Can pricing change without a product code deploy? Teams underestimate how often this happens. PricingSaaS and Growth Unhinged logged 1,800+ pricing changes across roughly 500 top SaaS and AI companies in 2025 (Growth Unhinged × PricingSaaS, December 2025), about 3.6 pricing changes per company per year. A platform where every pricing change requires engineering becomes the bottleneck.
  4. Usage transparency for the end customer. The Cursor incident was an operational failure, not an architectural one. Customers need to see their balance, their burn rate, and an alert before they exhaust their allotment. Without that, a correct system still produces invoices customers dispute.
  5. Per-customer cost attribution. Knowing a customer's revenue is one thing; knowing their gross margin is another. AI products often discover their best-paying customers are also the most expensive to serve. A platform without cost attribution leaves the unit-economics question unanswerable.

These five criteria are independent of the architectural fork. Engineering teams that ignore them end up with a correctly-architected system that still produces wrong-feeling invoices, surprise overages, and unclear margins.

How Credyt handles metered billing

Credyt is real-time wallet-native billing, not a metering layer with faster clocks. The product collapses usage capture, pricing, and wallet debit into one atomic operation. Usage is recorded and billed in the same transaction, against the customer's wallet balance.

With Credyt, you can:

  • Run real-time usage billing on any billable event (API calls, tokens, GPU seconds, custom units)
  • Hold multi-asset wallets per customer (USD, tokens, GPU hours, anything you define)
  • Support all five canonical pricing models without rebuilding (usage-based, prepaid credits, subscription, hybrid, entitlement-based)
  • Track per-customer cost attribution and gross margin in real time
  • Drop in a branded customer portal with live balance, usage history, and self-serve top-ups

Ship real-time usage-based billing in hours, not the 6 to 12 months a homegrown wallet system typically takes. Learn more.

Don't let monetization slow you down.

Free to start. Live in hours. No engineering team required.