← Back to blog
Usage-based billing guides

What is real-time monetization?

Nick Thomson

Nick is seasoned in building payments and marketplace products used around the world. He was previously Head of Product at Checkout.com and Chief Product Officer at Banked.

On this page

Real-time monetization charges customers as they use a product: pricing, authorization, and the balance update happen as one atomic operation at the moment of usage, not at the end of a billing cycle. This article covers how it works, why AI products need it, and how it differs from invoice-based billing.

How does real-time monetization work?

Real-time monetization works as a single atomic operation at the moment of usage. The platform first checks the customer's available balance, the action proceeds if the balance is sufficient, and the usage event is then priced and deducted from the balance in the same transaction.

That collapses what looks like three steps into one. Most articles on usage-based billing describe four stages: capture the event, aggregate it during the period, price it at cycle end, generate the invoice. Real-time architectures do not run those stages in sequence. They run them together.

A worked example. A customer hits an AI feature on a platform priced at 12 credits per response.

  1. The platform queries the customer's balance before sending the request to the model. The balance is 480 credits.
  2. The check returns sufficient. The platform forwards the prompt to the model and serves the response back to the customer.
  3. The usage event is submitted to the billing layer. The event is priced at 12 credits, and the customer's balance is debited from 480 to 468 in the same atomic transaction.

The customer's live balance now reads 468. Not at the end of the day. Not when an aggregation job runs. Now. If the same customer fires a second request, the next balance check reads 468, not a stale 480.

That property, every authorization reading the authoritative post-debit balance, is what prevents two concurrent requests from racing to deplete the same balance twice. Serving the second request from a stale pre-debit state is how silent overages accumulate.

How is it different from invoice-based billing?

Real-time monetization and invoice-based billing differ on one architectural decision: whether the platform can block usage before the cost is incurred. Invoice-based platforms cannot. Real-time platforms can.

The two architectures sit at opposite ends of usage-based billing.

Invoice-basedReal-time
Latency from event to billedCycle end (days to weeks)Milliseconds
Source of truthInvoice, computed at reconciliationCustomer's balance, updated continuously
Can block usage before cost landsNo. Overages discovered at invoice time.Yes. Authorization runs before the action.
Separate metering stageYes. Reconciled into billing at cycle end.No. Metering and billing run as one atomic operation.
Reconciliation jobRequiredNone
Fits AI inference workloadsPoorly. Costs land before the invoice.Directly.
Fits enterprise quarterly contractsDirectly.Reporting layer required to generate invoices.

Invoice-based platforms include Orb, Metronome, and Lago. They run a four-stage pipeline: events captured, aggregated during the period by a metering layer, reconciled at cycle end by a separate billing engine, and rendered as an invoice. The invoice is the source of truth for what the customer owes. Real-time visibility, where it exists, is a convenience layer built on top of the meter.

Stripe Billing is a third category. It is subscription-first with metered add-ons, not a real-time authorization system. It was built to charge recurring subscriptions and has been extended with a metering primitive. The architecture was not designed to make a usage decision before the cost is incurred.

Real-time platforms include Stigg, which runs entitlement checks in real time and settles billing through a downstream system, and Credyt, which runs the full real-time loop end to end. Both share the property that matters: the platform can block usage at the moment of the request, not at the end of the cycle.

Neither architecture is universally better. Invoice-based is the right fit for enterprise quarterly contracts with negotiated terms, and for high-volume event ingestion where throughput beats latency. Real-time is the right fit for workloads where the lag between usage and billing is itself a risk. Most AI products fall in the second group.

Why do AI products need real-time monetization?

AI products need real-time monetization because every API call has a direct cost that lands before any customer payment is collected, because concurrent requests create balance races that only an atomic check can resolve, and because margins move every time the underlying model changes.

Three structural breakages, in order.

Every API call has a direct infrastructure cost. Inference runs $0.10 to $0.50 or more per request depending on the model. The platform spends that money the moment the request fires. End-of-cycle invoicing was designed for SaaS where the marginal cost of serving one more user is near zero. For AI, every call is a cost event, and the gap between the cost and the invoice is a liability window.

The clearest public example is GitHub Copilot. In October 2023, the Wall Street Journal reported (as covered by The Register) that GitHub Copilot was priced at $10 per month per user, while the average cost to serve a user ran above $20 and heavy users cost up to $80. The flat-fee pricing had no mechanism to observe that inversion until it showed up in financials. By June 2026, GitHub had moved Copilot to usage-based billing with monthly AI Credits priced at $0.01 each. Two and a half years from the WSJ report to the architectural fix.

Concurrent requests create balance races. When a customer fires several simultaneous requests, the balance must be checked atomically. A sequential check allows two requests to both see a sufficient balance, both proceed, and together push the account below zero. Invoice-based systems have no balance to check against, so they cannot prevent this. Real-time architectures require a single atomic authorization that resolves the race before the cost is incurred.

Margins move with model selection. Switching from Claude to GPT-4 to Llama can change unit economics by a factor of ten in either direction. The customer's behavior does not change. The underlying infrastructure cost does.

According to ICONIQ Growth's 2026 State of AI Bi-Annual Snapshot, inference runs at about 23% of revenue at scaling-stage AI companies. AI gross margin sits at 45% in 2025, against the 80%-plus that classical SaaS posts. Only a system with per-request cost attribution surfaces when a model switch has inverted the margin on a customer segment.

Replit is a worked example of what happens when this surfaces too late. As documented by Aakash Gupta (How to price AI products, February 2026), Replit's gross margin swung from 36% to -14% as its AI agent consumed more LLM than its pricing covered. The mechanism is the same one that caught GitHub Copilot: variable per-request cost sitting behind a pricing structure that could not observe or respond in real time. Credyt's real-time billing architecture companion piece goes deeper on the gap between billing and economic control.

Cursor's June 2025 pricing reset is the canonical visibility failure. The team switched from request caps to a monthly credit pool priced at frontier-model API rates with automatic overages. Users ran through credits in days without knowing it. CEO Michael Truell opened a refund window and clarified the pricing.

The failure was not the credit model. The failure was opaque balances and charges landing after the cost was already incurred. Real-time balance visibility is not a UX detail; it is the control mechanism that makes prepaid models governable.

What pricing models work in a real-time setup?

Real-time monetization is an architectural decision, not a pricing model. Every common pricing structure fits inside it, including flat subscriptions. The difference is what the customer can see while the cycle runs.

Pricing modelWhat the customer seesWhat real-time requires
Flat subscriptionA live view of what is included in the plan and how much has been used against itReal-time usage tracking attached to the subscription
Per-unitCharged a fixed rate per token, per request, per secondAtomic price-and-debit per event
Prepaid creditsPre-funds a balance; usage debits it in real timeAtomic balance check before authorization
Hybrid (subscription + overage)Recurring fee with an included allowance; overage billed in real timeReal-time tracking of allowance depletion
Credit and tokenSpends a named unit (tokens, GPU hours, image credits)Multi-asset balance support

A flat subscription on a real-time architecture is still a flat subscription. The platform charges the same recurring fee. What changes is that the customer can see, mid-cycle, how much of the included plan they have already consumed, and the platform can attribute cost per customer against that fixed revenue. Cycle-end is no longer the first moment the platform learns where its margins are.

The credit-based variant has become the dominant pattern in AI products. Credit adoption grew 126% year over year across the PricingSaaS 500 in 2025, jumping from 35 to 79 companies, with 8,394 pricing and packaging events tracked across 498 companies over 2024 and 2025 (PricingSaaS Q1 2026 Trends Report).

Credits are not a pricing innovation for their own sake. They are the customer-facing expression of what happens when a product moves to prepaid, real-time billing. The customer pre-funds a balance, the platform authorizes against it before each request, and the balance depletes as usage happens.

The credit is what turns real-time authorization into a product experience instead of a plumbing detail. Products like Midjourney and Lovable use this model directly. Credyt has a separate guide on credit-based billing for teams considering it.

What to consider when implementing real-time monetization?

Real-time monetization requires five architectural properties to hold under production load: an authoritative real-time balance, atomic price-and-debit transactions, multi-asset support, per-customer cost attribution from day one, and authorization that runs before the cost is incurred. Missing any one of them introduces the failure modes the architecture is meant to prevent.

  1. The customer's balance has to be the authoritative source of truth. It needs to update atomically on every event. A stale balance read under concurrency lets two simultaneous requests both see a sufficient balance, both proceed, and together push the account below zero. A read that returns five-minute-old state is not a balance check; it is a guess.
  2. Pricing and the balance debit have to be one transaction. Two API calls a few milliseconds apart introduce a window where the system can charge twice, charge nothing, or charge against a stale state. Atomicity is the property that prevents this.
  3. Multi-asset support if the product uses non-USD units. Tokens, GPU hours, image credits, and named units have to live on the same balance as any cash credit the customer holds. Bolting on a separate asset system later is more expensive than building it once.
  4. Per-customer cost attribution wired in from day one. Margin governance cannot be retrofitted onto a system that captures aggregate cost only. Only 43% of AI companies track AI cost per customer, and only 22% track it per transaction (CloudZero State of AI Costs 2025). The number that matters is the per-customer one. The system has to be designed for it.
  5. Authorization has to run before the cost is incurred, not after. A meter read at cycle end can report on overruns. It cannot prevent them. The architectural question is whether the system can block the request, not whether it can describe it later.

Any team that hits all five is, at some point, in the market for a tool that already does this. The five points above are the spec.

How Credyt handles real-time monetization

Credyt is the real-time monetization infrastructure that runs atomic authorization, multi-asset wallets, per-customer cost attribution, and pre-funded balance enforcement as one system. The platform authorizes usage before the cost is incurred, debits the customer's balance atomically as the usage event arrives, and shows per-customer margin in real time.

With Credyt, you can:

  • Authorize usage before the work starts through a single POST endpoint. Pricing, charging, and the balance update run as one atomic transaction.
  • Hold multi-asset customer wallets: USD, tokens, GPU hours, and custom assets tracked in separate accounts within a single customer wallet.
  • Run any pricing structure on top: flat subscription, per-unit, prepaid credits, hybrid, credit and token, tiered, volume, dimensional.
  • Ship a branded customer billing portal with live balance, usage history, and self-service top-ups. No frontend engineering required.
  • See per-customer cost attribution and live profitability analytics from day one, with revenue and cost correlated per customer and per workload.
  • Connect through Credyt's MCP server inside Cursor, Windsurf, Claude Code, Codex, Lovable, Bolt, Replit, V0, and any other AI coding tool that supports MCP.

Teams ship real-time usage-based billing in hours, not the six to twelve months a build-from-scratch effort takes. Explore the docs to see what the integration looks like end to end.

Don't let monetization slow you down.

Free to start. Live in hours. No engineering team required.