SaaS billing software for AI products splits into three architectures: subscription-first, invoice-based usage, and real-time usage. The right one depends on whether your cost per customer is fixed per seat or variable per inference. This article walks through the decision framework with case studies from GitHub Copilot, Cursor, and Replicate.
What "SaaS billing software for AI products" actually means
SaaS billing software is the system that turns product usage into revenue. It does four jobs: record what each customer used, price it, generate an invoice or charge, and collect payment. For traditional SaaS, those four jobs are predictable. The customer pays a flat per-seat fee on a cycle, the invoice prints itself, and the only variable is renewals.
For AI products, none of that holds. Every inference has a real cost that hits before the customer pays. The same user can run a five-second chat or a six-hour agentic task on the same plan. And the gross margin profile is structurally different. Bessemer's State of AI 2025 tracks two AI cohorts: fast-growth "Supernova" companies running near 25% gross margin and steadier "Shooting Stars" landing near 60%, both well below the 70–85% benchmark for classic usage-based billing SaaS (Bessemer, August 2025).
So when someone searches "SaaS billing software for AI products," they are not looking for the same tool a horizontal SaaS company would buy. Microsoft 365 Copilot charges $30 per user per month flat; that is one billing architecture. Replicate charges $0.04 per generated image; that is a different one. Both are "SaaS billing software." The systems running them are not interchangeable.
What are the three architectures of SaaS billing software for AI products?
SaaS billing software splits into three categories based on when the customer is charged and how usage is tracked. Each fits a different cost structure. Picking the wrong one means rebuilding billing infrastructure 6 to 12 months later, after the product has already scaled on it.
Subscription-first
A subscription-first SaaS billing platform like Stripe Billing is built around a recurring fee on a cycle. Metered add-ons exist, but they reconcile at cycle end on top of the base subscription. Stripe's Meter Events API caps event ingestion at 1,000 events per second in standard live mode (with a v2 stream tier at 10,000), and accepts backdated events within a 35-day window (Stripe API documentation, 2025). The architecture fits products where cost per customer is predictable per seat.
Invoice-based usage
Platforms like Orb, Metronome, and Lago capture usage events throughout the period, then reconcile them into an invoice at cycle end through a separate billing engine. Metering and billing are distinct stages with a defined handoff. The architecture fits enterprise contracts where customers expect cycle-end invoices, negotiated terms, and reconciliation built into the workflow. Stripe announced the acquisition of Metronome in December 2025 and completed it in January 2026 (Stripe newsroom, January 2026); the underlying architecture is unchanged.
Real-time usage
In a real-time architecture, the customer's balance is checked before the action, and pricing and balance debit happen as a single atomic operation as the event arrives. There is no separate metering-then-reconciliation stage. The platform asks the billing system, "does this customer have balance for the next inference?" If yes, the action proceeds and the balance is debited immediately. If not, the platform itself decides whether to block, throttle, or allow the action on credit; the billing system provides the real-time balance state, but the enforcement decision stays with the platform.
Real-time billing fits products with variable per-inference cost that need the option to stop runaway usage before it becomes a loss.
| Architecture | Latency from usage to billed | Source of truth | Can block before cost is incurred? | Fits AI inference workloads | Examples |
|---|---|---|---|---|---|
| Subscription-first | Cycle end (days to weeks) | Subscription plan | No | Only if cost is absorbed flat | Stripe Billing |
| Invoice-based usage | Cycle end (days to weeks) | Invoice at reconciliation | No | Partially (enterprise terms only) | Orb, Metronome, Lago |
| Real-time usage | Milliseconds | Customer's balance, updated continuously | Yes | Directly | Credyt, Stigg |
Neither architecture is universally better. They solve different problems. Subscription-first was designed for fixed seat counts and flat recurring invoices. Invoice-based usage was designed for enterprise contracts with negotiated terms. Real-time was designed for products where the lag between usage and billing is itself a risk.
How do you decide which billing architecture fits your AI product?
Five questions decide the AI billing architecture that fits your product. Feature checklists come later.
- Is your per-customer cost fixed or variable per inference? A flat per-seat product with bundled AI features can absorb inference cost in the subscription price. A pay-per-generation or per-token product cannot.
- Can you afford to discover overages at invoice time, or do you need to block them before they happen? Invoice-based and subscription-first both discover overages after the fact. Only real-time can authorize a debit before the cost is incurred.
- Do your customers expect monthly invoices, or do they expect their balance to drop in real time? Enterprise procurement runs on invoices. Prepaid-credit consumers expect the balance bar to move as they use the product.
- Is your contract structure standard or negotiated per customer? Self-serve public pricing fits subscription-first and real-time equally. Negotiated quarterly true-ups and complex commercial terms are the natural home of invoice-based platforms.
- How fast are your margins changing as you switch models? The ICONIQ State of AI panel (January 2026) found 37% of AI companies plan a pricing change within 12 months. If your margins move with every model release, the architecture has to keep up.
Map your answers to a category. Variable cost, need-to-block, real-time balance expectation, self-serve, fast-changing margins point to real-time. Fixed cost, invoice-time is fine, monthly invoices, enterprise terms, stable margins point to subscription-first. Variable cost, invoice-time, monthly invoices, enterprise, moderate margin change point to invoice-based.
How do real AI companies choose? Three case studies
Three companies, three architectures. Each chose for a reason rooted in cost structure.
GitHub Copilot, subscription-first that broke. On April 27, 2026, GitHub announced Copilot was moving to usage-based billing. The premium-request model was "no longer sustainable" because "a quick chat question and a multi-hour autonomous coding session can cost the user the same amount" (GitHub Blog, April 2026). New model: $10 per month for Pro with $10 in AI Credits, $19 per user per month for Business with $19 in Credits, effective June 1, 2026. The highest-profile public acknowledgement that flat per-seat breaks under agentic AI. Deeper write-up on GitHub Copilot's usage-based billing architecture.
Cursor, hybrid done opaquely. On June 16, 2025, Cursor replaced request caps with a credit pool priced at frontier-model API rates ($20 per month Pro, $200 Ultra). Users burned credits in days; CEO Michael Truell opened a 19-day refund window and apologized publicly (TechCrunch, July 2025). Cursor had been absorbing long agentic-task cost under the flat plan. The lesson is not "avoid usage pricing." Cursor kept the credit model. Opaque meters break trust regardless of how justified the change is.
Replicate, pure pay-per-inference (real-time). Replicate prices strictly by what is consumed: hardware time at $0.000025 per second on CPU to $0.0122 per second on 8x H100 GPUs, per-output for generation models (Flux 1.1 Pro at $0.04 per image), per-token for language models (Replicate pricing, accessed May 2026). No subscription floor. Balance is funded ahead and drawn down as inferences run.
The pattern is the same shift the wider market is making. Growth Unhinged's State of B2B Monetization (June 2025, 240-company panel) shows seat-based pricing fell from 21% to 15% in a single year and hybrid models jumped from 27% to 41%. The same panel records that the top 10% of users generate 70–80% of token consumption. a16z (December 2024) frames the cause: when AI resolves work that previously required human seats, vendor NRR breaks unless pricing moves to usage or outcomes.
What to look for in a billing platform once you have decided on the architecture
Most billing comparisons rank platforms on metering throughput, ingestion volume, and dashboard polish. Those are table stakes; every serious billing platform handles them. The criteria that decide whether your billing system holds up from $3M to $50M ARR are different.
| Common evaluation criterion | What actually matters at scale | Why it matters |
|---|---|---|
| Events per second | Pre-usage authorization | Throughput is table stakes. Whether the platform can check balance and let you block before cost is incurred is the capability that separates real-time architectures from invoice-based ones, and it is the only way AI margins survive runaway usage. |
| Metering speed | Per-customer margin visibility | Knowing what a customer paid is accounting. Knowing what they cost is survival. Aggregate dashboards lie; AI-product margins move with each customer's behavior, so margin has to be a live per-customer number, not a month-end report. |
| Ingestion volume | Multi-asset and hybrid billing in one system | Self-serve credits, mid-market subscriptions, and enterprise contracts is three billing modes. Running them on separate systems burns reconciliation cycles every month. Credit architecture depth (USD, custom credits, tokens, GPU hours in one customer balance) also decides what year-two product moves are possible. |
| Dashboard polish | PSP flexibility | Stripe acquired Metronome in January 2026. If your billing only works with one PSP, your vendor choice is no longer a choice. |
| Integration count | Pricing changes without engineering | AI pricing moves faster than any other category. A code deploy to change a rate card kills iteration speed; the right platform lets you change pricing in hours, not sprints. |
The right-hand column compounds. Pre-usage authorization is only useful with per-customer margin visibility to set thresholds; hybrid billing needs PSP flexibility to negotiate enterprise terms alongside self-serve credits. The platforms that hold up at $10M ARR are the ones that own all five. See our broader notes on AI monetization strategies for startups.
How Credyt fits the real-time usage architecture
Credyt is real-time billing infrastructure for AI products: authorization, pricing, and balance debit happen as a single atomic operation. If your product fits the real-time usage architecture, Credyt lets you ship that infrastructure without building from scratch. With Credyt, you can:
- Bill usage in real time. The event records the usage and debits the customer's wallet in a single atomic operation, with no separate reconciliation stage.
- See per-customer cost attribution as it happens, not at month end. Gross margin per customer is a live number.
- Run pre-funded credit wallets. The platform checks the balance via Credyt's Wallet API before the inference runs; Credyt provides the real-time balance state, and the platform decides whether to allow, throttle, or block. Optional wallet controls activate hard enforcement at zero.
- Ship a branded billing portal where customers self-serve top-ups, view usage history, and set their own auto top-up thresholds.
- Pay $1 per Monthly Active Wallet. The first 10 active wallets are free every month. No percentage of revenue. No markup on Stripe fees.
Credyt does not handle subscription-only billing or invoice reconciliation for negotiated enterprise contracts. If your architecture is subscription-first or invoice-based, the platforms named earlier are a better fit. For real-time usage workloads, expect to ship in days rather than the 6 to 12 months of in-house wallet engineering. Explore the platform.
Related resources
- What is usage-based billing? The umbrella concept and how it splits across architectures.
- How Stripe's usage-based billing actually works. The subscription-first architecture in detail, with code examples and rate-limit context.
- Usage-based billing software in 2026. Landscape view across the platforms named above.
