← Back to blog
AI Pricing

How to design hybrid pricing for AI products

Ben Foster
By Ben Foster·Founder

Ben has built fintech products and scaled technology teams from an early stage through to unicorn. He was previously VP Engineering at TrueLayer and SVP Engineering at Checkout.com.

On this page

Hybrid pricing combines a fixed subscription base with usage-based charges for consumption above an included allowance. AI products converge on it because flat-rate erodes gross margin and pure usage pricing creates bill anxiety. This guide covers how it works, the design process, migration steps, and the margin traps to avoid.

What is hybrid pricing for AI products?

Hybrid pricing, sometimes called subscription plus usage pricing, charges a fixed subscription fee plus metered usage above an included allowance. The customer pays a predictable base every cycle, the base includes a set amount of usage, and consumption beyond that allowance is billed by the unit. It sits between two models that each break in a different direction for AI products.

Flat-rate pricing charges the same fee no matter how much a customer consumes. It is predictable for the buyer and dangerous for the vendor, because AI cost is variable per request and a heavy user can consume far more than they pay. Pure usage-based pricing inverts the problem: it protects the vendor's margin but gives the customer no floor, which creates forecasting problems and bill anxiety for both sides. This is the same trade-off that runs through consumption-based pricing more broadly, and hybrid is the structure most teams land on once they have felt both extremes.

Most products described as usage-based are hybrid in practice. Clay, Cursor, and Figma all carry a base fee or a minimum commitment alongside their metered usage. The pure model, with no commitment at all, is rarer than the marketing language suggests. That detail matters operationally: a model with a floor behaves very differently from one without, and AI products with variable inference costs need the floor.

Why do AI products converge on hybrid pricing?

AI products converge on hybrid pricing because their cost structure punishes flat-rate and their buyers resist pure usage. The convergence is not a fashion. It is two forces meeting in the middle.

The first force is margin. AI cost of goods is variable per inference, but a flat subscription charges the same for 50 requests and 3,000. The gap between those two customers is pure gross-margin erosion, and at scale it compounds quietly. Replit watched its gross margin on an AI agent product swing from 36% to -14% when the agent consumed more model capacity than its pricing covered (Wildfire Labs analysis, March 2026).

GitHub described the same mechanism when it moved Copilot off flat-rate: under the old plan, "a quick chat question and a multi-hour autonomous coding session can cost the user the same amount" (GitHub, April 2026). When the most engaged customer is the least profitable, flat-rate is the mechanism doing the damage.

This is not a problem SaaS teams are used to. Traditional SaaS runs gross margins of 80 to 90%. AI-native products run 50 to 60% (Bessemer AI Pricing and Monetization Playbook, February 2026). The gap does not close with scale; inference is roughly 23% of revenue at scale and rising (ICONIQ, January 2026). The playbook that worked at 85% margin does not survive at 55%.

Hybrid is one of the five API monetization models teams weigh, and for AI products the margin math reorders the list.

The second force is the buyer. Pure usage-based pricing solves the vendor's margin problem and creates a new one: unpredictable bills. Enterprise buyers reject this directly. In a16z's enterprise survey, CIOs cited cost unpredictability and attribution as their main objections to usage-only and outcome-only models.

Hybrid resolves the standoff: the base fee gives finance a predictable floor, the included allowance covers the median customer's real consumption, and overage exists but is bounded.

The data shows the convergence happening. Hybrid pricing went from 27% to 41% of B2B software companies in a single year, the largest shift of any model (Growth Unhinged, June 2025), with ICONIQ projecting 48% adoption in 2026. Credit-based implementations, the most common way to package hybrid, grew 126% year-over-year across the top 500 software companies (PricingSaaS, December 2025).

How does hybrid pricing work?

Hybrid pricing works by combining four components: a base fee, an included allowance, an overage rate, and rules for what happens to unused allowance. Get these four right and the model holds; get the allowance or the overage rate wrong and it leaks margin or loses customers.

Flat-ratePure usage-basedHybrid
Customer cost predictabilityHighLowMedium (predictable base, bounded overage)
Vendor margin protection under variable costLowHighHigh
Bill-shock riskNoneHighPossible above allowance
Revenue floorYesNoYes
Fit for variable AI costPoorGoodGood

The clearest current example is GitHub Copilot. As of June 2026, Copilot Pro is $10 a month with $10 of included AI Credits, Business is $19 per user with $19 of credits, and Enterprise is $39 per user with $39 of credits. Usage above the included credits is billed at published token API rates, and routine code completions are not metered at all (GitHub, April 2026).

The structure maps exactly to the four components: a base fee, an included allowance equal to the base, a metered overage rate, and a carve-out for low-cost actions. You can read more on how GitHub Copilot's billing architecture works and why agentic workloads forced the change.

Clay follows the same pattern with different units. Its Launch plan is $167 a month with 15,000 included actions and 2,500 data credits. Consumption above that is billed at roughly a 30% markup over the plan rate, and Enterprise customers can roll over a portion of unused credits (Clay pricing, 2026). Cursor went through the same shift in June 2025, moving its Pro plan from 500 fixed fast requests to $20 of credits charged at published API rates (Cursor, June 2025). Three different products, the same skeleton.

The component teams underestimate is the overage rate, because for variable AI cost overage handling is an authorization question, not a reporting one. A single agentic session can consume what used to be a month of usage. If the billing system only counts events and totals them at cycle end, the cost is already incurred before anyone sees it.

The durable pattern, which Snowflake, AWS, and Cloudflare all arrived at independently, is a soft cap. Notify the customer at 75% of the limit, suspend gracefully at 100% so in-flight work completes, and escalate above that. Doing this well means checking the balance before each inference call rather than after, so the decision to continue or stop happens at the moment of spend.

How to design a hybrid pricing plan

Designing a hybrid plan is a five-step process: set the commitment, size the allowance, set the overage rate, decide rollover, and instrument usage. The order matters, and the most common mistake lives in steps two and three.

  1. Set the base commitment. Decide the fixed fee and what it buys beyond usage: support tier, seats, features, SLAs. The base is the predictable floor finance is buying, so it should map to a real unit of value, not an arbitrary number.

  2. Size the included allowance to the median user. Pull your usage distribution and set the included amount to cover what a typical customer actually consumes in a cycle. The median user should rarely see an overage line. This keeps the base fee honest and avoids surprising the bulk of your customers.

  3. Set the overage rate to cover the P90 user's marginal cost. This is the step teams skip. If you price only for the median, your heaviest users will run at zero or negative margin. Run the unit economics at the median, the 75th percentile, and the 90th percentile: revenue minus fixed cost minus variable cost times volume. At 2,000 to 3,000 queries a month, contribution margin can hit zero at typical compute rates (Wildfire Labs, March 2026). As Ridgeway Financial put it, pricing is a compute-risk management tool, not just a go-to-market decision. Set the overage rate above your P90 marginal cost.

  4. Decide rollover and expiry. Choose whether unused allowance carries forward and for how long. A rollover policy improves customer goodwill, but an unbounded one becomes a balance-sheet liability. ElevenLabs caps rollover at two months on paid plans, which is a reasonable default: generous enough to feel fair, bounded enough to stay manageable.

  5. Instrument usage so you can meter, attribute, and bill it. The minimum viable setup is one meter, one usage metric, one data source. Real-time streaming is not required to validate the model; a daily batch upload from your database is enough to go live. Add real-time ingestion when you need usage caps, in-product dashboards, or sub-daily billing. The point is to capture usage per customer accurately before you optimize the pipeline.

If you are early, this is easier than it sounds. Early-stage products have a pricing advantage: a small customer base means you can change the model with a handful of conversations rather than a migration program.

How to migrate flat-rate customers to hybrid pricing

Migrating flat-rate customers to hybrid pricing follows five phases: launch on new contracts first, segment existing customers by usage, match the message to each segment, grandfather strategically, and communicate through account teams. This is change management, not pricing theory, which is why pricing guides rarely cover it. The model design is the straightforward half; moving customers onto it without losing them is where the work lives.

Launch on new contracts first

Roll the hybrid model out to new customers before you touch the existing base. The first billing cycle surfaces edge cases nobody anticipated, and it is far better to find them with a handful of new accounts than with the whole installed base at once. Give yourself 30 to 45 days between the new-contract launch and the existing-base migration, and use that window to validate the model and build operational muscle around hybrid invoicing.

Segment before you communicate anything

Pull 12 months of usage per customer and sort them into low, mid, and high consumption. Each bucket reacts differently. Low-usage customers may pay more if the base fee exceeds their old flat rate, so the risk is sticker shock.

High-usage customers will pay more, and the open question is whether they read it as fair or treat it as a reason to renegotiate. Mid-usage customers are usually the easiest, because the new math often lands close to what they already pay. Know the bill impact for every account before a single email goes out.

This step depends on data most teams do not have. Only 43% of companies can attribute AI cost to a specific customer, and only 22% track it per transaction (CloudZero, May 2025). Without 12 months of per-customer usage, you cannot segment, which is one more reason to instrument usage early.

Match the message to the segment

For low-usage customers, anchor on the included allocation. "You now get a set amount of usage included before any charges apply" is an easier message than "your monthly fee went up." For high-usage customers, lead with the volume economics, which often improve their unit cost as they scale. Some will welcome it.

Others will use it to open a negotiation, and that is not a failure. The customers who push hardest on price during a migration are usually the most engaged, which makes them good candidates for a committed-usage deal that works for both sides. Be ready for that conversation rather than surprised by it.

Grandfather strategically, not by default

Blanket grandfathering is a trap; it freezes a growing share of your base on economics that no longer work. With an entitlements system you do not need it. You can override the new plans with the old terms for specific accounts and retire them on your own schedule.

Either way, give a clear 3 to 6 month transition window with a fixed cutover date. A cleaner option turns the change into a commercial opportunity: offer to honor flat-rate pricing for any customer who commits to a multi-year contract inside the migration window.

Communicate through your account teams

Every customer above a meaningful revenue threshold should hear about the change from their account manager before it shows up on an invoice, and the conversation doubles as a health check. Customers who discover a pricing change through an automated email escalate, and that escalation costs more time and relationship capital than the call would have. This is exactly the playbook Clay used when it rewired its pricing: communicate early, directly, and through the people who own the relationship.

What are the gotchas?

Four operational failures recur in hybrid pricing: revenue leakage from missed events, invoices customers cannot read, enterprise contract terms that break standard billing models, and margin surprises discovered at month-close. Most surface months after launch, not at rollout, which is why they are worth designing against from day one.

Revenue leakage from missed events. When events are dropped, delayed, or attributed to the wrong customer, you deliver value you never bill for. Teams typically discover this six months in, when a reconciliation turns up a gap. Run that reconciliation from the first billing cycle instead: compare total events your product emitted against total events the billing system recorded, per customer, per period. Any gap is a drop or a double-count, and catching it early is far cheaper than backfilling a quarter of missed revenue.

Invoices customers cannot read. A hybrid invoice is a more complex document than a flat subscription charge. If a customer cannot trace why the bill was $4,200 this month against $3,100 last month, you get disputes. Every metered line item should link to a usage report, and a customer-facing view that shows consumption by period, ideally before the invoice finalizes, resolves most of these tickets before they open.

Enterprise contract edge cases. Enterprise terms routinely break standard billing models: a base fee waived for the first 90 days, a monthly cap on usage charges, a minimum committed volume, a multi-year schedule with annual escalators. If the billing platform cannot model these natively, they end up in a spreadsheet, and the spreadsheet is eventually wrong at the worst possible moment.

Margin surprises at month-close. A high-usage customer can look excellent on an ARR dashboard while quietly destroying gross margin. Run the cost-versus-revenue reconciliation per customer cohort, not in aggregate. Your top 10% of customers by consumption should clear your cost floor. If they do not, the fix is the usage pricing, not the sales motion.

Frequently asked questions

These questions cover the most common points of confusion when designing and launching a hybrid pricing model for AI products.

What is the difference between hybrid pricing and usage-based pricing?

Pure usage-based pricing is consumption only, with no fixed commitment: you pay for exactly what you use. Hybrid adds a base fee or an included allocation underneath the usage. Most products marketed as usage-based pricing for AI are hybrid in practice, because they carry a minimum, a base, or an included amount. Clay, Cursor, and Figma all fit this description.

The distinction is operational: pure usage has no floor, which creates revenue volatility and forecasting problems on both sides.

How do I handle free tiers in a hybrid model?

Fold the free tier into the base as an included allocation. The "free tier" becomes the set of units that do not trigger charges, and metered billing begins once a customer exceeds it. This is cleaner than running a separate free product, because it ties the included usage to a paying relationship from the start rather than maintaining two pricing systems.

What is the minimum viable metering setup to go live?

One meter, one usage metric, one data source. Real-time streaming is not required on day one; a daily batch upload from your database is enough to validate the model in production. Add real-time ingestion when you need usage caps, in-product usage dashboards, or sub-daily billing cycles. Do not over-engineer the metering layer before the pricing model has been validated against real customers.

How do I prevent revenue leakage when moving to automated metered billing?

Run a reconciliation check from the first billing cycle and keep it running every period after. Compare total usage your product emitted against total usage the billing system received, per customer, per period. Any gap is events dropped or double-counted. The mistake is treating this as a launch-time check rather than a standing one.

How do I communicate a pricing change to enterprise customers mid-contract?

Directly, early, and through the account team. Segment customers by bill impact before you communicate anything, then give 60 to 90 days of notice with a fixed transition date and a bill estimate built from each customer's actual historical usage. Treat the pushback as commercial rather than defensive: the customers who resist hardest are often your best expansion candidates.

When should I move from hybrid to outcome-based pricing?

When two things are true: the product can reliably deliver a measurable outcome rather than just actions that contribute to one, and you have the telemetry to attribute that outcome cleanly. Hybrid is often transitional, a stepping stone between flat subscription and outcome pricing. GitHub Copilot is the textbook path: per-user first, then hybrid with AI credits, now positioning autonomous agents toward per-outcome. Each move followed a real capability shift. Do not jump to outcome pricing before the telemetry can support it.

Next steps

A hybrid model is only as reliable as the billing system underneath it. The pieces that break under variable AI cost, real-time authorization, accurate per-customer usage, and an invoice a customer can read, are the pieces worth building on infrastructure designed for them.

That is what Credyt provides for AI products:

  • Hybrid billing: a recurring base fee with a bundled credit allowance that refreshes on a schedule, plus metered overage once the allowance is exhausted.
  • Grants and entitlements: issue trial credits, promotional grants, and plan entitlements with optional expiry dates, without building grant logic yourself.
  • Branded billing portal with usage history: the customer-facing usage view that prevents most invoice disputes.
  • Profitability analytics: per-customer cost and revenue side by side, so a negative-margin customer surfaces before month-close instead of at renewal.
  • Product catalog and versioning: change pricing rules in the dashboard without touching code, and keep full product version history with a diff review before saving.

See how Credyt handles hybrid pricing without stitching billing infrastructure together.

Don't let monetization slow you down.

Free to start. Live in hours. No engineering team required.