Lessons for any AI team from GitHub Copilot's "usage-based billing" landing

On April 27, 2026, GitHub announced that Copilot is moving to usage-based billing on June 1. The reaction was the sharpest yet for any AI billing redesign. The most-repeated question across the developer community was the obvious one: "If we are going to pay API rates, why wouldn't we just get Codex or Claude Code? What is the logic here?"

This is not new in AI tooling. Cursor opened refund windows in June 2025 after swapping request caps for a credit pool priced at API rates. Lovable shifted to complexity-based credit consumption in July 2025. Vercel restructured pricing in September 2025. Each rollout drew the same shape of reaction.

The reason GitHub's announcement is the most instructive is not that the design is the worst. It is that the direction is so obviously right. AI products genuinely need usage-based billing. Frontier model providers cut prices multiple times through 2025. New models arrived every three to four months and reset the price curve. Agentic adoption pushed tokens-per-task up by orders of magnitude. A flat subscription with a fixed monthly allowance cannot absorb that. GitHub had to move.

The problem is what they actually moved to. You cannot ship usage-based pricing without shipping the usage-based experience that goes with it. GitHub changed the pricing model and left the customer surface untouched. There is no balance the user can see and trust, no spend controls at the moment of decision, no clear rate card in the IDE. The pricing label moved. The product around it did not. The gap between cost incurred and cost recorded is exactly where real-time economic control becomes the differentiator.

"Usage-based" is doing a lot of work in this announcement

The April 27 announcement renames the unit. Premium requests, the abstract counter introduced in 2025, become AI Credits priced in dollars: 1 credit = $0.01, with per-token rates published per model (GitHub Copilot pricing docs, April 2026). The rate card becomes legible. That is the visible improvement.

What is not changing is the architecture underneath. Pro is still $10 per month for an allowance bundle of 1,000 AI Credits. Pro+ is still $39 for 3,900. Allowances still reset on the first of every month, and nothing rolls over. Overage still bills automatically against a card on file. Users still cannot pre-pay a balance the way they can on Anthropic or OpenAI consoles. There is still no pre-request cost preview in the IDE. Annual subscribers still found out about new model multipliers from an email, not at renewal.

In billing terms, this is a fixed-fee subscription with usage-based overages, billed via end-of-cycle invoice. The rate card is more legible now, but the architecture is unchanged. It is not pay-as-you-go, it is not prepaid credits, and it is not real-time billing. The label describes the unit. The system underneath is the same one users complained about for the last twelve months.

That mismatch between the label and the system is what the reaction is about. The complaints converge on three specific implementation failures. None are about whether usage-based billing is the right direction for AI products. The argument is over how it was implemented.

The three implementation failures users named

Failure 1: Monthly credits that expire

Users object to expiring credits because there is no balance they own. The most-repeated framing across the community was direct: "Expiring monthly credits. The fact these don't roll over and accumulate is criminal." A slightly cooler version: "An API wrapper where your unused credits expire at the end of the month."

Subscribe to Pro for $10 and you get 1,000 AI Credits at API rates. Subscribe to Pro+ for $39 and you get 3,900. If you do not use them in 30 days, you lose them. The 2025 system had the same shape with 300 premium requests on Pro and 1,500 on Pro+, and the same monthly reset. Users went through twelve months asking for rollover; the rebuild kept the calendar boundary intact. The framing the community kept returning to was not "we want a balance." It was: why would anyone choose this over going direct to Anthropic or OpenAI, where $10 in pre-paid credits sits on the account until you spend it?

The architectural cause is not the absence of a balance internally. GitHub clearly tracks remaining credits inside the system, otherwise overage at the published rate would not work. The block is not technical capability. It is what the fixed-fee subscription model treats as default. Forecasting assumes a reset cycle. Revenue recognition assumes consumed-in-period. Customer support, refund logic, and accounting are all built around the calendar boundary. Rollover is buildable on top, but it is exception work in every layer. In a real-time wallet-native system, rollover is a configuration choice attached to each grant. Roll forever, expire in 30 days, convert to a different credit type are toggle settings, not engineering work.

The reset is not an architectural impossibility. It is the cheapest path inside a fixed-fee billing engine. The result is a product that users now compare directly to alternatives where balance does carry over. Prepaid credits at OpenAI or Anthropic. OpenCode Go credits with a multi-week consumption window. BYOK setups where any unused dollar stays in the user's account. The competitive frame switched from "Copilot vs no AI tool" to "Copilot vs every other place I can spend the same dollar." That is what usage-based billing does to a product when balance is not a customer-facing primitive.

Failure 2: No control point between the prompt and the bill

Users cannot see what a session will cost before it runs, and the platform cannot block runaway cost on their behalf. Two distinct problems live under the same complaint, and one of them is physically unsolvable.

Predicting the exact total cost of a request before it runs is not possible on any architecture. Output token count is not knowable before generation. Anthropic, OpenAI, and GitHub all face the same limit. No one shows users "this prompt will cost $X.XX." This is not a design choice. It is a property of autoregressive generation.

What is solvable is the obvious stuff that AI API providers already ship. Anthropic and OpenAI show users their balance, let them set spend caps, surface the rate card in the console, and bill against a prepaid pool that does not expire monthly. Those are table-stakes for an AI billing UX in 2026. GitHub Copilot wraps the same APIs but ships none of it. The benchmark of "good" is set by the very vendors Copilot resells, and Copilot lands below it.

In AI workloads, billing has to be a first-class part of the product. Balance and controls need to be visible in the IDE, in the agent, at the moment the user decides whether to spend. Fixed-fee subscriptions treat billing as a back-office function that produces an invoice at month-end. That works when the unit cost is stable. It breaks when the cost surface moves every quarter and the customer never sees the meter until the bill arrives.

Failure 3: Annual subscribers degraded mid-contract

The April 27 announcement creates two cohorts of subscribers with very different experiences. New monthly subscribers move to AI Credits with direct token rates; multipliers disappear from their pricing model. Legacy annual subscribers stay on the premium request system, but the multipliers are reassigned: GPT-5.4 from 1x to 6x, Sonnet 4.6 from 1x to 9x, Opus 4.6 and 4.7 to a flat 27x. Standard-tier models, previously included at 0x, are removed from annual plans entirely. Annual subscribers found this out from the announcement, not at renewal.

Their reaction was unambiguous. "How on Earth is reducing someone's service 90% legal for annual plan users?" "If someone just paid for the annual plan, they got royally screwed." Refund discussions filled every comment section.

GitHub clearly can version pricing. Every entry in the multiplier table is a price version. The November 2025 launch of dedicated SKUs for Spark and Cloud Agent shows the system supports separate billing units for separate features. The architecture is not the constraint. The way pricing changes are externalized is.

The cleaner way to externalize the same flexibility would be to represent the AI Credit as a custom asset with a fixed dollar conversion and a published cost in credits per model: Sonnet 4.6 burns 1 credit per 1,000 input tokens, Opus 4.7 burns 5, and so on. Pricing changes then live on the rate card the customer can see. Annual subscribers see what changed when it changes, not in an email three months after they paid. The April 27 announcement moves part of the way for monthly users, with a transparent unit and published rates, but keeps the multiplier mechanism for legacy annuals. The multipliers are exactly the lever that broke trust.

The lesson is simple. Publish the unit price clearly, and do not change the terms on customers halfway through a contract they have already paid for. GitHub did neither. The fixed-fee subscription model made the silent path the cheapest one.

Why fixed-fee subscriptions with end-of-cycle billing make these primitives expensive

There is nothing wrong with subscriptions as a commitment. What does not bend with the AI cost surface is the specific shape GitHub Copilot, Cursor, Lovable, and Vercel all share: a fixed monthly fee, a fixed monthly allowance, overages charged at end of cycle, and a billing engine that treats the cycle as the unit of forecasting. That architecture was built for a stable cost surface. The price of a unit lives inside the subscription itself. The customer has an allowance, not a balance. Forecasting assumes consumed-in-period. None of those primitives bend gracefully when the underlying cost moves between cycles.

The AI cost surface shifts faster than annual contracts can absorb. Frontier model providers cut rates multiple times per year, often without warning. New models arrive every three to four months and reset the price curve. Claude Sonnet 3.5 became 3.7, became 4, became 4.5, became 4.6 across 2025. GPT-4 became 4o, became 5 mini, became 5, became 5.5 over the same period. Agentic adoption pushed tokens-per-task up by orders of magnitude through 2025, raising effective cost per unit of work even when published rates held steady. A 12-month subscription locks the customer at a price assumption that becomes obsolete in a quarter.

The four things users asked for are not four independent feature requests. They are the inverse of four defaults baked into the fixed-fee subscription model.

What users asked for	Fixed-fee subscription default
A customer-facing balance with grant-level lifecycle	Monthly allowance, reset on the first
Per-usage gating and hard caps	Enforce at the end of the billing cycle
Configurable rollover at grant level	Lifecycle equals subscription cycle
Customer-visible rate card versioning	Price as a stored constant; change via back-end multiplier

Each pair is shaped by the same architectural assumption: the billing cycle is the unit of forecasting, not the request. Real-time wallet-native flips that. The request is the unit, and every primitive becomes event-driven instead of cycle-driven.

AI-native teams shipping their own usage-based billing transitions during 2026 will face the same trade-off at the same stages. A fixed-fee plan with a metered add-on can survive another cycle or two through multiplier changes. After that, balance, rollover, gating, and versioning all become forced asks. The choice is to rebuild the architecture or to repeat GitHub's landing. See rapid pricing iteration for why this compounds with company stage.

The architecture you choose determines what is cheap to change between requests and what is expensive.

Architecture	Cheap to change between requests	Expensive	Examples
Fixed-fee subscription with metered overages	Multiplier on the back-end. Headline price (politically expensive, technically cheap).	Customer-facing balance. Rollover. Per-usage gating. Rate-card change without an email campaign.	Stripe Billing; GitHub Copilot post-June 1
Invoice-based usage-based billing	Aggregations and end-of-cycle reconciliation. Per-usage rate cards.	Real-time gating before runtime. Per-grant rollover.	Orb, Metronome, Lago
Real-time wallet-native	Meter, price, balance, entitlement, and rollover policy, all per request, through configuration.	(none of the above)	Credyt, Stigg

What a smooth landing looks like

The cost of a single request will always be a posterior fact. Output tokens are not predictable, and no billing architecture changes that. What is solvable is what the customer sees and what the platform enforces around that uncertainty. Four primitives separate a smooth landing from a rough one. None of the four require predicting cost. All four require an event-driven balance.

The first is a customer-facing balance with a grant-level lifecycle. The customer sees their balance in real time. Rules attached to each grant (roll, expire, convert) are toggle settings, not features that need an engineering release.

The second is per-usage gating and hard caps. Pre-request balance check that blocks at zero. Pre-request hard caps configurable per user, per session, and per agent. Mid-flight stream-and-stop when consumption crosses a threshold. Not a coarse organization-level cap controlled by an admin. Granular, configurable, and exposed to the platform and the agent.

The third is configurable rollover at the grant level. Whether credits roll, expire, or convert is a configuration choice attached to each grant, not a property of the billing cycle. Switching from a non-rolling to a rolling model is a setting, not a billing-engine change.

The fourth is customer-visible rate-card versioning. When a model gets cheaper or more expensive, the rate card updates and applies to the next event. The customer sees the new effective price on the next request, not in an email three months later.

All four are technically buildable on any architecture. On a fixed-fee subscription billing engine, every one of them is exception engineering. On real-time wallet-native, every one of them is a default. The architecture you choose on day one decides whether each future pricing change is a project or a configuration. GitHub made that choice years ago. AI-native teams launching usage-based billing now are making it today.

Credyt is the AI monetization infrastructure built around these four primitives as defaults. Per-usage authorization, hard caps, configurable rollover at the grant level, and customer-visible rate-card versioning are not advanced features. They are the system. The four primitives users called out as missing from GitHub Copilot's rollout are how Credyt is built by default.

Read the technical overview at credyt.ai →