Building your own payment stack costs far more than the estimate shows. At comparable revenue scale, Dropbox ran homegrown billing with 50 to 70 engineers; OpenAI runs it on a third-party platform with roughly 1.5 (Big Think, May 2025). The two-week build is the cheapest part; the correctness tax, the operational long tail, and the margin blind spot are the costs almost nobody prices in. This piece breaks down where that cost lands, and how it shifts with the kind of system you are building.
The part you can estimate is the part that does not matter
The part of a payment stack you can estimate in advance is the part that does not decide whether it works, and what the sketch even looks like depends on the system you are building. A pure payment system starts with a transactions table and a model around the lifecycle of a payment at a PSP. A traditional billing system starts with invoices, a scheduler, a PSP integration, and a reconciliation process. An AI or usage-based product starts somewhere that looks simpler: "we'll track a balance in Postgres," an endpoint that records usage and debits that balance, and a webhook when a card is charged.
This piece is about that third case, because it is where the gap between estimate and reality is widest. The estimate is two or three weeks, and it is honest. It is also wrong, because the balance sketch prices what is easy to see and ignores what determines whether the system is correct.
The decision also arrives bundled with a payment processor relationship. Whichever pattern you pick for connecting a PSP to your billing logic becomes infrastructure you own and maintain, and the PSP integration pattern you end up owning shapes setup time and lock-in more than the metering layer does. That is one more layer the sketch leaves out.
This is not a pitch for any one vendor. It is an argument about where engineering time goes once a billing system meets real traffic and pricing that changes every quarter. The sketch is a demo. The production system is a different object: it has to stay correct under concurrency, absorb pricing changes without a migration, and tell you which customers make you money. None of that is in the two-week estimate.
Where the true cost of building actually lands
The cost of a homegrown payment stack is not the build. It is the five things the build leaves out, each invisible at estimate time and unavoidable in production.
| Cost | What the estimate misses |
|---|---|
| Correctness tax | Idempotency, atomic concurrency, and event-log design a naive schema gets wrong |
| Operational long tail | Dunning, proration, refunds, revenue recognition, reconciliation, and tax; each a separate project |
| Margin blind spot | No per-customer or per-product cost attribution, so you cannot see your product's profitability or which customers lose money |
| Maintenance load | Every pricing change becomes an engineering ticket and a data migration |
| Headcount cost | 50 to 70 engineers at Dropbox scale against roughly 1.5 on a platform |
The estimate trap: weeks to demo, quarters to correct
Teams budget for the happy path and ship the edge cases for the next year. The clearest illustration is a headcount comparison. Scott Woody, who built billing at Dropbox before founding the metering platform Metronome, has described Dropbox running homegrown billing with 50 to 70 engineers at comparable revenue scale (Big Think, May 2025). OpenAI, by contrast, runs billing on a platform with roughly 1 to 1.5 engineers even after ChatGPT-scale growth. The source is a vendor founder, so read the exact numbers as directional; the shape of the gap is the point.
The dollar version is just as stark, and it does not get cheap because you have AI coding tools. We costed out a homegrown build for a single engineer, with AI assistance assumed, and the core components came to about 27 weeks before a line of product code ships:
| Component | Engineering weeks |
|---|---|
| System design and integration architecture | 3 |
| Basic credit and usage deduction | 2 |
| Wallet state management and consistency | 3 |
| Credit expiry and authorization logic | 2 |
| Auto top-up logic | 3 |
| Customer portal | 4 |
| Pricing rule engine | 3 |
| Per-customer cost tracking | 4 |
| Existing customer migration | 3 |
| Ongoing maintenance | 1+ week per month |
| Total upfront | ~27 weeks (~6 months) |
That is roughly six months and about $78,000 at $150,000 a year fully loaded for one engineer, then more than a week a month to keep it running. Independent agency benchmarks land in the same range: a custom billing build runs $45,000 to $350,000 and up, with 10 to 20% of that per year in maintenance (Appinventiv, January 2026). The two-week estimate and the six-month reality describe the same project.
This is the wall vibe coders hit when they build an app in hours but billing takes months. The app ships fast. The billing is the part that does not finish.
The correctness tax: AI breaks the assumptions billing was built on
The correctness tax is the engineering cost of making billing handle concurrency, idempotency, and retroactive repricing correctly; it is invisible in the estimate and unavoidable in production. Subscription billing was designed for human-paced, low-frequency events: one charge a month, a few plan changes a year. AI inference generates events at machine speed and in parallel, which breaks the assumptions a naive balance table is built on.
Arnon Shimoni's engineering breakdown lists fourteen distinct pains of building your own billing system (February 2024). The first is idempotency. When API rate limits force retries and a billing system runs across multiple instances, the same usage event can be charged twice unless every request carries a unique key and the system deduplicates them. That logic is not in the sketch, and getting it wrong means double-charging customers.
Concurrency is the second broken assumption. When several requests hit the same customer balance at once, each has to draw down from the true remaining balance, not a stale read, or two requests both pass a check the balance could only cover once. Underneath both sits an architectural fork a homegrown schema usually gets wrong. Most in-house systems increment a pre-aggregated counter, which makes retroactive repricing, refunds, and backfills destructive operations because there is no raw record to recompute from. The alternative is to store every usage signal as an immutable event and treat the invoice as a deterministic query over that log (Orb engineering docs). The second design survives corrections; the first does not. As one engineer put it in the Hacker News thread on the pains of building billing (February 2024), there are two kinds of engineers: those who have worked on billing, and those who have not yet learned why it is hard.
The operational long tail: the features no one demos
The operational long tail is the set of billing capabilities that no team demos but every production system requires. Each one is a separate engineering project, and none appear in the original estimate:
- Dunning: declined-card retries, how many times you retry, escalation rules, and how you tell the customer.
- Proration: what a mid-cycle plan change costs, across time zones, billing anniversaries, and partial periods.
- Refunds and credits: applied in the right order so the books stay correct.
- Revenue recognition: deciding when prepayments, credits, and usage become recognized revenue, under rules that are getting stricter in the US. See revenue recognition for usage-based billing.
- Failed top-ups and reconciliation: matching what your system thinks happened against what the team's PSP account actually settled.
- Tax: jurisdiction rules that change without notice.
The fourteen-pains list exists because every one of these surprised a team that thought it was nearly done.
The margin blind spot: you build revenue capture, not cost visibility
The cost of building that almost no estimate includes is this: while you build a system to capture revenue, you rarely build the one that tells you whether that revenue is profitable. A payments table records what a customer paid. It does not record what that customer cost you in inference, and at AI margins the difference is the whole business. Bessemer's 2026 analysis puts AI-native gross margins at 50 to 60%, against the 80 to 90% that classical SaaS enjoys (Bessemer Venture Partners, February 2026). At those margins, a single heavy customer can be unprofitable, and a stack that only tracks payments cannot tell you which one.
In our conversations with AI teams through the first half of 2026, the pattern we see most often is a billing stack that records what a customer paid and nothing about what that customer cost. The margin problem surfaces at month-end, not in real time, and by then the heaviest customers have already eaten the quarter. The schema choice that created the blind spot was made in week two, when the simplest table felt like the right one.
This is not hypothetical at the pricing layer either. Anthropic repriced its enterprise tier in April 2026, moving off a flat $200 per user per month toward a $20 seat plus pay-per-token model (The Information, April 2026), after inference costs ran ahead of a flat price. A team that cannot attribute cost per customer in real time cannot make that call before the margin is gone. This is why real-time per-customer economic control is a different requirement from billing.
The maintenance that never ends: pricing changes are code changes
Billing is not a project you finish; it changes as often as your pricing does, and AI pricing changes constantly. Kyle Poyar's Growth Unhinged tracked more than 1,800 pricing changes across the top 500 transparently-priced SaaS and AI companies in 2025, about 3.6 per company (Growth Unhinged, January 2026). ICONIQ's January 2026 survey of around 300 AI executives found 37% planning a pricing-model change in the next twelve months, with outcome-based pricing jumping from 2% to 18% adoption in six months (ICONIQ Growth, January 2026). Matthieu Hafemeister, co-founder of Concourse, changed pricing more than seven times in the first forty days after launch (a16z, July 2025).
For a team with billing in its own codebase, each of those changes is an engineering ticket, a data migration, and a reconciliation job. When the billing layer cannot absorb a change cleanly, customers feel it. Cursor's June 2025 pricing change burned through users' allocations within hours and forced a public apology from the company (TechCrunch). The cost of a homegrown stack is not the day you ship it. It is every pricing decision afterward that now needs an engineer.
This is not only about whether you change pricing; it is about whether you can. A new or repriced model can flip an AI-native company's margins overnight, so the ability to roll out a new pricing model, add dimensional pricing tied to workload attributes, test the response, and iterate is a survival capability, not a back-office convenience. A team that ships an engineering project for every change moves slower and prices less competitively than one that does not.
Build vs buy is really a rent-versus-own decision
When the correctness tax, operational long tail, margin blind spot, maintenance load, and headcount cost are totalled, the build vs buy billing system question changes shape. The estimate treats billing as a feature with a finish line. Production treats it as infrastructure with no finish line: a correct, concurrent, evolving system that has to absorb every pricing change and answer every margin question for as long as the company exists. The build cost is one-time and estimable. The carrying cost is permanent and almost never estimated.
That reframes the decision. The real question is not "can we build this cheaper than we can buy it?" It is "is billing our competitive advantage?" For a handful of companies the answer is genuinely yes. For most AI products, billing is the one system where being almost correct is a refund, an angry customer, and a financial-control finding all at once. Getting it perfectly correct wins you nothing your customers will ever praise.
As one commenter summarized the Hacker News thread, a working billing system is expected; it is all downside and no upside for the team that owns it. The companies most associated with AI billing reflect this. OpenAI runs on Metronome; Replit is among Orb's customers (TechCrunch, September 2024). The cautionary case is the 70-engineer billing org, not the model.
When build vs buy favors building
Building your own payment stack is the right call in three scenarios: a genuinely novel billing model, extreme scale, or a hard compliance requirement. Buying is not always correct, and a piece that pretended otherwise would be selling, not arguing.
The first case is a billing model no platform supports. If your pricing depends on an outcome-attribution scheme that does not exist in any vendor's data model, you may have to build the part that is actually differentiated, even if you buy the rest.
The second case is extreme scale, where vendor per-unit economics invert. This was Dropbox's original rationale: past a certain volume, a per-event or per-customer platform fee can exceed the fully-loaded cost of an internal team. The math only works at scale most companies never reach, but at that scale it is real.
The third case is a hard compliance or data-residency requirement. A team selling into regulated EU financial entities under rules like DORA, or one with strict self-hosting mandates, faces a real constraint. It may be easier to satisfy audit and residency requirements with a self-hosted open-source billing engine such as Lago than to negotiate equivalent terms with a managed vendor.
Outside those cases, the honest picture is a set of buy options, not a build-or-Credyt binary. Stripe Billing fits subscription-first products with simple overages. Orb and Metronome fit high-volume, enterprise, invoice-based contracts where usage is metered through a billing period and reconciled into an invoice at cycle end. Each is a real answer for a real reader.
Build what you are paid for; rent what just has to be correct
Build the thing your customers pay you for. Rent the thing that just has to be correct. A payment stack sits in the second category for most AI products: it is mission-coupled, unforgiving of error, and invisible when it works. The engineering quarter you would spend making it correct is a quarter not spent on the product that is actually your advantage.
For teams that reach this conclusion, the next decision is narrower than it looks. Real-time billing can be added alongside an existing subscription setup without a rip-and-replace, and adopting real-time billing without replacing your stack is usually the lowest-friction path.
Credyt exists to be the rented layer. It lets platforms authorize usage against a customer's balance in real time, then prices and debits each usage event as it happens rather than reconciling at cycle end. It attributes cost per customer, so the margin question is answered live instead of discovered at quarter end. The idempotency, the concurrency safety, the per-customer cost attribution, and the branded billing portal are the parts a team would otherwise spend that engineering quarter building. That is the gap the build estimate never shows, and the reason the true cost of a build vs buy payment system decision is higher than the sketch suggests.
