Back to blog
Hackathon Diaries: Scaling Flux.2 with vLLM & Credyt in 24 Hours
Engineering

Hackathon Diaries: Scaling Flux.2 with vLLM & Credyt in 24 Hours

DDaniel Münch
Daniel Münch
|

There is a special kind of energy when the whole team is finally in the same room. We were just in Berlin for our latest get-together: Coffee was flowing, snow was falling, ideas were flying, the view over Berlin Tiergarten was amazing and naturally, we couldn't resist a spontaneous mini-hackathon.

The challenge? Build something fun, functional, and fully self-hosted before heading out for a Schnitzel (including vegan options).

My goal was to bridge the gap between "Digital Sovereignty" (which I love) and our actual GTM strategy. I wanted to host the new FLUX.2 model for high-fidelity image generation, but I didn't want to just burn GPU hours. I wanted a sustainable economic model.

I wanted a system where users could generate art, but also earn the right to generate more by sharing their creations. A true viral loop, powered by a wallet-native architecture (aka Credyt).

The Architecture: MLX vs. vLLM (Choosing the Right Hammer)

First, the stack. If you’ve read my previous posts, you know I’m a huge fan of MLX for running models locally on Apple Silicon. It’s the gold standard for personal, offline inference: efficient, private, and runs right on my MacBook.

But here’s the reality check: MLX is optimised for single-user latency, not high-concurrency serving.

To open this up to the team (and potentially the internet), I couldn't have my laptop melting a hole in the table. I needed raw throughput and efficient batching. That meant switching gears to vLLM running on NVIDIA A10Gs via Modal.

If you haven't played with Modal yet, it’s a game changer for this kind of project. It’s a serverless platform where you define your entire infrastructure: OS packages, CUDA drivers, volumes, and hardware requirements, all directly in your Python code. No wrestling with Dockerfiles or Kubernetes YAML hell. You just add a @app.cls(gpu="a100") decorator, and it spins up a remote A100 in seconds, handles the job, and scales back to zero when you're done.

While MLX is the king of local sovereignty, vLLM on Modal is the beast you need when you actually want to ship. It handles the request queue beautifully, keeping the GPU saturated without the overhead.

While MLX is amazing for "local sovereignty," vLLM is the beast you need when you actually want to ship. It handles the request queue beautifully, keeping the GPU saturated without the overhead.

The Viral Loop: Earning Credits by Sharing

But a high-performance GPU backend is just a money pit without a business model. We didn't want a boring "Stripe Checkout" form. We wanted to incentivise distribution.

The model we hacked together is simple:

  1. Sign up
    You get 10 free image credits (funded by a "gift" transaction in Credyt)
  2. Generate
    Each generation deducts 1 credit in real-time
  3. Share
    When you generate an image, you get a public "capability URL"
  4. Earn
    If someone else views your image, you earn 0.1 credits back

Get 10 people to look at your art? You’ve earned a free generation. It’s a self-sustaining viral loop.

The Engineering Challenge: Idempotency & Deduplication

Now, the backend engineer in me immediately spotted the exploit: What if a user just sits there hitting refresh on their own image link to farm infinite credits?

In a traditional setup, I’d be writing a complex Redis state machine to track IP addresses, timestamps, and rate limits. But we had a deadline, and I didn't want to spend the hackathon writing boilerplate.

This is where Credyt's native idempotency handling saved us. I just needed to be smart about the transaction_id.

In the FastAPI backend, when a file is requested, I generate a deterministic UUID based on the visitor's IP and the filename:

# Create stable transaction_id from client IP and filename
client_ip = request.client.host
transaction_id = str(uuid.uuid5(uuid.NAMESPACE_URL, f"{client_ip}:{filename}"))

# Call Credyt to gift credits
await credyt.adjust_wallet(
    customer_id=customer_id,
    transaction_id=transaction_id, # <--- The magic happens here
    asset="IMG",
    amount=0.1,
    reason="gift"
)

Because Credyt respects idempotency, if the same IP views the same file 100 times, Credyt sees the same transaction_id and only processes the wallet adjustment once.

I didn't have to write a single line of database code to prevent fraud. The infrastructure handled it.

Why Credyt? (The Hackathon Realization)

I’ve built auth and billing systems before. It usually takes weeks. With Credyt, we had the economy running before the first slice of pizza was gone.

  • Wallet-Native
    We weren't just charging cards; we were managing a ledger of "IMG" assets
  • Real-Time Auth
    Before vLLM even spins up, Credyt checks if the user has a balance
  • Zero Frontend Lift
    Credyt provided the billing portal, so I could keep my frontend as a lightweight Alpine.js SPA.

Want to dive deeper? The API is super clean, check out the Credyt Documentation to see how easily it slots into your stack.

Open Source or It Didn't Happen

We believe in showing our work. If you want to see exactly how we glued Modal, vLLM, and Credyt together, or if you just want to fork it and build your own viral AI app, the code is up on GitHub.

Check it out here github.com/credyt/photo-booth

2026 Prediction

We are going to see a shift from "flat subscription" SaaS to "wallet-native" apps. The ability to mix prepaid credits, viral "gifts," and real-time usage is just too powerful for AI products to ignore.

This hackathon was a blast. I got to play with state-of-the-art diffusion models on vLLM and build a robust economic engine around it, all without losing the "get-together" vibe to debugging billing code.

If you’re building AI tools and still sending invoices at the end of the month... come hang out with us in Berlin next time. Or just spin up a Credyt wallet.