Hey, it's me, josh - as a memoji

josh.miami

TokenGolf started with a simple question: what if AI cost had stakes?

Josh Echeverri
Josh Echeverri
4 min read

Claude Code already tracked cost.

The weird part was that the number didn't really matter.

It was there. It moved. You could notice it if you cared. But it didn't shape the session. It was telemetry, not pressure. That felt like a wasted design surface.

So the question that kicked off TokenGolf wasn't "how do I visualize cost better?" It was simpler and sharper than that:

What happens when cost has consequences?

Not a dashboard. Not a report. Stakes.

The number was already there

Once I looked at cost that way, the mechanic showed up fast.

Token and dollar efficiency should become score. Not just something you inspect after the run. Something the run lives inside.

But one score system by itself felt too flat. Passive tracking is mildly interesting. It tells you what happened. It doesn't force a choice.

So the idea split almost immediately into two modes.

Flow mode was the low-friction version. Just work normally. Let the tool track what happened in the background. No declaration, no ritual, no pretending every Claude Code session needs to become a challenge run.

That part mattered because I know how these tools die. One extra step at the wrong moment and the tool becomes a thing you admire more than use.

But passive tracking alone wasn't the interesting part.

The interesting part was roguelike mode.

Pre-commit to a quest. Pick a model. Commit to a budget. If you bust the budget, the run dies. Permadeath. No soft reinterpretation after the fact. No "well technically I still got it done." The point was to make spend feel real while you were still in the run, not just obvious once it was over.

That was where the whole project clicked.

Flow mode could watch you. Roguelike mode could train you.

The framing showed up almost immediately

Once the stakes were real, the rest of the framing came surprisingly fast.

A score on its own is abstract. It needs a world around it or it just reads like accounting with better branding. But the second the session had consequence, it also wanted progression. It wanted floors. It wanted a boss. It wanted a sense that the work was moving through stages instead of just accumulating cost.

So the dungeon framing showed up early.

Write code. Run tests. Fix what broke. Review the output. Merge the PR. Boss fight.

That could've been corny if it had felt pasted on. It didn't. It felt like it was naming something coding sessions already have: momentum, attrition, recovery, overconfidence, dumb deaths, clean wins.

Same with the budget tiers. Same with the class system.

Claude Code's models already felt different enough that mapping them into game roles didn't feel like branding. It felt more like translation.

Haiku, Sonnet, and Opus already carry different assumptions. Different ceilings, different failure modes, different economics, different instincts about when you'd reach for them. So turning them into classes felt almost too easy.

The model identities were already there. TokenGolf was just making them legible in a different language.

The idea came together in a messy real workflow

The planning session itself was not elegant.

It wasn't some clean product workshop. It was a Claude browser chat spread across laptop and phone, typing in some places, dictating in others through whspr flow, moving fast enough that the medium should've made the idea mushy.

It didn't.

That was the first thing that made me pay attention.

Usually when an idea shows up in fragments like that, it takes a while before the shape gets clean. This one didn't. By the end of that session, the core pieces were already there: flow mode, roguelike mode, classes, tiers, floors, boss state, the central idea that cost wasn't just metadata anymore.

That kind of early coherence is useful.

It's also a little suspicious.

Because there's a specific kind of project that feels "done" in idea-space and then falls apart the second it touches runtime. The concept sounds complete. The actual system turns out to be mostly gesture.

I know that pattern well enough not to trust the early feeling too much.

Then it had to survive contact with code

So by the end of the planning session, the next move was obvious:

take the design artifacts, drop them into Claude Code, and find out what parts were fake.

That was the point where TokenGolf stopped being a neat idea and became a repo with a chance to embarrass itself.

The funny part is I thought the hard part was going to be game design.

It wasn't.

The framing arrived fast. The reality check arrived faster.


Building TokenGolf is an ongoing series about turning Claude Code sessions into a roguelike. You can see the product and follow along at https://josheche.github.io/tokengolf/

Related Posts

What I shipped today in TokenGolf v0.4.0

Expansion and hardening at the same time. More achievements, more hooks, more structure, and the release infrastructure to keep the rules from drifting.

Sunday was when TokenGolf stopped feeling bolted on

By the end of Saturday, the score was believable. Sunday became the day of friction removal and deeper integration.

The repo looked done until I tried to run it

Saturday was where TokenGolf stopped being a neat concept and started becoming a real tool. The first milestone had nothing to do with the game.