Hey, it's me, josh - as a memoji

josh.miami

The repo looked done until I tried to run it

Josh Echeverri
Josh Echeverri
8 min read

Saturday was where TokenGolf stopped being a neat concept and started becoming a real tool.

That shift was immediate.

By the end of the planning session, the idea felt unusually complete. That was useful. It was also mostly meaningless until the code existed in an environment that could fight back.

So I took the browser-chat artifacts, brought them into Claude Code, and basically said: grok this repo.

The scaffold had the shape of a tool

At first glance, it looked promising.

The scaffold had all the right shapes. src/, hooks, scoring logic, state, TUI components, even a dead SQLite file sitting around like it was contributing something. It had the posture of a real tool.

That's not the same thing as being one.

The first time I actually tried to run tokengolf install, it failed immediately on JSX syntax.

That was a clean reality check.

The repo looked like an app, but Node was trying to execute JSX directly and there was no build step. So the first milestone had nothing to do with the game. It was much more basic: make the code compile.

The first real feature was "it builds"

TokenGolf is a Node CLI. I built it on Node 24, but kept it compatible back to Node 18. The command surface used Commander. The terminal UI used Ink, React, and @inkjs/ui. That stack made sense for the product, but it also meant the source couldn't just be run raw by Node.

So the first real fix was adding esbuild.

Keep src/ as source. Bundle to dist/cli.js. Wire npm run build. Make prepare run the build so npm link doesn't leave you with a nice-looking package that still can't execute.

Not glamorous, but load-bearing.

The project's first real feature was "it builds."

// package.json
"scripts": {
  "build": "esbuild src/cli.js --bundle --platform=node --format=esm --loader:.js=jsx --packages=external --outfile=dist/cli.js && chmod +x dist/cli.js",
  "prepare": "npm run build && husky"
}

Runtime reality started showing up everywhere

Once that was fixed, the next lie surfaced.

The CLI now ran, but the hooks were getting installed to the wrong place. The reason was subtle and exactly the kind of thing that turns scaffolds into real tools: process.argv[1] gave me the symlink path, not the real binary path. So after npm link, the install logic was writing hook paths that pointed into the wrong place.

Easy enough to fix once it was obvious. Follow the real file path instead of trusting the symlink.

Then came the next version of the same lesson: install can't just work once.

If rerunning tokengolf install duplicates entries or trashes config, you don't have a usable devtool. You have a setup script that creates cleanup work. So hook entries got a marker so install could dedupe cleanly on rerun.

That's quiet progress, but it matters. Real tools have to survive repetition.

Then one of the hooks just hung.

Again, not mysterious in hindsight. SessionStart was written as if it would receive piped stdin. It doesn't. So the hook was waiting for input that was never coming. The fix was to stop pretending the lifecycle was something it wasn't. Rewrite it as synchronous. No stdin read. No invented contract.

That was probably the first moment the project really changed shape.

Up to that point, TokenGolf still felt like "take the fun design and wire it up." But once the hooks started exposing what Claude Code actually does and does not provide, the work got more serious. The bottleneck wasn't game design. It was host-environment truth.

// src/lib/install.js — follow symlinks so npm link resolves to real hook paths
const realEntry = fs.realpathSync(process.argv[1])
const HOOKS_DIR = path.resolve(path.dirname(realEntry), "../hooks")

// Hooks are tagged _tg: true so install can dedupe cleanly on rerun
function upsertHook(event, entry) {
  const existing = settings.hooks[event] || []
  const filtered = existing.filter(
    (h) => !h._tg && !h.hooks?.some((e) => e.command?.includes("tokengolf"))
  )
  settings.hooks[event] = [...filtered, { _tg: true, ...entry }]
}
// hooks/session-start.js — synchronous, no stdin; SessionStart doesn't pipe data
const cwd = process.env.PWD || process.cwd()

let run = null
try {
  run = JSON.parse(fs.readFileSync(STATE_FILE, "utf8"))
} catch {
  /* no run */
}

Then the score broke

Once the CLI built, the install path was sane, and the hooks could at least fire without hanging, the obvious next test was a real session.

If TokenGolf was going to be a game about cost, the cost had to be believable.

The first real run came back at $9.41.

The actual Claude Code session cost was $0.49.

That's not a rounding problem. That's not "we're a little off." That's "the game is invalid if this is what it thinks happened."

So everything stopped there.

At first it looked like cost parsing was broken. That wasn't the real problem.

The real problem was context.

TokenGolf was trusting run.cwd, and in this case that cwd pointed at the TokenGolf repo itself because the hook had fired while I was working on TokenGolf. So when the tool went looking for the transcript, it found the expensive TokenGolf dev session instead of the target project session. The number was real. It was just real for the wrong thing.

That distinction matters a lot. Wrong by a little is annoying. Wrong about what world you're in means the system doesn't know what it's measuring.

The fix there was to stop trusting the stored working directory in that moment and use process.cwd() when the user actually runs the command. That fixed the wrong-project problem.

It also proved there was another problem.

// src/lib/cost.js — use process.cwd() (where user runs the command),
// not run.cwd (captured when the run started, can point to the wrong project)
export function autoDetectCost(run) {
  const projectDir = getProjectDir(process.cwd())
  // ...
}

export function getProjectDir(cwd) {
  return path.join(os.homedir(), ".claude", "projects", cwd.replace(/\//g, "-"))
}

The session wasn't really in one file

The next number was much closer, but still off by around 15 percent.

So now the issue had moved. The easy bug was gone. What was left was more interesting.

The remaining gap turned out to be mostly Haiku usage from subagent sidechains.

That was the point where the project had to stop treating the main transcript as "the session." Claude Code was writing a meaningful chunk of the run into separate .jsonl files. If TokenGolf only parsed the obvious transcript, it was always going to undercount. Not because the math was bad, but because the model of the session was incomplete.

That led to the most reasonable question in the whole debugging process:

why am I doing my own cost reconstruction at all? Why not just use the final stats Claude Code already knows?

That looked like the clean answer. Add the Stop hook. Read the exact total cost from the event payload. Stop doing homemade accounting.

It would've been nice.

The hook fired. It just didn't include total_cost_usd.

So the elegant solution died there.

Once that happened, transcript parsing had to get better instead of disappearing. The parser started scanning all relevant .jsonl files modified since the run began, which meant pulling in subagent sidechains and the missing Haiku spend along with them.

That made the score believable enough to matter.

// src/lib/cost.js — scan ALL .jsonl files modified since run started,
// not just the main session file — subagent sidechains live in separate files
function findTranscriptsSince(projectDir, sinceMs) {
  try {
    return fs
      .readdirSync(projectDir)
      .filter((f) => f.endsWith(".jsonl"))
      .map((f) => ({
        p: path.join(projectDir, f),
        mtime: fs.statSync(path.join(projectDir, f)).mtimeMs,
      }))
      .filter(({ mtime }) => mtime >= sinceMs)
      .map(({ p }) => p)
  } catch {
    return []
  }
}

Debugging turned into design

It also surfaced something more interesting than just a corrected total.

Claude Code sessions were often genuinely multi-model.

Sonnet might be doing the main work. Haiku might be handling side work, subagents, or cheaper background tasks around the edges. That wasn't just billing detail anymore. That was play style.

So the debugging work turned into design material.

Once Haiku's cost share was visible, it stopped looking like hidden accounting noise and started looking like something the game should acknowledge. That's where getHaikuPct() came from. That's where Frugal and Rogue Run came from. The system could now name a real behavior: letting cheaper models do more of the work.

That was the biggest change in the project on Saturday.

It started the day as a concept entering a scaffold. It ended the day as something much more specific: a game layer that only worked because it had learned how to interpret Claude Code's actual traces.

The build issues mattered. The install fixes mattered. The hook lifecycle fixes mattered.

But the real foundation of TokenGolf ended up being simpler than that:

the score had to be believable.

// src/lib/score.js — haiku share of total spend
export function getHaikuPct(modelBreakdown, totalSpent) {
  if (!modelBreakdown || !totalSpent) return null
  const haikuCost = Object.entries(modelBreakdown)
    .filter(([m]) => m.toLowerCase().includes("haiku"))
    .reduce((sum, [, c]) => sum + c, 0)
  if (haikuCost === 0) return null
  return Math.round((haikuCost / totalSpent) * 100)
}

// Achievement conditions
if (haikuPct >= 50)
  achievements.push({
    key: "frugal",
    label: `Frugal — Haiku handled ${haikuPct}% of session cost`,
    emoji: "🏹",
  })
if (haikuPct >= 75)
  achievements.push({
    key: "rogue_run",
    label: `Rogue Run — Haiku handled ${haikuPct}% of session cost`,
    emoji: "🎲",
  })

Saturday's real outcome

By the end of Saturday, the repo finally ran and the number was finally close enough to trust.

That still didn't make the product effortless.

It was real enough to matter now. But it still required too much deliberate effort to use, which made Sunday's next job pretty obvious: make it live inside the session instead of next to it.


Building TokenGolf is an ongoing series about turning Claude Code sessions into a roguelike. You can see the product and follow along at https://josheche.github.io/tokengolf/

Related Posts

What I shipped today in TokenGolf v0.4.0

Expansion and hardening at the same time. More achievements, more hooks, more structure, and the release infrastructure to keep the rules from drifting.

Sunday was when TokenGolf stopped feeling bolted on

By the end of Saturday, the score was believable. Sunday became the day of friction removal and deeper integration.

TokenGolf started with a simple question: what if AI cost had stakes?

Claude Code already tracked cost. The weird part was that the number didn't really matter. TokenGolf started there.