Hey, it's me, josh - as a memoji

josh.miami

Sunday was when TokenGolf stopped feeling bolted on

Josh Echeverri
Josh Echeverri
9 min read

By the end of Saturday, TokenGolf had crossed the first serious threshold.

The score was believable.

That mattered a lot, because a game built around spend collapses if the spend isn't real. But once that part was in decent shape, the next problem became obvious too: the product still asked too much effort from the user.

A sidecar you have to remember to use is still mostly a sidecar.

So Sunday became the day of friction removal and deeper integration.

Flow mode had to disappear into the background

The first move was the most important one. Stop making people explicitly start tracking just to be tracked. If the whole point of flow mode is low-friction awareness, it can't depend on one more command at the beginning of the session.

So SessionStart started auto-creating a flow-mode run whenever there wasn't already an active run.

That changed the product more than it sounds like it would.

Before that, TokenGolf still felt like a thing you consciously decided to use. After that, it started behaving more like infrastructure. Open Claude Code, work normally, and the run exists. That's a much better product shape. People use what doesn't require one more burst of intention at the wrong moment.

// hooks/session-start.js — auto-create a flow run if none is active
if (!run || run.status !== "active") {
  if (!fs.existsSync(STATE_DIR)) fs.mkdirSync(STATE_DIR, { recursive: true })
  run = {
    id: `run_${Date.now()}`,
    quest: null,
    model: "claude-sonnet-4-6",
    budget: null,
    effort: detectEffort(),
    fastMode: detectFastMode(),
    spent: 0,
    status: "active",
    mode: "flow",
    promptCount: 0,
    totalToolCalls: 0,
    startedAt: new Date().toISOString(),
    // ...
  }
  fs.writeFileSync(STATE_FILE, JSON.stringify(run, null, 2))
}

Then it had to move inside the session

But auto-tracking only solved the startup problem.

The game still lived adjacent to the actual session. It could track what happened. It still didn't feel like it was participating in the run while the run was unfolding.

That's where the two high-leverage integration points showed up: StatusLine and SessionEnd.

StatusLine mattered because it already had the thing TokenGolf cares about most: live cost. Once that existed as a real-time surface, the score stopped being purely postmortem. It could become part of the texture of the session itself.

Now the status bar could carry live spend, the tier emoji, budget pressure, efficiency state, context-window load, model class, and effort label. Not after the run. During it.

That's a very different feeling.

Before that, cost was something you discovered at the end. Once it lived in the status bar, it became something you felt while making decisions. That moved the product closer to a game layer and further from a report.

It also made context pressure visible in a way that fit the framing cleanly. Once the context bar gets heavy, Claude starts reasoning in a different environment. That's a real resource problem, not just an implementation detail. So the game started naming that too.

// src/lib/install.js — statusLine must be an object, not a string
settings.statusLine = {
  type: "command",
  command: STATUSLINE_PATH,
  padding: 1,
}

The run needed a real ending

Then there was SessionEnd.

StatusLine made TokenGolf present while the run was live. SessionEnd made it feel native when the run actually resolved.

Before that, tokengolf win still existed as a manual cleanup step. Fine for development. Not the shape you want if the product is trying to disappear into the workflow. SessionEnd let the run resolve automatically on /exit: scan the transcripts, determine the state, save the run, print the scorecard.

That was a bigger step than it sounds like on paper. The second the product can auto-payoff at the natural end of the session, it stops feeling bolted on.

TokenGolf scorecard: SESSION COMPLETE with achievements, extended thinking invocations, model usage breakdown, and tool call summary

There was still some hook plumbing weirdness in there, of course. There always is. But by that point the important part wasn't the shell scripting. It was the shift in feel. TokenGolf was now living in the same lifecycle as the work itself.

// hooks/session-end.js — determine run outcome from exit reason + cost
const cleanExits = [
  "clear",
  "logout",
  "prompt_input_exit",
  "bypass_permissions_disabled",
]
const fainted =
  !cleanExits.includes(reason) && reason !== "other"
    ? false
    : reason === "other"

let status
if (run.budget && result.spent > run.budget) status = "died"
else if (fainted)
  status = "resting" // hit limit, run continues next session
else status = "won"

Fainted proved the framing was useful

That also opened up a more interesting design problem.

Not every session ends in a clean win or a clean bust.

Sometimes you hit Claude Code's usage limits. The session stops. The work doesn't feel finished, but it also doesn't feel dead. Treating that as failure would've been wrong. Treating it as success would've been nonsense. So the game needed a better state.

That's where Fainted came from.

At first glance, that sounds like flavor text. It wasn't. It was a more accurate model of what actually happened. The run wasn't dead. It was interrupted. It was resting. The state persisted. Session count increased. You could come back and finish it later.

That one mechanic did more than just add personality. It proved the dungeon framing was actually useful.

Fainted, Made Camp, Came Back, No Rest for the Wicked, Ghost Run. Those labels work because they compress real session patterns into something people understand immediately. They're not just jokes pasted on top of metrics. They make the behavior easier to reason about.

// hooks/session-end.js — read exit reason from stdin
let stdin = ""
try {
  stdin = fs.readFileSync("/dev/stdin", "utf8")
} catch {}

let event = {}
try {
  event = JSON.parse(stdin)
} catch {}
const reason = event.reason || "other"

// reason === 'other' means unexpected exit (usage limit hit = Fainted)
// clean exits: 'clear', 'logout', 'prompt_input_exit', 'bypass_permissions_disabled'
const cleanExits = [
  "clear",
  "logout",
  "prompt_input_exit",
  "bypass_permissions_disabled",
]
const fainted =
  !cleanExits.includes(reason) && reason !== "other"
    ? false
    : reason === "other"

The budget system needed to mean the same thing across models

That same pattern showed up again when the budget system got revisited.

The original tiers were simple and readable. They were also too model-agnostic to hold up. A number that feels tight but fair on Haiku can be instant death on Opus. The problem wasn't that the early tiers were badly intentioned. It's that equal numbers across different model classes don't mean equal commitments.

So the budget presets got recalibrated around actual model economics. Not to make them mathematically pretty, but to make them mean roughly the same thing across classes. The point wasn't identical absolute thresholds. The point was comparable pressure.

Sunday also added a real effort model.

Haiku stayed simple. Sonnet got Low, Medium, and High. Opus added Max. Fast mode got treated as a separate signal instead of collapsing it into the same choice. That mattered because the game was getting better at naming not just which model you picked, but how aggressively you were using it.

// src/lib/score.js — budget presets calibrated per model class
export const MODEL_BUDGET_TIERS = {
  haiku: { diamond: 0.15, gold: 0.4, silver: 1.0, bronze: 2.5 },
  sonnet: { diamond: 0.5, gold: 1.5, silver: 4.0, bronze: 10.0 },
  opusplan: { diamond: 1.5, gold: 4.5, silver: 12.0, bronze: 30.0 },
  opus: { diamond: 2.5, gold: 7.5, silver: 20.0, bronze: 50.0 },
}

export function getModelBudgets(model) {
  const m = (model || "").toLowerCase()
  if (m.includes("haiku")) return MODEL_BUDGET_TIERS.haiku
  if (m.includes("opusplan")) return MODEL_BUDGET_TIERS.opusplan
  if (m.includes("opus")) return MODEL_BUDGET_TIERS.opus
  return MODEL_BUDGET_TIERS.sonnet
}

Ultrathink created the first real mid-run decision

Then Sunday found the mechanic that was most interesting from a gameplay perspective: ultrathink.

A lot of the early systems were observational. They described what had already happened. Useful, but still reactive. Ultrathink was different because it introduced a real mid-run decision.

Do I spend here or not?

That mattered because ultrathink isn't some separate command panel. It's just natural language. Write "ultrathink" in the prompt and suddenly you're triggering a more expensive reasoning path with real budget consequences. That makes it a game mechanic whether you call it one or not.

Once that became visible, TokenGolf had to acknowledge it.

The transcript already held the traces. Thinking blocks were there. So parsing those blocks into invocation counts and approximate token usage made the choice legible. Now the scorecard could show when the run leaned on extended thinking. Now the game could tell the difference between using it well and leaning on it badly.

That's where the achievement set around it came from.

Spell Cast for using it and winning. Calculated Risk for using it and still staying very efficient. Deep Thinker for leaning into that mode multiple times. Silent Run for winning cleanly without needing it. Hubris for the run where you reached for expensive thought and still died.

Hubris is probably my favorite one in the bunch because it doesn't soften failure. It names it.

// src/lib/cost.js — detect ultrathink from thinking blocks in transcripts
export function parseThinkingFromTranscripts(paths) {
  let invocations = 0
  let tokens = 0
  for (const p of paths) {
    const lines = fs.readFileSync(p, "utf8").trim().split("\n")
    for (const line of lines) {
      const entry = JSON.parse(line)
      if (entry.type === "assistant" && Array.isArray(entry.message?.content)) {
        const thinkBlocks = entry.message.content.filter(
          (b) => b.type === "thinking"
        )
        if (thinkBlocks.length > 0) {
          invocations++
          for (const block of thinkBlocks) {
            tokens += Math.round((block.thinking?.length || 0) / 4) // ~4 chars/token
          }
        }
      }
    }
  }
  return invocations > 0
    ? { thinkingInvocations: invocations, thinkingTokens: tokens }
    : null
}

Sunday changed how the product felt

By the end of Sunday, TokenGolf had changed again.

Saturday was about truth. Sunday was about feel.

The product wasn't just tracking a run anymore. It was participating in how the run felt while it was happening. It had real-time pressure. It had a better model of interruption and recovery. It had a cleaner budget language across models. It had its first mechanic that actually changed behavior mid-session instead of just describing it later.

That was the point where it finally stopped feeling like a clever wrapper.

It started feeling like a real game layer on top of Claude Code.

That still didn't make it release-ready.

By Sunday night, the product felt native and strategically interesting. But there was still a lot of behavior in the data that the game wasn't naming yet, and the rule system still wasn't hardened enough to ship without flinching.

That made Monday's job pretty clear: deepen the behavior model, harden the system, and release the version that actually feels real.


Building TokenGolf is an ongoing series about turning Claude Code sessions into a roguelike. You can see the product and follow along at https://josheche.github.io/tokengolf/

Related Posts

What I shipped today in TokenGolf v0.4.0

Expansion and hardening at the same time. More achievements, more hooks, more structure, and the release infrastructure to keep the rules from drifting.

The repo looked done until I tried to run it

Saturday was where TokenGolf stopped being a neat concept and started becoming a real tool. The first milestone had nothing to do with the game.

TokenGolf started with a simple question: what if AI cost had stakes?

Claude Code already tracked cost. The weird part was that the number didn't really matter. TokenGolf started there.