How to manage AI token costs

By Michael Domanic, Section Head of AI

We expected to spend about $20,000 on inference this month at Section. We’re going to land somewhere around $30,000. That’s almost 50% higher than planned, and my reaction was: good.

I realize that’s not the normal reaction to blowing a budget by 50%. But I’ve started to think that low token costs are a bigger red flag than high ones. If your inference spending is flat, it probably means your people are having occasional chats with AI rather than building automations, running agents, and actually changing how they work.

That said, “costs going up is good” is not a strategy. So here’s how I’m actually thinking about this at Section - and where I think most companies are getting it wrong.

The conversation that changed my thinking

A few weeks ago, I pulled up our Claude usage dashboard and looked at which individuals were generating the highest inference costs. One name surprised me - an employee who was consistently one of our top users, right up there with engineers.

I reached out and had a pretty frank conversation. Not accusatory - just curious. “You’re racking up high token costs. I’m not saying that’s wrong. Maybe it should be higher. But tell me what you’re doing.”

What I learned was that she’s managing a scope of work that’s honestly too large for one person to handle without AI. She’s using Claude to stay on top of work that would otherwise fall through the cracks. If her token costs went down, our outcomes would go down with them.

The traditional answer to “this person has too much on their plate” is to hire another human and split the work. Instead, she’s using inference to handle the load - and the cost of that inference, even extrapolated across the year, is a fraction of what another hire would cost.

That’s the calculation that matters – not “is this person spending too much on AI?” but “what would it cost us if they stopped?”

The rough math

Our CTO and I ran the numbers last week. Across our 50-person business, our total annual inference cost is projected to be roughly equivalent to one more senior FTE.

Now forget AI exists for a second and think about that differently. Imagine you could hire one additional person who would produce orders of magnitude greater outcomes across your entire organization - not in one department, but everywhere. You’d make that hire instantly. That’s what inference spending is right now.

The rough ratio we’re seeing: for every 50 to 100 employees, your AI inference costs will be equivalent to one or two FTEs. But the output of that “FTE” is distributed across your whole organization - every person doing more, handling more, building more than they could alone.

Why you can’t do ROI on a single agent

One thing I’ve realized as we build more automations is that the unit of measurement isn’t the individual agent - it’s the cluster.

We’re building a set of agents right now for our sales team around discovery call quality. Our CRO’s hypothesis is that better discovery early in the cycle leads to higher close rates, larger contract values, and faster disqualification of prospects that were never going to close - which makes sense.

But “better discovery” isn’t one agent, it’s a cluster. There’s an agent that reviews each discovery call against a scoring rubric, another that generates a report for the AE with what they captured and what they missed, and another that checks whether the gaps were filled in subsequent conversations. Some of these agents will get reused for post-sale work - reviewing whether we’re continuing to capture the right information from customers over time.

I can’t go in and say “this one agent is accountable for X dollars in token usage and Y dollars in revenue impact.” The agents share infrastructure, they get repurposed across workflows, and the outcomes they’re driving - ACV, close rate, time to close - are lagging indicators that won’t be clear for a quarter or two.

What I’d actually advise

It’s too early to have really strong opinions about inference cost management. We’re all in an experimental phase, and the worst thing you can do right now is rein in spending too much before you understand what it’s producing.

That said, here’s the rough framework I’m operating with:

Set reasonable caps so you don’t get a surprise. This is table stakes. Put limits on individual users and set an organizational ceiling. You’re not trying to restrict usage - you’re trying to prevent runaway costs from a misconfigured automation that runs every 15 minutes and burns tokens on nothing useful (I’ve done this myself).

‍When someone’s costs are high, have a conversation before making a judgment. The instinct is to see a high number and assume waste. Sometimes it is - sometimes it’s a person managing too much work in a more efficient way than asking for headcount. You won’t know until you ask.

‍Don't publish token usage leaderboards. This is a mistake I've seen companies do to drive adoption. What they actually drive is tokenmaxing and employees burning inference on personal projects at company expense. Leaderboards measure volume, not value, and people optimize for whatever you measure. You end up paying for activity, not outcomes.

Track costs at the project level, not the agent level. Individual agent costs are meaningless in isolation. What matters is the total inference cost of a project or workflow relative to the business outcome it’s producing. And be patient with the measurement - the meaningful indicators are going to lag by at least a quarter.

Resist the urge to optimize too early. If your team continues to hit their limits, go up incrementally. Have conversations about what value is being produced. But don’t let the CFO shut down experimentation because the line item is growing. The line item should be growing right now.

The honest truth is there’s no roadmap to this yet. We’re figuring it out in real time, and I suspect every company is. The one thing I’m fairly confident about: a year from now, the companies that held inference spending flat will wish they hadn’t. And the companies that let it grow - while paying attention to what it was producing - will be in a very different position.

See you next week,

Michael

Your fellow Head of AI

P.S. Section just introduced a new slate of agentic enterprise workshops. If you want to develop the culture of AI experimentation I’m talking about, we’d love to host these for your team.