cursor chat token usage

Have you noticed that cursor chat requests can use a lot more tokens than you'd expect? 

Matt Pocock said it well: "tokens are the currency of LLMs". You're charged by the token. A little, but it does add up. So we need to pay attention to token usage.

I ran some tests in cursor chat to review token usage and was surprised by how many tokens are used by my chat requests. I ran these tests in a large repo which has a number of cursor rules files defined. 

Tests:

1. ask a general tech question in chat, not related to specific code in repo; context used: 19.7k tokens

  • prompt: "how should I choose between useSWR and react-router v6 for data fetching?"
  • to contrast, same question in Claude 4.0 outside of cursor user 28 input tokens and 768 output tokens in response
  • to contrast further, same question in Gemini 2.5 Pro outside of cursor used 19 input tokens and 1060 output tokens in response
  • almost one twentieth the token usage of cursor chat!!! wow

2. ask to write unit tests for a component in repo (auto includes cursor rules files): context used: 73.3k tokens

3. ask to write unit tests for a component in repo but don't include any cursor rules files: context used: 69.9k tokens

  • learning: the repos cursor rules adds approx 3k tokens to context requests.

4. ask to write unit tests for a component in repo and include rules files but don't add whole repo search to context: context used: 31.8k tokens

  • prompt: "generate unit tests for the file in the editor. BUT don't search the whole codebase or add the whole codebase to the context. Instead just follow the relevant rules and use only directly referenced files such as typescript definition"
  • learning: simple instructions to AI to limit context size has a big impact (halving number of tokens used in this case)


These results show that cursor chat context can be significantly/massively larger than one would expect (especially for a large repo). Like zoinks! 

If token costs were a factor, it would definitely change your behavior and when/how you'd use cursor chat. Cursor chat can also be slow (I find gpt5 in cursor slooow) because its building in all that context, which is why I sometimes use Gemini for general questions that don't need repo context.

But why is cursor behaving like this? Its because cursor uses Retrieval Augmented Generation (RAG) to search (retrieve) across the codebase to find relevant content which it then adds (augments) to our prompts as context. It then sends all this to the LLM for an answer (generation). For Test 1, even though the question is general and doesn't need codebase context, cursor does not know that and may also assume I'd ask follow up questions which require context.

Cursor advises: "working with AI models is all about managing the context you provide it."

So what goes into a cursor chat context?

It includes:

  • the user (your) prompt; which can be text or attached images or files
  • system prompt (tool creator created this)
  • user specified context, such as @ file and folder mentions etc.
  • conversation history, when having a conversation in chat, new requests include history
  • current open file(s) and recently viewed files
  • project wide information cursor grepped and searched for (up to cursor)
  • cursor rules

Now we better better understand why cursor context gets large (especially for larger repos).

Studies have shown that AI model performance degrades the larger the context gets. Its worth keeping that in mind, hence advice such as: starting new chats for new topics and closing unnecessary tabs. We also saw in Test 4 that "negative prompting" can significantly reduce token usage.

Cursor does chat summarization automatically for longer chats to keep it efficient. You can also manually trigger using the /summarize command in chat.

Comments

Popular posts from this blog

angularjs ui-router query string parameter support

typescript notes (typeof, keyof, ReturnType, Parameters, Extract)

deep dive into Material UI TextField built by mui