OpenAI Assistants API and Context Tokens

OpenAI Assistants API and Context Tokens


I was impressed with how fast GPT-4o was and the pricing for the API was quite a bit cheaper. So I decided to play with their Assistants API and see if I could build my own agent locally to browse the internet. It was surprisingly easy to do and I used it for some small tasks here and there. I was never concerned about pricing; I checked my usage it was low enough not to ring any alarm bells.

Building a React Frontend for the Assistant

I then decided it'd be cool to build a React frontend for this assistant because interacting via the terminal was difficult at times. So I did my normal thing which was to ask the ChatGPT interface to give me some frontend code. But I was feeling particularly lazy... so I instead added some functions for my agent that allowed it to run shell commands so it could write the code for me. Brilliant idea! (Don't worry, I had some code in place that told me what function it wanted to run and I could prevent it from running something if it went off the rails.)

Achievements and Challenges

It actually worked pretty well! I was impressed, it created a React frontend that worked, and it even required rewriting some of the Python assistant code to get it to work with this new client/server model. Then I decided I wanted to start expanding the functionality of the app to basically rebuild the ChatGPT interface so I could switch Threads, load previous conversations, etc. I downloaded the OpenAI Python implementation and told my assistant where it was. It started browsing files and trying to make updates (at this point it started screwing up massively and basically broke the code, but that's a discussion point for another blog post).

Uncovering the Token Usage Mystery

I noticed it was reading a lot of files and got curious as to what my usage was up to. I was surprised to see my usage was at ~$15 for only 5-10 minutes worth of work! What happened?! Were these files it was reading huge or something? Then I looked in my Activity tab and this is what I saw:

Understanding Context Tokens

HOW IN THE WORLD DID I REACH 3 MILLION TOKENS? For reference, 1 token = ~3/4 of a word. So 1 million tokens would be ~750k words. 3 million tokens would be ~2.25 million words! If we assume ~500 words per page on a book, and a book has ~300 pages, that would be ~15 books worth of text! That was also over the span of ~10-15 minutes like I said. This is insane. The generated token count looks sane, so what are context tokens?!

Turns out it's pretty ambiguous... and I found a thread of other people discussing this. It's good to know I'm not alone! The best guess so far is that the entire conversation history gets fed back into the AI model every time you send a message as, well, context! If that's the case, this seems completely unreasonable to use on a large scale as-is. You might need to run another AI model that somehow compresses the conversation and figures out a way to summarize the history, or perhaps store it in a way that the Assistant can query to find necessary context as opposed to combing through the entire message history.

Pausing and Reassessing My Coding Assistant

I put my coding assistant on pause for now because I think I'll need a way to have 2 assistants work together in order to prevent it from making silly mistakes (e.g. one that writes code and another that acts as QA and tests it and then manages the local version using something like git). And this led to my current project of trying to get 2 separate AI assistant instances that can communicate with me and each other in a single chat.

Switching to Gemini for Cost-Effective Development

Given how quickly my expenses would grow, I've started building this on Gemini. I was hoping they had billing figured out so I could actually use it without their lower "free tier" quotas, but they seem to keep pushing the billing start date back.

For now though, the free tier Flash model (direct competitor to GPT-4o) seems good enough for building and testing. And the best part is that it's free!

Conclusion: Managing Context Tokens and Future Developments

Just wanted to share this little exploration and discovery about "Context Tokens" in case someone else runs into this issue. You'll need to find a way to better manage context to avoid massive API usage charges. Maybe OpenAI is working on adding Memory to their Assistants API now. It's still in beta, so I'm sure more changes are coming. But for now, Gemini is my go-to for doing this kind of stuff because the free tier for their Flash model is more than good enough for my building and testing.

Back to blog

Leave a comment

Please note, comments need to be approved before they are published.