The foundation of how Large Language Models (LLMs) process information.
The Core Idea: AI models do not "read" words like humans do. They process Tokens. A token isn't necessarily a word; it's a chunk of characters. Understanding tokens is the first step to mastering context window limits, cost estimation, and the "probability" of intelligence.
Type below to see how GPT-4 sees your text. (Powered by js-tiktoken running locally in your browser).
The demo above uses BPE, the standard algorithm for GPT-4. It doesn't use a dictionary of words. Instead, it uses a frequency map.
" the" (with a space) are assigned a single, efficient ID (e.g., 262)."Sounny" might become "Soun" + "ny" (2 tokens)."AI" and " AI" (with a leading space) are different tokens. This is why trailing spaces in your prompts can technically waste money!The Context Window is finite (e.g., 128k tokens). If your bibliography is 130k tokens, the model physically cannot "see" the beginning. Understanding token density helps you fit more data into the "Brain."
APIs charge per 1M tokens.
Input: ~$2.50 / 1M tokens.
Output: ~$10.00 / 1M tokens.
Efficient prompting saves grant money.
Estimate the cost of a research project based on your typical prompt length.
Use the Live Tokenizer above to solve these mysteries:
Can you predict the "Intelligence Budget" for this phrase?
Action: Type the number 1000. Then type 1,000. Then 1 000.
Observation: LLMs are notoriously bad at math. Why? Look at how the tokens break the numbers apart visually. They typically don't see "One Thousand," they see "One" and "Zero Zero Zero."
Action: Paste a block of Python code with heavy indentation.
Observation: Look at the whitespace. Are spaces essentially free? No. Every 4 spaces (tab) is often a token. Deeply nested code eats your Context Window faster than flat code.
Action: Type Apple vs apple.
Observation: They are completely different IDs. The model has to "learn" the concept of the fruit twice (once for each capitalization state) in its embedding space.