Token Usage Policy
What is a Token in the Context of ModelXpert AI Platform?
A token is a unit of text that AI models use to process information for both input (when you send a prompt) and output (when the model generates a response). A token can be a word, part of a word, or a few characters.
Each AI model (ChatGPT, Gemini, Claude, Perplexity, DeepSeek, Grok) has its own tokenization method to count tokens.
Token Estimation
Although it depends on the specific model, for simplicity:
- Each token is approximately ¾ of a word
- 750 words equal approximately 1,000 tokens
Token Usage for Different Features
Thinking Models:
When a thinking model is used, tokens are consumed during the AI's reasoning process. The number of tokens depends on:
- The complexity of the task
- The reasoning level selected (low, medium, or high)
Image Generation:
Tokens are also counted when generating images.
How Are Tokens Counted?
Input Tokens
When you send a prompt, tokens are counted based on the length of words or characters:
- Approximately 1 token = ¾ of a word
- For example, 75 words = approximately 100 tokens
Output Tokens
For Non-Reasoning Models:
- Token counting is similar to input
- Approximately 1 token = ¾ of a word
- For example, a 300-word response = approximately 400 tokens
For Thinking/Reasoning Models:
- Output tokens include thinking tokens + response tokens
- Example: 1,000 thinking tokens + 400 response tokens = 1,400 total output tokens
Practical Example
Scenario 1: Using ChatGPT-5 Nano (No-Reasoning Model)
Your Prompt: "How does ChatGPT generate responses?"
Input tokens counted: 7 tokens
ChatGPT Response: "ChatGPT generates responses using a type of artificial intelligence... refined by humans to give meaningful and safe answers — not just random text." (338 words)
Output tokens counted: 441 tokens
Scenario 2: Using ChatGPT-5 Reasoning Model
Same Prompt: "How does ChatGPT generate responses?"
- Input tokens: 7 tokens (same as before)
- Response tokens: 441 tokens (same as before)
- Thinking tokens: 1,000 tokens (depending on task complexity)
- Total output tokens: 1,441 tokens
Chat History and Context
When you continue chatting in the same conversation, the AI remembers previous context. Here's how tokens are counted:
Second Prompt in Same Chat: "What are the limitations of ChatGPT's knowledge and reasoning?"
The system counts:
- Previous chat history as input: 7 (previous input) + 441 (previous output) = 448 tokens
Note: Previous thinking tokens are NOT included in history - Current prompt: 12 tokens
- Total input tokens: 448 + 12 = 460 tokens
- Output tokens: Counted based on the new response generated
Important Notes:
- As you continue chatting in the same conversation, input tokens increase with each prompt because the entire chat history is sent to maintain context
- This is why we provide 6 million input tokens — which is sufficient for extensive conversations
- When you start a fresh chat, no history is sent, so input token counting starts from zero
How Are Tokens Counted on ModelXpert for Standard & Premium Models?
Standard Models
Token usage for Standard models is counted as 1x — the same as the actual tokens consumed by the model for both input and output.
Exception: Perplexity models are counted at 2.5x the actual token usage.
Premium Models
Token usage for Premium models is counted as 4x the actual tokens consumed for both input and output.
Exception: Perplexity Sonar Pro is counted at 6x the actual token usage.
Detailed Token Counting by Model
Note: All models share the same base allocation of 8 Million total tokens (6 Million Input + 2 Million Output). The rate multiplier determines how much of this allocation you can actually use with each model.
| Category | Models | Rate | Effective Usage |
|---|---|---|---|
| Standard | Gemini-2.5-Flash Gemini-2.5-Flash-Lite GPT-5 Nano GPT-5 Mini Claude Haiku 3 Grok-4.1 Fast Reasoning Grok-4.1 Fast Non-Reasoning Grok-4 Fast Reasoning Grok-4 Fast Non-Reasoning Grok-3 Mini Deepseek Chat Deepseek Reasoner | 1x | 8 Million Total 6 Million Input 2 Million Output |
| Standard | Perplexity Sonar Perplexity Sonar Reasoning | 2.5x | 3.2 Million Total 2.4 Million Input 800,000 Output |
| Premium | Gemini-2.5-Pro Gemini-3 Low Reasoning Gemini-3 High Reasoning GPT-5.1 High Reasoning GPT-5.1 Medium Reasoning GPT-5.1 Low Reasoning Claude Sonnet 4.5 Claude Sonnet 4 Sonar Reasoning Pro Grok-4 | 4x | 2 Million Total 1.5 Million Input 500,000 Output |
| Premium | Perplexity Sonar Pro | 6x | 1,333,000 Total 1 Million Input 333,000 Output |
| Image Generation | Gemini 2.5 Flash Image (Nano Banana) | 6x | 338,000 Total (Output) Only counted as Output tokens |
Note: Per image generation consumes approximately 1,300 actual tokens.
Reasoning and Non-Reasoning Models
You should know which models are reasoning and non-reasoning because reasoning models consume more tokens as the AI uses additional tokens for thinking and processing complex tasks.
| Provider | Reasoning Models | Non-Reasoning Models |
|---|---|---|
| ChatGPT | GPT-5.1 High Reasoning GPT-5.1 Medium Reasoning GPT-5.1 Low Reasoning | GPT-5 Nano GPT-5 Mini |
| Gemini | Gemini-2.5-Pro Gemini-3 Low Reasoning Gemini-3 High Reasoning | Gemini-2.5-Flash Gemini-2.5-Flash-Lite |
| Claude | Claude Sonnet 4.5 Claude Sonnet 4 Claude Haiku 3 | — |
| Grok | Grok-4 Grok-4.1 Fast Reasoning Grok-4 Fast Reasoning Grok-3 Mini | Grok-4.1 Fast Non-Reasoning Grok-4 Fast Non-Reasoning |
| Perplexity | Perplexity Sonar Reasoning Perplexity Sonar Reasoning Pro | Perplexity Sonar Perplexity Sonar Pro |
| Deepseek | Deepseek Reasoner | Deepseek Chat |
