Token Usage Policy

What is a Token in the Context of ModelXpert AI Platform?

A token is a unit of text that AI models use to process information for both input (when you send a prompt) and output (when the model generates a response). A token can be a word, part of a word, or a few characters.

Each AI model (ChatGPT, Gemini, Claude, Perplexity, DeepSeek, Grok) has its own tokenization method to count tokens.

Token Estimation

Although it depends on the specific model, for simplicity:

Each token is approximately ¾ of a word
750 words equal approximately 1,000 tokens

Token Usage for Different Features

Thinking Models:

When a thinking model is used, tokens are consumed during the AI's reasoning process. The number of tokens depends on:

The complexity of the task
The reasoning level selected (low, medium, or high)

Image Generation:

Tokens are also counted when generating images.

How Are Tokens Counted?

Input Tokens

When you send a prompt, tokens are counted based on the length of words or characters:

Approximately 1 token = ¾ of a word
For example, 75 words = approximately 100 tokens

Output Tokens

For Non-Reasoning Models:

Token counting is similar to input
Approximately 1 token = ¾ of a word
For example, a 300-word response = approximately 400 tokens

For Thinking/Reasoning Models:

Output tokens include thinking tokens + response tokens
Example: 1,000 thinking tokens + 400 response tokens = 1,400 total output tokens

Practical Example

Scenario 1: Using ChatGPT-5 Nano (No-Reasoning Model)

Your Prompt: "How does ChatGPT generate responses?"

Input tokens counted: 7 tokens

ChatGPT Response: "ChatGPT generates responses using a type of artificial intelligence... refined by humans to give meaningful and safe answers — not just random text." (338 words)

Output tokens counted: 441 tokens

Scenario 2: Using ChatGPT-5 Reasoning Model

Same Prompt: "How does ChatGPT generate responses?"

Input tokens: 7 tokens (same as before)
Response tokens: 441 tokens (same as before)
Thinking tokens: 1,000 tokens (depending on task complexity)
Total output tokens: 1,441 tokens

Chat History and Context

When you continue chatting in the same conversation, the AI remembers previous context. Here's how tokens are counted:

Second Prompt in Same Chat: "What are the limitations of ChatGPT's knowledge and reasoning?"

The system counts:

Previous chat history as input: 7 (previous input) + 441 (previous output) = 448 tokens
Note: Previous thinking tokens are NOT included in history
Current prompt: 12 tokens
Total input tokens: 448 + 12 = 460 tokens
Output tokens: Counted based on the new response generated

Important Notes:

As you continue chatting in the same conversation, input tokens increase with each prompt because the entire chat history is sent to maintain context
This is why we provide 6 million input tokens — which is sufficient for extensive conversations
When you start a fresh chat, no history is sent, so input token counting starts from zero

How Are Tokens Counted on ModelXpert for Standard & Premium Models?

Standard Models

Token usage for Standard models is counted as 1x — the same as the actual tokens consumed by the model for both input and output.

Exception: Perplexity models are counted at 2.5x the actual token usage.

Premium Models

Token usage for Premium models is counted as 4x the actual tokens consumed for both input and output.

Exception: Perplexity Sonar Pro is counted at 6x the actual token usage.

Detailed Token Counting by Model

Note: All models share the same base allocation of 8 Million total tokens (6 Million Input + 2 Million Output). The rate multiplier determines how much of this allocation you can actually use with each model.

Category	Models	Rate	Effective Usage
Standard	Gemini-2.5-Flash Gemini-2.5-Flash-Lite GPT-5 Nano GPT-5 Mini Claude Haiku 3 Grok-4.1 Fast Reasoning Grok-4.1 Fast Non-Reasoning Grok-4 Fast Reasoning Grok-4 Fast Non-Reasoning Grok-3 Mini Deepseek Chat Deepseek Reasoner	1x	8 Million Total 6 Million Input 2 Million Output
Standard	Perplexity Sonar Perplexity Sonar Reasoning	2.5x	3.2 Million Total 2.4 Million Input 800,000 Output
Premium	Gemini-2.5-Pro Gemini-3 Low Reasoning Gemini-3 High Reasoning GPT-5.1 High Reasoning GPT-5.1 Medium Reasoning GPT-5.1 Low Reasoning Claude Sonnet 4.5 Claude Sonnet 4 Sonar Reasoning Pro Grok-4	4x	2 Million Total 1.5 Million Input 500,000 Output
Premium	Perplexity Sonar Pro	6x	1,333,000 Total 1 Million Input 333,000 Output
Image Generation	Gemini 2.5 Flash Image (Nano Banana)	6x	338,000 Total (Output) Only counted as Output tokens

Note: Per image generation consumes approximately 1,300 actual tokens.

Reasoning and Non-Reasoning Models

You should know which models are reasoning and non-reasoning because reasoning models consume more tokens as the AI uses additional tokens for thinking and processing complex tasks.

Provider	Reasoning Models	Non-Reasoning Models
ChatGPT	GPT-5.1 High Reasoning GPT-5.1 Medium Reasoning GPT-5.1 Low Reasoning	GPT-5 Nano GPT-5 Mini
Gemini	Gemini-2.5-Pro Gemini-3 Low Reasoning Gemini-3 High Reasoning	Gemini-2.5-Flash Gemini-2.5-Flash-Lite
Claude	Claude Sonnet 4.5 Claude Sonnet 4 Claude Haiku 3	—
Grok	Grok-4 Grok-4.1 Fast Reasoning Grok-4 Fast Reasoning Grok-3 Mini	Grok-4.1 Fast Non-Reasoning Grok-4 Fast Non-Reasoning
Perplexity	Perplexity Sonar Reasoning Perplexity Sonar Reasoning Pro	Perplexity Sonar Perplexity Sonar Pro
Deepseek	Deepseek Reasoner	Deepseek Chat