Cost Management - Monitor & Optimize
In this chapter, you'll learn how to understand, monitor and optimize costs for OpenAI usage.
💰 How Does Billing Work?
Pay-as-you-go Model
OpenAI bills on a Pay-as-you-go principle:
- You only pay for what you actually use
- No base fee, no subscription costs
- Billing is per token
What are Tokens?
Tokens are the smallest units that OpenAI processes:
- 1 Token ≈ 4 characters
- 1 Token ≈ 0.75 words (in English)
Examples:
"Hello" = 2 tokens
"How are you?" = 4 tokens
"I'm looking for a red jacket in size M" = 10 tokens
Online counting tool: OpenAI Tokenizer
📊 Pricing Overview (as of January 2025)
gpt-4o-mini (Recommended for most cases)
| Token Type | Price per 1 Million Tokens | Price per 1,000 Tokens |
|---|---|---|
| Input Tokens | $0.15 | $0.00015 |
| Output Tokens | $0.60 | $0.00060 |
| Cached Input Tokens | $0.075 | $0.000075 |
gpt-4o (For complex tasks)
| Token Type | Price per 1 Million Tokens | Price per 1,000 Tokens |
|---|---|---|
| Input Tokens | $2.50 | $0.0025 |
| Output Tokens | $10.00 | $0.0100 |
| Cached Input Tokens | $1.25 | $0.00125 |
gpt-5 (Newest model, very expensive)
| Token Type | Price per 1 Million Tokens | Price per 1,000 Tokens |
|---|---|---|
| Input Tokens | $10.00 | $0.010 |
| Output Tokens | $30.00 | $0.030 |
| Cached Input Tokens | $5.00 | $0.005 |
o4-mini (With reasoning, variable)
| Token Type | Price per 1 Million Tokens | Price per 1,000 Tokens |
|---|---|---|
| Input Tokens | $3.00 | $0.003 |
| Reasoning Tokens | $12.00 | $0.012 |
| Output Tokens | $12.00 | $0.012 |
| Cached Input Tokens | $1.50 | $0.0015 |
gpt-4o-mini offers the best price-performance ratio for 90% of e-commerce applications!
🧮 Understanding Cost Calculations
What is Billed?
Each API request consists of:
1. Input Tokens (Input)
Everything sent to OpenAI:
- ✅ User message (e.g., "Show me red jackets")
- ✅ System Instructions (your agent instructions)
- ✅ Tool descriptions (all activated tools)
- ✅ Conversation history (previous messages in thread)
- ✅ Init Instructions (greeting)
- ✅ Fallback Instructions (error handling)
Example calculation:
System Instructions: 800 tokens
Tool Descriptions: 1,200 tokens (20 tools)
Conversation History: 500 tokens
User Message: 15 tokens
─────────────────────────────────────
Total Input: 2,515 tokens
Cost (gpt-4o-mini):
2,515 tokens × $0.00000015 = $0.000377
2. Output Tokens (Output)
Everything OpenAI returns:
- ✅ Agent responses
- ✅ Tool calls (JSON structure)
Example:
Agent Response: 120 tokens
Tool Call (JSON): 45 tokens
─────────────────────────────────────
Total Output: 165 tokens
Cost (gpt-4o-mini):
165 tokens × $0.00000060 = $0.000099
3. Cached Tokens (Cached)
OpenAI caches frequently used input parts:
- ✅ System Instructions (usually stay the same)
- ✅ Tool descriptions (rarely change)
Caching savings:
Without cache:
Input: 2,515 tokens × $0.00000015 = $0.000377
With cache (1,500 tokens cached):
New Input: 1,015 tokens × $0.00000015 = $0.000152
Cached Input: 1,500 tokens × $0.00000008 = $0.000120
────────────────────────────────────────────────────
Total: $0.000272
Savings: 28% lower costs!
OpenAI automatically caches prompts longer than 1,024 tokens that repeat. You don't need to do anything – it works automatically!
📈 Example Cost Calculation
Scenario: Product Advisor Agent
Setup:
- Model:
gpt-4o-mini - System Instructions: 800 tokens
- 20 tools activated: 1,200 tokens
- Average conversation: 3 messages
Conversation:
Customer: "I'm looking for a winter jacket"
→ Input: 2,015 tokens (Instructions + Tools + Message)
→ Output: 180 tokens (Tool call + Answer)
Customer: "Do you have it in blue?"
→ Input: 2,520 tokens (+ History)
→ Output: 120 tokens
Customer: "Perfect, I'll take size M"
→ Input: 2,650 tokens (+ History)
→ Output: 95 tokens
Total Token Usage:
| Type | Tokens | Price |
|---|---|---|
| Input (new) | 3,200 | $0.00048 |
| Input (cached) | 3,985 | $0.00030 |
| Output | 395 | $0.00024 |
| Total | 7,580 | $0.00102 |
Per conversation: approx. 0.1 cents ($0.001)
Projection:
- Per day (100 conversations): $0.10
- Per month (3,000 conversations): $3.06
- Per year (36,000 conversations): $36.72
With gpt-4o-mini, you can run thousands of conversations for just a few dollars per month!
📉 Optimize Costs - Top 10 Tips
1. Use gpt-4o-mini instead of gpt-4o
Savings: 90-95%
gpt-4o: $0.015 per 1,000 tokens
gpt-4o-mini: $0.0004 per 1,000 tokens
────────────────────────────────────────
Savings: 37.5x cheaper!
When to still use gpt-4o?
- Very complex reasoning tasks
- Multilingual, sophisticated conversations
- Specialized knowledge required
Test:
Try gpt-4o-mini first. In 90% of cases, it's completely sufficient!
2. Deactivate Unneeded Tools
Problem: Each tool increases input tokens
Example:
5 tools: ~300 tokens
10 tools: ~600 tokens
20 tools: ~1,200 tokens
30 tools: ~1,800 tokens
Solution:
Activate only tools your agent really needs.
Product advisor needs:
- ✅
product_search - ✅
get_product_details - ✅
search_logs - ❌ NOT:
get_order_status,create_order,send_email
Savings: 30-50% fewer input tokens
3. Shorten Instructions
Problem: Long instructions = high input costs
❌ Bad (1,200 tokens):
You are a friendly, helpful product advisor for our online store.
Your task is to support customers in product selection, answer questions,
give recommendations and ensure that every customer finds the perfect
product. You should always be polite, patient and understanding.
Use the tools available to you to...
[another 800 tokens]
✅ Good (300 tokens):
You are a product advisor for fashion. Tasks:
- Search products with product_search
- Get details with get_product_details
- Use search_logs first for common questions
- Short, precise answers
- When unclear: ask
Tone: Friendly, helpful, professional
Savings: 75% fewer tokens
Ensure instructions remain clear and precise!
4. Use the Log System
Why does this save costs?
Without logs:
Customer: "How long does shipping take?"
→ Agent calls get_shipping_info
→ Input: 2,500 tokens, Output: 200 tokens
→ Cost: $0.00052
With logs:
Customer: "How long does shipping take?"
→ Agent finds answer in search_logs
→ Input: 1,800 tokens, Output: 120 tokens
→ Cost: $0.00034
Savings: 35% per request
With frequent questions: 50-70% savings!
Setup:
- Activate
search_logstool - Create FAQ entries (see Knowledge Management)
- In instructions: "ALWAYS use search_logs first"
5. Limit Max Output Tokens
Problem: Long answers = high output costs
Solution:
In agent configuration:
- Max Output Tokens: 500 (instead of 4,000)
Example:
Without limit:
Agent writes 2,000 token long answer
→ Cost: $0.0012
With limit (500):
Agent writes maximum 500 tokens
→ Cost: $0.0003
Savings: 75%
For product advice, 300-500 tokens are usually completely sufficient!
6. Use Lower Temperature
What is Temperature?
Creativity setting:
- 0.1-0.5: Consistent, predictable, more efficient
- 0.6-1.0: Balanced
- 1.1-2.0: Creative, but more tokens
Cost effect:
Low temperature (0.3):
Answer: "Yes, we have the jacket in size M in stock."
→ 11 tokens
High temperature (1.5):
Answer: "Gladly! I'm pleased to inform you that we indeed have this wonderful jacket in your desired size M in stock!"
→ 22 tokens
Savings: 40-60% fewer output tokens
Recommendation: Temperature 0.3-0.7 for e-commerce
7. Avoid High Reasoning Effort
Problem: high Reasoning Effort is expensive
Example (o4-mini):
Low Reasoning: 500 reasoning tokens
Medium Reasoning: 1,500 reasoning tokens
High Reasoning: 5,000 reasoning tokens
Cost (o4-mini, $0.012 per 1k):
Low: $0.006
Medium: $0.018
High: $0.060
Savings: 90% by switching from high → low
Recommendation:
- Standard:
lowormedium - Only for very complex tasks:
high
8. End Threads Regularly
Problem: Long threads = large conversation history
Example:
Message 1: 2,500 input tokens
Message 2: 2,700 input tokens (+ History)
Message 3: 2,950 input tokens (+ History)
...
Message 20: 8,000 input tokens (+ History)
Solution:
End threads after:
- Completion of a purchase
- Resolution of a request
- 10-15 messages
Frontend integration:
// End thread after successful order
if (orderCompleted) {
createNewThread();
}
Savings: 40-60% for long conversations
9. Avoid Unnecessary Tool Calls
Problem: Tool calls increase output tokens
Example:
Bad:
Agent calls:
1. product_search (all red jackets)
2. product_search (all blue jackets)
3. product_search (all green jackets)
→ 3 tool calls = 180 output tokens
Better:
Agent calls:
1. product_search (all jackets)
→ 1 tool call = 60 output tokens
Solution:
In instructions:
Use tools efficiently. Call product_search only once and
then use the results. Avoid multiple calls for
similar requests.
Savings: 50-70% fewer tool calls
10. Set Budget Limits
Why?
Protect yourself from:
- Unexpected costs
- Misuse (if API key compromised)
- Bugs (e.g., infinite loops)
Setup:
- Go to OpenAI Billing Settings
- Set Hard Limit (e.g., $10/month)
- Set Soft Limit (e.g., $5/month → email notification)
Example:
Soft Limit: $5 → You receive email warning
Hard Limit: $10 → API is deactivated
Recommendation:
- Small shop: $5-10/month
- Medium shop: $20-50/month
- Large shop: $100-200/month
📊 Monitor Costs
OpenAI Usage Dashboard
Access: platform.openai.com/usage
What you see:
Daily view
- Costs per day
- Requests per day
- Tokens per day
Model breakdown
- Which model is used how much?
- Which model costs the most?
Token details
- Input tokens
- Output tokens
- Cached tokens
Cost history
- Chart of last 30 days
- Identify trends
Shopware Plugin Logs
Access: 5E OAI Agent Manager → Assistant Logs
What you see:
- Cost per conversation
- Token usage per message
- Average costs
Analysis:
- Filter by time period (e.g., last 7 days)
- Sort by "Cost" (most expensive first)
- Identify outliers
Questions:
- Which conversations were particularly expensive?
- Why? (too many tools? long conversation? wrong model?)
- How can you optimize?
💡 Cost Scenarios
Scenario 1: Small Shop (50 conversations/day)
Setup:
- Model:
gpt-4o-mini - Tools: 10 tools
- Average: 2,000 input, 300 output tokens
- 50% cached
Calculation:
Per conversation:
Input (new): 1,000 × $0.00000015 = $0.00015
Input (cached): 1,000 × $0.00000008 = $0.00008
Output: 300 × $0.00000060 = $0.00018
────────────────────────────────────────────────
Total: $0.00041
Projection:
Per day: 50 × $0.00041 = $0.02
Per month: 1,500 conversations = $0.62
Per year: 18,000 conversations = $7.44
Budget recommendation: $5/month (enough buffer)
Scenario 2: Medium Shop (300 conversations/day)
Setup:
- Model:
gpt-4o-mini - Tools: 15 tools
- Average: 2,500 input, 400 output tokens
- 60% cached (thanks to knowledge management)
Calculation:
Per conversation:
Input (new): 1,000 × $0.00000015 = $0.00015
Input (cached): 1,500 × $0.00000008 = $0.00012
Output: 400 × $0.00000060 = $0.00024
────────────────────────────────────────────────
Total: $0.00051
Projection:
Per day: 300 × $0.00051 = $0.15
Per month: 9,000 conversations = $4.59
Per year: 108,000 conversations = $55.08
Budget recommendation: $10-20/month
Scenario 3: Large Shop (1,000 conversations/day)
Setup:
- Model:
gpt-4o-mini(90%),gpt-4o(10% for complex cases) - Tools: 20 tools
- Average: 3,000 input, 500 output tokens
- 70% cached
Calculation gpt-4o-mini (90%):
Per conversation:
Input (new): 900 × $0.00000015 = $0.00014
Input (cached): 2,100 × $0.00000008 = $0.00017
Output: 500 × $0.00000060 = $0.00030
────────────────────────────────────────────────
Total: $0.00061
Calculation gpt-4o (10%):
Per conversation:
Input (new): 900 × $0.0000025 = $0.00225
Input (cached): 2,100 × $0.0000013 = $0.00273
Output: 500 × $0.0000100 = $0.00500
────────────────────────────────────────────────
Total: $0.00998
Total:
90% gpt-4o-mini: 900 × $0.00061 = $0.55
10% gpt-4o: 100 × $0.00998 = $1.00
────────────────────────────────────────────
Per day: $1.55
Projection:
Per month: 30,000 conversations = $46.50
Per year: 360,000 conversations = $558
Budget recommendation: $100-150/month (enough buffer for peaks)
🎯 ROI (Return on Investment)
Cost vs. Benefit
Cost example (medium shop):
- OpenAI: $5/month
- Plugin: one-time (no recurring costs)
- Total: $5/month
Benefits:
Customer service savings:
- Assumption: Agent answers 50% of all inquiries
- 150 inquiries/day × 50% = 75 automated inquiries
- Time per inquiry: 5 minutes
- Savings: 375 minutes/day = 6.25 hours
- Employee cost: $20/hour
- Savings: $125/day = $3,750/month
Higher conversion:
- Customers receive immediate help (24/7)
- Assumption: 2% higher conversion through better advice
- At 10,000 visitors/month, 2% conversion, $80 AOV:
- Additional conversions: 10,000 × 0.02 × 0.02 = 4
- Additional revenue: 4 × $80 = $320/month
Total benefit:
- Customer service savings: $3,750
- Additional revenue: $320
- Total: $4,070/month
ROI:
(Benefit - Cost) / Cost × 100
= (4,070 - 5) / 5 × 100
= 81,300%
OpenAI costs are negligible compared to the benefits!
📋 Cost Optimization Checklist
- Model: Use
gpt-4o-miniinstead ofgpt-4o - Tools: Only needed tools activated (max. 10-15)
- Instructions: Short and precise (< 500 tokens)
- Logs:
search_logsactivated, FAQ entries created - Output: Max output tokens limited to 500
- Temperature: Set to 0.3-0.7
- Reasoning:
lowormedium(nothigh) - Threads: End regularly (after 10-15 messages)
- Tool calls: Describe efficient use in instructions
- Budget limit: Hard limit set in OpenAI
- Monitoring: Check OpenAI usage dashboard weekly
- Analysis: Identify most expensive conversations monthly
🆘 Troubleshooting Costs
Costs are Unexpectedly High
Diagnosis:
Check OpenAI Usage Dashboard:
- Which model is causing the costs?
- Are there spikes on certain days?
Check Shopware logs:
- Sort by "Cost"
- Which conversations were expensive?
Common causes:
Cause 1: Wrong Model
Problem: Accidentally gpt-4o instead of gpt-4o-mini
Solution: Change model in agent configuration
Savings: 90%
Cause 2: Too Many Tools
Problem: 30 tools activated
Solution: Reduce to 10-12
Savings: 40%
Cause 3: Very Long Instructions
Problem: 2,000 token instructions
Solution: Shorten to 300-500 tokens
Savings: 60%
Cause 4: No Caching Usage
Problem: 0% cached tokens
Cause: Instructions constantly changing
Solution: Stabilize instructions
Cause 5: Infinite Loops
Problem: Agent calls itself
Solution: Formulate instructions more clearly, prevent loops
Next Steps
You now know how to monitor and optimize costs!
➡️ Best Practices - More optimization tips
➡️ Troubleshooting - Solve common problems
➡️ Back to Main Documentation