Cost Management - Monitor & Optimize

In this chapter, you'll learn how to understand, monitor and optimize costs for OpenAI usage.

💰 How Does Billing Work?

Pay-as-you-go Model

OpenAI bills on a Pay-as-you-go principle:

You only pay for what you actually use
No base fee, no subscription costs
Billing is per token

What are Tokens?

Tokens are the smallest units that OpenAI processes:

1 Token ≈ 4 characters
1 Token ≈ 0.75 words (in English)

Examples:

"Hello" = 2 tokens
"How are you?" = 4 tokens
"I'm looking for a red jacket in size M" = 10 tokens

Online counting tool: OpenAI Tokenizer

📊 Pricing Overview (as of January 2025)

gpt-4o-mini (Recommended for most cases)

Token Type	Price per 1 Million Tokens	Price per 1,000 Tokens
Input Tokens	$0.15	$0.00015
Output Tokens	$0.60	$0.00060
Cached Input Tokens	$0.075	$0.000075

gpt-4o (For complex tasks)

Token Type	Price per 1 Million Tokens	Price per 1,000 Tokens
Input Tokens	$2.50	$0.0025
Output Tokens	$10.00	$0.0100
Cached Input Tokens	$1.25	$0.00125

gpt-5 (Newest model, very expensive)

Token Type	Price per 1 Million Tokens	Price per 1,000 Tokens
Input Tokens	$10.00	$0.010
Output Tokens	$30.00	$0.030
Cached Input Tokens	$5.00	$0.005

o4-mini (With reasoning, variable)

Token Type	Price per 1 Million Tokens	Price per 1,000 Tokens
Input Tokens	$3.00	$0.003
Reasoning Tokens	$12.00	$0.012
Output Tokens	$12.00	$0.012
Cached Input Tokens	$1.50	$0.0015

Tip

gpt-4o-mini offers the best price-performance ratio for 90% of e-commerce applications!

🧮 Understanding Cost Calculations

What is Billed?

Each API request consists of:

1. Input Tokens (Input)

Everything sent to OpenAI:

✅ User message (e.g., "Show me red jackets")
✅ System Instructions (your agent instructions)
✅ Tool descriptions (all activated tools)
✅ Conversation history (previous messages in thread)
✅ Init Instructions (greeting)
✅ Fallback Instructions (error handling)

Example calculation:

System Instructions:     800 tokens
Tool Descriptions:       1,200 tokens (20 tools)
Conversation History:    500 tokens
User Message:            15 tokens
─────────────────────────────────────
Total Input:             2,515 tokens

Cost (gpt-4o-mini):

2,515 tokens × $0.00000015 = $0.000377

2. Output Tokens (Output)

Everything OpenAI returns:

✅ Agent responses
✅ Tool calls (JSON structure)

Example:

Agent Response:          120 tokens
Tool Call (JSON):        45 tokens
─────────────────────────────────────
Total Output:            165 tokens

Cost (gpt-4o-mini):

165 tokens × $0.00000060 = $0.000099

3. Cached Tokens (Cached)

OpenAI caches frequently used input parts:

✅ System Instructions (usually stay the same)
✅ Tool descriptions (rarely change)

Caching savings:

Without cache:

Input: 2,515 tokens × $0.00000015 = $0.000377

With cache (1,500 tokens cached):

New Input:     1,015 tokens × $0.00000015 = $0.000152
Cached Input:  1,500 tokens × $0.00000008 = $0.000120
────────────────────────────────────────────────────
Total:                                   $0.000272

Savings: 28% lower costs!

Prompt Caching

OpenAI automatically caches prompts longer than 1,024 tokens that repeat. You don't need to do anything – it works automatically!

📈 Example Cost Calculation

Scenario: Product Advisor Agent

Setup:

Model: gpt-4o-mini
System Instructions: 800 tokens
20 tools activated: 1,200 tokens
Average conversation: 3 messages

Conversation:

Customer: "I'm looking for a winter jacket"
→ Input: 2,015 tokens (Instructions + Tools + Message)
→ Output: 180 tokens (Tool call + Answer)

Customer: "Do you have it in blue?"
→ Input: 2,520 tokens (+ History)
→ Output: 120 tokens

Customer: "Perfect, I'll take size M"
→ Input: 2,650 tokens (+ History)
→ Output: 95 tokens

Total Token Usage:

Type	Tokens	Price
Input (new)	3,200	$0.00048
Input (cached)	3,985	$0.00030
Output	395	$0.00024
Total	7,580	$0.00102

Per conversation: approx. 0.1 cents ($0.001)

Projection:

Per day (100 conversations): $0.10
Per month (3,000 conversations): $3.06
Per year (36,000 conversations): $36.72

Conclusion

With gpt-4o-mini, you can run thousands of conversations for just a few dollars per month!

📉 Optimize Costs - Top 10 Tips

1. Use gpt-4o-mini instead of gpt-4o

Savings: 90-95%

gpt-4o:        $0.015 per 1,000 tokens
gpt-4o-mini:   $0.0004 per 1,000 tokens
────────────────────────────────────────
Savings:       37.5x cheaper!

When to still use gpt-4o?

Very complex reasoning tasks
Multilingual, sophisticated conversations
Specialized knowledge required

Test: Try gpt-4o-mini first. In 90% of cases, it's completely sufficient!

2. Deactivate Unneeded Tools

Problem: Each tool increases input tokens

Example:

tools:   ~300 tokens
tools:  ~600 tokens
tools:  ~1,200 tokens
tools:  ~1,800 tokens

Solution:

Activate only tools your agent really needs.

Product advisor needs:

✅ product_search
✅ get_product_details
✅ search_logs
❌ NOT: get_order_status, create_order, send_email

Savings: 30-50% fewer input tokens

3. Shorten Instructions

Problem: Long instructions = high input costs

❌ Bad (1,200 tokens):

You are a friendly, helpful product advisor for our online store.
Your task is to support customers in product selection, answer questions,
give recommendations and ensure that every customer finds the perfect
product. You should always be polite, patient and understanding.
Use the tools available to you to...
[another 800 tokens]

✅ Good (300 tokens):

You are a product advisor for fashion. Tasks:
- Search products with product_search
- Get details with get_product_details
- Use search_logs first for common questions
- Short, precise answers
- When unclear: ask

Tone: Friendly, helpful, professional

Savings: 75% fewer tokens

Caution

Ensure instructions remain clear and precise!

4. Use the Log System

Why does this save costs?

Without logs:

Customer: "How long does shipping take?"
→ Agent calls get_shipping_info
→ Input: 2,500 tokens, Output: 200 tokens
→ Cost: $0.00052

With logs:

Customer: "How long does shipping take?"
→ Agent finds answer in search_logs
→ Input: 1,800 tokens, Output: 120 tokens
→ Cost: $0.00034

Savings: 35% per request

With frequent questions: 50-70% savings!

Setup:

Activate search_logs tool
Create FAQ entries (see Knowledge Management)
In instructions: "ALWAYS use search_logs first"

5. Limit Max Output Tokens

Problem: Long answers = high output costs

Solution:

In agent configuration:

Max Output Tokens: 500 (instead of 4,000)

Example:

Without limit:

Agent writes 2,000 token long answer
→ Cost: $0.0012

With limit (500):

Agent writes maximum 500 tokens
→ Cost: $0.0003

Savings: 75%

Tip

For product advice, 300-500 tokens are usually completely sufficient!

6. Use Lower Temperature

What is Temperature?

Creativity setting:

0.1-0.5: Consistent, predictable, more efficient
0.6-1.0: Balanced
1.1-2.0: Creative, but more tokens

Cost effect:

Low temperature (0.3):

Answer: "Yes, we have the jacket in size M in stock."
→ 11 tokens

High temperature (1.5):

Answer: "Gladly! I'm pleased to inform you that we indeed have this wonderful jacket in your desired size M in stock!"
→ 22 tokens

Savings: 40-60% fewer output tokens

Recommendation: Temperature 0.3-0.7 for e-commerce

7. Avoid High Reasoning Effort

Problem: high Reasoning Effort is expensive

Example (o4-mini):

Low Reasoning:     500 reasoning tokens
Medium Reasoning:  1,500 reasoning tokens
High Reasoning:    5,000 reasoning tokens

Cost (o4-mini, $0.012 per 1k):

Low:    $0.006
Medium: $0.018
High:   $0.060

Savings: 90% by switching from high → low

Recommendation:

Standard: low or medium
Only for very complex tasks: high

8. End Threads Regularly

Problem: Long threads = large conversation history

Example:

Message 1: 2,500 input tokens
Message 2: 2,700 input tokens (+ History)
Message 3: 2,950 input tokens (+ History)
...
Message 20: 8,000 input tokens (+ History)

Solution:

End threads after:

Completion of a purchase
Resolution of a request
10-15 messages

Frontend integration:

// End thread after successful order
if (orderCompleted) {
    createNewThread();
}

Savings: 40-60% for long conversations

9. Avoid Unnecessary Tool Calls

Problem: Tool calls increase output tokens

Example:

Bad:

Agent calls:
1. product_search (all red jackets)
2. product_search (all blue jackets)
3. product_search (all green jackets)
→ 3 tool calls = 180 output tokens

Better:

Agent calls:
1. product_search (all jackets)
→ 1 tool call = 60 output tokens

Solution:

In instructions:

Use tools efficiently. Call product_search only once and
then use the results. Avoid multiple calls for
similar requests.

Savings: 50-70% fewer tool calls

10. Set Budget Limits

Why?

Protect yourself from:

Unexpected costs
Misuse (if API key compromised)
Bugs (e.g., infinite loops)

Setup:

Go to OpenAI Billing Settings
Set Hard Limit (e.g., $10/month)
Set Soft Limit (e.g., $5/month → email notification)

Example:

Soft Limit:  $5 → You receive email warning
Hard Limit: $10 → API is deactivated

Recommendation:

Small shop: $5-10/month
Medium shop: $20-50/month
Large shop: $100-200/month

📊 Monitor Costs

OpenAI Usage Dashboard

Access: platform.openai.com/usage

What you see:

Daily view
- Costs per day
- Requests per day
- Tokens per day
Model breakdown
- Which model is used how much?
- Which model costs the most?
Token details
- Input tokens
- Output tokens
- Cached tokens
Cost history
- Chart of last 30 days
- Identify trends

Shopware Plugin Logs

Access: 5E OAI Agent Manager → Assistant Logs

What you see:

Cost per conversation
Token usage per message
Average costs

Analysis:

Filter by time period (e.g., last 7 days)
Sort by "Cost" (most expensive first)
Identify outliers

Questions:

Which conversations were particularly expensive?
Why? (too many tools? long conversation? wrong model?)
How can you optimize?

💡 Cost Scenarios

Scenario 1: Small Shop (50 conversations/day)

Setup:

Model: gpt-4o-mini
Tools: 10 tools
Average: 2,000 input, 300 output tokens
50% cached

Calculation:

Per conversation:
Input (new):    1,000 × $0.00000015 = $0.00015
Input (cached): 1,000 × $0.00000008 = $0.00008
Output:           300 × $0.00000060 = $0.00018
────────────────────────────────────────────────
Total:                               $0.00041

Projection:

Per day:   50 × $0.00041 = $0.02
Per month: 1,500 conversations = $0.62
Per year:  18,000 conversations = $7.44

Budget recommendation: $5/month (enough buffer)

Scenario 2: Medium Shop (300 conversations/day)

Setup:

Model: gpt-4o-mini
Tools: 15 tools
Average: 2,500 input, 400 output tokens
60% cached (thanks to knowledge management)

Calculation:

Per conversation:
Input (new):    1,000 × $0.00000015 = $0.00015
Input (cached): 1,500 × $0.00000008 = $0.00012
Output:           400 × $0.00000060 = $0.00024
────────────────────────────────────────────────
Total:                               $0.00051

Projection:

Per day:   300 × $0.00051 = $0.15
Per month: 9,000 conversations = $4.59
Per year:  108,000 conversations = $55.08

Budget recommendation: $10-20/month

Scenario 3: Large Shop (1,000 conversations/day)

Setup:

Model: gpt-4o-mini (90%), gpt-4o (10% for complex cases)
Tools: 20 tools
Average: 3,000 input, 500 output tokens
70% cached

Calculation gpt-4o-mini (90%):

Per conversation:
Input (new):      900 × $0.00000015 = $0.00014
Input (cached): 2,100 × $0.00000008 = $0.00017
Output:           500 × $0.00000060 = $0.00030
────────────────────────────────────────────────
Total:                               $0.00061

Calculation gpt-4o (10%):

Per conversation:
Input (new):      900 × $0.0000025 = $0.00225
Input (cached): 2,100 × $0.0000013 = $0.00273
Output:           500 × $0.0000100 = $0.00500
────────────────────────────────────────────────
Total:                              $0.00998

Total:

90% gpt-4o-mini: 900 × $0.00061 = $0.55
10% gpt-4o:      100 × $0.00998 = $1.00
────────────────────────────────────────────
Per day:                           $1.55

Projection:

Per month: 30,000 conversations = $46.50
Per year:  360,000 conversations = $558

Budget recommendation: $100-150/month (enough buffer for peaks)

🎯 ROI (Return on Investment)

Cost vs. Benefit

Cost example (medium shop):

OpenAI: $5/month
Plugin: one-time (no recurring costs)
Total: $5/month

Benefits:

Customer service savings:
- Assumption: Agent answers 50% of all inquiries
- 150 inquiries/day × 50% = 75 automated inquiries
- Time per inquiry: 5 minutes
- Savings: 375 minutes/day = 6.25 hours
- Employee cost: $20/hour
- Savings: $125/day = $3,750/month
Higher conversion:
- Customers receive immediate help (24/7)
- Assumption: 2% higher conversion through better advice
- At 10,000 visitors/month, 2% conversion, $80 AOV:
- Additional conversions: 10,000 × 0.02 × 0.02 = 4
- Additional revenue: 4 × $80 = $320/month
Total benefit:
- Customer service savings: $3,750
- Additional revenue: $320
- Total: $4,070/month

ROI:

(Benefit - Cost) / Cost × 100
= (4,070 - 5) / 5 × 100
= 81,300%

Conclusion

OpenAI costs are negligible compared to the benefits!

📋 Cost Optimization Checklist

🆘 Troubleshooting Costs

Costs are Unexpectedly High

Diagnosis:

Check OpenAI Usage Dashboard:
- Which model is causing the costs?
- Are there spikes on certain days?
Check Shopware logs:
- Sort by "Cost"
- Which conversations were expensive?

Common causes:

Cause 1: Wrong Model

Problem: Accidentally gpt-4o instead of gpt-4o-mini
Solution: Change model in agent configuration
Savings: 90%

Cause 2: Too Many Tools

Problem: 30 tools activated
Solution: Reduce to 10-12
Savings: 40%

Cause 3: Very Long Instructions

Problem: 2,000 token instructions
Solution: Shorten to 300-500 tokens
Savings: 60%

Cause 4: No Caching Usage

Problem: 0% cached tokens
Cause: Instructions constantly changing
Solution: Stabilize instructions

Cause 5: Infinite Loops

Problem: Agent calls itself
Solution: Formulate instructions more clearly, prevent loops

Next Steps

You now know how to monitor and optimize costs!

➡️ Best Practices - More optimization tips

➡️ Troubleshooting - Solve common problems

➡️ Back to Main Documentation

Cost Management - Monitor & Optimize

💰 How Does Billing Work?​

Pay-as-you-go Model​

What are Tokens?​

📊 Pricing Overview (as of January 2025)​

gpt-4o-mini (Recommended for most cases)​

gpt-4o (For complex tasks)​

gpt-5 (Newest model, very expensive)​

o4-mini (With reasoning, variable)​

🧮 Understanding Cost Calculations​

What is Billed?​

1. Input Tokens (Input)​

2. Output Tokens (Output)​

3. Cached Tokens (Cached)​

📈 Example Cost Calculation​

Scenario: Product Advisor Agent​

📉 Optimize Costs - Top 10 Tips​

1. Use gpt-4o-mini instead of gpt-4o​

2. Deactivate Unneeded Tools​

3. Shorten Instructions​

4. Use the Log System​

5. Limit Max Output Tokens​

6. Use Lower Temperature​

7. Avoid High Reasoning Effort​

8. End Threads Regularly​

9. Avoid Unnecessary Tool Calls​

10. Set Budget Limits​

📊 Monitor Costs​

OpenAI Usage Dashboard​

Shopware Plugin Logs​

💡 Cost Scenarios​

Scenario 1: Small Shop (50 conversations/day)​

Scenario 2: Medium Shop (300 conversations/day)​

Scenario 3: Large Shop (1,000 conversations/day)​

🎯 ROI (Return on Investment)​

Cost vs. Benefit​

📋 Cost Optimization Checklist​

🆘 Troubleshooting Costs​

Costs are Unexpectedly High​

Cause 1: Wrong Model​

Cause 2: Too Many Tools​

Cause 3: Very Long Instructions​

Cause 4: No Caching Usage​

Cause 5: Infinite Loops​

Next Steps​

💰 How Does Billing Work?

Pay-as-you-go Model

What are Tokens?

📊 Pricing Overview (as of January 2025)

gpt-4o-mini (Recommended for most cases)

gpt-4o (For complex tasks)

gpt-5 (Newest model, very expensive)

o4-mini (With reasoning, variable)

🧮 Understanding Cost Calculations

What is Billed?

1. Input Tokens (Input)

2. Output Tokens (Output)

3. Cached Tokens (Cached)

📈 Example Cost Calculation

Scenario: Product Advisor Agent

📉 Optimize Costs - Top 10 Tips

1. Use gpt-4o-mini instead of gpt-4o

2. Deactivate Unneeded Tools

3. Shorten Instructions

4. Use the Log System

5. Limit Max Output Tokens

6. Use Lower Temperature

7. Avoid High Reasoning Effort

8. End Threads Regularly

9. Avoid Unnecessary Tool Calls

10. Set Budget Limits

📊 Monitor Costs

OpenAI Usage Dashboard

Shopware Plugin Logs

💡 Cost Scenarios

Scenario 1: Small Shop (50 conversations/day)

Scenario 2: Medium Shop (300 conversations/day)

Scenario 3: Large Shop (1,000 conversations/day)

🎯 ROI (Return on Investment)

Cost vs. Benefit

📋 Cost Optimization Checklist

🆘 Troubleshooting Costs

Costs are Unexpectedly High

Cause 1: Wrong Model

Cause 2: Too Many Tools

Cause 3: Very Long Instructions

Cause 4: No Caching Usage

Cause 5: Infinite Loops

Next Steps