Skip to main content

Cost Management - Monitor & Optimize

In this chapter, you'll learn how to understand, monitor and optimize costs for OpenAI usage.


💰 How Does Billing Work?

Pay-as-you-go Model

OpenAI bills on a Pay-as-you-go principle:

  • You only pay for what you actually use
  • No base fee, no subscription costs
  • Billing is per token

What are Tokens?

Tokens are the smallest units that OpenAI processes:

  • 1 Token ≈ 4 characters
  • 1 Token ≈ 0.75 words (in English)

Examples:

"Hello" = 2 tokens
"How are you?" = 4 tokens
"I'm looking for a red jacket in size M" = 10 tokens

Online counting tool: OpenAI Tokenizer


📊 Pricing Overview (as of January 2025)

Token TypePrice per 1 Million TokensPrice per 1,000 Tokens
Input Tokens$0.15$0.00015
Output Tokens$0.60$0.00060
Cached Input Tokens$0.075$0.000075

gpt-4o (For complex tasks)

Token TypePrice per 1 Million TokensPrice per 1,000 Tokens
Input Tokens$2.50$0.0025
Output Tokens$10.00$0.0100
Cached Input Tokens$1.25$0.00125

gpt-5 (Newest model, very expensive)

Token TypePrice per 1 Million TokensPrice per 1,000 Tokens
Input Tokens$10.00$0.010
Output Tokens$30.00$0.030
Cached Input Tokens$5.00$0.005

o4-mini (With reasoning, variable)

Token TypePrice per 1 Million TokensPrice per 1,000 Tokens
Input Tokens$3.00$0.003
Reasoning Tokens$12.00$0.012
Output Tokens$12.00$0.012
Cached Input Tokens$1.50$0.0015
Tip

gpt-4o-mini offers the best price-performance ratio for 90% of e-commerce applications!


🧮 Understanding Cost Calculations

What is Billed?

Each API request consists of:

1. Input Tokens (Input)

Everything sent to OpenAI:

  • ✅ User message (e.g., "Show me red jackets")
  • ✅ System Instructions (your agent instructions)
  • ✅ Tool descriptions (all activated tools)
  • ✅ Conversation history (previous messages in thread)
  • ✅ Init Instructions (greeting)
  • ✅ Fallback Instructions (error handling)

Example calculation:

System Instructions:     800 tokens
Tool Descriptions: 1,200 tokens (20 tools)
Conversation History: 500 tokens
User Message: 15 tokens
─────────────────────────────────────
Total Input: 2,515 tokens

Cost (gpt-4o-mini):

2,515 tokens × $0.00000015 = $0.000377

2. Output Tokens (Output)

Everything OpenAI returns:

  • ✅ Agent responses
  • ✅ Tool calls (JSON structure)

Example:

Agent Response:          120 tokens
Tool Call (JSON): 45 tokens
─────────────────────────────────────
Total Output: 165 tokens

Cost (gpt-4o-mini):

165 tokens × $0.00000060 = $0.000099

3. Cached Tokens (Cached)

OpenAI caches frequently used input parts:

  • ✅ System Instructions (usually stay the same)
  • ✅ Tool descriptions (rarely change)

Caching savings:

Without cache:

Input: 2,515 tokens × $0.00000015 = $0.000377

With cache (1,500 tokens cached):

New Input:     1,015 tokens × $0.00000015 = $0.000152
Cached Input: 1,500 tokens × $0.00000008 = $0.000120
────────────────────────────────────────────────────
Total: $0.000272

Savings: 28% lower costs!

Prompt Caching

OpenAI automatically caches prompts longer than 1,024 tokens that repeat. You don't need to do anything – it works automatically!


📈 Example Cost Calculation

Scenario: Product Advisor Agent

Setup:

  • Model: gpt-4o-mini
  • System Instructions: 800 tokens
  • 20 tools activated: 1,200 tokens
  • Average conversation: 3 messages

Conversation:

Customer: "I'm looking for a winter jacket"
→ Input: 2,015 tokens (Instructions + Tools + Message)
→ Output: 180 tokens (Tool call + Answer)

Customer: "Do you have it in blue?"
→ Input: 2,520 tokens (+ History)
→ Output: 120 tokens

Customer: "Perfect, I'll take size M"
→ Input: 2,650 tokens (+ History)
→ Output: 95 tokens

Total Token Usage:

TypeTokensPrice
Input (new)3,200$0.00048
Input (cached)3,985$0.00030
Output395$0.00024
Total7,580$0.00102

Per conversation: approx. 0.1 cents ($0.001)

Projection:

  • Per day (100 conversations): $0.10
  • Per month (3,000 conversations): $3.06
  • Per year (36,000 conversations): $36.72
Conclusion

With gpt-4o-mini, you can run thousands of conversations for just a few dollars per month!


📉 Optimize Costs - Top 10 Tips

1. Use gpt-4o-mini instead of gpt-4o

Savings: 90-95%

gpt-4o:        $0.015 per 1,000 tokens
gpt-4o-mini: $0.0004 per 1,000 tokens
────────────────────────────────────────
Savings: 37.5x cheaper!

When to still use gpt-4o?

  • Very complex reasoning tasks
  • Multilingual, sophisticated conversations
  • Specialized knowledge required

Test: Try gpt-4o-mini first. In 90% of cases, it's completely sufficient!


2. Deactivate Unneeded Tools

Problem: Each tool increases input tokens

Example:

5 tools:   ~300 tokens
10 tools: ~600 tokens
20 tools: ~1,200 tokens
30 tools: ~1,800 tokens

Solution:

Activate only tools your agent really needs.

Product advisor needs:

  • product_search
  • get_product_details
  • search_logs
  • ❌ NOT: get_order_status, create_order, send_email

Savings: 30-50% fewer input tokens


3. Shorten Instructions

Problem: Long instructions = high input costs

Bad (1,200 tokens):

You are a friendly, helpful product advisor for our online store.
Your task is to support customers in product selection, answer questions,
give recommendations and ensure that every customer finds the perfect
product. You should always be polite, patient and understanding.
Use the tools available to you to...
[another 800 tokens]

Good (300 tokens):

You are a product advisor for fashion. Tasks:
- Search products with product_search
- Get details with get_product_details
- Use search_logs first for common questions
- Short, precise answers
- When unclear: ask

Tone: Friendly, helpful, professional

Savings: 75% fewer tokens

Caution

Ensure instructions remain clear and precise!


4. Use the Log System

Why does this save costs?

Without logs:

Customer: "How long does shipping take?"
→ Agent calls get_shipping_info
→ Input: 2,500 tokens, Output: 200 tokens
→ Cost: $0.00052

With logs:

Customer: "How long does shipping take?"
→ Agent finds answer in search_logs
→ Input: 1,800 tokens, Output: 120 tokens
→ Cost: $0.00034

Savings: 35% per request

With frequent questions: 50-70% savings!

Setup:

  1. Activate search_logs tool
  2. Create FAQ entries (see Knowledge Management)
  3. In instructions: "ALWAYS use search_logs first"

5. Limit Max Output Tokens

Problem: Long answers = high output costs

Solution:

In agent configuration:

  • Max Output Tokens: 500 (instead of 4,000)

Example:

Without limit:

Agent writes 2,000 token long answer
→ Cost: $0.0012

With limit (500):

Agent writes maximum 500 tokens
→ Cost: $0.0003

Savings: 75%

Tip

For product advice, 300-500 tokens are usually completely sufficient!


6. Use Lower Temperature

What is Temperature?

Creativity setting:

  • 0.1-0.5: Consistent, predictable, more efficient
  • 0.6-1.0: Balanced
  • 1.1-2.0: Creative, but more tokens

Cost effect:

Low temperature (0.3):

Answer: "Yes, we have the jacket in size M in stock."
→ 11 tokens

High temperature (1.5):

Answer: "Gladly! I'm pleased to inform you that we indeed have this wonderful jacket in your desired size M in stock!"
→ 22 tokens

Savings: 40-60% fewer output tokens

Recommendation: Temperature 0.3-0.7 for e-commerce


7. Avoid High Reasoning Effort

Problem: high Reasoning Effort is expensive

Example (o4-mini):

Low Reasoning:     500 reasoning tokens
Medium Reasoning: 1,500 reasoning tokens
High Reasoning: 5,000 reasoning tokens

Cost (o4-mini, $0.012 per 1k):

Low:    $0.006
Medium: $0.018
High: $0.060

Savings: 90% by switching from high → low

Recommendation:

  • Standard: low or medium
  • Only for very complex tasks: high

8. End Threads Regularly

Problem: Long threads = large conversation history

Example:

Message 1: 2,500 input tokens
Message 2: 2,700 input tokens (+ History)
Message 3: 2,950 input tokens (+ History)
...
Message 20: 8,000 input tokens (+ History)

Solution:

End threads after:

  • Completion of a purchase
  • Resolution of a request
  • 10-15 messages

Frontend integration:

// End thread after successful order
if (orderCompleted) {
createNewThread();
}

Savings: 40-60% for long conversations


9. Avoid Unnecessary Tool Calls

Problem: Tool calls increase output tokens

Example:

Bad:

Agent calls:
1. product_search (all red jackets)
2. product_search (all blue jackets)
3. product_search (all green jackets)
→ 3 tool calls = 180 output tokens

Better:

Agent calls:
1. product_search (all jackets)
→ 1 tool call = 60 output tokens

Solution:

In instructions:

Use tools efficiently. Call product_search only once and
then use the results. Avoid multiple calls for
similar requests.

Savings: 50-70% fewer tool calls


10. Set Budget Limits

Why?

Protect yourself from:

  • Unexpected costs
  • Misuse (if API key compromised)
  • Bugs (e.g., infinite loops)

Setup:

  1. Go to OpenAI Billing Settings
  2. Set Hard Limit (e.g., $10/month)
  3. Set Soft Limit (e.g., $5/month → email notification)

Example:

Soft Limit:  $5 → You receive email warning
Hard Limit: $10 → API is deactivated

Recommendation:

  • Small shop: $5-10/month
  • Medium shop: $20-50/month
  • Large shop: $100-200/month

📊 Monitor Costs

OpenAI Usage Dashboard

Access: platform.openai.com/usage

What you see:

  1. Daily view

    • Costs per day
    • Requests per day
    • Tokens per day
  2. Model breakdown

    • Which model is used how much?
    • Which model costs the most?
  3. Token details

    • Input tokens
    • Output tokens
    • Cached tokens
  4. Cost history

    • Chart of last 30 days
    • Identify trends

Shopware Plugin Logs

Access: 5E OAI Agent Manager → Assistant Logs

What you see:

  • Cost per conversation
  • Token usage per message
  • Average costs

Analysis:

  1. Filter by time period (e.g., last 7 days)
  2. Sort by "Cost" (most expensive first)
  3. Identify outliers

Questions:

  • Which conversations were particularly expensive?
  • Why? (too many tools? long conversation? wrong model?)
  • How can you optimize?

💡 Cost Scenarios

Scenario 1: Small Shop (50 conversations/day)

Setup:

  • Model: gpt-4o-mini
  • Tools: 10 tools
  • Average: 2,000 input, 300 output tokens
  • 50% cached

Calculation:

Per conversation:
Input (new): 1,000 × $0.00000015 = $0.00015
Input (cached): 1,000 × $0.00000008 = $0.00008
Output: 300 × $0.00000060 = $0.00018
────────────────────────────────────────────────
Total: $0.00041

Projection:

Per day:   50 × $0.00041 = $0.02
Per month: 1,500 conversations = $0.62
Per year: 18,000 conversations = $7.44

Budget recommendation: $5/month (enough buffer)


Scenario 2: Medium Shop (300 conversations/day)

Setup:

  • Model: gpt-4o-mini
  • Tools: 15 tools
  • Average: 2,500 input, 400 output tokens
  • 60% cached (thanks to knowledge management)

Calculation:

Per conversation:
Input (new): 1,000 × $0.00000015 = $0.00015
Input (cached): 1,500 × $0.00000008 = $0.00012
Output: 400 × $0.00000060 = $0.00024
────────────────────────────────────────────────
Total: $0.00051

Projection:

Per day:   300 × $0.00051 = $0.15
Per month: 9,000 conversations = $4.59
Per year: 108,000 conversations = $55.08

Budget recommendation: $10-20/month


Scenario 3: Large Shop (1,000 conversations/day)

Setup:

  • Model: gpt-4o-mini (90%), gpt-4o (10% for complex cases)
  • Tools: 20 tools
  • Average: 3,000 input, 500 output tokens
  • 70% cached

Calculation gpt-4o-mini (90%):

Per conversation:
Input (new): 900 × $0.00000015 = $0.00014
Input (cached): 2,100 × $0.00000008 = $0.00017
Output: 500 × $0.00000060 = $0.00030
────────────────────────────────────────────────
Total: $0.00061

Calculation gpt-4o (10%):

Per conversation:
Input (new): 900 × $0.0000025 = $0.00225
Input (cached): 2,100 × $0.0000013 = $0.00273
Output: 500 × $0.0000100 = $0.00500
────────────────────────────────────────────────
Total: $0.00998

Total:

90% gpt-4o-mini: 900 × $0.00061 = $0.55
10% gpt-4o: 100 × $0.00998 = $1.00
────────────────────────────────────────────
Per day: $1.55

Projection:

Per month: 30,000 conversations = $46.50
Per year: 360,000 conversations = $558

Budget recommendation: $100-150/month (enough buffer for peaks)


🎯 ROI (Return on Investment)

Cost vs. Benefit

Cost example (medium shop):

  • OpenAI: $5/month
  • Plugin: one-time (no recurring costs)
  • Total: $5/month

Benefits:

  1. Customer service savings:

    • Assumption: Agent answers 50% of all inquiries
    • 150 inquiries/day × 50% = 75 automated inquiries
    • Time per inquiry: 5 minutes
    • Savings: 375 minutes/day = 6.25 hours
    • Employee cost: $20/hour
    • Savings: $125/day = $3,750/month
  2. Higher conversion:

    • Customers receive immediate help (24/7)
    • Assumption: 2% higher conversion through better advice
    • At 10,000 visitors/month, 2% conversion, $80 AOV:
    • Additional conversions: 10,000 × 0.02 × 0.02 = 4
    • Additional revenue: 4 × $80 = $320/month
  3. Total benefit:

    • Customer service savings: $3,750
    • Additional revenue: $320
    • Total: $4,070/month

ROI:

(Benefit - Cost) / Cost × 100
= (4,070 - 5) / 5 × 100
= 81,300%
Conclusion

OpenAI costs are negligible compared to the benefits!


📋 Cost Optimization Checklist

  • Model: Use gpt-4o-mini instead of gpt-4o
  • Tools: Only needed tools activated (max. 10-15)
  • Instructions: Short and precise (< 500 tokens)
  • Logs: search_logs activated, FAQ entries created
  • Output: Max output tokens limited to 500
  • Temperature: Set to 0.3-0.7
  • Reasoning: low or medium (not high)
  • Threads: End regularly (after 10-15 messages)
  • Tool calls: Describe efficient use in instructions
  • Budget limit: Hard limit set in OpenAI
  • Monitoring: Check OpenAI usage dashboard weekly
  • Analysis: Identify most expensive conversations monthly

🆘 Troubleshooting Costs

Costs are Unexpectedly High

Diagnosis:

  1. Check OpenAI Usage Dashboard:

    • Which model is causing the costs?
    • Are there spikes on certain days?
  2. Check Shopware logs:

    • Sort by "Cost"
    • Which conversations were expensive?

Common causes:

Cause 1: Wrong Model

Problem: Accidentally gpt-4o instead of gpt-4o-mini
Solution: Change model in agent configuration
Savings: 90%

Cause 2: Too Many Tools

Problem: 30 tools activated
Solution: Reduce to 10-12
Savings: 40%

Cause 3: Very Long Instructions

Problem: 2,000 token instructions
Solution: Shorten to 300-500 tokens
Savings: 60%

Cause 4: No Caching Usage

Problem: 0% cached tokens
Cause: Instructions constantly changing
Solution: Stabilize instructions

Cause 5: Infinite Loops

Problem: Agent calls itself
Solution: Formulate instructions more clearly, prevent loops

Next Steps

You now know how to monitor and optimize costs!

➡️ Best Practices - More optimization tips

➡️ Troubleshooting - Solve common problems

➡️ Back to Main Documentation