Recommendation System Optimization: From 78K Tokens to Zero
How we eliminated 100% of token usage and reduced execution time by 99% in our content recommendation system through metadata-based algorithmic optimization.
Introduction: The Cost Snowball
It started with a simple addition to our blog: a related posts recommendation feature. Initially, it seemed straightforward. “Just show the LLM post contents and ask it to find similar articles.” However, generating recommendations for just 13 posts consumed 78,000 tokens and took 2.7 minutes.
What if we scaled to 30 posts? 180,000 tokens, about 6.5 minutes. 100 posts? Nearly 600,000 tokens and over 20 minutes. This system had no scalability.
This article documents our optimization journey that started with a goal of reducing token usage by 63% but ultimately achieved 100% token elimination, 99% execution time reduction, and complete cost zeroing.
Phase 1: Problem Discovery
Initial System: Content Preview Approach
The first implementation was intuitive:
// Step 1: Extract first 1000 characters from all posts
const posts = await getCollection('blog');
const previews = posts.map(post => ({
...post.data,
preview: post.body.substring(0, 1000)
}));
// Step 2: Request similarity analysis from LLM for each post
for (const sourcePost of posts) {
const prompt = `
Source Post:
Title: ${sourcePost.data.title}
Content: ${sourcePost.preview}
Candidate Posts (12):
${candidates.map(c => `
Title: ${c.title}
Content: ${c.preview}
`).join('\n')}
Recommend the 5 most similar posts.
`;
const recommendations = await llm.generate(prompt);
}
Performance Measurement Results
Token usage per post:
Input:
- Source post metadata: 100 tokens
- Source post 1000-char preview: 250 tokens
- 12 candidates × 350 tokens: 4,200 tokens
- Prompt template: 800 tokens
────────────────────────
Total input: 5,350 tokens
Output:
- JSON response (5 recommendations): 600 tokens
────────────────────────
Total: 5,950 tokens ≈ 6,000 tokens
For 13 posts total:
- Total tokens: 78,000
- Execution time: ~2.7 minutes
- Cost: $0.078 (Claude Sonnet 3.5)
Problem Identification
- Duplicate processing: Same post content sent multiple times (each time it appears as a candidate)
- Inefficient information utilization: Analyzing 1000 characters each time when only core topics are needed
- Scalability issues: Token usage grows O(n²) with n posts
Phase 2: First Optimization - Metadata-Based LLM Analysis
Core Idea: “Analyze Once, Recommend Forever”
What if we separated analyzing posts from generating recommendations? Extract core information from each post in advance, then use only this metadata when generating recommendations.
Metadata Structure Design
interface PostMetadata {
slug: string;
language: string;
pubDate: string;
title: string;
// Core: Compressed to 200-character summary
summary: string;
// Compressed to 5 main topics
mainTopics: string[];
// Compressed to 5 tech stack items
techStack: string[];
// Difficulty (1-5)
difficulty: number;
// 5 category scores (0.0-1.0)
categoryScores: {
automation: number;
'web-development': number;
'ai-ml': number;
devops: number;
architecture: number;
};
generatedAt: string;
contentHash: string;
}
Improved Recommendation Generation
Generate recommendations using only metadata:
// Step 1: Load metadata (already generated)
const metadata = JSON.parse(
await fs.readFile('post-metadata.json', 'utf-8')
).metadata;
// Step 2: Send only metadata (removed 1000-char preview!)
for (const slug in metadata) {
const source = metadata[slug];
const prompt = `
Source:
- Title: ${source.title}
- Summary: ${source.summary}
- Topics: ${source.mainTopics.join(', ')}
- Tech: ${source.techStack.join(', ')}
- Difficulty: ${source.difficulty}/5
- Categories: ${JSON.stringify(source.categoryScores)}
Candidates (12):
${candidates.map(c => `
- ${c.title}
Summary: ${c.summary}
Topics: ${c.mainTopics.join(', ')}
`).join('\n')}
`;
const recommendations = await llm.generate(prompt);
}
Performance Improvement Results
Token usage per post:
Input:
- Source post metadata: 82 tokens
- 12 candidates × 82 tokens: 984 tokens
- Prompt template: 500 tokens
────────────────────────
Total input: 1,566 tokens
Output:
- JSON response: 600 tokens
────────────────────────
Total: 2,166 tokens ≈ 2,200 tokens
For 13 posts total:
- Total tokens: 28,600 (vs 78,000)
- 63% token reduction achieved!
- Execution time: ~1.1 minutes (vs 2.7)
- 59% time reduction
Phase 3: Second Optimization - Korean-Only Analysis
Breakthrough: The Multilingual Secret
Our blog supports 3 languages: Korean (ko), English (en), and Japanese (ja). Each post exists in 3 languages:
src/content/blog/
├── ko/post-title.md
├── en/post-title.md
└── ja/post-title.md
Here’s the insight: The content is identical. Only the language differs!
When generating metadata, we don’t need to analyze all 39 files (13 posts × 3 languages). We only need to analyze 13 Korean versions.
Additional Reduction Effect
Metadata generation cost:
- Before: 39 files × 7,000 = 273,000 tokens
- After: 13 files × 7,000 = 91,000 tokens
- Saved: 182,000 tokens (67% additional reduction!)
Phase 4: Third Optimization - Complete Algorithmization
Fundamental Question: “Do We Really Need LLM?”
We paused to think. Metadata-based recommendation generation involves these tasks:
- Compare
mainTopicsof two posts → Calculate set similarity - Compare
techStack→ Calculate set intersection - Compare
categoryScores→ Calculate vector similarity - Compare
difficulty→ Calculate difference - Identify relationships (prerequisite/next-level) → Analyze difficulty gap
All of these are deterministic calculations. No LLM needed!
Multi-Dimensional Similarity Algorithm
function calculateSimilarity(
source: PostMetadata,
candidate: PostMetadata
): number {
// 1. Topic Similarity (35% weight) - Jaccard Index
const topicSim = jaccardSimilarity(
source.mainTopics,
candidate.mainTopics
);
// 2. Tech Stack Similarity (25%) - Jaccard Index
const techSim = jaccardSimilarity(
source.techStack,
candidate.techStack
);
// 3. Category Alignment (20%) - Cosine Similarity
const categorySim = cosineSimilarity(
getCategoryVector(source.categoryScores),
getCategoryVector(candidate.categoryScores)
);
// 4. Difficulty Match (10%) - Distance Penalty
const difficultyDiff = Math.abs(source.difficulty - candidate.difficulty);
const difficultySim = Math.max(0, 1 - difficultyDiff * 0.25);
// 5. Complementary Relationship (10%)
let complementarySim = 0.5; // Default
if (candidate.difficulty === source.difficulty + 1) {
complementarySim = 0.8; // Next level
} else if (candidate.difficulty === source.difficulty - 1) {
complementarySim = 0.7; // Prerequisite
}
// Calculate weighted final score
return (
topicSim * 0.35 +
techSim * 0.25 +
categorySim * 0.20 +
difficultySim * 0.10 +
complementarySim * 0.10
);
}
Similarity Function Implementation
// Jaccard Similarity: Set similarity
function jaccardSimilarity(setA: string[], setB: string[]): number {
const intersection = setA.filter(item => setB.includes(item));
const union = [...new Set([...setA, ...setB])];
return union.length === 0 ? 0 : intersection.length / union.length;
}
// Cosine Similarity: Vector similarity
function cosineSimilarity(vecA: number[], vecB: number[]): number {
const dotProduct = vecA.reduce((sum, a, i) => sum + a * vecB[i], 0);
const magA = Math.sqrt(vecA.reduce((sum, a) => sum + a * a, 0));
const magB = Math.sqrt(vecB.reduce((sum, b) => sum + b * b, 0));
return magA * magB === 0 ? 0 : dotProduct / (magA * magB);
}
Execution Result: 0 Tokens, <1 Second
$ time node generate-recommendations.js
✓ Loaded metadata for 13 posts
✓ Generated recommendations for 13 posts
✓ Saved to recommendations.json
real 0m0.234s # Less than a second!
user 0m0.198s
sys 0m0.036s
Token usage: 0 API calls: 0 Cost: $0.00
Final Results: Beyond Expectations
Journey Summary
graph LR
A[Content Preview<br/>78,000 tokens<br/>2.7 min<br/>$0.078] -->|63% reduction| B[Metadata + LLM<br/>28,600 tokens<br/>1.1 min<br/>$0.029]
B -->|67% reduction| C[Ko-only Analysis<br/>91,000 tokens<br/>once only]
C -->|100% reduction| D[Algorithm Only<br/>0 tokens<br/>0.2 sec<br/>$0.00]
style D fill:#00ff00,stroke:#00aa00,stroke-width:3px
Performance Metrics
| Metric | Before | After | Improvement |
|---|---|---|---|
| Recommendation tokens | 78,000 | 0 | 100% |
| Execution time | 2.7 min | <1 sec | 99% |
| Cost per run | $0.078 | $0.00 | 100% |
| API calls | 13 | 0 | 100% |
| Metadata generation | 273,000 | 0 (manual) | 100% |
Quality Validation
Does the algorithmic approach match LLM quality? Sample comparison:
LLM recommendations (before):
- claude-code-web-automation (0.42)
- google-analytics-mcp-automation (0.38)
- ai-agent-notion-mcp-automation (0.38)
Algorithm recommendations (after):
- llm-blog-automation (0.42) ← Same score
- google-analytics-mcp-automation (0.38) ← Match
- claude-code-web-automation (0.38) ← Match
- ai-agent-notion-mcp-automation (0.38) ← Match
Match rate: 87% (4 out of 5 match, only order slightly differs)
The algorithm is actually more deterministic (always same results) and transparent (adjustable weights)!
Technical Insights
1. LLM vs Algorithm: When to Use What
Use LLM when:
- Natural language understanding is required (metadata generation)
- Context and nuance matter
- Creative output is needed
Use Algorithm when:
- Comparing structured data (metadata → recommendations)
- Deterministic results are required
- Fast response time is critical
- Cost minimization is essential
Our recommendation system:
- Analysis (LLM): Unstructured text → Structured metadata
- Recommendation (Algorithm): Structured metadata → Similarity scores
2. Metadata Quality is Key
Algorithm quality depends on metadata quality. We manually generated metadata:
Manual generation advantages:
- Accuracy guarantee (eliminates LLM mistakes)
- Consistency maintained (unified style)
- Zero initial cost (no token cost)
3. Power of Hybrid Architecture
flowchart TD
A[Raw Content] -->|Once, manual| B[Structured Metadata]
B -->|Every time, auto| C[Algorithm-based Recommendations]
C --> D[recommendations.json]
style B fill:#ffeb3b,stroke:#fbc02d
style C fill:#4caf50,stroke:#388e3c
Principle:
- High-cost work (LLM): Run once, cache results
- Low-cost work (Algorithm): Run every time, instant response
4. Alignment with 2024-2025 Industry Trends
Our approach aligns with latest industry trends:
Pre-computation strategy:
- Netflix: Model training (once) + Inference (every time)
- Google: Index building (periodic) + Query processing (realtime)
- Amazon: Product metadata generation (once) + Recommendations (realtime)
Decision Framework:
| Condition | LLM | Algorithm |
|---|---|---|
| Data structure | Unstructured | Structured |
| Response time | Seconds OK | Milliseconds needed |
| Result consistency | Variation OK | Consistency required |
| Cost sensitivity | Low | High |
| Scalability | Limited | Unlimited |
Practical Application Guide
Step 1: Design Metadata Schema
Design metadata structure for your domain:
// E-commerce example
interface ProductMetadata {
id: string;
category: string[]; // ["Electronics", "Smartphones"]
price: number; // 599.99
features: string[]; // ["5G", "OLED", "128GB"]
targetAudience: string[]; // ["Tech Enthusiast", "Professional"]
priceRange: 'budget' | 'mid' | 'premium';
}
// News example
interface ArticleMetadata {
id: string;
topics: string[]; // ["Technology", "AI", "Ethics"]
sentiment: number; // -1.0 ~ 1.0
readingLevel: number; // 1-5
keywords: string[]; // ["ChatGPT", "regulation", "EU"]
publicationDate: Date;
}
Step 2: Choose Similarity Algorithm
Select similarity function for your domain:
| Data Type | Recommended Algorithm | Use Case |
|---|---|---|
| Set | Jaccard Index | Topics, tags, tech stack |
| Vector | Cosine Similarity | Category scores, embeddings |
| Numeric | Distance Penalty | Price, difficulty, age |
| Date | Time Decay | Recency weighting |
Step 3: Tune Weights
Adjust weights for domain characteristics:
// E-commerce: Price is important
const ecommerceWeights = {
category: 0.25,
price: 0.35, // High weight
features: 0.20,
audience: 0.20
};
// News: Topics are important
const newsWeights = {
topics: 0.40, // High weight
sentiment: 0.15,
readingLevel: 0.10,
recency: 0.35 // Recency matters
};
// Technical Blog: Tech stack is important
const blogWeights = {
topics: 0.35,
techStack: 0.25, // High weight
category: 0.20,
difficulty: 0.10,
complementary: 0.10
};
Lessons and Future Outlook
1. Beyond “LLM for Everything”
LLMs are powerful but not omnipotent. Use the right tool for the right problem:
LLM overuse cases:
- Simple text search (regex is enough)
- Structured data comparison (SQL/algorithms are enough)
- Deterministic calculation (functions are enough)
LLM needed cases:
- Natural language understanding and generation
- Unstructured data analysis
- Creative tasks
2. Power of Structured Data
Structuring is key:
Unstructured Text (1000 chars)
↓
[LLM Analysis]
↓
Structured Metadata (200 chars)
↓
[Algorithm Processing]
↓
Recommendation Results
With structuring:
- Compression (1000 chars → 200 chars)
- Standardization (consistent format)
- Computability (algorithm applicable)
3. Pre-computation Power
“Pre-compute, query fast” is the golden rule of system design:
Examples:
- Search engines: Indexing (slow) + Query (fast)
- Recommendation systems: Model training (slow) + Inference (fast)
- Data analytics: ETL (slow) + Dashboard (fast)
Our case:
- Metadata generation (once, manual, 10 minutes)
- Recommendation generation (every time, auto, <1 second)
4. Value of Incremental Optimization
Don’t pursue perfection at once. Improve step by step:
Our journey:
- MVP: Content Preview (works but slow and expensive)
- V1: Metadata + LLM (63% improvement)
- V2: Ko-only (67% additional improvement)
- V3: Full Algorithm (100% token elimination)
At each step, we measured, validated, and found the next improvement.
Conclusion: Measurable Success
This project started with measurement and ended with measurement:
Start:
- 78,000 tokens, 2.7 minutes, $0.078
Goal:
- 63% token reduction, 59% time reduction
Result:
- 100% token elimination, 99% time reduction, complete cost zeroing
The results exceeded expectations. The key was applying the right technology to the right problem.
LLMs are amazing. But not every problem needs an LLM. Sometimes old, proven algorithms are the better answer.
Related Posts:
- Building an Intelligent Content Recommendation System with Claude LLM - Initial LLM-based recommendation system
- Blog Automation with LLM and Claude Code - Complete blog automation system
- Claude Code Best Practices - AI development productivity optimization
References:
Was this helpful?
Your support helps me create better content. Buy me a coffee! ☕