DeNA LLM Study Part 5: Agent Design and Multi-Agent Orchestration
DeNA LLM Study Series finale. Practical guide to n8n workflows, agent design principles, multi-agent orchestration patterns, and memory management strategies.
Series: DeNA LLM Study (5/5 - Final)
- Part 1: LLM Fundamentals and 2025 AI Landscape
- Part 2: Structured Output and Multi-LLM Pipelines
- Part 3: Model Training Methodologies
- Part 4: RAG Architecture and Latest Trends
- Part 5: Agent Design and Multi-Agent Orchestration ← Current Article
Overview
This is the final installment of the DeNA LLM Study Series. Part 5 covers agent design and multi-agent orchestration using LLMs. Beyond simple prompt engineering, we’ll explore how to build autonomous agent systems and address cost, performance, and reliability challenges in production environments.
Part 5 Key Topics
- LLM Workflows with n8n - Building agents with no-code/low-code automation platforms
- Agent Design Principles - Core components and Self-Healing patterns
- Multi-Agent Orchestration - 6 patterns and framework comparison (LangGraph, AutoGen, CrewAI)
- Memory and State Management - MemGPT, A-MEM (Zettelkasten-based)
- Production Case Study - DeNA NOC Alert Agent
- Cost and Performance Optimization - Semantic caching, batching, SLM utilization
This article synthesizes DeNA’s official study materials with the latest research findings and production case studies.
1. LLM Workflows with n8n
What is n8n?
n8n is a no-code/low-code workflow automation platform. As of 2025, it supports 422+ integrations and provides specialized features for building LLM agents.
Key Features:
- Visual workflow builder
- LangChain, Ollama, and major LLM framework integrations
- Native ReAct Agent pattern support
- Self-hostable (data privacy guaranteed)
ReAct Agent Implementation
Example of implementing the ReAct (Reasoning and Acting) pattern in n8n:
// n8n ReAct Agent workflow example
{
"nodes": [
{
"type": "n8n-nodes-langchain.agent",
"name": "ReAct Agent",
"parameters": {
"agentType": "react",
"systemMessage": "You are a data analysis expert. Analyze user questions and select appropriate tools to answer.",
"tools": ["webSearch", "calculator", "database"]
}
}
]
}
2025 Trend: Orchestration > Full Autonomy
According to DeNA study materials and recent research, the key trend for agent systems in 2025 is a shift from “full autonomy” to “orchestration”.
Reasons:
- Cost Explosion: Unlimited API calls from autonomous agents
- Unpredictability: Difficulty controlling agent behavior
- Reliability Issues: Instability in production environments
Workflow tools like n8n are gaining attention precisely because they provide explicit orchestration.
2. Agent Design Principles
Core Components
LLM agents consist of four core components:
graph LR
Agent[AI Agent] --> Core[1. Core]
Agent --> Memory[2. Memory]
Agent --> Planning[3. Planning]
Agent --> Tools[4. Tool Use]
Core --> Prompt[Prompt Template]
Core --> LLM[LLM Engine]
Core --> Parser[Output Parser]
Memory --> ShortTerm[Short-term<br/>Conversation History]
Memory --> LongTerm[Long-term<br/>Knowledge Base]
Planning --> ReAct[ReAct]
Planning --> PlanExecute[Plan-and-Execute]
Tools --> FunctionCall[Function Calling]
Tools --> ToolRegistry[Tool Registry]
style Core fill:#7B2CBF,color:#fff
style Memory fill:#0066CC,color:#fff
style Planning fill:#00A896,color:#fff
style Tools fill:#F77F00,color:#fff
1. Core
The central engine of the agent.
Components:
- Prompt Template: System message, persona definition
- LLM Engine: Claude, GPT-4, Gemini, etc.
- Output Parser: Converts LLM output to structured data
2. Memory
The agent’s memory system.
Short-term Memory:
- Current conversation session history
- Typically last N messages (N=5〜10)
- Directly included in Context Window
Long-term Memory:
- Persistent knowledge base
- Vector Database (Pinecone, Weaviate, etc.)
- Retrieved via RAG pattern when needed
3. Planning
The agent’s strategy for executing complex tasks.
ReAct Pattern:
Thought: User requested company revenue data.
Action: query_db
Action Input: SELECT revenue FROM sales WHERE year=2024
Observation: [Result: $1.5M]
Thought: Need to compare with last year.
Action: query_db
Action Input: SELECT revenue FROM sales WHERE year=2023
Observation: [Result: $1.2M]
Thought: Need to calculate growth rate.
Action: calculate
Action Input: ((1.5 - 1.2) / 1.2) * 100
Observation: 25%
Final Answer: 2024 revenue is $1.5M, up 25% from previous year.
4. Tool Use
Mechanism for LLM to interact with external tools.
Function Calling Reliability Issues
Critical issue highlighted in DeNA study materials: LLM function calling is not 100% reliable.
Problem Cases:
- Incorrect Parameters: Missing required fields, type mismatches
- Hallucination: Calling non-existent tools
- Infinite Loops: Repeatedly calling the same tool
Self-Healing Pattern
Mechanism for agents to automatically recover from errors.
class SelfHealingAgent {
async execute(task: Task): Promise<Result> {
const maxRetries = 3;
let attempt = 0;
while (attempt < maxRetries) {
try {
const result = await this.runTask(task);
return result;
} catch (error) {
attempt++;
// Analyze error
const analysis = await this.analyzeError(error);
// Select recovery strategy
if (analysis.recoverable) {
task = await this.adjustTask(task, analysis);
console.log(`Retry ${attempt}: ${analysis.suggestion}`);
} else {
throw new UnrecoverableError(error);
}
}
}
throw new Error("Max retries exceeded");
}
}
3. Multi-Agent Orchestration
6 Orchestration Patterns
Patterns for distributed processing of complex tasks across multiple agents.
1. Sequential
Linear structure where one agent’s output becomes the next agent’s input.
Use Cases:
- Blog post creation: Research → Draft → Edit → Publish
- Data pipeline: Collect → Clean → Analyze → Visualize
Pros:
- Simple implementation
- Easy debugging
- Predictable costs
2. Parallel
Structure where multiple agents work independently and simultaneously.
Use Cases:
- Content review: Quality check + Legal review + Fact checking in parallel
- Multimodal analysis: Text + Image + Audio parallel processing
3. Supervisor
Structure where a central supervisor distributes tasks and integrates results.
Use Cases:
- Complex research: Supervisor distributes sub-topics to multiple workers
- Code generation: Supervisor assigns module implementations to workers
4. Hierarchical
Multiple levels of supervisor-worker relationships forming a tree structure.
Use Cases:
- Large-scale project management: PM → Team Leaders → Developers
- Complex system design: Architect → Module Designers → Implementers
5. Network
Structure where agents communicate freely in a P2P manner.
Use Cases:
- Creative collaboration: Idea brainstorming
- Democratic decision-making: Vote-based consensus
6. Custom
Unique patterns optimized for specific problems.
Framework Comparison: LangGraph vs AutoGen vs CrewAI
Comparison of the three major multi-agent frameworks.
| Feature | LangGraph | AutoGen | CrewAI |
|---|---|---|---|
| Core Concept | Graph-based workflow | Conversation-based agents | Role-based teams |
| State Management | Explicit state graph | Conversation history | Built-in memory |
| Learning Curve | Medium | High | Low |
| Production Readiness | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ |
LangGraph
Philosophy: Express everything as graphs
Pros:
- Clear control flow: All paths explicitly defined in graph
- Easy debugging: State tracking possible
- Production stability: Predictable behavior
AutoGen
Philosophy: Problem-solving through agent conversation
Pros:
- Natural collaboration: Mimics human team conversations
- Flexibility: Dynamic conversation flow
- Emergent behavior: Unexpected problem-solving
Cons:
- Cost explosion risk (unlimited conversations)
- Unpredictable
- Difficult debugging
CrewAI
Philosophy: Role-based team composition
Pros:
- Intuitive: Role concept is easy to understand
- Rapid prototyping: Implementation with minimal code
- Built-in memory: Automatic context management
Cons:
- Insufficient logging (debugging difficulties)
- Limitations with complex workflows
- Difficult fine-grained control
Cost Impact of Pattern Selection
Analysis of how pattern selection affects costs in real projects.
Scenario: Blog post generation (Research + Writing + Editing)
| Pattern | API Calls | Expected Cost | Processing Time |
|---|---|---|---|
| Sequential | 3 calls | $0.15 | 90 sec |
| Parallel | 3 calls (simultaneous) | $0.15 | 30 sec |
| Supervisor | 7 calls (supervisor 2 + workers 3 + integration 2) | $0.35 | 60 sec |
| Network (AutoGen) | 15〜50 calls (conversation) | $0.75〜$2.50 | 120〜300 sec |
Production Recommendations:
- Clear workflow exists → Sequential or Parallel
- Dynamic task distribution needed → Supervisor
- Creative collaboration needed → Network (but set cost limits!)
4. Memory and State Management
MemGPT Pattern
MemGPT is an innovative approach applying OS virtual memory concepts to LLMs.
Core Idea:
- Main Context (Main Memory): LLM’s Context Window
- External Storage: Vector DB, Relational DB
- Memory Manager: Swap in/out based on importance
Push vs Pull Hybrid
MemGPT combines two memory strategies.
Push (Active):
- LLM automatically saves information it deems important
- Example: “This user prefers TypeScript” → Save
Pull (Reactive):
- Search external storage when needed
- Example: User says “consider my preferences” → Search
Hierarchical Memory Structure
MemGPT proposes a 3-tier memory hierarchy.
L1: Working Memory (Context Window)
├─ Current conversation (5〜10 messages)
├─ Active task state
└─ System prompt
L2: Recent Memory (Short-term storage)
├─ Recent sessions (1 week)
├─ Frequently referenced information
└─ Temporary task data
L3: Long-term Memory (Long-term storage)
├─ User profile
├─ Domain knowledge
└─ Accumulated learning data
A-MEM (Zettelkasten-based)
A-MEM is an innovative memory system proposed by Rutgers University in 2025. It applies Zettelkasten (German for “note box”) methodology to LLM agents.
What is Zettelkasten?
- Note organization method developed by Niklas Luhmann (sociologist)
- Assign unique IDs to each note
- Build knowledge network through note connections (links)
- Generate emergent insights
A-MEM Architecture
The core of A-MEM is that agents organize memory themselves.
Implementation Example:
class AMem {
notes: Map<string, Note>;
graph: Graph;
async createNote(content: string, metadata: Metadata): Promise<string> {
const noteId = generateId();
const note = new Note(noteId, content, metadata);
// Auto-tagging
const tags = await this.extractTags(content);
note.tags = tags;
// Calculate similarity with existing notes
const similar = await this.findSimilarNotes(note);
// Auto-generate connections (similarity > 0.7)
for (const [relatedNote, similarity] of similar) {
if (similarity > 0.7) {
this.linkNotes(noteId, relatedNote.id, {
type: "related",
strength: similarity,
});
}
}
this.notes.set(noteId, note);
return noteId;
}
}
A-MEM Benefits:
- Dynamic organization: No manual structuring needed
- Relevance-based search: Direct matching + indirect connections
- Emergent insights: Discover new connections between notes
- Scalability: Efficiency maintained as knowledge grows
5. Production Case Study: DeNA NOC Alert Agent
Real production deployment of NOC (Network Operations Center) Alert Agent at DeNA.
Problem Definition
Background:
- Operations team receives 100〜200 alerts daily
- 70% of alerts are false positives
- Engineers manually classify and respond to alerts
Goals:
- Automatic alert classification and prioritization
- False positive filtering
- Automatic response guide generation
Workflow Design
graph TD
Alert[Alert Generated] --> Classifier[Classification Agent]
Classifier --> |Urgent| Escalate[Escalation]
Classifier --> |Normal| Analyzer[Analysis Agent]
Classifier --> |False Positive| Archive[Archive]
Analyzer --> Context[Context Collection<br/>- Logs<br/>- Metrics<br/>- History]
Context --> RCA[Root Cause Analysis]
RCA --> Guide[Response Guide Generation]
Guide --> Ticket[Ticket Creation]
Escalate --> OnCall[On-call Engineer Alert]
style Classifier fill:#F77F00,color:#fff
style Analyzer fill:#0066CC,color:#fff
style RCA fill:#7B2CBF,color:#fff
Production Deployment Considerations
Problems discovered during actual deployment and their solutions.
1. Hallucination Problem
Issue: LLM mentions non-existent logs or metrics
Solution:
// Tool call result validation
class ToolExecutor {
async execute(toolName: string, input: any): Promise<any> {
const tool = this.registry.get(toolName);
// Input validation
if (!this.validateInput(tool, input)) {
return {
error: "Invalid input",
suggestion: "Please check the tool documentation",
};
}
// Execute
const result = await tool.execute(input);
// Output validation
if (this.isEmpty(result)) {
return {
error: "No data found",
suggestion: "Try different search parameters",
};
}
return result;
}
}
2. Latency Problem
Issue: Alert → Response takes 45 seconds on average (Target: 10 seconds)
Solutions:
- Parallel processing: Collect logs/metrics/history simultaneously
- Caching: Cache frequently used query results
- Streaming: Display partial results immediately
3. Cost Problem
Issue: 200 alerts/day × $0.20 = $40/day ($1,200/month)
Solutions:
- False positive pre-filter: Filter obvious false positives with rules first
- Batching: Process similar alerts together
- SLM utilization: Use small models for simple classification
Results
After 6 months of operation:
- False positive filtering accuracy: 92%
- Response time reduction: 15 minutes → 3 minutes average
- Engineer burden reduction: 20 hours saved per week
- Monthly operating cost: $1,200 → $350 (post-optimization)
6. Cost and Performance Optimization
The biggest challenges for LLM agent systems are cost and latency. Four core optimization techniques:
1. Semantic Caching (90% Cost Reduction)
Concept: Reuse cached responses for semantically similar queries
// Semantic caching implementation
class SemanticCache {
private cache: Map<string, CacheEntry> = new Map();
private embeddings: EmbeddingModel;
async get(query: string): Promise<string | null> {
// Query embedding
const queryEmbedding = await this.embeddings.encode(query);
// Similarity search
for (const [cachedQuery, entry] of this.cache) {
const similarity = cosineSimilarity(queryEmbedding, entry.embedding);
// Cache hit if similarity > 0.95
if (similarity > 0.95) {
console.log(`Cache hit! (similarity: ${similarity})`);
return entry.response;
}
}
return null;
}
}
Impact:
- 60% cache hit rate → 60% cost reduction
- 95% latency reduction (network delay eliminated)
2. Batching (50% Reduction)
Concept: Bundle multiple requests for processing at once
Impact:
- Batch size of 10 → Approximately 50% cost reduction
- However, latency slightly increases (wait time)
3. SLM (Small Language Model, 14x Reduction)
Concept: Use smaller models for simple tasks
// Model routing
class ModelRouter {
private smallModel: SLM; // Llama 3.2 (8B)
private largeModel: LLM; // Claude Sonnet 4
async route(task: Task): Promise<Response> {
const complexity = this.assessComplexity(task);
if (complexity < 0.3) {
// Simple task: SLM ($0.001)
return await this.smallModel.execute(task);
} else {
// Complex task: Large Model ($0.014)
return await this.largeModel.execute(task);
}
}
}
Impact:
- If 70% of tasks can be handled by SLM
- Cost: 70% × $0.001 + 30% × $0.014 = $0.0049 (average)
- Large Model only: $0.014
- Reduction rate: 65% (approximately 14x cheaper with SLM alone)
4. Quantization
Concept: Reduce model weight precision to decrease size and cost
| Quantization Level | Model Size | Accuracy Loss | Inference Speed | Use Case |
|---|---|---|---|---|
| FP16 (Original) | 16GB | 0% | 1x | Benchmark baseline |
| 8bit | 8GB | ~1% | 1.5x | Production (accuracy important) |
| 4bit | 4GB | ~3% | 2x | Local execution, experiments |
| 2bit | 2GB | ~10% | 3x | Prototypes, demos |
Comprehensive Cost Optimization Strategy
Real-world case combining all four techniques.
Before (Pre-optimization):
// Process all requests with Claude Sonnet 4
const response = await claude.generate(query);
// Cost: $0.014 per request
// Latency: 2 seconds
After (Post-optimization):
async function optimizedQuery(query: string): Promise<string> {
// 1. Semantic caching (60% hit rate)
const cached = await cache.get(query);
if (cached) return cached; // Cost: $0, Latency: 50ms
// 2. Complexity assessment and model routing
const complexity = assessComplexity(query);
let response: string;
if (complexity < 0.3) {
// 3. SLM usage (70% requests)
response = await smallModel.generate(query);
// Cost: $0.001, Latency: 500ms
} else {
// 4. Large Model (30% requests)
response = await largeModel.generate(query);
// Cost: $0.014, Latency: 2 seconds
}
// Cache save
await cache.set(query, response);
return response;
}
Cost Calculation:
Cache hit (60%): $0 × 0.6 = $0
Cache miss:
- SLM (28%): $0.001 × 0.28 = $0.00028
- Large (12%): $0.014 × 0.12 = $0.00168
Average cost: $0.00196 per request
Reduction rate: 86% (Before $0.014 → After $0.00196)
Key Insights and Reflections
Core insights gained from completing DeNA LLM Study Part 5.
1. “Orchestration” Over “Full Autonomy”
This is the core of the 2025 trend. Rather than letting agents decide everything autonomously, enabling autonomy within clear workflows is more effective in production.
2. Memory Creates True Agent Intelligence
Advanced memory systems like MemGPT and A-MEM transform agents from simple “prompt executors” to “learning and evolving systems”.
3. Choose Multi-Agent Patterns Based on Problem
There is no “one-size-fits-all” among the six patterns. Each has clear pros and cons.
4. Cost Optimization is Essential, Not Optional
The biggest barrier to LLM agent systems in production is cost.
Core Strategies:
- Semantic caching - Apply to all systems (60% hit rate alone yields significant benefits)
- SLM routing - Simple tasks (70%) with small models
- Batching - Bundle tasks that don’t need real-time processing
- Pre-filtering - Filter obvious cases with rules
5. Production Agents Need “Self-Healing”
LLMs are probabilistic systems. 100% accurate responses are impossible.
Series Recap
We’ve completed DeNA LLM Study Parts 1〜5. Let’s look back at the entire learning journey.
Insights from the Entire Series
- LLMs are tools: Not omnipotent. Proper usage for the problem matters.
- Prompts > Fine-tuning: 90% of problems can be solved with prompts.
- RAG is essential: For latest information and domain knowledge, RAG is the answer.
- Agents need memory: Adding memory to stateless LLMs creates true agents.
- Cost optimization from design phase: Optimizing later costs 10x more.
References
DeNA Official Materials
- DeNA Tech Blog
- DeNA LLM Study Presentation (Internal materials, 2024)
n8n and Workflows
Multi-Agent Frameworks
Memory Systems
Cost Optimization
DeNA LLM Study Series Complete! We hope this series helps your LLM agent development journey.
Was this helpful?
Your support helps me create better content. Buy me a coffee! ☕