DeNA LLM Study Part 5: Agent Design and Multi-Agent Orchestration

DeNA LLM Study Part 5: Agent Design and Multi-Agent Orchestration

DeNA LLM Study Series finale. Practical guide to n8n workflows, agent design principles, multi-agent orchestration patterns, and memory management strategies.

Series: DeNA LLM Study (5/5 - Final)

  1. Part 1: LLM Fundamentals and 2025 AI Landscape
  2. Part 2: Structured Output and Multi-LLM Pipelines
  3. Part 3: Model Training Methodologies
  4. Part 4: RAG Architecture and Latest Trends
  5. Part 5: Agent Design and Multi-Agent Orchestration ← Current Article

Overview

This is the final installment of the DeNA LLM Study Series. Part 5 covers agent design and multi-agent orchestration using LLMs. Beyond simple prompt engineering, we’ll explore how to build autonomous agent systems and address cost, performance, and reliability challenges in production environments.

Part 5 Key Topics

  1. LLM Workflows with n8n - Building agents with no-code/low-code automation platforms
  2. Agent Design Principles - Core components and Self-Healing patterns
  3. Multi-Agent Orchestration - 6 patterns and framework comparison (LangGraph, AutoGen, CrewAI)
  4. Memory and State Management - MemGPT, A-MEM (Zettelkasten-based)
  5. Production Case Study - DeNA NOC Alert Agent
  6. Cost and Performance Optimization - Semantic caching, batching, SLM utilization

This article synthesizes DeNA’s official study materials with the latest research findings and production case studies.

1. LLM Workflows with n8n

What is n8n?

n8n is a no-code/low-code workflow automation platform. As of 2025, it supports 422+ integrations and provides specialized features for building LLM agents.

Key Features:

  • Visual workflow builder
  • LangChain, Ollama, and major LLM framework integrations
  • Native ReAct Agent pattern support
  • Self-hostable (data privacy guaranteed)

ReAct Agent Implementation

Example of implementing the ReAct (Reasoning and Acting) pattern in n8n:

// n8n ReAct Agent workflow example
{
  "nodes": [
    {
      "type": "n8n-nodes-langchain.agent",
      "name": "ReAct Agent",
      "parameters": {
        "agentType": "react",
        "systemMessage": "You are a data analysis expert. Analyze user questions and select appropriate tools to answer.",
        "tools": ["webSearch", "calculator", "database"]
      }
    }
  ]
}

2025 Trend: Orchestration > Full Autonomy

According to DeNA study materials and recent research, the key trend for agent systems in 2025 is a shift from “full autonomy” to “orchestration”.

Reasons:

  1. Cost Explosion: Unlimited API calls from autonomous agents
  2. Unpredictability: Difficulty controlling agent behavior
  3. Reliability Issues: Instability in production environments

Workflow tools like n8n are gaining attention precisely because they provide explicit orchestration.

2. Agent Design Principles

Core Components

LLM agents consist of four core components:

graph LR
    Agent[AI Agent] --> Core[1. Core]
    Agent --> Memory[2. Memory]
    Agent --> Planning[3. Planning]
    Agent --> Tools[4. Tool Use]

    Core --> Prompt[Prompt Template]
    Core --> LLM[LLM Engine]
    Core --> Parser[Output Parser]

    Memory --> ShortTerm[Short-term<br/>Conversation History]
    Memory --> LongTerm[Long-term<br/>Knowledge Base]

    Planning --> ReAct[ReAct]
    Planning --> PlanExecute[Plan-and-Execute]

    Tools --> FunctionCall[Function Calling]
    Tools --> ToolRegistry[Tool Registry]

    style Core fill:#7B2CBF,color:#fff
    style Memory fill:#0066CC,color:#fff
    style Planning fill:#00A896,color:#fff
    style Tools fill:#F77F00,color:#fff

1. Core

The central engine of the agent.

Components:

  • Prompt Template: System message, persona definition
  • LLM Engine: Claude, GPT-4, Gemini, etc.
  • Output Parser: Converts LLM output to structured data

2. Memory

The agent’s memory system.

Short-term Memory:

  • Current conversation session history
  • Typically last N messages (N=5〜10)
  • Directly included in Context Window

Long-term Memory:

  • Persistent knowledge base
  • Vector Database (Pinecone, Weaviate, etc.)
  • Retrieved via RAG pattern when needed

3. Planning

The agent’s strategy for executing complex tasks.

ReAct Pattern:

Thought: User requested company revenue data.
Action: query_db
Action Input: SELECT revenue FROM sales WHERE year=2024
Observation: [Result: $1.5M]
Thought: Need to compare with last year.
Action: query_db
Action Input: SELECT revenue FROM sales WHERE year=2023
Observation: [Result: $1.2M]
Thought: Need to calculate growth rate.
Action: calculate
Action Input: ((1.5 - 1.2) / 1.2) * 100
Observation: 25%
Final Answer: 2024 revenue is $1.5M, up 25% from previous year.

4. Tool Use

Mechanism for LLM to interact with external tools.

Function Calling Reliability Issues

Critical issue highlighted in DeNA study materials: LLM function calling is not 100% reliable.

Problem Cases:

  1. Incorrect Parameters: Missing required fields, type mismatches
  2. Hallucination: Calling non-existent tools
  3. Infinite Loops: Repeatedly calling the same tool

Self-Healing Pattern

Mechanism for agents to automatically recover from errors.

class SelfHealingAgent {
  async execute(task: Task): Promise<Result> {
    const maxRetries = 3;
    let attempt = 0;

    while (attempt < maxRetries) {
      try {
        const result = await this.runTask(task);
        return result;
      } catch (error) {
        attempt++;

        // Analyze error
        const analysis = await this.analyzeError(error);

        // Select recovery strategy
        if (analysis.recoverable) {
          task = await this.adjustTask(task, analysis);
          console.log(`Retry ${attempt}: ${analysis.suggestion}`);
        } else {
          throw new UnrecoverableError(error);
        }
      }
    }

    throw new Error("Max retries exceeded");
  }
}

3. Multi-Agent Orchestration

6 Orchestration Patterns

Patterns for distributed processing of complex tasks across multiple agents.

1. Sequential

Linear structure where one agent’s output becomes the next agent’s input.

Use Cases:

  • Blog post creation: Research → Draft → Edit → Publish
  • Data pipeline: Collect → Clean → Analyze → Visualize

Pros:

  • Simple implementation
  • Easy debugging
  • Predictable costs

2. Parallel

Structure where multiple agents work independently and simultaneously.

Use Cases:

  • Content review: Quality check + Legal review + Fact checking in parallel
  • Multimodal analysis: Text + Image + Audio parallel processing

3. Supervisor

Structure where a central supervisor distributes tasks and integrates results.

Use Cases:

  • Complex research: Supervisor distributes sub-topics to multiple workers
  • Code generation: Supervisor assigns module implementations to workers

4. Hierarchical

Multiple levels of supervisor-worker relationships forming a tree structure.

Use Cases:

  • Large-scale project management: PM → Team Leaders → Developers
  • Complex system design: Architect → Module Designers → Implementers

5. Network

Structure where agents communicate freely in a P2P manner.

Use Cases:

  • Creative collaboration: Idea brainstorming
  • Democratic decision-making: Vote-based consensus

6. Custom

Unique patterns optimized for specific problems.

Framework Comparison: LangGraph vs AutoGen vs CrewAI

Comparison of the three major multi-agent frameworks.

FeatureLangGraphAutoGenCrewAI
Core ConceptGraph-based workflowConversation-based agentsRole-based teams
State ManagementExplicit state graphConversation historyBuilt-in memory
Learning CurveMediumHighLow
Production Readiness⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐

LangGraph

Philosophy: Express everything as graphs

Pros:

  • Clear control flow: All paths explicitly defined in graph
  • Easy debugging: State tracking possible
  • Production stability: Predictable behavior

AutoGen

Philosophy: Problem-solving through agent conversation

Pros:

  • Natural collaboration: Mimics human team conversations
  • Flexibility: Dynamic conversation flow
  • Emergent behavior: Unexpected problem-solving

Cons:

  • Cost explosion risk (unlimited conversations)
  • Unpredictable
  • Difficult debugging

CrewAI

Philosophy: Role-based team composition

Pros:

  • Intuitive: Role concept is easy to understand
  • Rapid prototyping: Implementation with minimal code
  • Built-in memory: Automatic context management

Cons:

  • Insufficient logging (debugging difficulties)
  • Limitations with complex workflows
  • Difficult fine-grained control

Cost Impact of Pattern Selection

Analysis of how pattern selection affects costs in real projects.

Scenario: Blog post generation (Research + Writing + Editing)

PatternAPI CallsExpected CostProcessing Time
Sequential3 calls$0.1590 sec
Parallel3 calls (simultaneous)$0.1530 sec
Supervisor7 calls (supervisor 2 + workers 3 + integration 2)$0.3560 sec
Network (AutoGen)15〜50 calls (conversation)$0.75〜$2.50120〜300 sec

Production Recommendations:

  1. Clear workflow exists → Sequential or Parallel
  2. Dynamic task distribution needed → Supervisor
  3. Creative collaboration needed → Network (but set cost limits!)

4. Memory and State Management

MemGPT Pattern

MemGPT is an innovative approach applying OS virtual memory concepts to LLMs.

Core Idea:

  • Main Context (Main Memory): LLM’s Context Window
  • External Storage: Vector DB, Relational DB
  • Memory Manager: Swap in/out based on importance

Push vs Pull Hybrid

MemGPT combines two memory strategies.

Push (Active):

  • LLM automatically saves information it deems important
  • Example: “This user prefers TypeScript” → Save

Pull (Reactive):

  • Search external storage when needed
  • Example: User says “consider my preferences” → Search

Hierarchical Memory Structure

MemGPT proposes a 3-tier memory hierarchy.

L1: Working Memory (Context Window)
    ├─ Current conversation (5〜10 messages)
    ├─ Active task state
    └─ System prompt

L2: Recent Memory (Short-term storage)
    ├─ Recent sessions (1 week)
    ├─ Frequently referenced information
    └─ Temporary task data

L3: Long-term Memory (Long-term storage)
    ├─ User profile
    ├─ Domain knowledge
    └─ Accumulated learning data

A-MEM (Zettelkasten-based)

A-MEM is an innovative memory system proposed by Rutgers University in 2025. It applies Zettelkasten (German for “note box”) methodology to LLM agents.

What is Zettelkasten?

  • Note organization method developed by Niklas Luhmann (sociologist)
  • Assign unique IDs to each note
  • Build knowledge network through note connections (links)
  • Generate emergent insights

A-MEM Architecture

The core of A-MEM is that agents organize memory themselves.

Implementation Example:

class AMem {
  notes: Map<string, Note>;
  graph: Graph;

  async createNote(content: string, metadata: Metadata): Promise<string> {
    const noteId = generateId();
    const note = new Note(noteId, content, metadata);

    // Auto-tagging
    const tags = await this.extractTags(content);
    note.tags = tags;

    // Calculate similarity with existing notes
    const similar = await this.findSimilarNotes(note);

    // Auto-generate connections (similarity > 0.7)
    for (const [relatedNote, similarity] of similar) {
      if (similarity > 0.7) {
        this.linkNotes(noteId, relatedNote.id, {
          type: "related",
          strength: similarity,
        });
      }
    }

    this.notes.set(noteId, note);
    return noteId;
  }
}

A-MEM Benefits:

  1. Dynamic organization: No manual structuring needed
  2. Relevance-based search: Direct matching + indirect connections
  3. Emergent insights: Discover new connections between notes
  4. Scalability: Efficiency maintained as knowledge grows

5. Production Case Study: DeNA NOC Alert Agent

Real production deployment of NOC (Network Operations Center) Alert Agent at DeNA.

Problem Definition

Background:

  • Operations team receives 100〜200 alerts daily
  • 70% of alerts are false positives
  • Engineers manually classify and respond to alerts

Goals:

  • Automatic alert classification and prioritization
  • False positive filtering
  • Automatic response guide generation

Workflow Design

graph TD
    Alert[Alert Generated] --> Classifier[Classification Agent]

    Classifier --> |Urgent| Escalate[Escalation]
    Classifier --> |Normal| Analyzer[Analysis Agent]
    Classifier --> |False Positive| Archive[Archive]

    Analyzer --> Context[Context Collection<br/>- Logs<br/>- Metrics<br/>- History]

    Context --> RCA[Root Cause Analysis]

    RCA --> Guide[Response Guide Generation]

    Guide --> Ticket[Ticket Creation]
    Escalate --> OnCall[On-call Engineer Alert]

    style Classifier fill:#F77F00,color:#fff
    style Analyzer fill:#0066CC,color:#fff
    style RCA fill:#7B2CBF,color:#fff

Production Deployment Considerations

Problems discovered during actual deployment and their solutions.

1. Hallucination Problem

Issue: LLM mentions non-existent logs or metrics

Solution:

// Tool call result validation
class ToolExecutor {
  async execute(toolName: string, input: any): Promise<any> {
    const tool = this.registry.get(toolName);

    // Input validation
    if (!this.validateInput(tool, input)) {
      return {
        error: "Invalid input",
        suggestion: "Please check the tool documentation",
      };
    }

    // Execute
    const result = await tool.execute(input);

    // Output validation
    if (this.isEmpty(result)) {
      return {
        error: "No data found",
        suggestion: "Try different search parameters",
      };
    }

    return result;
  }
}

2. Latency Problem

Issue: Alert → Response takes 45 seconds on average (Target: 10 seconds)

Solutions:

  • Parallel processing: Collect logs/metrics/history simultaneously
  • Caching: Cache frequently used query results
  • Streaming: Display partial results immediately

3. Cost Problem

Issue: 200 alerts/day × $0.20 = $40/day ($1,200/month)

Solutions:

  • False positive pre-filter: Filter obvious false positives with rules first
  • Batching: Process similar alerts together
  • SLM utilization: Use small models for simple classification

Results

After 6 months of operation:

  • False positive filtering accuracy: 92%
  • Response time reduction: 15 minutes → 3 minutes average
  • Engineer burden reduction: 20 hours saved per week
  • Monthly operating cost: $1,200 → $350 (post-optimization)

6. Cost and Performance Optimization

The biggest challenges for LLM agent systems are cost and latency. Four core optimization techniques:

1. Semantic Caching (90% Cost Reduction)

Concept: Reuse cached responses for semantically similar queries

// Semantic caching implementation
class SemanticCache {
  private cache: Map<string, CacheEntry> = new Map();
  private embeddings: EmbeddingModel;

  async get(query: string): Promise<string | null> {
    // Query embedding
    const queryEmbedding = await this.embeddings.encode(query);

    // Similarity search
    for (const [cachedQuery, entry] of this.cache) {
      const similarity = cosineSimilarity(queryEmbedding, entry.embedding);

      // Cache hit if similarity > 0.95
      if (similarity > 0.95) {
        console.log(`Cache hit! (similarity: ${similarity})`);
        return entry.response;
      }
    }

    return null;
  }
}

Impact:

  • 60% cache hit rate → 60% cost reduction
  • 95% latency reduction (network delay eliminated)

2. Batching (50% Reduction)

Concept: Bundle multiple requests for processing at once

Impact:

  • Batch size of 10 → Approximately 50% cost reduction
  • However, latency slightly increases (wait time)

3. SLM (Small Language Model, 14x Reduction)

Concept: Use smaller models for simple tasks

// Model routing
class ModelRouter {
  private smallModel: SLM; // Llama 3.2 (8B)
  private largeModel: LLM; // Claude Sonnet 4

  async route(task: Task): Promise<Response> {
    const complexity = this.assessComplexity(task);

    if (complexity < 0.3) {
      // Simple task: SLM ($0.001)
      return await this.smallModel.execute(task);
    } else {
      // Complex task: Large Model ($0.014)
      return await this.largeModel.execute(task);
    }
  }
}

Impact:

  • If 70% of tasks can be handled by SLM
  • Cost: 70% × $0.001 + 30% × $0.014 = $0.0049 (average)
  • Large Model only: $0.014
  • Reduction rate: 65% (approximately 14x cheaper with SLM alone)

4. Quantization

Concept: Reduce model weight precision to decrease size and cost

Quantization LevelModel SizeAccuracy LossInference SpeedUse Case
FP16 (Original)16GB0%1xBenchmark baseline
8bit8GB~1%1.5xProduction (accuracy important)
4bit4GB~3%2xLocal execution, experiments
2bit2GB~10%3xPrototypes, demos

Comprehensive Cost Optimization Strategy

Real-world case combining all four techniques.

Before (Pre-optimization):

// Process all requests with Claude Sonnet 4
const response = await claude.generate(query);
// Cost: $0.014 per request
// Latency: 2 seconds

After (Post-optimization):

async function optimizedQuery(query: string): Promise<string> {
  // 1. Semantic caching (60% hit rate)
  const cached = await cache.get(query);
  if (cached) return cached; // Cost: $0, Latency: 50ms

  // 2. Complexity assessment and model routing
  const complexity = assessComplexity(query);

  let response: string;

  if (complexity < 0.3) {
    // 3. SLM usage (70% requests)
    response = await smallModel.generate(query);
    // Cost: $0.001, Latency: 500ms
  } else {
    // 4. Large Model (30% requests)
    response = await largeModel.generate(query);
    // Cost: $0.014, Latency: 2 seconds
  }

  // Cache save
  await cache.set(query, response);

  return response;
}

Cost Calculation:

Cache hit (60%): $0 × 0.6 = $0
Cache miss:
  - SLM (28%): $0.001 × 0.28 = $0.00028
  - Large (12%): $0.014 × 0.12 = $0.00168

Average cost: $0.00196 per request
Reduction rate: 86% (Before $0.014 → After $0.00196)

Key Insights and Reflections

Core insights gained from completing DeNA LLM Study Part 5.

1. “Orchestration” Over “Full Autonomy”

This is the core of the 2025 trend. Rather than letting agents decide everything autonomously, enabling autonomy within clear workflows is more effective in production.

2. Memory Creates True Agent Intelligence

Advanced memory systems like MemGPT and A-MEM transform agents from simple “prompt executors” to “learning and evolving systems”.

3. Choose Multi-Agent Patterns Based on Problem

There is no “one-size-fits-all” among the six patterns. Each has clear pros and cons.

4. Cost Optimization is Essential, Not Optional

The biggest barrier to LLM agent systems in production is cost.

Core Strategies:

  1. Semantic caching - Apply to all systems (60% hit rate alone yields significant benefits)
  2. SLM routing - Simple tasks (70%) with small models
  3. Batching - Bundle tasks that don’t need real-time processing
  4. Pre-filtering - Filter obvious cases with rules

5. Production Agents Need “Self-Healing”

LLMs are probabilistic systems. 100% accurate responses are impossible.

Series Recap

We’ve completed DeNA LLM Study Parts 1〜5. Let’s look back at the entire learning journey.

Insights from the Entire Series

  1. LLMs are tools: Not omnipotent. Proper usage for the problem matters.
  2. Prompts > Fine-tuning: 90% of problems can be solved with prompts.
  3. RAG is essential: For latest information and domain knowledge, RAG is the answer.
  4. Agents need memory: Adding memory to stateless LLMs creates true agents.
  5. Cost optimization from design phase: Optimizing later costs 10x more.

References

DeNA Official Materials

  • DeNA Tech Blog
  • DeNA LLM Study Presentation (Internal materials, 2024)

n8n and Workflows

Multi-Agent Frameworks

Memory Systems

Cost Optimization


DeNA LLM Study Series Complete! We hope this series helps your LLM agent development journey.

Read in Other Languages

Was this helpful?

Your support helps me create better content. Buy me a coffee! ☕

About the Author

JK

Kim Jangwook

Full-Stack Developer specializing in AI/LLM

Building AI agent systems, LLM applications, and automation solutions with 10+ years of web development experience. Sharing practical insights on Claude Code, MCP, and RAG systems.