DeNA LLM Study Part 5: Agent Design and Multi-Agent Orchestration
12 min read

DeNA LLM Study Part 5: Agent Design and Multi-Agent Orchestration

DeNA LLM Study Series finale. Practical guide to n8n workflows, agent design principles, multi-agent orchestration patterns, and memory management strategies.

Series: DeNA LLM Study (5/5 - Final)

  1. Part 1: LLM Fundamentals and 2025 AI Landscape
  2. Part 2: Structured Output and Multi-LLM Pipelines
  3. Part 3: Model Training Methodologies
  4. Part 4: RAG Architecture and Latest Trends
  5. Part 5: Agent Design and Multi-Agent Orchestration ← Current Article

From Autonomous Agents to Orchestration

This is the final installment of the series. The days of steering an LLM with a single prompt are behind us. Here I look at how to design agents that decide for themselves and call their own tools, and how to wire several of them together. Chase autonomy too hard, though, and costs blow up while behavior slips out of your control. So this part also walks through the cost, performance, and reliability problems you actually hit in production.

Part 5 Key Topics

  1. LLM Workflows with n8n - Building agents with no-code/low-code automation platforms
  2. Agent Design Principles - Core components and Self-Healing patterns
  3. Multi-Agent Orchestration - 6 patterns and framework comparison (LangGraph, AutoGen, CrewAI)
  4. Memory and State Management - MemGPT, A-MEM (Zettelkasten-based)
  5. Production Case Study - DeNA NOC Alert Agent
  6. Cost and Performance Optimization - Semantic caching, batching, SLM utilization

The backbone here is DeNA’s official study material. I’ve layered the latest research and a few production case studies on top of it.

When This Is Useful, and When to Skip It

Agent design and multi-agent setups look impressive, but they aren’t the right tool for every problem. Before you read on, ask whether what you’re building actually needs this structure.

When this content helps:

  • The task can’t be done in one LLM call and needs several stages (research then analysis, analysis then writing).
  • The LLM has to pick and call external tools or APIs on its own (DB lookups, web search, ticket creation).
  • Different steps need different expertise, so splitting into roles feels natural (a researcher, a verifier).
  • The system runs for a long time and has to remember past context (a support bot, a personal assistant).
  • It’s a production pipeline handling hundreds of requests a day where cost and latency need trimming.

When you can skip it:

  • Simple classification, summarization, or translation that finishes in one call. A single function beats an agent here.
  • Deterministic processing with fixed input and output. A regex or rule engine is cheaper, faster, and steadier.
  • The prototype stage, when you just want to confirm behavior. Multi-agent debugging is expensive, so validate with a single call first and split it out only when you have to.
  • A Network (free-form conversation) pattern with no cost ceiling. When conversation length runs away, so does the bill.

In one line: reach for an agent only when “multiple stages plus tool calls plus memory” all show up at once. If even one is missing, suspect a simpler tool first. The cost trap of multi-agent setups is covered in The Cost Reality of AI Agents, and a deeper take on orchestration continues in Improving Multi-Agent Orchestration.

1. LLM Workflows with n8n

What is n8n?

n8n is a no-code/low-code workflow automation platform. As of 2025, it supports 422+ integrations and provides specialized features for building LLM agents.

Key Features:

  • Visual workflow builder
  • LangChain, Ollama, and major LLM framework integrations
  • Native ReAct Agent pattern support
  • Self-hostable (data privacy guaranteed)

ReAct Agent Implementation

Example of implementing the ReAct (Reasoning and Acting) pattern in n8n:

// n8n ReAct Agent workflow example
{
  "nodes": [
    {
      "type": "n8n-nodes-langchain.agent",
      "name": "ReAct Agent",
      "parameters": {
        "agentType": "react",
        "systemMessage": "You are a data analysis expert. Analyze user questions and select appropriate tools to answer.",
        "tools": ["webSearch", "calculator", "database"]
      }
    }
  ]
}

2025 Trend: Orchestration > Full Autonomy

According to DeNA study materials and recent research, the key trend for agent systems in 2025 is a shift from “full autonomy” to “orchestration”.

Reasons:

  1. Cost Explosion: Unlimited API calls from autonomous agents
  2. Unpredictability: Difficulty controlling agent behavior
  3. Reliability Issues: Instability in production environments

Workflow tools like n8n are gaining attention precisely because they provide explicit orchestration.

2. Agent Design Principles

Core Components

LLM agents consist of four core components:

graph LR
    Agent[AI Agent] --> Core[1. Core]
    Agent --> Memory[2. Memory]
    Agent --> Planning[3. Planning]
    Agent --> Tools[4. Tool Use]

    Core --> Prompt[Prompt Template]
    Core --> LLM[LLM Engine]
    Core --> Parser[Output Parser]

    Memory --> ShortTerm[Short-term<br/>Conversation History]
    Memory --> LongTerm[Long-term<br/>Knowledge Base]

    Planning --> ReAct[ReAct]
    Planning --> PlanExecute[Plan-and-Execute]

    Tools --> FunctionCall[Function Calling]
    Tools --> ToolRegistry[Tool Registry]

    style Core fill:#7B2CBF,color:#fff
    style Memory fill:#0066CC,color:#fff
    style Planning fill:#00A896,color:#fff
    style Tools fill:#F77F00,color:#fff

1. Core

The central engine of the agent.

Components:

  • Prompt Template: System message, persona definition
  • LLM Engine: Claude, GPT-4, Gemini, etc.
  • Output Parser: Converts LLM output to structured data

2. Memory

The agent’s memory system.

Short-term Memory:

  • Current conversation session history
  • Typically last N messages (N=5〜10)
  • Directly included in Context Window

Long-term Memory:

  • Persistent knowledge base
  • Vector Database (Pinecone, Weaviate, etc.)
  • Retrieved via RAG pattern when needed

3. Planning

The agent’s strategy for executing complex tasks.

ReAct Pattern:

Thought: User requested company revenue data.
Action: query_db
Action Input: SELECT revenue FROM sales WHERE year=2024
Observation: [Result: $1.5M]
Thought: Need to compare with last year.
Action: query_db
Action Input: SELECT revenue FROM sales WHERE year=2023
Observation: [Result: $1.2M]
Thought: Need to calculate growth rate.
Action: calculate
Action Input: ((1.5 - 1.2) / 1.2) * 100
Observation: 25%
Final Answer: 2024 revenue is $1.5M, up 25% from previous year.

4. Tool Use

Mechanism for LLM to interact with external tools.

Function Calling Reliability Issues

Critical issue highlighted in DeNA study materials: LLM function calling is not 100% reliable.

Problem Cases:

  1. Incorrect Parameters: Missing required fields, type mismatches
  2. Hallucination: Calling non-existent tools
  3. Infinite Loops: Repeatedly calling the same tool

Self-Healing Pattern

Mechanism for agents to automatically recover from errors.

class SelfHealingAgent {
  async execute(task: Task): Promise<Result> {
    const maxRetries = 3;
    let attempt = 0;

    while (attempt < maxRetries) {
      try {
        const result = await this.runTask(task);
        return result;
      } catch (error) {
        attempt++;

        // Analyze error
        const analysis = await this.analyzeError(error);

        // Select recovery strategy
        if (analysis.recoverable) {
          task = await this.adjustTask(task, analysis);
          console.log(`Retry ${attempt}: ${analysis.suggestion}`);
        } else {
          throw new UnrecoverableError(error);
        }
      }
    }

    throw new Error("Max retries exceeded");
  }
}

3. Multi-Agent Orchestration

6 Orchestration Patterns

Patterns for distributed processing of complex tasks across multiple agents.

1. Sequential

Linear structure where one agent’s output becomes the next agent’s input.

Use Cases:

  • Blog post creation: Research → Draft → Edit → Publish
  • Data pipeline: Collect → Clean → Analyze → Visualize

Pros:

  • Simple implementation
  • Easy debugging
  • Predictable costs

2. Parallel

Structure where multiple agents work independently and simultaneously.

Use Cases:

  • Content review: Quality check + Legal review + Fact checking in parallel
  • Multimodal analysis: Text + Image + Audio parallel processing

3. Supervisor

Structure where a central supervisor distributes tasks and integrates results.

Use Cases:

  • Complex research: Supervisor distributes sub-topics to multiple workers
  • Code generation: Supervisor assigns module implementations to workers

4. Hierarchical

Multiple levels of supervisor-worker relationships forming a tree structure.

Use Cases:

  • Large-scale project management: PM → Team Leaders → Developers
  • Complex system design: Architect → Module Designers → Implementers

5. Network

Structure where agents communicate freely in a P2P manner.

Use Cases:

  • Creative collaboration: Idea brainstorming
  • Democratic decision-making: Vote-based consensus

6. Custom

Unique patterns optimized for specific problems.

Framework Comparison: LangGraph vs AutoGen vs CrewAI

Comparison of the three major multi-agent frameworks.

FeatureLangGraphAutoGenCrewAI
Core ConceptGraph-based workflowConversation-based agentsRole-based teams
State ManagementExplicit state graphConversation historyBuilt-in memory
Learning CurveMediumHighLow
Production Readiness⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐

LangGraph

Philosophy: Express everything as graphs

Pros:

  • Clear control flow: All paths explicitly defined in graph
  • Easy debugging: State tracking possible
  • Production stability: Predictable behavior

AutoGen

Philosophy: Problem-solving through agent conversation

Pros:

  • Natural collaboration: Mimics human team conversations
  • Flexibility: Dynamic conversation flow
  • Emergent behavior: Unexpected problem-solving

Cons:

  • Cost explosion risk (unlimited conversations)
  • Unpredictable
  • Difficult debugging

CrewAI

Philosophy: Role-based team composition

Pros:

  • Intuitive: Role concept is easy to understand
  • Rapid prototyping: Implementation with minimal code
  • Built-in memory: Automatic context management

Cons:

  • Insufficient logging (debugging difficulties)
  • Limitations with complex workflows
  • Difficult fine-grained control

Cost Impact of Pattern Selection

Analysis of how pattern selection affects costs in real projects.

Scenario: Blog post generation (Research + Writing + Editing)

PatternAPI CallsExpected CostProcessing Time
Sequential3 calls$0.1590 sec
Parallel3 calls (simultaneous)$0.1530 sec
Supervisor7 calls (supervisor 2 + workers 3 + integration 2)$0.3560 sec
Network (AutoGen)15〜50 calls (conversation)$0.75〜$2.50120〜300 sec

Production Recommendations:

  1. Clear workflow exists → Sequential or Parallel
  2. Dynamic task distribution needed → Supervisor
  3. Creative collaboration needed → Network (but set cost limits!)

4. Memory and State Management

MemGPT Pattern

MemGPT is an innovative approach applying OS virtual memory concepts to LLMs.

Core Idea:

  • Main Context (Main Memory): LLM’s Context Window
  • External Storage: Vector DB, Relational DB
  • Memory Manager: Swap in/out based on importance

Push vs Pull Hybrid

MemGPT combines two memory strategies.

Push (Active):

  • LLM automatically saves information it deems important
  • Example: “This user prefers TypeScript” → Save

Pull (Reactive):

  • Search external storage when needed
  • Example: User says “consider my preferences” → Search

Hierarchical Memory Structure

MemGPT proposes a 3-tier memory hierarchy.

L1: Working Memory (Context Window)
    ├─ Current conversation (5〜10 messages)
    ├─ Active task state
    └─ System prompt

L2: Recent Memory (Short-term storage)
    ├─ Recent sessions (1 week)
    ├─ Frequently referenced information
    └─ Temporary task data

L3: Long-term Memory (Long-term storage)
    ├─ User profile
    ├─ Domain knowledge
    └─ Accumulated learning data

A-MEM (Zettelkasten-based)

A-MEM is an innovative memory system proposed by Rutgers University in 2025. It applies Zettelkasten (German for “note box”) methodology to LLM agents.

What is Zettelkasten?

  • Note organization method developed by Niklas Luhmann (sociologist)
  • Assign unique IDs to each note
  • Build knowledge network through note connections (links)
  • Generate emergent insights

A-MEM Architecture

The core of A-MEM is that agents organize memory themselves.

Implementation Example:

class AMem {
  notes: Map<string, Note>;
  graph: Graph;

  async createNote(content: string, metadata: Metadata): Promise<string> {
    const noteId = generateId();
    const note = new Note(noteId, content, metadata);

    // Auto-tagging
    const tags = await this.extractTags(content);
    note.tags = tags;

    // Calculate similarity with existing notes
    const similar = await this.findSimilarNotes(note);

    // Auto-generate connections (similarity > 0.7)
    for (const [relatedNote, similarity] of similar) {
      if (similarity > 0.7) {
        this.linkNotes(noteId, relatedNote.id, {
          type: "related",
          strength: similarity,
        });
      }
    }

    this.notes.set(noteId, note);
    return noteId;
  }
}

A-MEM Benefits:

  1. Dynamic organization: No manual structuring needed
  2. Relevance-based search: Direct matching + indirect connections
  3. Emergent insights: Discover new connections between notes
  4. Scalability: Efficiency maintained as knowledge grows

5. Production Case Study: DeNA NOC Alert Agent

Real production deployment of NOC (Network Operations Center) Alert Agent at DeNA.

Problem Definition

Background:

  • Operations team receives 100〜200 alerts daily
  • 70% of alerts are false positives
  • Engineers manually classify and respond to alerts

Goals:

  • Automatic alert classification and prioritization
  • False positive filtering
  • Automatic response guide generation

Workflow Design

graph TD
    Alert[Alert Generated] --> Classifier[Classification Agent]

    Classifier --> |Urgent| Escalate[Escalation]
    Classifier --> |Normal| Analyzer[Analysis Agent]
    Classifier --> |False Positive| Archive[Archive]

    Analyzer --> Context[Context Collection<br/>- Logs<br/>- Metrics<br/>- History]

    Context --> RCA[Root Cause Analysis]

    RCA --> Guide[Response Guide Generation]

    Guide --> Ticket[Ticket Creation]
    Escalate --> OnCall[On-call Engineer Alert]

    style Classifier fill:#F77F00,color:#fff
    style Analyzer fill:#0066CC,color:#fff
    style RCA fill:#7B2CBF,color:#fff

Production Deployment Considerations

Problems discovered during actual deployment and their solutions.

1. Hallucination Problem

Issue: LLM mentions non-existent logs or metrics

Solution:

// Tool call result validation
class ToolExecutor {
  async execute(toolName: string, input: any): Promise<any> {
    const tool = this.registry.get(toolName);

    // Input validation
    if (!this.validateInput(tool, input)) {
      return {
        error: "Invalid input",
        suggestion: "Please check the tool documentation",
      };
    }

    // Execute
    const result = await tool.execute(input);

    // Output validation
    if (this.isEmpty(result)) {
      return {
        error: "No data found",
        suggestion: "Try different search parameters",
      };
    }

    return result;
  }
}

2. Latency Problem

Issue: Alert → Response takes 45 seconds on average (Target: 10 seconds)

Solutions:

  • Parallel processing: Collect logs/metrics/history simultaneously
  • Caching: Cache frequently used query results
  • Streaming: Display partial results immediately

3. Cost Problem

Issue: 200 alerts/day × $0.20 = $40/day ($1,200/month)

Solutions:

  • False positive pre-filter: Filter obvious false positives with rules first
  • Batching: Process similar alerts together
  • SLM utilization: Use small models for simple classification

Results

After 6 months of operation:

  • False positive filtering accuracy: 92%
  • Response time reduction: 15 minutes → 3 minutes average
  • Engineer burden reduction: 20 hours saved per week
  • Monthly operating cost: $1,200 → $350 (post-optimization)

6. Cost and Performance Optimization

The biggest challenges for LLM agent systems are cost and latency. Four core optimization techniques:

1. Semantic Caching (90% Cost Reduction)

Concept: Reuse cached responses for semantically similar queries

// Semantic caching implementation
class SemanticCache {
  private cache: Map<string, CacheEntry> = new Map();
  private embeddings: EmbeddingModel;

  async get(query: string): Promise<string | null> {
    // Query embedding
    const queryEmbedding = await this.embeddings.encode(query);

    // Similarity search
    for (const [cachedQuery, entry] of this.cache) {
      const similarity = cosineSimilarity(queryEmbedding, entry.embedding);

      // Cache hit if similarity > 0.95
      if (similarity > 0.95) {
        console.log(`Cache hit! (similarity: ${similarity})`);
        return entry.response;
      }
    }

    return null;
  }
}

Impact:

  • 60% cache hit rate → 60% cost reduction
  • 95% latency reduction (network delay eliminated)

2. Batching (50% Reduction)

Concept: Bundle multiple requests for processing at once

Impact:

  • Batch size of 10 → Approximately 50% cost reduction
  • However, latency slightly increases (wait time)

3. SLM (Small Language Model, 14x Reduction)

Concept: Use smaller models for simple tasks

// Model routing
class ModelRouter {
  private smallModel: SLM; // Llama 3.2 (8B)
  private largeModel: LLM; // Claude Sonnet 4

  async route(task: Task): Promise<Response> {
    const complexity = this.assessComplexity(task);

    if (complexity < 0.3) {
      // Simple task: SLM ($0.001)
      return await this.smallModel.execute(task);
    } else {
      // Complex task: Large Model ($0.014)
      return await this.largeModel.execute(task);
    }
  }
}

Impact:

  • If 70% of tasks can be handled by SLM
  • Cost: 70% × $0.001 + 30% × $0.014 = $0.0049 (average)
  • Large Model only: $0.014
  • Reduction rate: 65% (approximately 14x cheaper with SLM alone)

4. Quantization

Concept: Reduce model weight precision to decrease size and cost

Quantization LevelModel SizeAccuracy LossInference SpeedUse Case
FP16 (Original)16GB0%1xBenchmark baseline
8bit8GB~1%1.5xProduction (accuracy important)
4bit4GB~3%2xLocal execution, experiments
2bit2GB~10%3xPrototypes, demos

Comprehensive Cost Optimization Strategy

Real-world case combining all four techniques.

Before (Pre-optimization):

// Process all requests with Claude Sonnet 4
const response = await claude.generate(query);
// Cost: $0.014 per request
// Latency: 2 seconds

After (Post-optimization):

async function optimizedQuery(query: string): Promise<string> {
  // 1. Semantic caching (60% hit rate)
  const cached = await cache.get(query);
  if (cached) return cached; // Cost: $0, Latency: 50ms

  // 2. Complexity assessment and model routing
  const complexity = assessComplexity(query);

  let response: string;

  if (complexity < 0.3) {
    // 3. SLM usage (70% requests)
    response = await smallModel.generate(query);
    // Cost: $0.001, Latency: 500ms
  } else {
    // 4. Large Model (30% requests)
    response = await largeModel.generate(query);
    // Cost: $0.014, Latency: 2 seconds
  }

  // Cache save
  await cache.set(query, response);

  return response;
}

Cost Calculation:

Cache hit (60%): $0 × 0.6 = $0
Cache miss:
  - SLM (28%): $0.001 × 0.28 = $0.00028
  - Large (12%): $0.014 × 0.12 = $0.00168

Average cost: $0.00196 per request
Reduction rate: 86% (Before $0.014 → After $0.00196)

Key Insights and Reflections

A few things stuck with me after finishing Part 5.

1. “Orchestration” Over “Full Autonomy”

This is the core of the 2025 trend. Rather than letting agents decide everything autonomously, enabling autonomy within clear workflows is more effective in production.

2. Memory Creates True Agent Intelligence

Advanced memory systems like MemGPT and A-MEM transform agents from simple “prompt executors” to “learning and evolving systems”.

3. Choose Multi-Agent Patterns Based on Problem

There is no “one-size-fits-all” among the six patterns. Each has clear pros and cons.

4. Cost Optimization is Essential, Not Optional

The biggest barrier to LLM agent systems in production is cost.

Core Strategies:

  1. Semantic caching - Apply to all systems (60% hit rate alone yields significant benefits)
  2. SLM routing - Simple tasks (70%) with small models
  3. Batching - Bundle tasks that don’t need real-time processing
  4. Pre-filtering - Filter obvious cases with rules

5. Production Agents Need “Self-Healing”

LLMs are probabilistic systems. 100% accurate responses are impossible.

Series Recap

That’s all five parts done. Worth a quick look back at the whole journey.

Insights from the Entire Series

  1. LLMs are tools: Not omnipotent. Proper usage for the problem matters.
  2. Prompts > Fine-tuning: 90% of problems can be solved with prompts.
  3. RAG is essential: For latest information and domain knowledge, RAG is the answer.
  4. Agents need memory: Adding memory to stateless LLMs creates true agents.
  5. Cost optimization from design phase: Optimizing later costs 10x more.

References

DeNA Official Materials

  • DeNA Tech Blog
  • DeNA LLM Study Presentation (Internal materials, 2024)

n8n and Workflows

Multi-Agent Frameworks

Memory Systems

Cost Optimization

Agent Design (Primary Sources)


DeNA LLM Study Series Complete! We hope this series helps your LLM agent development journey.

Frequently Asked Questions

Why is orchestration recommended over fully autonomous agents in 2025?
Fully autonomous agents make unlimited API calls that cause costs to explode, and their behavior is hard to control, which makes them unstable in production. Letting agents act autonomously inside a clear workflow keeps costs predictable and improves reliability and debugging. This is why explicit orchestration tools like n8n and LangGraph are gaining attention.
How should I choose among the multi-agent patterns?
Use Sequential or Parallel when there are clear stages, Supervisor when dynamic task distribution is needed, and Network when creative collaboration matters. Note that Network patterns such as AutoGen have unpredictable costs that vary with conversation length, so you must set a cost limit when using them.
Why do LLM function calls need a validation layer and Self-Healing?
LLM function calls cannot be trusted 100 percent, so problems like wrong parameters, calls to nonexistent tools, and infinite loops occur. That is why a layer that checks tool existence, validates parameter schemas, and limits call frequency is needed, along with a Self-Healing pattern that analyzes errors and retries automatically. These are essential in production.
How much can combining cost-optimization techniques actually save?
In a case combining semantic caching (60 percent hit rate), SLM routing, and complexity assessment, the average cost per request dropped from 0.014 dollars to 0.00196 dollars, about an 86 percent reduction. Latency also fell from 2000ms to 410ms, roughly 79 percent. In the DeNA NOC Alert Agent case, monthly operating cost fell from 1,200 dollars to 350 dollars.

Read in Other Languages

Was this helpful?

Your support helps me create better content. Buy me a coffee.

About the Author

jw

Kim Jangwook

Full-Stack Developer specializing in AI/LLM

Building AI agent systems, LLM applications, and automation solutions with 10+ years of web development experience. Sharing practical insights on Claude Code, MCP, and RAG systems.

Back to Blog