Hindsight — Open-Source MCP Memory That Gives AI Agents Learning

Hindsight — Open-Source MCP Memory That Gives AI Agents Learning

Analyzing the architecture, core capabilities, and production deployment strategies of the Hindsight MCP memory system that solves the memory problem for AI agents.

The Memory Problem with AI Agents

Any Engineering Manager who has deployed AI agents to production has likely experienced this at least once. You ask the agent, “Do you remember what we discussed yesterday?” and it just stares back blankly. Once a conversation ends, all context vanishes, and the next session starts from scratch.

There have been many attempts to solve this problem with RAG (Retrieval-Augmented Generation) or simple vector databases, but most stopped at “retrieval” without advancing to “learning.” Simply searching past conversations is fundamentally different from extracting patterns from experience and forming mental models.

Hindsight is an open-source project that tackles this problem head-on. Compatible with MCP (Model Context Protocol), it integrates immediately with major AI tools like Claude, Cursor, and VS Code. It achieved 91.4% on the LongMemEval benchmark, making it the first agent memory system to break the 90% barrier.

Hindsight’s Architecture

Hindsight organizes memory using biomimetic data structures inspired by human cognitive architecture.

graph TD
    subgraph Memory Types
        W["World<br/>Facts about the environment"] ~~~ E["Experiences<br/>Agent interactions"]
    end
    subgraph Processing Layers
        F["Fact Extraction<br/>Extract facts"]
        ER["Entity Resolution<br/>Resolve entities"]
        KG["Knowledge Graph<br/>Build knowledge graph"]
    end
    subgraph Higher Cognition
        MM["Mental Models<br/>Learned understanding"]
    end
    W --> F
    E --> F
    F --> ER
    ER --> KG
    KG --> MM

Memory is divided into three main layers:

  • World: Facts about the environment (“The stove is hot”)
  • Experiences: Records of the agent’s own interactions (“I touched the stove and it was hot”)
  • Mental Models: Learned understanding formed by reflecting on raw memories

The key difference from existing RAG systems is precisely these Mental Models. Rather than simply storing and retrieving data, the system analyzes memories and forms patterns, creating a structure where agents “learn from experience.”

Three Core Operations

Retain — Storing Memories

This is not simple text storage. Retain uses an LLM to automatically extract and normalize facts, temporal information, entities, and relationships from the input content.

from hindsight_client import Hindsight

client = Hindsight(base_url="http://localhost:8888")

# Stored as structured memory, not plain text
client.retain(
    bank_id="project-alpha",
    content="Kim (team lead) completed the auth module refactoring in Sprint 23. "
            "Migrated from session-based to JWT, improving response time by 40%.",
    context="sprint-retrospective",
    timestamp="2026-03-15T10:00:00Z"
)

With this single call, Hindsight internally performs the following:

  1. Entity extraction: “Kim (team lead)”, “Sprint 23”, “auth module”
  2. Relationship mapping: “Kim (team lead) -> completed -> auth module refactoring”
  3. Fact normalization: “session -> JWT migration”, “40% response time improvement”
  4. Temporal indexing: recorded as an event that occurred on 2026-03-15
  5. Vector embedding generation and knowledge graph update

Recall — Retrieving Memories

Recall executes four parallel retrieval strategies simultaneously:

graph TD
    Q["Query"] --> S["Semantic<br/>Vector similarity"]
    Q --> K["Keyword<br/>BM25 matching"]
    Q --> G["Graph<br/>Entity/causal links"]
    Q --> T["Temporal<br/>Time range filter"]
    S --> RRF["Reciprocal Rank<br/>Fusion"]
    K --> RRF
    G --> RRF
    T --> RRF
    RRF --> CE["Cross-Encoder<br/>Reranking"]
    CE --> R["Final Results"]
result = client.recall(
    bank_id="project-alpha",
    query="What are the recent changes related to authentication?",
    max_tokens=4096
)

The results from all four strategies are merged using Reciprocal Rank Fusion, and the final ranking is determined through Cross-Encoder Reranking.

Reflect — Reflection and Learning

Reflect is the core capability that elevates Hindsight from a simple memory system to a “learning system.”

insight = client.reflect(
    bank_id="project-alpha",
    query="Are there recurring patterns in our team's sprint retrospectives?",
)

Reflect comprehensively analyzes stored memories to:

  • Discover recurring patterns
  • Infer causal relationships between multiple memories
  • Automatically update Mental Models

MCP Integration: Getting Started in 5 Minutes

Installation and Execution

export OPENAI_API_KEY=sk-xxx
docker run --rm -it --pull always \
  -p 8888:8888 -p 9999:9999 \
  -e HINDSIGHT_API_LLM_API_KEY=$OPENAI_API_KEY \
  -v $HOME/.hindsight-docker:/home/hindsight/.pg0 \
  ghcr.io/vectorize-io/hindsight:latest
  • Port 8888: API + MCP endpoint
  • Port 9999: Admin UI

MCP Client Configuration

{
  "mcpServers": {
    "hindsight": {
      "type": "http",
      "url": "http://localhost:8888/mcp/my-project/"
    }
  }
}

Supported LLM Providers

ProviderConfig ValueNotes
OpenAIopenaiDefault
AnthropicanthropicClaude
GooglegeminiGemini
GroqgroqFast inference
OllamaollamaLocal
LM StudiolmstudioLocal

Deployment Strategy from an Engineering Manager’s Perspective

Phase 1: Start with a Personal Agent

Use it for 2〜3 weeks and observe the reduction in repetitive questions, context-switching adaptation speed, and mental model quality.

Phase 2: Build Team Shared Memory

graph TD
    subgraph Per-Member Agents
        A["Developer A<br/>Agent"] ~~~ B["Developer B<br/>Agent"] ~~~ C["Developer C<br/>Agent"]
    end
    subgraph Hindsight Server
        P["team-project<br/>bank"]
        S["sprint-retro<br/>bank"]
        D["decisions<br/>bank"]
    end
    A --> P
    B --> P
    C --> P
    A --> S
    B --> S
    C --> S

Separate banks by purpose:

  • team-project: Codebase, architecture decisions, tech stack information
  • sprint-retro: Sprint retrospectives, velocity metrics, recurring issues
  • decisions: ADRs, rationale behind technology choices

Phase 3: Operational Monitoring

Practical Use Case Scenarios

Scenario 1: Accelerating Onboarding

Scenario 2: Automated Sprint Retrospective Analysis

Scenario 3: Technical Decision Tracking

Comparison with Existing Approaches

FeatureSimple Vector DBRAGKnowledge GraphHindsight
StorageEmbeddings onlyDocument chunking + embeddingsEntities + relationshipsFacts + entities + time series + vectors
RetrievalVector similarity onlyVector + keywordGraph traversalQuad-parallel retrieval + reranking
LearningNoneNoneLimitedAutomatic Mental Model formation
Time AwarenessNoneLimitedLimitedNative temporal indexing
Benchmark---LongMemEval 91.4%

Points to Consider

  1. Processing Latency: If you recall immediately after retain, processing may not yet be complete.
  2. LLM Costs: Internal processing requires separate LLM calls.
  3. Data Security: Memories may contain sensitive information.
  4. Mental Model Quality: Automatically generated mental models are not always accurate.

Conclusion

Hindsight is a project that represents meaningful progress in the field of AI agent memory. It is open source under the MIT license, and you can get started in 5 minutes with a single Docker command.

References

Read in Other Languages

Was this helpful?

Your support helps me create better content. Buy me a coffee! ☕

About the Author

JK

Kim Jangwook

Full-Stack Developer specializing in AI/LLM

Building AI agent systems, LLM applications, and automation solutions with 10+ years of web development experience. Sharing practical insights on Claude Code, MCP, and RAG systems.