Claude Found 22 CVEs in Firefox — AI Security Audits Arrive

Claude Found 22 CVEs in Firefox — AI Security Audits Arrive

Anthropic Claude Opus 4.6 discovered 22 CVEs in Firefox in just two weeks. We break down how AI-driven security audits work and what engineering leaders should do next.

Two Weeks, 6,000 C++ Files, 22 CVEs

On March 6, 2026, Anthropic and Mozilla jointly announced the results of an AI-driven browser security audit. Claude Opus 4.6 analyzed roughly 6,000 files in Firefox’s C++ codebase, submitting 112 unique bug reports — of which 22 were registered as official CVEs.

The severity breakdown:

SeverityCountPercentage
High1463.6%
Moderate731.8%
Low14.5%

The 14 high-severity vulnerabilities represent roughly one-fifth of all high-severity Firefox patches shipped throughout 2025. Every single one has been patched in Firefox 148.

AI Found What Fuzzing Missed

Firefox has been battle-tested for decades through fuzzing, static analysis, and regular security reviews. So how did Claude still uncover new vulnerabilities?

graph TD
    subgraph Traditional Approach
        F[Fuzzing] --> R1[Random Input Generation]
        R1 --> C1[Crash Detection]
        C1 --> B1[Input Boundary Bugs]
    end
    subgraph AI Approach
        AI[Claude] --> R2[Semantic Code Analysis]
        R2 --> C2[Logic Error Detection]
        C2 --> B2[Design-Level Bugs]
    end
    B1 -.->|Partial Overlap| B2

The key difference lies in the nature of what each approach detects:

  • Fuzzing: Triggers crashes through random inputs. Excels at catching missing input validation and buffer overflows.
  • AI code analysis: Understands code semantics and context to detect logical errors. Catches logic errors and complex memory vulnerabilities like Use After Free that fuzzing tends to miss.

According to Mozilla’s official announcement, Claude “found many previously unknown bugs despite decades of fuzzing and static analysis.” Notably, one report documented a Use After Free vulnerability discovered just 20 minutes after Claude began exploring the JavaScript engine.

Report Quality — Why Mozilla Trusted the Results

More important than the raw count of “N vulnerabilities found” is the quality of the reports. Three factors enabled Mozilla’s security team to act on Anthropic’s findings quickly:

  1. Minimal test cases: Each bug came with a minimal, reproducible code sample
  2. Detailed proofs of concept (PoC): Concrete exploitation scenarios showing how each vulnerability could be abused
  3. Candidate patches: Self-contained reports that included proposed fixes

This structure allowed Mozilla’s security team to complete verification within hours of receiving a report and begin working on fixes immediately. The biggest bottleneck in security auditing — report triage — was dramatically shortened.

Exploit vs. Detection — Where AI Stands Today

One notable detail: Anthropic separately tested Claude’s exploit development capabilities:

MetricResult
Test attemptsHundreds
API cost$4,000
Successful exploits2

There is a significant gap between vulnerability detection and exploit development. AI excels at reading code and identifying potential issues but still struggles to write working attack code. This asymmetry favors defenders — it means security teams have a window of opportunity to deploy AI for defense before attackers can weaponize it effectively.

Practical Takeaways for EMs and CTOs

Here is what this case means for engineering leaders.

1. AI Security Audit Adoption Roadmap

graph TD
    P1["Phase 1<br/>Pilot"] --> P2["Phase 2<br/>Automation"]
    P2 --> P3["Phase 3<br/>Integration"]

    P1 --- D1["Select high-risk modules<br/>Run AI analysis<br/>Validate results"]
    P2 --- D2["CI/CD pipeline integration<br/>Per-PR auto scanning<br/>Alerting system setup"]
    P3 --- D3["Security dashboard integration<br/>Trend analysis<br/>Scheduled audit automation"]

Phase 1 (Pilot, 1–2 weeks):

  • Identify security-sensitive modules in legacy code (authentication, payments, data processing)
  • Run a one-off audit with an LLM-based code analysis tool
  • Have your existing security team validate the results to calibrate trust

Phase 2 (Automation, 1–2 months):

  • Add an AI security scan stage to your CI/CD pipeline
  • Automatically analyze changed code on every PR
  • Set up Slack/email alerting workflows

Phase 3 (Integration, quarterly):

  • Integrate AI audit results into your security dashboard
  • Analyze vulnerability trends and assign risk scores
  • Run automated audits across the full codebase every quarter

2. Cost-Effectiveness

Based on Anthropic’s published data, here is a rough comparison:

FactorTraditional ApproachAI Audit
TimelineWeeks to months2 weeks
Staffing2–3 senior security engineersAI + 1 validation engineer
ScopeSample-basedFull codebase (6,000 files)
Report qualityExpert-levelIncludes test cases + PoC + patches

AI auditing does not fully replace human experts, of course. The optimal approach is a hybrid model where AI handles first-pass screening and human experts handle validation and prioritization.

3. Organizational Considerations

  • Code confidentiality: Review your security policies around sending code to external AI APIs. Consider on-premise models or zero-retention API contracts.
  • False positive management: Of the 112 reports, 22 became actual CVEs (roughly 20%). The rest were lower-severity bugs or false positives. A triage process is essential.
  • Integration with existing tools: Plan how AI audits will complement your existing AppSec pipeline — SAST (static analysis), DAST (dynamic analysis), and SCA (software composition analysis).
  • Regulatory compliance: Evaluate how to use AI audit results as evidence within compliance frameworks like SOC 2 and ISO 27001.

The Bigger Picture — The Future of AI AppSec

This is not an isolated event. It is part of a broader shift toward AI-driven security auditing becoming an industry standard:

  • Google Project Zero is already researching LLM-based vulnerability detection
  • GitHub Copilot continues to strengthen its security review features
  • NIST’s AI agent security standards include guidelines for using AI as a security tool

For EMs and CTOs, the real question is no longer “Should we adopt AI security auditing?” but rather “When and in what order should we roll it out?” If AI can find new vulnerabilities in a codebase as thoroughly vetted as Firefox, what might it find in yours?

Key Takeaways

ItemDetails
WhoAnthropic (Claude Opus 4.6) x Mozilla
Duration2 weeks (February 2026)
Scope6,000 files in Firefox C++ codebase
Results112 reports → 22 CVEs (14 high-severity)
Key differentiatorDetected logic errors that fuzzing missed
Report qualityMinimal repro code + PoC + candidate patches
Patch statusAll patched in Firefox 148

References

Read in Other Languages

Was this helpful?

Your support helps me create better content. Buy me a coffee! ☕

About the Author

JK

Kim Jangwook

Full-Stack Developer specializing in AI/LLM

Building AI agent systems, LLM applications, and automation solutions with 10+ years of web development experience. Sharing practical insights on Claude Code, MCP, and RAG systems.