Claude Found 22 CVEs in Firefox — AI Security Audits Arrive

Two Weeks, 6,000 C++ Files, 22 CVEs

On March 6, 2026, Anthropic and Mozilla jointly announced the results of an AI-driven browser security audit. Claude Opus 4.6 analyzed roughly 6,000 files in Firefox’s C++ codebase, submitting 112 unique bug reports — of which 22 were registered as official CVEs.

The severity breakdown:

Severity	Count	Percentage
High	14	63.6%
Moderate	7	31.8%
Low	1	4.5%

The 14 high-severity vulnerabilities represent roughly one-fifth of all high-severity Firefox patches shipped throughout 2025. Every single one has been patched in Firefox 148.

AI Found What Fuzzing Missed

Firefox has been battle-tested for decades through fuzzing, static analysis, and regular security reviews. So how did Claude still uncover new vulnerabilities?

graph TD
    subgraph Traditional Approach
        F[Fuzzing] --> R1[Random Input Generation]
        R1 --> C1[Crash Detection]
        C1 --> B1[Input Boundary Bugs]
    end
    subgraph AI Approach
        AI[Claude] --> R2[Semantic Code Analysis]
        R2 --> C2[Logic Error Detection]
        C2 --> B2[Design-Level Bugs]
    end
    B1 -.->|Partial Overlap| B2

The key difference lies in the nature of what each approach detects:

Fuzzing: Triggers crashes through random inputs. Excels at catching missing input validation and buffer overflows.
AI code analysis: Understands code semantics and context to detect logical errors. Catches logic errors and complex memory vulnerabilities like Use After Free that fuzzing tends to miss.

According to Mozilla’s official announcement, Claude “found many previously unknown bugs despite decades of fuzzing and static analysis.” Notably, one report documented a Use After Free vulnerability discovered just 20 minutes after Claude began exploring the JavaScript engine.

Report Quality — Why Mozilla Trusted the Results

More important than the raw count of “N vulnerabilities found” is the quality of the reports. Three factors enabled Mozilla’s security team to act on Anthropic’s findings quickly:

Minimal test cases: Each bug came with a minimal, reproducible code sample
Detailed proofs of concept (PoC): Concrete exploitation scenarios showing how each vulnerability could be abused
Candidate patches: Self-contained reports that included proposed fixes

This structure allowed Mozilla’s security team to complete verification within hours of receiving a report and begin working on fixes immediately. The biggest bottleneck in security auditing — report triage — was dramatically shortened.

Exploit vs. Detection — Where AI Stands Today

One notable detail: Anthropic separately tested Claude’s exploit development capabilities:

Metric	Result
Test attempts	Hundreds
API cost	$4,000
Successful exploits	2

There is a significant gap between vulnerability detection and exploit development. AI excels at reading code and identifying potential issues but still struggles to write working attack code. This asymmetry favors defenders — it means security teams have a window of opportunity to deploy AI for defense before attackers can weaponize it effectively.

Practical Takeaways for EMs and CTOs

Here is what this case means for engineering leaders.

1. AI Security Audit Adoption Roadmap

graph TD
    P1["Phase 1<br/>Pilot"] --> P2["Phase 2<br/>Automation"]
    P2 --> P3["Phase 3<br/>Integration"]

    P1 --- D1["Select high-risk modules<br/>Run AI analysis<br/>Validate results"]
    P2 --- D2["CI/CD pipeline integration<br/>Per-PR auto scanning<br/>Alerting system setup"]
    P3 --- D3["Security dashboard integration<br/>Trend analysis<br/>Scheduled audit automation"]

Phase 1 (Pilot, 1–2 weeks):

Identify security-sensitive modules in legacy code (authentication, payments, data processing)
Run a one-off audit with an LLM-based code analysis tool
Have your existing security team validate the results to calibrate trust

Phase 2 (Automation, 1–2 months):

Add an AI security scan stage to your CI/CD pipeline
Automatically analyze changed code on every PR
Set up Slack/email alerting workflows

Phase 3 (Integration, quarterly):

Integrate AI audit results into your security dashboard
Analyze vulnerability trends and assign risk scores
Run automated audits across the full codebase every quarter

2. Cost-Effectiveness

Based on Anthropic’s published data, here is a rough comparison:

Factor	Traditional Approach	AI Audit
Timeline	Weeks to months	2 weeks
Staffing	2–3 senior security engineers	AI + 1 validation engineer
Scope	Sample-based	Full codebase (6,000 files)
Report quality	Expert-level	Includes test cases + PoC + patches

AI auditing does not fully replace human experts, of course. The optimal approach is a hybrid model where AI handles first-pass screening and human experts handle validation and prioritization.

3. Organizational Considerations

Code confidentiality: Review your security policies around sending code to external AI APIs. Consider on-premise models or zero-retention API contracts.
False positive management: Of the 112 reports, 22 became actual CVEs (roughly 20%). The rest were lower-severity bugs or false positives. A triage process is essential.
Integration with existing tools: Plan how AI audits will complement your existing AppSec pipeline — SAST (static analysis), DAST (dynamic analysis), and SCA (software composition analysis).
Regulatory compliance: Evaluate how to use AI audit results as evidence within compliance frameworks like SOC 2 and ISO 27001.

The Bigger Picture — The Future of AI AppSec

This is not an isolated event. It is part of a broader shift toward AI-driven security auditing becoming an industry standard:

Google Project Zero is already researching LLM-based vulnerability detection
GitHub Copilot continues to strengthen its security review features
NIST’s AI agent security standards include guidelines for using AI as a security tool

For EMs and CTOs, the real question is no longer “Should we adopt AI security auditing?” but rather “When and in what order should we roll it out?” If AI can find new vulnerabilities in a codebase as thoroughly vetted as Firefox, what might it find in yours?

Key Takeaways

Item	Details
Who	Anthropic (Claude Opus 4.6) x Mozilla
Duration	2 weeks (February 2026)
Scope	6,000 files in Firefox C++ codebase
Results	112 reports → 22 CVEs (14 high-severity)
Key differentiator	Detected logic errors that fuzzing missed
Report quality	Minimal repro code + PoC + candidate patches
Patch status	All patched in Firefox 148

Claude Found 22 CVEs in Firefox — AI Security Audits Arrive

Two Weeks, 6,000 C++ Files, 22 CVEs

AI Found What Fuzzing Missed

Report Quality — Why Mozilla Trusted the Results

Exploit vs. Detection — Where AI Stands Today

Practical Takeaways for EMs and CTOs

1. AI Security Audit Adoption Roadmap

2. Cost-Effectiveness

3. Organizational Considerations

The Bigger Picture — The Future of AI AppSec

Key Takeaways

References

Read in Other Languages

Was this helpful?

About the Author

Kim Jangwook

Two Weeks, 6,000 C++ Files, 22 CVEs

AI Found What Fuzzing Missed

Report Quality — Why Mozilla Trusted the Results

Exploit vs. Detection — Where AI Stands Today

Practical Takeaways for EMs and CTOs

1. AI Security Audit Adoption Roadmap

2. Cost-Effectiveness

3. Organizational Considerations

The Bigger Picture — The Future of AI AppSec

Key Takeaways

References

Read in Other Languages

Was this helpful?

About the Author

Kim Jangwook

Related Articles

Cursor Agent Trace — An Open Standard for Tracking AI-Generated Code

RoguePilot — GitHub Copilot Prompt Injection Vulnerability and AI Coding Tool Security

Anthropic Agent Skills Practical Guide: From Implementation to ROI