Claude Found 22 CVEs in Firefox — AI Security Audits Arrive
Anthropic Claude Opus 4.6 discovered 22 CVEs in Firefox in just two weeks. We break down how AI-driven security audits work and what engineering leaders should do next.
Two Weeks, 6,000 C++ Files, 22 CVEs
On March 6, 2026, Anthropic and Mozilla jointly announced the results of an AI-driven browser security audit. Claude Opus 4.6 analyzed roughly 6,000 files in Firefox’s C++ codebase, submitting 112 unique bug reports — of which 22 were registered as official CVEs.
The severity breakdown:
| Severity | Count | Percentage |
|---|---|---|
| High | 14 | 63.6% |
| Moderate | 7 | 31.8% |
| Low | 1 | 4.5% |
The 14 high-severity vulnerabilities represent roughly one-fifth of all high-severity Firefox patches shipped throughout 2025. Every single one has been patched in Firefox 148.
AI Found What Fuzzing Missed
Firefox has been battle-tested for decades through fuzzing, static analysis, and regular security reviews. So how did Claude still uncover new vulnerabilities?
graph TD
subgraph Traditional Approach
F[Fuzzing] --> R1[Random Input Generation]
R1 --> C1[Crash Detection]
C1 --> B1[Input Boundary Bugs]
end
subgraph AI Approach
AI[Claude] --> R2[Semantic Code Analysis]
R2 --> C2[Logic Error Detection]
C2 --> B2[Design-Level Bugs]
end
B1 -.->|Partial Overlap| B2
The key difference lies in the nature of what each approach detects:
- Fuzzing: Triggers crashes through random inputs. Excels at catching missing input validation and buffer overflows.
- AI code analysis: Understands code semantics and context to detect logical errors. Catches logic errors and complex memory vulnerabilities like Use After Free that fuzzing tends to miss.
According to Mozilla’s official announcement, Claude “found many previously unknown bugs despite decades of fuzzing and static analysis.” Notably, one report documented a Use After Free vulnerability discovered just 20 minutes after Claude began exploring the JavaScript engine.
Report Quality — Why Mozilla Trusted the Results
More important than the raw count of “N vulnerabilities found” is the quality of the reports. Three factors enabled Mozilla’s security team to act on Anthropic’s findings quickly:
- Minimal test cases: Each bug came with a minimal, reproducible code sample
- Detailed proofs of concept (PoC): Concrete exploitation scenarios showing how each vulnerability could be abused
- Candidate patches: Self-contained reports that included proposed fixes
This structure allowed Mozilla’s security team to complete verification within hours of receiving a report and begin working on fixes immediately. The biggest bottleneck in security auditing — report triage — was dramatically shortened.
Exploit vs. Detection — Where AI Stands Today
One notable detail: Anthropic separately tested Claude’s exploit development capabilities:
| Metric | Result |
|---|---|
| Test attempts | Hundreds |
| API cost | $4,000 |
| Successful exploits | 2 |
There is a significant gap between vulnerability detection and exploit development. AI excels at reading code and identifying potential issues but still struggles to write working attack code. This asymmetry favors defenders — it means security teams have a window of opportunity to deploy AI for defense before attackers can weaponize it effectively.
Practical Takeaways for EMs and CTOs
Here is what this case means for engineering leaders.
1. AI Security Audit Adoption Roadmap
graph TD
P1["Phase 1<br/>Pilot"] --> P2["Phase 2<br/>Automation"]
P2 --> P3["Phase 3<br/>Integration"]
P1 --- D1["Select high-risk modules<br/>Run AI analysis<br/>Validate results"]
P2 --- D2["CI/CD pipeline integration<br/>Per-PR auto scanning<br/>Alerting system setup"]
P3 --- D3["Security dashboard integration<br/>Trend analysis<br/>Scheduled audit automation"]
Phase 1 (Pilot, 1–2 weeks):
- Identify security-sensitive modules in legacy code (authentication, payments, data processing)
- Run a one-off audit with an LLM-based code analysis tool
- Have your existing security team validate the results to calibrate trust
Phase 2 (Automation, 1–2 months):
- Add an AI security scan stage to your CI/CD pipeline
- Automatically analyze changed code on every PR
- Set up Slack/email alerting workflows
Phase 3 (Integration, quarterly):
- Integrate AI audit results into your security dashboard
- Analyze vulnerability trends and assign risk scores
- Run automated audits across the full codebase every quarter
2. Cost-Effectiveness
Based on Anthropic’s published data, here is a rough comparison:
| Factor | Traditional Approach | AI Audit |
|---|---|---|
| Timeline | Weeks to months | 2 weeks |
| Staffing | 2–3 senior security engineers | AI + 1 validation engineer |
| Scope | Sample-based | Full codebase (6,000 files) |
| Report quality | Expert-level | Includes test cases + PoC + patches |
AI auditing does not fully replace human experts, of course. The optimal approach is a hybrid model where AI handles first-pass screening and human experts handle validation and prioritization.
3. Organizational Considerations
- Code confidentiality: Review your security policies around sending code to external AI APIs. Consider on-premise models or zero-retention API contracts.
- False positive management: Of the 112 reports, 22 became actual CVEs (roughly 20%). The rest were lower-severity bugs or false positives. A triage process is essential.
- Integration with existing tools: Plan how AI audits will complement your existing AppSec pipeline — SAST (static analysis), DAST (dynamic analysis), and SCA (software composition analysis).
- Regulatory compliance: Evaluate how to use AI audit results as evidence within compliance frameworks like SOC 2 and ISO 27001.
The Bigger Picture — The Future of AI AppSec
This is not an isolated event. It is part of a broader shift toward AI-driven security auditing becoming an industry standard:
- Google Project Zero is already researching LLM-based vulnerability detection
- GitHub Copilot continues to strengthen its security review features
- NIST’s AI agent security standards include guidelines for using AI as a security tool
For EMs and CTOs, the real question is no longer “Should we adopt AI security auditing?” but rather “When and in what order should we roll it out?” If AI can find new vulnerabilities in a codebase as thoroughly vetted as Firefox, what might it find in yours?
Key Takeaways
| Item | Details |
|---|---|
| Who | Anthropic (Claude Opus 4.6) x Mozilla |
| Duration | 2 weeks (February 2026) |
| Scope | 6,000 files in Firefox C++ codebase |
| Results | 112 reports → 22 CVEs (14 high-severity) |
| Key differentiator | Detected logic errors that fuzzing missed |
| Report quality | Minimal repro code + PoC + candidate patches |
| Patch status | All patched in Firefox 148 |
References
Was this helpful?
Your support helps me create better content. Buy me a coffee! ☕