What AI Security Research Looks Like When It Works

Author

Stanislav Fort

Date Published

AISLE Founders

This week, Anthropic announced that Claude Opus 4.6 helped surface more than 500 vulnerabilities in open source software. Their blog post showcases three examples in GhostScript, OpenSC, and a GIF library, and notes that reporting and patching have begun, though it shares little detail on severity breakdown, target selection, or maintainer response. The news generated excitement from the AI community, skepticism from security engineers, and genuine concern from open source maintainers who are already stretched thin.

All three reactions are warranted. Having spent the past year building and operating an AI system that discovers, validates, and patches zero-day vulnerabilities in some of the most critical and well-audited codebases on the planet, I want to offer a practitioner's perspective on what this work actually entails and what the real challenges are. The findings described in this post all predate this week's announcement, and are therefore an independent sample of what AI security can look like.

The results

At AISLE, we've been testing our AI system against the most secure software projects out there as live targets since late 2025. We did not focus on retrospective benchmarks, toy tasks, or CTF challenges, but on production code that the world critically depends on. We chose this path because no synthetic benchmark faithfully captures the difficulty of earning a real CVE from a well-secured project like OpenSSL, where maintainers are conservative, have limited time, and have every reason to reject every finding that is not absolutely clear cut.

Here's where things stand.

In the latest OpenSSL security release on January 27, 2026, twelve new zero-day vulnerabilities (meaning unknown to the maintainers at time of disclosure) were announced. Our AI system is responsible for the original discovery of all twelve, each found and responsibly disclosed to the OpenSSL team during the fall and winter of 2025. Of those, 10 were assigned CVE-2025 identifiers and 2 received CVE-2026 identifiers. Adding the 10 to the three we already found in the Fall 2025 release, AISLE is credited for surfacing 13 of 14 OpenSSL CVEs assigned in 2025, and 15 total across both releases. This is a historically unusual concentration for any single research team, let alone an AI-driven one.

These weren't trivial findings either. They included CVE-2025-15467, a stack buffer overflow in CMS message parsing that's potentially remotely exploitable without valid key material, and exploits for which have been quickly developed online. OpenSSL rated it HIGH severity; NIST's CVSS v3 score is 9.8 out of 10 (CRITICAL, an extremely rare severity rating for such projects). Three of the bugs had been present since 1998-2000, for over a quarter century having been missed by intense machine and human effort alike. One predated OpenSSL itself, inherited from Eric Young's original SSLeay implementation in the 1990s. All of this in a codebase that has been fuzzed for millions of CPU-hours and audited extensively for over two decades by teams including Google's.

In five of the twelve cases, our AI system directly proposed the patches that were accepted into the official release.

The OpenSSL maintainers' response matters to us more than the plain numbers. The OpenSSL CTO publicly said: "This release is fixing 12 security issues, all disclosed to us by AISLE. We appreciate the high quality of the reports and their constructive collaboration with us throughout the remediation." Matt Caswell, Executive Director of the OpenSSL Foundation, added: "We appreciate AISLE's responsible disclosures and the quality of their engagement across these issues."

Daniel Stenberg, curl's creator and lead maintainer, reacted to the release on LinkedIn: "I'm a little amazed by the amount of CVEs released by OpenSSL today. 12(!) of them were reported by people at Aisle... I mean if you are curious what AI can do for Open Source security when used for good."

The curl story is very telling on its own. Stenberg himself recently shut down curl's long-running bug bounty program. The stated reason: a flood of low-quality AI-generated submissions that were unsustainable for his highly capable yet small security team to process. In his words, AI-generated "slop" killed the program.

Over the same stretch of 2025, our system (operating under the pseudonym "Giant Anteater" on HackerOne and later in direct correspondence with Daniel) discovered and was awarded 5 CVEs in curl, including 3 of the 6 disclosed in the curl 8.18.0 release. Daniel has credited AISLE and similar high-quality AI-driven tools with "several hundred" bug fixes in his year-end review.

Beyond OpenSSL and curl, over the course of the second half of 2025 and the very first days of 2026 we discovered and were assigned over 100 externally validated CVEs across more than 30 projects, including the Linux kernel, glibc, Chromium, Firefox, WebKit, Apache HTTPd, GnuTLS, OpenVPN, Samba, NASA's CryptoLib, and others. This is in addition to several hundred similar zero-day discoveries in projects that are not assigning CVEs. Some of our findings affect billions of devices across the browser and mobile ecosystem. Every single one was validated and accepted by the respective project's security team and maintainers.

What actually matters

There's a temptation in this space to lead with big numbers. Five hundred vulnerabilities sounds impressive. But the number that actually matters is how many of those findings made the software more secure.

That distinction requires asking harder questions. Were the targets critical infrastructure, or low-hanging fruit? Were the findings externally validated by independent maintainers, or self-reported? Did patches land in official releases, or just get filed as reports? Is there an ongoing relationship with the projects, or a one-shot disclosure dump?

Does the system handle the full loop (discovery, triage, validation, patch generation, patch verification), or are humans still doing the hard parts?

These questions matter because the failure mode of AI-driven security research isn't "AI can't find bugs", although it is still an extremely difficult feat to do well. The capability is now there at the frontier. The failure mode is drowning maintainers in noise, generating findings that look plausible but waste human time, or declaring victory based on volume while the actual security posture of the software doesn't improve.

Daniel Stenberg put it well in his FOSDEM 2026 main-track talk to hundreds of key open-source maintainers when he distinguished between the "slop" that killed his bug bounty and the high-quality AI-driven work that his project has benefited from. He described AI-powered analyzers finding things "in ways no other tools previously could find," in what "sometimes feels like magic." The difference wasn't just the use of AI but the security expertise and intent behind it.

AI is simultaneously collapsing the median ("slop") and raising the ceiling (real zero-days in critical infrastructure). Mass adoption of these tools is flooding maintainers with low-quality noise. At the same time, carefully built systems can find real vulnerabilities that decades of human review and automated fuzzing missed. Both things are true, and pretending otherwise is unhelpful. We are witnessing this great bifurcation in real time.

From finding bugs to preventing them

Retroactive vulnerability discovery, scanning existing codebases for known patterns of flaws, is valuable but ultimately backward-looking. The harder and more impactful goal is catching vulnerabilities before they ever ship.

This has been AISLE's vision from the beginning. Our security Analyzer now runs on OpenSSL and curl pull requests, reviewing new code as it's proposed. Our Analyzer is directly integrated into the development workflow, flagging potential security issues in real time as maintainers review contributions.

In a recent OpenSSL PR adding a new AEAD cipher implementation, the OpenSSL CTO himself invoked our analyzer for review. It identified six potential issues, including an out-of-bounds write from a missing buffer size validation and an inconsistent build configuration guard. One OpenSSL team member responded: "I'm impressed by this catch!" Another noted that it revealed a pattern worth extending to all AEAD schemes in OpenSSL. On curl, Daniel Stenberg now routinely invokes our research bot on his own pull requests, asking "@aisle-analyzer thoughts?"

OpenSSL now lists AISLE as an in-kind supporter, alongside IBM. That recognition matters more to us than any CVE count.

Throughout 2025, we already caught several vulnerabilities in OpenSSL's development branches before they reached any release, including a double-free in the OCSP implementation and a use-after-free in RSA OAEP label handling. That's the outcome we're ultimately working toward: preventing vulnerabilities before they ship rather than patching them after deployment.

The real challenges ahead

I don't want to paint an entirely rosy picture. The acceleration of AI-driven vulnerability discovery creates genuine problems that the ecosystem isn't yet equipped to handle.

The most immediate is the maintainer burden. Even high-quality findings create extra work. Someone has to review the report, verify the issue, develop or review the patch, coordinate disclosure, and ship the release. If discovery scales dramatically while the number of people who can do that downstream work stays flat, the result isn't necessarily better security because the onslaught can lead to burnout. This could manifest in a cybersecurity version of Baumol's cost disease.

Industry-standard 90-day disclosure windows were designed for a world where vulnerability discovery happened at human speed. If AI systems materially increase the rate of valid findings (and the evidence suggests they already are), those norms will need to evolve. The ecosystem needs better mechanisms for de-duplication, coordinated disclosure at volume, and AI-assisted patch development. That last part is a major component of what we're building at AISLE.

There's also an offense-defense question that's genuinely hard to answer. The capabilities that find vulnerabilities for defenders are, in principle, the same capabilities that could find them for attackers. I believe this ultimately advantages defense. The hard part was always finding what to fix, and remediation scales more easily once you know what's broken. But I hold that belief with appropriate uncertainty, and the question deserves continued scrutiny.

What I'm confident about is that the trajectory is clear. AI can now find real security vulnerabilities in the most hardened, well-audited codebases on the planet. The capabilities exist, they work, and they're improving rapidly. The question is no longer whether this will happen, but whether the ecosystem can adapt quickly enough to absorb the results.

What we're building toward

Our goal at AISLE is to make critical infrastructure genuinely more secure, particularly foundational libraries like OpenSSL that the rest of the software ecosystem inherits from.

That means deepening our work on the hardest targets, expanding proactive review to more projects, and continuing to contribute actual patches alongside our discovery reports. It also means doing all of this in a way that earns and keeps the trust of the maintainers who are ultimately responsible for the software the world runs on.

The era of AI-driven cybersecurity is here. We don't yet know its full shape, but the trajectory is clear, and we intend to build it right. If you maintain or depend on critical software infrastructure and want help, reach out.


Stanislav Fort is Founder and Chief Scientist at AISLE. For a detailed technical account of the OpenSSL findings, see AISLE Discovered 12 out of 12 OpenSSL Vulnerabilities. For discussion of the broader implications, see the original post on LessWrong. For coverage of the wider landscape, see Socket.dev's The Next Open Source Security Race.