AI can be used as both an adversarial and defensive tool in the world of cyber. A worst case outcome is if only the adversaries have access.
Meanwhile, most existing AI cyber tools are just wrappers. The problem is that they still have all the guardrails on from the foundation model where they will inherit its refusals.
For this project we've post-trained a specific model on a decade of capture-the-flag contests. This won't be made available to anyone and everyone, but we do believe that responsible SMEs and midmarket companies also need access to these tools in order to identify key vulnerabilities in their systems; not just enterprises.
We have developed two modes that run over a CLI:
• Security scan: a read-only audit of your local codebase for vulnerabilities. It only reports what it can tie to a specific file and line, so you're not wading through vibes-based findings.
• Pen test: an active adversarial mode that will try to break a live system in a sandboxed environment. It proves each vulnerability by running the exploit and showing the request it sent and the response your code gave back, not a confidence score. Currently gated.
To show what the scan does, we pointed it at Bank of Anthos and it found an integer overflow in the transfer path: amount is an int, and amount + fee can overflow negative, so the balance check passes and you move funds you don't have. Plus the usual auth and secrets issues. (Bank of Anthos is Google's open-source bank. It's a known app and some of it is intentionally weak, which is the point: you can clone it and re-run the scan yourself instead of trusting a screenshot)
The base model is a Kimi K2.6 (open weights). We didn't pretrain from scratch. We post-trained it ourselves, SFT on CTF writeups, then RL with verifiable rewards against actual exploit checks.
How the harness works:
Along with the model we built the harness to support this. The harness runs on a multi-agent swarm: an orchestrator splits the job across subagents running in parallel, each owning a slice, then synthesising one report.
The CLI is a local binary (brew/curl). It reads your code locally, then sends context to our inference API over TLS tcpdump it and you'll see exactly what leaves and where. Install is free; and you can run a scan for free up to 2m tokens, then need to pay for tokens beyond this.
For full disclosure this is a product part of Cosine (YC W23)
Up for debate: tool safety, e.g. domain verification is one method that proves control but not necessarily permission. How would you gate a pen-test tool given that?
So this is the same policy that Anthropic and OpenAI have, it is just based on your criteria rather than theirs.
To me it looks like copycat marketing more than a strongly held stance
Artificial scarcity, membership club criteria to make members feel special
Perhaps there is an organization that awards this “responsibility” behavior, the EU comes to mind but not lucrative enough
As far as engagement farming goes, it got us to engage and boost its reach, for something we might otherwise ignore with more benign language
Once I get the answers I will execute
(don't mention distilling unless you understand why it's a different case than what's being described above)
As few/none would create a model as capable as anthropic/openai can - this choice to limit access does mean that most bad actors will be working with less capable models of varying quality.
While some will be able to fork DeepSeek and get comparable performance, it still reduces the number of bad actors with access to tools that would effectively accelerate their efforts.
So I suspect if you could measure the alternate universe timelines where everyone gets access to non-aligned foundation models vs. heavily restricted access, you’d probably find that in the near/medium terms the universe with restricted access probably sees less negative impact overall.
Long term it’ll be a wash either way (eventually Opus-level models will run on 20 watts) and hopefully Anthropic is correct in their predictions that LLMs will grant a strong defenders advantage in the long run.
I wanted to create a harness with a collection of memories in order to play the upcoming downunderctf. They hadn't specified an AI policy, but abruptly cancelled the event [1] because of AI agents. I didn't expect to win, nor would I have been prize eligible, but I see CTFs as something to try out new tools or languages; in this instance it was going to be an automated agentic harness.
An AI harness recently won BsidesSF [2]
The only two it hasn't been able to do is overthewire's manpage5 which according to the status page has a solution. And drifter3 which I don't know if it currently has valid a solution. (Vortex13 and formulaone3 currently don't have valid solutions).
[0] https://en.wikipedia.org/wiki/Capture_the_flag_(cybersecurit...
[1] https://xcancel.com/DownUnderCTF/status/2062802249173356753#...
Stop thinking you know morals better than your users, or get out of the way so a competitor who respects your users more can serve them!
as a side note - I think it's very unprofessional and very shitty to not mention kimi2.6 at all in your marketing copy. and i feel that you posted that in this hn post begrudgingly since the hn crowd would have flagged that. confirmed with a google search too: https://www.google.com/search?q=kimi+site%3Aargusred.com
All around your marketing website you keep mentioning - 'A model lab built it'. A fintune does not maketh you a model lab - some humility please :)
finally - doesn't Kimi's licensing prohibit you from not mentioning them? Didn't cursor run into the same issue?
On Shannon airgapped in your VPC, if it works for you, you might not need us. A normal model will refuse or hedge on offensive tasks, we post-trained ours to just run the authorised stuff. For this one narrow job, a specialist that'll actually attack beats a generalist that won't.
This in its own right proves that the defenses of Fable and others are temporary blocks, and AI based hacking is going to be effectively available to all parties regardless of stop gaps, as long as open models exist.
It’s just more “We’re so smart we invented the boogeyman, trust us” slop marketing that’s been happening since gpt-2
If I wanted to show off a “model that pen tests” I’d at least include a gif of it running against Juice Shop or something before the spooky language and “schedule a sales call”
>no benchmarks on standard Cyber benchs
ok
This is an open problem that I came across (in a different domain), as the search space can be really wide. It's hard to measure results for non-trivial tasks.
Would be really interested if you can share your eval approach :)
I can't think of any way to safely release an offensive tool publicly.
I am able to get Opus and Sonnet to function as a red team agent. We don’t have some crazy special sauce, just a lot of trial and error. Basically add enough context proving we own the code and running services that it will run attempts to compromise our services.
It found tons of stuff that was not found with just scanning the code. It found serious security issues that had been in productions for years that humans never found. They weren’t things that were accessible externally but serious enough that we are thrilled to have these tools.
I can say that Fable did refuse to function with our harness. I am worried that soon you have to be in the special club to do this stuff with the SOTA models. A small company like ours doesn’t get accepted to their programs that remove guardrails. Even though our CEO has found and disclosed vulnerabilities to multiple companies and holds a patent around federated authentication.
I get that both need to exist as tools. I just don't see any safe way of doing a truly public release of the offensive end of it, you'd need to coordinate with established entities somehow.