A quick look at Mythos run on Firefox: too much hype?

92 points by leonidasv 18 hours ago|36 comments

•

Eufrat 16 hours ago

Probably worth noting that the new-ish Mozilla CEO, Anthony Enzor-DeMeo, is clearly an AI booster having talked about wanting to make Firefox into a “modern AI browser”. So, I don’t doubt that Anthropic and Mozilla saw an opportunity to make a good bit of copy.

I think this has been pushed too hard, along with general exhaustion at people insisting that AI is eating everything and the moon these claims are getting kind of farcical.

Are LLMs useful to find bugs, maybe? Reading the system card, I guess if you run the source code through the model a 10,000 times, some useful stuff falls out. Is this worth it? I have no idea anymore.

•

Woodi 8 hours ago

> new-ish Mozilla CEO, Anthony Enzor-DeMeo, is clearly an AI booster having talked about wanting to make Firefox into a “modern AI browser”

Ah... That's why they put free vpn into v150 - more human behaviors for training :))

•

SkiFire13 14 hours ago

> I guess if you run the source code through the model a 10,000 times, some useful stuff falls out.

But you might also get a lot of non-useful stuff which you'll need to sort out.

•

MyFirstSass 16 hours ago

Hackernews has also been completely co-opted by boosters.

So much that i don't really visit anymore after 15 years of use.

It's a bizarre situation with billions in marketing and PR, astroturfing and torrents of fake news with streams of comments beneath them with zero skepticism and an almost horrifying worship of these billion dollar companies.

Something completely flipped here at some point, i don't know if it's because YC is also heavily pro these companies, and embedded with them, requiring YC applicants to slop code their way in, then cheering about it.

Either way it's incredibly sad and remind me of the worst casino economy, nft's, crypto, web3 while there's actually an interesting core, regex on steroids with planning aspects, but it's constantly oversold.

I say that as a daily user of Claude Max for over a year.

•

HeWhoLurksLate 15 hours ago

I haven't been able to find any communities with as high of a signal-to-noise ratio and breadth of experiences as HN, especially not public ones that one can stumble their way into without knowing a guy / joining a clique

•

arealaccount 8 hours ago

Heh that’s a very low bar though

•

gertop 5 hours ago

> Heh that’s a very low bar though

This is a low bar: ?

> communities with as high of a signal-to-noise ratio and breadth of experiences as HN, especially not public ones that one can stumble their way into without knowing a guy / joining a clique

If this is such a low bar, then how come there's only HN? Can you name another? 10? 100? Because I can't.

•

red-iron-pine 5 hours ago

HN is a tech incubator's news blog. OpenAI literally got its start as a YCombinator project as part of YC Research.

Sam Altman of OpenAI was the president of YC for years.

Like, what did you expect mon ami?

•

tracker1 4 hours ago

I think the fact that AI is finally at a point where it seems to be more useful that annoying, it's easy to be overly optimistic. I've only been using Claude for a few months (I did try 20x, but fell back to 5x), and it's genuinely been a productivity multiplier. That said, the way I've worked with it is very different than me coding on my own... I spend way more time planning, there's a lot more documentation and testing that is part of the output and even then I still find a lot of issues.

I'm also mournful for those just starting out, that may lean so much on these tools that they may never have true proficiency to be able to spot issues with fitness and quality. I see people running half a dozen or more agents and know there is no way they're doping any kind of meaningful QA/QC on that output.

•

uep 5 hours ago

I've noticed a lot of astroturfing lately. It really bothers me, because it was kind of my last bastion of sanity for online tech discourse. Every forum I've used is now full of marketing and dishonesty by bots, paid shills, and bad actors.

•

PearlRiver 4 hours ago

Funny because that is exactly what Edge calls itself...

•

goalieca 16 hours ago

There was a double fronted marketing push by both organizations. That much is true and this makes me more skeptical of the message and how exactly it was framed.

If we just stick with c/c++ systems, pretty much every big enough project has a backlog of thousands of these things. Either simple like compiler warnings for uninitialized values or fancier tool verified off-by-one write errors that aren’t exploitable in practice. There are many real bad things in there, but they’re hidden in the backlog waiting for someone to triage them all.

Most orgs just look at that backlog and just accept it. It takes a pretty big $$$ investment to solve.

I would like to see someone do a big deep dive in the coming weeks.

•

bestouff 15 hours ago

Globally agreed excepted for the "harmless" bit. Hackers are good these days, and these apparently innocuous bugs can be exploited in creative ways

•

SAI_Peregrinus 6 hours ago

I think one thing we'll see is that "sophisticated" multi-step exploit chains will become the domain of script kiddies. They often already were, malware vendors often pre-packaged software that exploited several vulnerabilities in a row, but I expect that LLMs will make the "Attack Complexity" metric in CVSS even more useless than it already is.

•

licorices 14 hours ago

Feel like LLMs main sue in these situations would be to work through these essentially nothing-burger issues? If they're essentially just time consuming to solve, rather than problematic, they should be fairly trivial for them to hopefully solve reliably enough right? I'm very doubtful on AI for actual issues a lot of times, but in my experience, it rarely finds bigger issues from scratch without a lot of extra context such as some hints towards what and where the issue is, and essentially full context explaining any relevant parts to it. However I do find that it often find minor issues when the context is small and contained, or as mentioned when it knows what the issue is, and the solution is simple.

I'm sure there's already plenty of work towards these things, but do bigger code bases completely shut out AI right now, due to the extreme amount of unsolicited PRs they get from AIs? I'd imagine if they were coordinated and structured properly on these things, they'd be more likely to be seen as an acceptable thing? I'm just spitballing, never worked on any real open source project, especially one where there's thousands if not millions of users and several issues every day, so my view on AI usage in these are mostly just from some instances where they ban all AI PRs and stuff like that because they are often really bad.

•

nazgu1 16 hours ago

Why people publish AI written articles? If I would like to read AI I can just prompt it myself, and when I read something on someone blog I expect that I will read thoughts of this particular human being...

•

Bishonen88 16 hours ago

While the text seems to be at least AI-supported, I think the research is still interesting. Whether that was done mostly by the author or an AI still, does not change much to me at least.

I'd appreciate some sort of disclaimer at the start of each article whether it's AI written/assisted or not. But I guess authors understand that it will diminish the perceived value of their work/part.

•

invalidSyntax 15 hours ago

I agree. Even if it is a little pain to read, it's still an information worth knowing and an actual humans opinion(at least I hope). There's no reason to be skeptical if it isn't a famous news site or something.

•

tracker1 4 hours ago

I have a friend that literally takes topics he wants/needs to know more about... has AI generate a discussion format conversation as a deep dive explanation on multiple PoV, then outputs it through TTS tooling so he can listen to it while traveling.

It's kind of crazy... but I can kind of see the appeal of getting just what you are looking for on something.

That said, actually churning that out for other people to consume feels very wrong... I absolutely hate the slop generated content on YouTube. I want a lot of historical content, so don't always mind it when it's genuine/factual... but when there are obvious content errors, it becomes more annoying than anything.

•

whattheheckheck 2 hours ago

You have no idea the territory or how shallow the llms are going

•

whattheheckheck 2 hours ago

Thats the thing. You didn't prompt it. Someone is saying you should read this. Otherwise you never would have prompted it

•

kevincox 11 hours ago

The author seems to believe that dereferencing a null pointer is safe. DoS attacks aside dereferencing a null pointer in C++ is undefined behavior so you never know what could happen. It could easily result in bypassing seemingly unrelated security checks or any other behaviour. To know it wasn't exploitable you would need to check the compiled output of every compiler and set of flags used to compile Firefox.

•

dwedge 15 hours ago

This article felt really informative at first but sone point it was like reading an LLM getting stuck in a circle

•

dd8601fn 7 hours ago

It certainly promised more in the beginning than it actually delivered.

It all starts to feel like letting those “before the youtube video” ads run too long without skipping.

A hook you know people would like to know the answer to… followed by utter horseshit.

•

schnitzelstoat 16 hours ago

It’s just marketing. Remember when OpenAI said GPT-2 was too dangerous to release?

•

csmantle 15 hours ago

IIRC Mozilla usually categorize internally-found bugs into a few large CVE IDs, grouped by severity, with around ten or so bugs in each. Every advisory gets several CVEs of this kind, for example, <https://www.mozilla.org/en-US/security/advisories/mfsa2026-2...>, <https://www.mozilla.org/en-US/security/advisories/mfsa2026-1...>, <https://www.mozilla.org/en-US/security/advisories/mfsa2026-0...>, etc.

•

bawolff 16 hours ago

One think to keep in mind is that firefix is probably a pretty hard target. Everyone wants to try and hack a web browser. One assumes the low hanging fruit is mostly gone.

I think the fact this is even a conversation is pretty impressive.

•

Bishonen88 16 hours ago

Probably you're right, but given the browser usage-distribution, I reckon most hackers wouldn't care about firefox at this point and solely concentrate on chrome. I reckon firefox users are on average, more tech savvy and given a hack, would be able to help themselves/find out about the hack quicker than the average chrome user.

•

helsinkiandrew 16 hours ago

Whatever the capabilities, there’s always a little hype, or at least the risk won’t be as great as thought:

> Due to our concerns about malicious applications of the technology, we are not releasing the trained model.

That was for GPT-2 https://openai.com/index/better-language-models/

•

imInGoodCompany 15 hours ago

I think a certain level of hype is warranted for a model that can autonomously discover complex 27-year-old 0-days in OpenBSD for $20K[0]. We don't yet know what this does to the balance of attack/defense in OSS security, and we cannot know until the capability is widespread. My most hopeful guess is that it looks heavily in favor of attackers in the first 6-12 months while the oldest 0-days are still waiting to be discovered, before tipping in favor of defenders as the price goes down for Mythos-level models and the practice of using them for vulnerability review becomes widespread.

The absolute best case is at we end up with similar situation to modern cryptography, which is clearly in favor of defenders. One can imagine a world where a defender can run a codebase review for $X compute and patch all the low-hanging fruit, to the point where anything that remains for an attacker would cost $X*100000 (or some other large multiplier) to discover.

[0] https://red.anthropic.com/2026/mythos-preview/

•

1una 15 hours ago

In the same article you linked:

> Due to concerns about large language models being used to generate deceptive, biased, or abusive language at scale, we are only releasing a much smaller version of GPT‑2 along with sampling code .

7 years later, these concerns seem pretty legit.

•

bblb 15 hours ago

Can IDE's be configured so that it won't allow to save the file changes if it contains the usual suspects; buffer overflows and what not. LLM would scan it and deny write operation.

Like the Black formatter for Python code on VSCode, that runs before hitting CTRL+S.

•

mccr8 3 hours ago

You can't just use a linter to fix buffer overflows, or people would have done it already.

•

Ferret7446 12 hours ago

There's a pretty big problem here, which is that all of the security bugs (the serious ones) are embargoed. So going off of public info is not really useful

•

Sweepi 11 hours ago

> Conclusion

"The Firefox 150 data suggests a tool that is genuinely useful for defensive security work, especially at scale, but the public record does not justify the strongest claims people want to make from it. The headline number is impressive, yet it bundles together bugs of very different significance and does not publicly resolve into a clean accounting."

I mean: Obviously. Does not matter how good or bad a product is, the current meta is to over-hype it in order to achieve maximum "news-penetration". Anthropic seems to have sth. "real". However, Since there is no way for outsiders to calculate real metrics like false-positive rate, cost (tokens, Dev hours for setup and review, ...)/ Issue found, ... there is no real way to put any scale on the hype-graph.

•

imiric 15 hours ago

For crying out loud, why are we discussing and paying attention to articles and claims about a product that doesn't even exist yet?!

If this isn't a sign of a bubble, where marketing is more important than the actual product, I don't know what is. This industry has completely lost the plot.