I think this has been pushed too hard, along with general exhaustion at people insisting that AI is eating everything and the moon these claims are getting kind of farcical.
Are LLMs useful to find bugs, maybe? Reading the system card, I guess if you run the source code through the model a 10,000 times, some useful stuff falls out. Is this worth it? I have no idea anymore.
Ah... That's why they put free vpn into v150 - more human behaviors for training :))
But you might also get a lot of non-useful stuff which you'll need to sort out.
So much that i don't really visit anymore after 15 years of use.
It's a bizarre situation with billions in marketing and PR, astroturfing and torrents of fake news with streams of comments beneath them with zero skepticism and an almost horrifying worship of these billion dollar companies.
Something completely flipped here at some point, i don't know if it's because YC is also heavily pro these companies, and embedded with them, requiring YC applicants to slop code their way in, then cheering about it.
Either way it's incredibly sad and remind me of the worst casino economy, nft's, crypto, web3 while there's actually an interesting core, regex on steroids with planning aspects, but it's constantly oversold.
I say that as a daily user of Claude Max for over a year.
This is a low bar: ?
> communities with as high of a signal-to-noise ratio and breadth of experiences as HN, especially not public ones that one can stumble their way into without knowing a guy / joining a clique
If this is such a low bar, then how come there's only HN? Can you name another? 10? 100? Because I can't.
Sam Altman of OpenAI was the president of YC for years.
Like, what did you expect mon ami?
I'm also mournful for those just starting out, that may lean so much on these tools that they may never have true proficiency to be able to spot issues with fitness and quality. I see people running half a dozen or more agents and know there is no way they're doping any kind of meaningful QA/QC on that output.
If we just stick with c/c++ systems, pretty much every big enough project has a backlog of thousands of these things. Either simple like compiler warnings for uninitialized values or fancier tool verified off-by-one write errors that aren’t exploitable in practice. There are many real bad things in there, but they’re hidden in the backlog waiting for someone to triage them all.
Most orgs just look at that backlog and just accept it. It takes a pretty big $$$ investment to solve.
I would like to see someone do a big deep dive in the coming weeks.
I'm sure there's already plenty of work towards these things, but do bigger code bases completely shut out AI right now, due to the extreme amount of unsolicited PRs they get from AIs? I'd imagine if they were coordinated and structured properly on these things, they'd be more likely to be seen as an acceptable thing? I'm just spitballing, never worked on any real open source project, especially one where there's thousands if not millions of users and several issues every day, so my view on AI usage in these are mostly just from some instances where they ban all AI PRs and stuff like that because they are often really bad.
I'd appreciate some sort of disclaimer at the start of each article whether it's AI written/assisted or not. But I guess authors understand that it will diminish the perceived value of their work/part.
It's kind of crazy... but I can kind of see the appeal of getting just what you are looking for on something.
That said, actually churning that out for other people to consume feels very wrong... I absolutely hate the slop generated content on YouTube. I want a lot of historical content, so don't always mind it when it's genuine/factual... but when there are obvious content errors, it becomes more annoying than anything.
It all starts to feel like letting those “before the youtube video” ads run too long without skipping.
A hook you know people would like to know the answer to… followed by utter horseshit.
I think the fact this is even a conversation is pretty impressive.
> Due to our concerns about malicious applications of the technology, we are not releasing the trained model.
That was for GPT-2 https://openai.com/index/better-language-models/
The absolute best case is at we end up with similar situation to modern cryptography, which is clearly in favor of defenders. One can imagine a world where a defender can run a codebase review for $X compute and patch all the low-hanging fruit, to the point where anything that remains for an attacker would cost $X*100000 (or some other large multiplier) to discover.
> Due to concerns about large language models being used to generate deceptive, biased, or abusive language at scale, we are only releasing a much smaller version of GPT‑2 along with sampling code .
7 years later, these concerns seem pretty legit.
Like the Black formatter for Python code on VSCode, that runs before hitting CTRL+S.
"The Firefox 150 data suggests a tool that is genuinely useful for defensive security work, especially at scale, but the public record does not justify the strongest claims people want to make from it. The headline number is impressive, yet it bundles together bugs of very different significance and does not publicly resolve into a clean accounting."
I mean: Obviously. Does not matter how good or bad a product is, the current meta is to over-hype it in order to achieve maximum "news-penetration". Anthropic seems to have sth. "real". However, Since there is no way for outsiders to calculate real metrics like false-positive rate, cost (tokens, Dev hours for setup and review, ...)/ Issue found, ... there is no real way to put any scale on the hype-graph.
If this isn't a sign of a bubble, where marketing is more important than the actual product, I don't know what is. This industry has completely lost the plot.