Both OpenClaw and MSDOS gaining a lot a traction by taking short cuts, ignoring decades of lessons learned and delivering now what might have been ready next year. MSDOS (or the QDOS predecessor) was meant to run on "cheap" microcomputer hardware and appeal to tinkerers. OpenClaw is supposed to appeal to YOLO / FOMO sentiments.
And of course, neither will be able to evolve to their eventual real-world context. But for some time (much longer than intended), that's where it will be.
Similar YOLO attitude to OpenAI's launch of modern LLMs while Google was still worrying about all the legal and safety implications. The free market does not often reward conservative responsible thinking. That's where government regulation comes in.
The author sold his previous software business and I'm pretty sure would never need to work anymore. I doubt "a gig at OpenAI" was high on his wish list when he started on Clawdbot.
Tons of people called for common sense regulation/guardrails years ago and were shouted down as "luddites obstructing progress." It's funny to see this discussion coming back around.
This is related to my observation that for thousands of years, written text has indicated a human author - this is no longer true, and I think this is going to be very difficult for us to wrap our human brains around fully.
> the training process doesn't introduce anything novel
This is not always the case. A compiler, linter, proof checker, tests, etc. can all lower entropy.
True, but it doesn't scale. No amount of YOLO will let anyone else repeat that feat.
If you're not paying for the product, then you're the product.
That's how you end up like Germany still using cash and fax machines for 60+ years.
Many European countries have learnt hard lesson about state protection police agencies.
A lesson that younger generations seem keen to forget and live through by themselves, because our stories aren't real enough.
Of interest, today in Australian media:
Why cash has made an unexpected comeback in Australia: new study - https://theconversation.com/why-cash-has-made-an-unexpected-...
which includes figures that show while only 8% of Australian transactions are cash (by some metric, see article) 33% (a third) of the population fully supports keeping cash on.
In Germany in many places you can only pay with cash.
But the point is, OpenClaw is just the first that lucked and got viral. If not for it, something equivalent would. Much like LangChain in the early LLM days.
You think so? OpenClaw certainly owned the hype cycle for a while. There was a thread on HN last week where someone asked who was actually using it, and the comments were overwhelmingly "tried it, it was janky and I didn't have a good use case for it, so I turned it off." With a handful of people who seemed to have committed to it and had compelling use cases. Obviously anecdotal, but that has been the trend I've seen on conversations around it lately.
Also, the fact that the most starred repo on GitHub in a matter of a few months raises a few questions for me about what is actually driving that hype cycle. Seems hard to believe that is strictly organic.
I was shocked when I saw the guy behind libgdx was also behind pi.dev. Random tech worlds colliding.
But because OpenClaw can just use a web browser like a normal user, you don't need all these APIs and there's no theoretical limitations on the services that can be integrated and automated.
Right now there's a lot of issues/bugs. People have more trust in a deterministic solution like Zapier. But maybe the LLMs and OpenClaw will get there eventually, and if it does, I can see how that's a better solution than a deterministic system.
Which has of course always been the true allure of AI. Do nothing and pretend you did something, when pretending is something you can be bothered to do.
This is why we can’t have nice things
Memory isolation is enforced by the MMU. This is not software.
Maybe you were confused with Linux, which came later, and landed in a soft x32 bed with CPU rings and Page Tables/VirtualMemory. ("Protected Mode", named for that reason...)
That being said, OpenClaw is criminally bad, but as such, fits well in our current AI/LLM ecosystem.
Those arrived with the 386 (286? Don't remember but 386 for sure) and DOS was well alive late into the 386 and even late in the 486 days.
> For UNIX on the same machines, they also had no such protections.
I was already running Linux on my 486 before Windows 95 arrived. Linux and DOS. One had those protections, the other didn't.
With the 386, you could run multiple virtualized Real Mode CPUs, which enabled OS/2 2.x to preemptively multitask DOS sessions. Windows 3.x on a 386 or higher could also multitask DOS sessions, just much less reliably.
Simply put there was no putting said protections in to DOS for a few reasons. Backwards compatibility being a huge one. That and memory in most computers was tiny, getting Linux running on most of them would have been difficult.
win 3.1 had protected mode. but windows 95 which was dos based did not. 98 i think got protected mode? not 100% sure on that. i know in windows 2000 it was still horrible broken...
Sad? That was the best part of DOS. Some of my fondest computing memories was on DOS - warts and all. Being able to live hexedit your drive and memory to cheat in games, the sheer freedom you got from hacking around the OS is incompatible to modern operating systems - even Linux.
You were in full control of your computer and you decided how it could be used, not some mega corporation or compliance agency.
DOS wasn't sad, it was fun.
There's no megacorp stopping you from reading and writing kernel memory. Unless, of course, the computer refuses to run software not signed by the megacorp or some software refuses to run without a digitally signed chain all the way down to the firmware like some game anticheats do.
But that's not really because of things like permission boundaries for processes. You can have those and still be in full control of those boundaries. It may be more convoluted than in a barebones system like DOS, of course.
Problem is, I was just learning and the mac was running System 7. Which, like MS-DOS, lacked memory protection.
So, one backwards test at the end of your loop and you could -- quite easily -- just overwrite system memory with whatever bytes you like.
I must have hard-locked that computer half a dozen times. Power cycle. Wait for it to slowly reboot off the external 20MB SCSI HDD.
Eventually I took to just printing out the code and tracing through it instead of bothering to run it. Once I could get through the code without any obvious mistakes I'd hazard a "real" execution.
To this day, automatic memory management still feels a little luxurious.
But my main takeaway is that from the security standpoint this is a ticking bomb. Even under Docker, for these things to be useful there is no going around giving it credentials and permissions that are stored in your computer where they can be accessed by the agent. So, for the time being, I see Telegram, my computer, the LLM router (OpenRouter) and the LLM server as potential attack/exfiltration surfaces. Add to that uncontrolled skills/agents from unknown origins. And to top it off, don't forget that the agent itself can malfunction and, say, remove all your email inboxes by mistake.
Fascinating technology but lacking maturity. One can clearly see why OpenAI hired Clawdbot's creator. The company that manages to build an enterprise-ready platform around this wins the game.
Hype, mainly buying Hype before their IPO. The project is open source and the thinking behind it is not difficult, if they truly wanted they could have done it a long time ago or even without the guy. It was a pure hype 'acquisition' of a project that become popular for amateur programmers that got into it through vibe-coding and are unaware of the consequences and security exposure they subject themselves at.
This is so clearly the next step from Siri to Alexa to {Openclaw like technology}, that is an interface to technology that loads of people find value in everyday, and loads of people complain doesn’t have enough capabilities.
It's like your actual asssitant. Now, most of this can be done inside ChatGPT/Claude/Codex now. Their only remaining problem for certain agentic things is being able to run those remotely. You can set up Telegram with Claude Code but it's somehow even more complicated than OpenClaw.
Using my Mac or Windows PC, it's very rare that I actually want an app to access files on its own that it didn't create. Like if I write a doc in Word, I'm only going to edit it in Word. I might want to email a copy to someone, but that doesn't mean Mail needs RW access to the original. I might copy a video clip into editing software, but again it doesn't need to touch the original. Programs often need their own dirs for caches, settings, etc, and those don't even need to be read by other programs. It's also annoying how they can write anywhere in ~/ and end up scattering stuff in random places. The iPhone sandboxing system works great for all that, where apps have to explicitly share to others. The Mac file access rules tried to address this but still seem like Swiss cheese while also getting in the way of normal usage, and there's seemingly nothing in Windows or Linux (unless you're going out of your way with jails).
Other APIs besides file are a bigger challenge. That was gated away from the start on iPhones but not on desktop OSes. If they don't find a solution, web and mobile apps are going to keep taking over.
Separately, idk if anyone is only installing from the Mac App Store. You can't even get Chrome that way. It's also just bad, like it bothers you about login, then apps get stuck in half-downloaded state.
But yeah if anyone can pull this off, it'd be Apple, seeing how frequently they'll break apps by requiring new APIs.
I had to look to see what's even in my phone's Files app... looks like WhatsApp saved its received photos in there and I accidentally downloaded a couple of files in Safari. Except I didn't give WhatsApp permission, so idk what the rules are with that.
That alone is a reason to start over with a from-scratch mes desktop OS implementation
Microsoft and Apple both have more than enough money to do it. It would be a ~10 year project. It would not make money which is why it’ll never be done.
Is there something specific that you want that cannot be implemented reasonably on existing systems?
Unless you have a specific compelling benefit that only a rewrite can grant, focused narrow rewrites are the way.
I think this depends on your definition of an app. When I write a text file in vim, I definitely want grep and sed and awk to have access to it. When I edit a source code document with VSCode, I want Python to be able to read and interpret it. To me, the core of computing is documents that are passed along from app to app, gaining value at each step.
It is the worst of both worlds.
I especially love how flatpak has its own version of graphics drivers, which results in my browser flatpak updating to a version that has a mismatch between my system drivers, flatpak, and the browser.
Too many bloody layers of abstraction.
Linux has flatpak
Convenience is a technical advantage. That's why streaming later beat Blu-ray despite a regression in picture quality.
I remember watching something on Betamax, possibly Star Wars but have no recollection of changing tape. My dad was a teacher and had access to a VT player on my birthday. On other occasions he would bring home a BBC Microcomputer. Quite a treat when we couldn't afford to buy our own TV even.
Edit, seems Empire Strikes Back was a single tape - https://ebay.us/m/Ypz8SW
Originally, Betamax increased physical tape length, but cutting tape speed (like how vinyls can play at different RPMs) was a more economical way of cramming more hours onto the same tape by cutting quality.
Both VHS and Betamax went through multiple phases of this. Eventually VHS won by being cheaper.
https://mrbetamax.com/BetaSpeeds.htm
But this wasn't Sony's strength and VHS was consistently cheaper for longer movies.
That conceded a big chunk of early adopters to VHS.
The counterexamples that convinced me were Grok and Steam VR aren't taking significant marketshare from ChatGPT and Oculus despite having better support for adult content.
Originally, I thought this would kill Oculus given that's the most popular use of VR, but nope.
Videotape solved the "can't watch film porn at home" problem. Online payments and delivery solved the "need to risk going to the store to buy porn" problem.
What does VR porn solve over just watching standard porn on your monitor/TV?
And because of the resolution limitations, aren't VR headsets actually worse for watching porn than modern 4K monitors?
Ironically, AI is solving this better than VR did. Many people are in relationships with AI girlfriends. Not many are in relationships with VR waifus.
If your old Lenovo has vPro/Intel AMT - then you already are running Minix.
Remember that DOS is single process, it is not as if you could run a service just like that. You could mess with interrupt handlers, but you had to do the scheduling yourself, and make sure your code is small, because shockingly, 640k may not be enough for everyone.
MS-DOS simply wasn't a good choice for networking, and not just because of the lack of security. In fact, it could turn out to be more secure than modern systems because of the limited attack surface. No crazy framework stacks here, just your code and the network card.
I am not interested in the "claw" workflow, but if I can use it for a safer "code" environment it is a win for me.
When people vibe-code, usually the goal is to do something.
When I hear people using OpenClaw, usually the goal seems to be… using OpenClaw. At a cost of a Mac Mini, safety (deleting emails or so), and security (litelmm attack).
It wasn’t (only) that, though; they also learned, so that, when people could afford to buy computers that were really useful, there were people who could write useful programs, administer them, etc.
Same thing with 3D printers a decade or so ago. What did people use them for? Mostly tinkering with hard- and software for days to finally get them to print some teapot or rabbit they didn’t need or another 3D printer.
This _may_ be similar, with OpenClaw-like setups eventually getting really useful and safe enough for mere mortals.
But yes, the risks are way larger than in those cases.
Also, I think there are safer ways to gain the necessary expertise.
My Dad used PC-Scheme a lot, working on his side projects.
Of course, also games.
- organize follow-up reminders for business calls. Automate a modem-based upload.
- crunch investment options in commodities. Not in an econometric way, but a table listing which analyst said what and which analyst was silent. Automate a modem-based upload.
So, with regard to the article, we can presume the author did claw-like things with DOS. As he aged, now he probably needs to organize many trips to doctors and specialists. Who is doing all that administration for your older folks?
From what I understand, the main appeal isn't the end result, but building that AI personal assistant as a hobby is the appeal.
1. Something like OpenClaw will change the world.
2. OpenClaw is not yet ready.
The heart of OpenClaw (and the promise) is the autonomy. We can already do a lot with the paid harnesses offered by OpenAI and Anthropic, so the secret sauce here is agents doing stuff for us without us having to babysit them or even ask them.
The problem is that OpenClaw does this is an extreme rudimentary way: with "heartbeats." These are basically cron jobs which execute every five minutes. The cron job executes a list of tasks, which in turn execute other tasks. The architecture is extremely inefficient, heavy in LLM compute, and prone to failure. I could enumerate the thousand ways it can and will fail but it's not important. So the autonomy part of the autonomous assistant works very badly. Many people end up with a series of prescriptive cron jobs and mistakenly call that OpenClaw.
Compounding this is memory. It is extremely primitive. Unfortunately even the most advanced RAG solutions out there are poor. LLMs are powerful due to the calculated weights between parametric knowledge. Referring to non-parametric knowledge is incredibly inefficient. The difference between a wheelchair and a rocket ship. This compounds over time. Each time OpenClaw needs to "think" about anything, it preloads a huge amount of "memories" into the query. Everything from your personal details to architecture to the specific task. Something as simple as "what time is it" can chew through tens of thousands of tokens. Now consider what happens over time as the agent learns more and more about you. Does that all get included in every single query? It eventually fails under its own weight.
There is no elegant solution to this. You can "compress" previous knowledge but this is very lossy and the LLMs do a terrible job of intelligently retaining the right stuff. RAG solutions are testing intelligent routing. One method is an agentic memory feedback loop to seek out knowledge which might exist. The problem is this is circular and mathematically impossible. Does the LLM always attempt to search every memory file in the hope that one of the .md files contains something useful? This is hopelessly slow. Does it try to infer based on weekly/monthly summaries? This has proven extremely error-prone.
At this point I think this will be first solved by OpenAI and/or Anthropic. They'll create a clean vectorised memory solution (likely a light LLM which can train itself in the background on a schedule) and a sustainable heartbeat cadence packaged into their existing apps. Anthropic is clearly taking cues from OpenClaw right now. In a couple of years we might have a competent open source agent solution. By then we might also have decent local LLMs to give us some privacy, because sending all my most intimate info to OpenAI doesn't feel great.
Heartbeat cron and naive memory are the right thread to pull. Agree.
The problem is the data/trust boundary. One agent process, one credential store, all channels sharing both. Whenever we scale the memory up, which we all want to do, we scale the disaster radius of every prompt injection with it.
Wirken accounted for this in the first design step. Per-channel process isolation. Handshakes between adapters and the core. Compile-time type constraints so a Discord adapter cannot construct a Telegram session handle. Encrypted credential vault. Hash-chained audit log of every action. All, remaining model-agnostic, so local models and confidential-compute providers are drop-in.
Your memory point is still unsolved at this layer. When memory does get solved, you want the solver running where it cannot leak the wrong credentials to the wrong channel. Otherwise the smarter it gets, the worse the breach.
Agentic coding has all of the same issues and it gets solved much the same way: give LLMs tool calls to file persistent memories by topic, list what topics are available (possibly with multiple levels of subtopics in turn) and retrieve them into the context when relevant. Not too different from what humans do with zettelkasten and the like.
Then about $120 in API credits from Anthropic, xAI, openrouter, openAI, and Gemini ( $25 here and there adds up).
Best bang for my buck has been with xAI grok-4.20-beta
The thread's linked article is about comparing MS-DOS' security, but the comparison works on another level as well: I remember MS-DOS. When the very idea of the home/office computer was new. When regular people learned how to use these computers.
All this pretension that computers are "hard to use", that LLMs are making the impossible possible, it's all ahistoric nonsense. "It would've taken me months!" no, you would've just had to spend a day or two learning the basics of python.
It's a simple idea, but as a self-taught dev I still remember how foreign these tools first felt.
That’s why there isn’t a coherent use story because like glue the answer is whatever the user needs to glue/get done
I have it hooked up to my smart home stuff, like my speaker and smart lights and TV, and I've given it various skills to talk to those things.
I can message it "Play my X playlist" or "Give me the gorillaz song I was listening to yesterday"
I can also message it "Download Titanic to my jellyfin server and queue it up", and it'll go straight to the pirate bay.
It having a browser and the ability to run cli tools, and also understand English well enough to know that "Give me some Beatles" means to use its audio skill, means it's a vastly better alexa
It only costs me like $180 a month in API credits (now that they banned using the max plan), so seems okay still.
I have a hard time imagining how much better Alexa would have to be for me to spend $180/month on it...
OpenClaw is not a CC-only product. You can configure it to use any API endpoint.
Paying $180/month to Anthropic is a personal choice, not a requirement to use OpenClaw.
In other words, assuming no price increase, 7 years of that pricing is $15k. Is there hardware I could buy for $7k or less that would be able to replace those API calls or alternativr subs entirely?
I've personally been trying to determine if I should buy a new GC on my aging desktop(s), since their graphic cards can't really handle LLMs)
But if you don't need frontier coding abilities, there are several nice models that you can run on a video card with 24GB to 32GB of VRAM. (So a 5090 or a used 3090.) Try Gemma4 and Qwen3.5 with 4-bit quantization from Unsloth, and look at models in the 20B to 35B range. You can try before you buy if you drop $20 on OpenRouter. I have a setup like this that I built for $2500 last year, before things got expensive, and it's a nice little "home lab."
If you want to go bigger than this, you're looking at an RTX 6000 card, or a Mac Studio with 128GB to 512GB of RAM. These are outside your budget. Or you could look at a Mac Minis, DGX Spark or Strix Halo. These let you bigger models much slower, mostly.
5090 is pretty expensive (~$4000) to justify it over a $10-50 sub. I guess the nice thing is the api side becomes "included", if I ever want to go that route. But if I have a GHCP $40 sub vs a $4000 GC to match it, just on hardware, pay off is at 8 years. If I add in electricity, pay off is probably never.
Sure, the sub can go up in price, but the value proposition for self-running doesn't seem to make sense - especially if I can't at least match Sonnet on GHCP or something like that.
I hope to self-run some not useless LLMs/Agents at some point, but I think this market needs to stabalize first. I just don't like waiting.
As for models, I'm really genuinely impressed with Gemma4 26B A4B and Qwen3.6 35B A3B right now. Between them, I've seen solid image analysis, good medium-image OCR on very tough images, very good understanding of short stories, good structured data extraction from documents, extremely good language translation, etc. If you wanted to build a custom tool which summarized your inbox/RSS feeds/local news every day, or extracted information from emails and entered it into a database, or automatically captioned images, those tasks are all viable locally. The quality of the results is up dramatically in the last 12 months. At this point, my old personal non-agentic LLM benchmarks are "saturated": All the current leading models score extremely well on literally anything I was asking last year.
It's the true agentic coding workflows where the big models really stand out. And those models are all large enough that the hardware needs to amortized over enough users to run 24 hours/day.
M3 ultra with 80GOu cores and 256GB of ram is $7500 - that’s right at the edge of the budget, but it fits.. if you can get an edu discount through a kid or friend you’re even better off!
Over 5 years, that works out to ~$45k vs ~$10k, and during that duration, it's possible better open models will come available making the GPU better, but it's far more likely that the VC-fueled companies advance quicker (since that's been the trend so far).
In other words, the local economics do not work out well at a personal scale at all unless you're _really_ maxing out the GPU at close to 50% literally 24/7, and you're okay accepting worse results.
As long as proprietary models advance as quickly as they are, I think it makes no sense to try and run em locally. You could buy an H100, and suddenly a new model that's too large to run on it could be the state of the art, and suddenly the resale value plummets and it's useless compared to using this new model via APIs or via buying a new $90k GPU with twice the memory or whatever.
Given the trends of the capitalist US government, which constantly cedes more and more power to the private sector, especially google and apple, I assume we'll end up with a state-run model infrastructure as soon as we replace the government with Google, at which point Gemini simply becomes state infrastructure.
That's not correct. If USPS makes more revenue than their expenses for a year, they can't pay it out as profits to anyone.
It's true that USPS is intended to be self-funded, covering it's costs through postage and services sold, and not tax revunue. That doesn't mean there's profit anywhere.
Pricing in the US postal system is not based on maximizing profit. Ths US postal system is not a for-profit system, at all. It is a delivery system (more or less) that happened to start turning a profit (2006) until PAEA. After that, the next time it made a profit was 2025.
That depends on the country in question :-)
$3,699.00
M4 Max 16c/40c, 128GB of RAM, 1TB SSD.
LM Studio is free and can act as a LLM server or as a chat interface, and it provides GUI management of your models and such. It's a nice easy and cheap setup.
Like, no one bats an eye at all the people paying $100/mo for Hulu + Live TV, or paying $350/mo for virtual pixels in candy crush / pokemon go / whatever, and I'm having at least that much fun in playing with openclaw.
If any of my friends admitted to spending $350/mo on candy crush i'd think that they'd badly need help for a gambling problem.
The things I want to use it for (like gathering weekly reports across a half dozen brokerage and bank accounts) are not things I'd trust it to do.
That means picking up and cleaning the house after 3 kids and a dog. Grocery shopping. Dishes. Laundry. Chores.
Tech crap? Nope.
The only "selection" complaint I regularly have had is the bananas are nearly always very unripe - like several days from being edible. But then I went to the store myself for several weeks and realized they just never have ripe bananas.
In other words, they're doing as well as I could do if I were shopping it myself.
Then you have a shopping list. You can do the shopping digitally now a days, but once it's delivered, now you have to organize it into the pantry existing stock, probably with a way to ensure older items are used first. This might involve separating out certain ingredients into smaller packaging and freezing some for later use.
That is all very manual, and I don't see how digitizing one part greatly simplifies it, especially if the digitization is error prone.
In a high enough income state, the answer is you hire a personal household chef or something like that. That isn't digitizing the problem- that is outsourcing it.
In The Netherlands you can get a live-in au-pair from the Philippines for less than that. She will happily play your Beatles song, download the Titanic movie for you, find your Gorillaz song and even cook and take care of your children.
It's horrible that we have such human exploitation in 2026, but it does put into perspective how much those credits are if you can get a real-life person doing those tasks for less.
Working abroad is a totally reasonable proposition compared to working in the Philippines.
A normal full time employee costs at least 2000€ a month (salary, tax, pension plan, health insurance, etc). If you are paying less than that you are definietly exploiting them.
A lot of people in the Silicon Valley area spend that much ($6/day) on coffee. What they don’t realize is how out of touch they are in thinking makes sense for the rest of the fucking world. $180/mo is about 5% of the median US per capita income. It’s not going to pick your kids up from school, do your taxes, fix your car, or do the dishes. It’s going to download movies and call restaurants and play music. It’s a hobby, high-touch leisure assistant that costs a lot of money.
The economics of these businesses are based way more on hope and hype than rational analysis and planning.
For comparison, a full time "virtual assistant" with fluent English from the Philippines costs upwards of $700/month nowadays.
What a horrible situation.
The number one goal of AI should be to eliminate human exploitation. We want robots mining the minerals we use for our phones, not children. We should strive to free all of humanity from dangerous labour and the need for such jobs to exist.
If Elon Musk wants Optimus robots to help colonize Mars shouldn’t he be trying to create robots that can mine cobalt or similar minerals from dangerous mines and such?
I have some bad news.
And you see nothing wrong with that?
Be judgemental all you want, but I feel like I'm paying for less friction, and also more security since my experiments also showed claude to be the least vulnerable to prompt injection attempts.
Hard to believe unless your are doing something much more complex than the things you listed
(The latter would however not be the case for Titanic, I imagine.)
Not to be a narc or anything, but is OpenClaw liable to just perform illegal acts on your behalf just because it seemed like that's what you meant for it to do?
There's at least a couple of dozen instances right now, somewhere, getting very close to designing boutique chemical weapons.
I think they do it mostly to feel young and edgy.
People do this? Or is it some sort of joke way above my head?
In what bizarre world is it easier to ask a massive LLM to play a playlist rather than ... literally hitting the play key on it?
You could build up a legitimate collection for much less than $180/mo.
This is cheap replacement for ordinary people.
It's going to be big. But probably it's best to wait for Google and Apple to step up their assistants.
OTOH, this isn't an issue for "ordinary people". They go to work, school, children's sports events, etc. If they had an assistant for free, most of them would probably find it difficult to generate enough volume to establish the muscle memory of using them. In my own professional life, this occurred with junior lawyers and legal assistants--the juniors just never found them useful because they didn't need them even though they were available. Even the partners ended up consolidating around sharing a few of them for the same reason.
Down in this thread someone mentions it being an advanced Alexa, which seems apt. Yes, a party novelty but not useful enough to be top of mind in the every day work flow.
The tech has existed for a while but nobody sane wants to be the one who takes responsibility for shipping a version of this thing that's supposed to be actually solid.
Issues I saw with OpenClaw:
- reliability (mostly due to context mgmt), esp. memory, consistency. Probably solvable eventually
- costs, partly solvable with context mgmt, but the way people were using it was "run in the background and do work for me constantly" so it's basically maxing out your Claude sub (or paying hundreds a day), the economics don't work
- you basically had to use Claude to get decent results, hence the costs (this is better now and will improve with time)
- the "my AI agent runs in a sandboxed docker container but I gave it my Gmail password" situation... (The solution is don't do that, lol)
See also simonw's "lethal trifecta":
>private data, untrusted content, and external communication
https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/
The trifecta (prompt injection) is sorta-kinda solved by the latest models from what I understood. (But maybe Pliny the liberator has a different opinion!)
The "gave it my Gmail password" problem has a better answer than "don't do that." Security kicks itself out of the room when it only says no. Reserve the no for the worst days. The rest of the time, ship a better way.
That's why I built the platform to make credential leaks hard. It takes more than a single prompt. The credential vault is encrypted. Typed secret wrappers prevent accidental logging and serialization. Per-channel process isolation means a compromise in one adapter does not hand an attacker live sessions in the others.
"Don't do that" fails even for users trying their hardest. Good engineering makes mistakes hard and the right answer easy. Architecture carries the weight so the user does not have to.
On the trifecta being "sorta-kinda solved" by newer models, no. Model mitigations are a layer, not a substitute. Prompt injection has the shape of a confused-deputy problem and the answer to confused deputies has always been capabilities and isolation, not asking the already confused deputy to try harder.
You want the injection to fail EVEN when the model does not catch it.
The one I see the most is brokers. Agent talks to a thing, thing has credential and does the task for the agent. Or proxies that magically inject tokens.
I think this only works for credentials though?
It doesn't solve the personal information part (e.g. your actual emails), right?
As for security, my solution was: keep it simple and limit blast radius.
Expect it to blow things up, and set things up so it doesn't matter when it happens.
I don't like docker so I just made a Linux user called agent. Agent can blow up all the files in its own homedir, and cannot read mine.
I felt really clever until I realized there's an even better solution: just give it a laptop (or Mac mini, or server, or whatever we're doing this week).
Same result but less pain in my ass. Switching users is annoying (and sharing files, and permission issues...). Also, worrying about which user I'm running stuff as... The thing just shouldn't be on my machine in the first place. It should have its own!
Functionally, its own Linux user or root on a $3 VPS are the same thing. It blows up the VPS, I just reset it.
For keys, I don't do anything fancy. It can leak all my keys. But if anyone steals them, they can exhaust my entire $5 prepaid balance ;) Blast radius limited.
But yeah, needs, tastes and preferences may differ.
The layer that addresses content-level flow is information-flow enforcement above identity. TriOnyx (https://github.com/tri-onyx/tri-onyx) looks at that exact problem: taint and sensitivity tracking, gateway kills on threshold breach.
It complements Wirken. You need identity before you can meaningfully ask what agent A has been exposed to.
On the agent-gets-its-own-machine approach, that is fine as a blast-radius strategy and I have no quarrel with it. It trades isolation between channels for isolation between the agent and the host. If you only have one channel and disposable keys, it works. It stops working as soon as the agent holds something you cannot cheaply rotate, which for most people ends up being their messaging identities.
So I guess that leaves the in-between people who don't care about spending $180 every month but don't have any personal staff yet or even access to concierge services.
OTOH a lower-middle-class Joe like me really does have a lot of mundane social/professional errands, which existing software has handled just fine for decades. I suppose on the margins AI might free up 5 minutes here or there around calendar invites / etc, but at the cost of rolling snake eyes and wasting 30 minutes cleaning up mistakes. Even if it never made mistakes, I just don't see the "personal assistant" use case really taking off. And it's not how people use LLMs recreationally.
Really not trying to say that LLM personal assistants are "useless" for most people. But I don't think they'll be "big," for the same reason that Siri and Alexa were overhyped. It's not from lack of capability; the vision is more ho-hum than tech folks seem to realize.
Existing software is what dumped most of those errands on you in the first place.
If you ignore the risks I don't see why it's hard to see value.
The AI can read all your email, that's useful. It can delete them to free up space after deciding they are useless. It can push to GitHub. The more of your private info and passwords you give it the more useful it becomes.
That's all great, until it isn't.
Putting firewalls in place is probably possible and obviously desirable but is a bit of a hassle and will probably reduce the usefulness to some degree, so people won't. We'll all collectively touch the stove and find out that it is hot.
I built a fastmail CLI tool for my *claw and it can only read mails, that's it. I might give it the ability to archive and label later on, with a separate log of actions so I can undo any operation it did easily.
It's pretty decent at going "hey, there's a sale on $thing at $store", for mails, but that's about it.
You can ask it questions like “what classes does my gym offer between 6-8pm today” and just get a good answer instead of wasting time finding their schedule. You can tell it to check your favorite band’s website everyday to see if they announce any shows in your city. You can tell it to read your emails and automatically add important information to your calendar.
This isn’t the space where I get the most value from AI, but it’s nice to have a hyper connected agent that can quickly take care of more smaller and more personal tasks.
It of course depends heavily on your work, but my work is 50% communication / overseeing, and I simply lose track of everything.
I don’t give it any credentials of any sort, but I run data pipelines on an hourly basis that ingest into the agent’s workspace.
Edit: Yes it’s still there https://support.mozilla.org/en-US/kb/thunderbird-and-junk-sp...
I believe that the shift from "my one computer" to multiple clients (computer + phone + webmail) probably has something to do with it. Even with IMAP sharing state, you still don't have a great way to see and control the filtering, except by moving things in/out of spam folders.
Mostly (but of course, not exclusively), porn for the techies. Receiving a phone notification every time a PR is opened on a project of yours? Exciting or sad, depends on one's outlook on life.
---
IMHO, the biggest problem with OpenClaw and other AI agents is that the use-cases are still being discovered. We have deployed several hundred of these to customers and I think this challenge comes from the fact that AI agents are largely perceived as workflow automation tools so when it comes to business process they are seen as a replacement for more established frameworks.
They can automate but they are not reliable. I think of them as work and process augmentation tools but this is not how most customers think in my experience.
However, here are a several legit use-case that we use internally which I can freely discuss.
There is an experimental single-server dev infrastructure we are working on that is slightly flaky. We deployed a lightweight agent in go (single 6MB binary) that connects to our customer-facing API (we have our own agentic platform) where the real agent is sitting and can be reconfigured. The agent monitors the server for various health issues. These could be anything from stalled VMs, unexpected errors etc. It is firecracker VMs that we use in very particular way and we don't know yet the scope of the system. When such situations are detected the agent automatically corrects the problems. It keeps of log what it did in a reusable space (resource type that we have) under a folder called learnings. We use these files to correct the core issues when we have the type to work on the code.
We have an AI agent called Studio Bot. It exists in Slack. It wakes up multiple times during the day. It analyses our current marketing efforts and if it finds something useful, it creates the graphics and posts to be sent out to several of our social media channels. A member of staff reviews these suggestions. Most of the time they need to follow up with subsequent request to change things and finally push the changes to buffer. I also use the agent to generate branded cover images for linkedin, x and reddit articles in various aspect ratios. It is a very useful tool that produces graphics with our brand colours and aesthetics but it is not perfect.
We have a customer support agent that monitors how well we handle support request in zendesk. It does not automatically engage with customers. What it does is to supervise the backlog of support tickets and chase the team when we fall behind, which happens.
We have quite a few more scattered in various places. Some of them are even public.
In my mind, the trick is to think of AI agents as augmentation tools. In other words, instead of asking how can I take myself out of the equation, the better question is how can I improve the situation. Sometimes just providing more contextually relevant information is more than enough. Sometimes, you need a simple helper that own a certain part of the business.
I hope this helps.
This is why I won't use them for anything externally facing or with high or even moderate damage potential.
Which basically means they don't get used at all.
Idk, it's strange for me to think of it that way. It's tech. If it does something useful, that's cool.
Data protection is always a consideration. I just don't consider a LLM to be a special case or a person, the same way that I don't have strong feelings about "AI" being applied in google search since forever. I don't have special feelings or get embarrassed by the thought of a LLM touching my mails.
Right now for me, agentic coding is great. I have a hard time seeing a future where the benefits that we experience there will not be more broadly shared. Explorations in that direction is how we get there.
I don't know why they don't make an official integration for it. Probably cause they're already out of GPUs lol
https://code.claude.com/docs/en/channels
I haven't used OpenClaw since then ...
If you look around in the business world, there is an absurdly large number of people still doing all sorts of things manually that they probably shouldn't. And its costing them money. Even before AI that was true. But now it's increasingly becoming obvious to these people that there are solutions out there that might work. There's a fair amount of FOMO on that front with more clued in people that have heard of other people allegedly being a bit smarter than them.
From a practical experience point of view, most people probably don't have the hands-on experience to make a good judgment just yet. "I tried Chat GPT once and it hallucinated" doesn't really count as valid experience at this point and many non-technical people are still at that level. There generally are a lot of headless chickens making absurd claims (either way) about what these systems can and cannot do making sweeping statements about how possible or impossible things are.
If you take the time and sit down to automate a few things you'll find that: 1) the tools aren't great right now 2) there are lots of basic plumbing issues that get in the way 3) fixing those plumbing issues is not rocket science and something anyone with basic CLI or scripting skills can solve easily 4) you can actually outsource most of that stuff to coding agents. 5) if you figure some of the basics out, you can actually make OpenClaw or similar systems do things that are valuable. 6) Most people that aren't programmers won't get very far given the current state of tools. 7) this might change rapidly as better tools become available. 8) people generally lack the imagination to see how even basic solutions could work for them with these systems.
I have an OpenClaw up and running for our company. It is doing some basic things that are useful for us. After solving some basic plumbing issues, it's now a lot easier to make it do new things. It's not quite doing everything just yet (lots more plumbing issues to solve) and we have our healthy hesitations about letting it loose on our inboxes. But it's not useless or without value. Every plumbing issue we solve unlocks a few more use cases. There's a bit of a gold rush right now of course. And "picks and shovels" people like myself are probably going to do a brisk business.
You can wait it out or tap into the action now. That's your choice. But try making it an informed choice. And no better experience than the first-hand type.
Similarly, I have been using Hermes Agent also inside a container, and on a VPS with only access to a local directory in the VPS with a dozen active projects on GitHub. I don't give it access to my GitHub credentials, but allow it to work in whatever branch is checked out.
This setup is fabulously productive. I use it about every other day to perform some meaningful task for me. It is inexpensive also. A task might take 20 minutes and cost $0.25 in GLP-5.1 API costs.
So TLDR: out of the box, I use Hermes at least one hour a week and find it to be a wonderful tool.
But I am someone that, for example, dislikes home automation. Know that thing that you ask Alexa to open your curtains? I think that is cringe af.
Maybe there's an overlap with the crowd that likes that.
I remember circa 2015 all my nerdy colleagues were going wild with home automation stuff, and I felt like I wanted to play with it too at first. But then I started to observe that these guys weren't spending less time than me turning on their lights. They were spending way more time than me, in fact, tinkering with their thermostats and curtains. I'm perfectly happy hitting a light switch when I walk in the door.
I can't envision one of these Telegram bots reliably completing tasks for me. Maybe the closest one would be what I've seen in this thread. Downloading torrents and putting them in Jellyfin for me, but really, I don't hate curating my own media collection.
Yep. The IoT home automation stuff is still less performant than much older, wired solutions where whole systems were designed at once in a set-and-forget mode and didn't have weird sync issues or delays. I remember seeing the 'home of the future' exhibit at Epcot like 20+ years ago and these IoT setups are often still a total joke in comparison because of all the protocol issues and fiddling with various interfaces needed.
Just like how the analog wired POTS phone systems were more performant in many ways than pretty much any IP based voice setup.
I simply got tired of messing with stuff that kept breaking in unexpected ways. It wasn't saving time, it was adding a lot of totally unnecessary stress and actually taking time away from me-- for little more than an occasional spark of novelty. Being able to use voice accurately & repeatably for simple task requests is probably the only standout advancement.
My 'nerdy colleagues' and myself can get a lot of enjoyment out of tinkering with this new agentic hotness. However, very few of us I think are really getting something that's actually saving us time in the long run (at least in our personal lives), and it's going to take a while to figure out what's actually realistically reproducible toward that end at a reasonable cost.
IoT really comes into its own space though when you pair it up with something that is a real pain to get to. Think somewhere you have to have a crainlift and a 4 hour drive just to touch the 20 year old computer something is hooked up to. Or basically anywhere that takes hours to get to. The space my company typically targeted was high rise air con companies. Or companies where the customer would service out any sort of PLC work to a 3rd party. At that point the savings of having to roll a guy out there vs looking on a computer has the thing pay for itself in 1-2 trips. Also the ability to show up on site with the correct parts. That alone was a huge savings.
IoT's big issues is you have to beat many things that are already dead simple to do.
EDIT: Giving the keys to an agent for such a trivial work is ... I got your sarcasm I think ^^.
I’m not generally interested in having it read my email or calendar. I have a digital calendar in the kitchen, and I rarely get important email. I do really enjoy being able to control my house by voice in natural language. I had it set all my lights to Easter colors a while back in a single instruction.
I'm writing a Blender plugin. I don't know what I did to offend MacOS but somewhere I changed something or touched a file and now MacOS refuses to open Blender until I go through an arcane ritual of telling the OS it's actually ok to use.
I don't like the lack of control in name of security or that you're having to be come ever-more an expert to actually use things you own. Security needs to be done carefully and intentionally not just blasted everywhere. What is being called security is very much more often control by the creators.
Well, that may be the first time it wasn't a mainframe. Plus this is for the lay-away project of the mid-'80's when IBM-compatible PC's already existed.
What most people don't realize is that before the IBM PC, Wal-Mart was way ahead of all other retailers with its POS, inventory, and logistics systems from all over the US tied into headquarters in Bentonville, Arkansas.
Using telephone modems of course like anybody else, but this is before they had any "superstores", and were still quite small, nor were they in any big cities. Yet.
But they were ready. Actually that was all Sam had ever been planning for since the beginning, but they had become quite successful enough already as the fastest growing chain of mainly rural "country stores".
A typical Wal-Mart was dwarfed by a K-Mart store, and a full-size Sears seemed like a super store by comparison. They had mainframes and POS too but Wal-Mart just took it to the next level. Ran circles around them digitally.
Really did scorch some earth with those kind of advantages when they came to the big cities, but after the PC came out I would say they regressed more toward the mean. The momentum still overwhelmed though.
*Claw is more like windows 98. Everyone knows it is broken, nobody really cares. And you are almost certainly going to be cryptolocked (or worse) because of it. It isn't a matter of if, but when.
And if you remove either access to data or access to internet then you kill a good chunk of usefulness
Assume locally i know a read only agent (running on account A) is reading a specific file from user B. Assume it has access to a secret that user B cannot observe. By prompt injection, you can have the read only agent encode the secret as "read" pattern that user B can decode by looking at file access times.
(You can think of fetch requests and the likes for more involved cases)
So read only, while helpful, does not innately prevent communication with an attacker
"Interrupts", for example, are an old concept that is rarely talked about anymore until you get into low-level programming. At a high level, you don't even think about them, let alone talk about them.
If they're doing the same thing as the interrupts that the article is talking about, they are low-level.
If they aren't, then the comparison is by name only.
I remember Apple introducing sandboxing for Mac apps, extending deadlines because no one was implementing it. AFAIK, many apps still don’t release apps there simply because of how limiting it is.
Ironically, the author suggests to install his software by curl’ing it and piping it straight into sh.
I have similar concerns and somehow think openclaw will not be remembered in 30 years time so the comparison does not land for me
So yeah, perhaps it isn't fooling the author, but it doesn't matter for the other billions of people.
I'm reading that Swedish IT consultant's rant in the voice of Swedish Guy.
I too remember DOS. Data and code finely blended and perfectly mixed in the same universally accessible block of memory. Oh, wait… single context. nwm
By the 90s, when I was working on DOS/Windows software, I was inundated with resumes from engineers who had been laid off from those very same companies. And it wasn't until Windows XP in 2000 when most people moved away from DOS.
I feel like every twenty years or so, kids look at the current computing landscape and say, "That's too complicated! I'm not going to learn all that--I'm just going to invent my own thing." DOS was that in the 80s, the Web in 2000. Maybe OpenClaw is that for the 2020s. AI is certainly going to reinvent everything.
The DEC people were right about DOS: it was barely more than a toy. But we didn't abandon it and go back to the safety of VMS. Instead, we improved it until it had all the security and capabilities we needed.
I don't know if OpenClaw is going to win or not, but if you remember MS-DOS, you wouldn't count it out.
Packages shipping as part of Linux distros are signed. Official Emacs packages (but not installed by the default Emacs install) are all signed too.
I thankfully see some projects released, outside of distros, that are signed by the author's private key. Some of these keys I have saved (and archived) since years.
I've got my own OCI containers automatically verifying signed hashes from known author's past public keys (i.e. I don't necessarily blindly trust a brand new signature key as I trust one I know the author has been using since 10 years).
Adding SHA hashes pinning to "curl into bash" is a first step but it's not sufficient.
Software shipped properly aren't just pinning hashes into shell scripts that are then served from pwned Vercel sites. Because the attacker can "pin" anything he wants on a pwned JavaScript site.
Proper software releases are signed. And they're not "signed" by the 'S' in HTTPS as in "That Vercel-compromised HTTPS site is safe because there's an 'S' in HTTPS".
Is it hard to understand that signing a hash (that you can then PIN) with a private key that's on an airgapped computer is harder to hack than an online server?
We see major hacks nearly daily know. The cluestick is hammering your head, constantly.
When shall the clue eventually hit the curl-basher?
Oh wait, I know, I know: "It's not convenient" and "Buuuuut HTTPS is just as safe as a 10 years old private key that has never left an airgapped computer".
Here, a fucking cluestick for the leftpad'ers:
https://wiki.debian.org/Keysigning
(btw Debian signs the hash of testing release with GPG keys that haven't changed in years and, yes, I do religiously verify them)