Check out our demo video: https://www.youtube.com/watch?v=291IkUbPrlk.
We started TesterArmy because testing is still far too painful. AI coding tools have made it dramatically faster to write and ship code, but testing is still a bottleneck. Traditional E2E tests are slow to set up and expensive to maintain. Managing auth and test users is painful. Setting up staging environments is painful. Running tests reliably is painful.
We think most teams do not actually want to spend their time writing selectors or maintaining test infrastructure. They just want confidence that their core flows work. With TesterArmy, an engineer can sign up, give an agent our CLI, and let it handle creating tests and running them on schedule or on GitHub.
When something breaks, TesterArmy alerts your team through Slack or Discord.
Over the past few months, we scaled from 0 to 30+ teams using our product every day. We caught bugs in critical flows, including onboarding, checkout, and AI chat. We've got many of our customers migrating from already established competitors to us because of the quality and reliability of our agents.
Here are a few of the recent bugs that our agent found (there were quite a lot of them!):
1) Timezone bug that affected the booking flow in one of our clients' apps, the dashboard was very complex and hard to catch by a human. 2) Regression in agent orchestration that caused a sandboxed environment to be stuck on loading, thanks to TesterArmy, the team was able to resolve it before it hit production. 3) Incorrectly counting the order amount in a complex dashboard flow with checkout, thanks to TesterArmy, the team was able to resolve it before it affected revenue 4) Catching a regression in an AI chat flow that would result in a user not being able to retrieve their data due to broken tool calling.
And many more, mostly related to some incorrect API calls, 404s, unhandled errors, etc.
If this sounds useful, we would love your feedback at https://tester.army. We have a bunch of free test runs for you to try. And don’t worry, we won’t make you do sales calls, and we don’t have long onboarding or annoying setup. Our goal is an it-just-works experience.
If you're looking for an end-to-end testing solution, we'd love to hear your feedback!
To ensure stable results we do a lot of harness engineering, where we inject trajectories of previous tests to ensure the stability and also the split into smaller steps helps to prevent context overload and decision fatigue.
Regarding test case management, our customers have used our CLI to migrate their existing test cases from whatever system they were using before.
So i would say that atm in house testing is easier than external testing for us
It's cool, but I'm not super excited about using some 3rd party SaaS as a critical part of my testing.
I will say though, I don't really want to "write tests in natural language", I want something to crawl my app and figure out what's there and what's currently broken and then write its own regression tests.
We use cypress heavily for our core flows which has a similar ai prompt thing but it’s not quite ad hoc enough for smaller fixes which is where the bottleneck still comes in for us.
fast: gemini-3-flash, falls back to gpt-5.4, 15-min run timeout, max 2 visual calls/step. deep: gpt-5.4, 15-min timeout, max 3 visual calls/step.
Why such a hard timeout, and why not latest models?
On a slight tangent, since we are all here...
Does anyone still believe there is a long-term future in traditional UI/UX?
It feels like a lot of attention is still going into landing pages, dashboards, and CRUD apps, while overlooking a bigger shift where fewer people will actually need to interact with those interfaces directly when the same tools can perform the underlying tasks automatically, without much UI at all.
So the bigger question is does UI/UX evolve into something else, or does a large part of it simply disappear?
I might be a bit too early. Recently I started a project and decided to skip all of that and focus to make it more friendly to AI agents and frankly so far it has been great purely from user experience but also what it delivers.
Basically that.
If the app requires a mouse then it should have UI, if not, unless critical, it can be driven by an agent.
That's my point.
Is there a future where we still have traditional UX? Absolutely.
I don't want to write a whole dissertation on this topic, so I'm just going to mention that we tried to build AI voice assistants for a decade, and while LLMs have basically solved understanding, they have not solved the UX portion.
Frankly, the best experience I've had in a long time. I just ask the agent what I need and it does it. Populating the CRM is even easier.
Personally, again personal take, I cannot see going back to CRUD applications.
I still had Claude make me charts because I wanted to understand what was going on though.
I'm not routinely involved in sales, but if I was I feel like there are some breakdowns I would want to see regularly.
I also want to note that my startup sells an infra product. It doesn't need a UI, but people fucking love charts, so we built them the charts and they are happier.
Pricing question, the usage on the plans seems low considering in the demo you said that you have 25 tests per pr which would mean you get only 10 PRs per month on the hobby plan?
Regarding pricing, the self serve options are currently only for lower usage. We will add more plans further down the line. Currently the most popular one is the startup plan. If you need more usage I’m happy to discuss it on a call!
I've been experimenting with Revyl and it's really nice. I think this agent-driven testing is the future.
Physical devices for AI agents is something we at TestingBot do provide: https://testingbot.com/support/ai/mcp
Would love to hear your feedback after you try it out!
> Traditional E2E tests are slow to set up and expensive to maintain.
Isn't this just using agents to create e2e tests or is there some better new approach I'm missing?
This still leads me to my original question of how though. If you're not using locators are you just passing page contents to the LLM? Or using a multi modal model and say screenshotting? My experience with that has been pretty poor and worse than proper e2e scripts, and is fairly expensive to boot.
Sorry for the insistence haha, just interested because it could be pretty groundbreaking if done well.
Do you handle heterogenous environments and network connectivity simulation as well? I am working on a mobile app and occasionally having users just lose a request or two can put the state machine into unusual modes.
Regarding the other question: not yet. For now, we have Chromium, iOS, and Android (latest versions of each), but we are working on adding more. Regarding network connectivity, it's coming soon (I have an open PR).
Rainforest QA, for example, has been in this space for a while and also happens to be a YC company.
Will this solution work with services protected by cloudflare turnstile or captchas? Does this involve human in a loop?
Also your current pricing is $300 for 1K tests which means $0.3 for each test. We tried out playwright mcp and it easily consumes 1M+ tokens for a test with ~20 steps (including image input). So with this pricing are you guys default alive?
Also is there a benchmark which you ran to prove the efficacy of your testing agent? because in the current stage it is a trust me bro kinda thing.
We currently do not have any benchmarks; much of the experience depends on the test plan. We've been mostly focusing on the customer experience not benchmarking.
First of all, static tests are very brittle: you rely on selectors, need wait times, and can’t really test a lot of dynamic content (think AI chats/interactions). Then it’s all the infrastructure around it: solving captchas, handling auth, handling email OTP (each of our agents has access to its own inbox), spinning up simulators and handling video recording and screenshots.
To ensure stable results we do a lot of harness engineering, where we inject trajectories of previous tests to ensure the stability and also the split into smaller steps helps to prevent context overload and decision fatigue.
Regarding security part, the product can operate solely without any access to the codebase, you can just give us a URL or a mobile app build and we will do the testing.
Always happy to see cool products from Poland! :)