Eg. Ask the agent to write a skill then get it to prompt a subagent to use the skill, then iterate until it verifies the task was completed correctly
https://github.com/bjcoombs/ai-native-toolkit/blob/main/skil...
It hardens a skill through judge-panel refinement rounds, it’s a quality gate that runs after authoring, not an authoring tool.
"You must use tool ABC before calling tool XYZ"
This can either be in some static prompt scheme somewhere, or it can be the live result of a tool call.
If you make everything tool calling and environmental, you effectively have a lazily evaluated & dynamic prompt scheme.
I like to think of this as context for the context. The better you map the environment and descriptions of it to the agent, the less top-down prompting is required.
If you set up the harness correctly, you can run circles around a lot of what passes as AI innovation with powershell in a while loop. Adding static markdown document soup on top of this would only reduce performance in the general case.
Nope. Still the same.
Thank you! That made it clear to me why it's an useful caching technique.
So in essence, the ideal skill imo is pretty much a list of shell commands with a sentence next to each of when to use them
With these, I personally have skills for:
- dealing with our metrics and tracing platform
- dealing with jira
- dealing with confluence (mostly finding info I need via different search strategies without using too many tokens)
- dealing with database
- doing reviews (this one is more prompting about what info I need to review well myself, rather than commands, though it does instruct the agent to download the branch into a new worktree and clean it up after its done with specific commands)
Im generally suspicious of people with hundreds of skills, especially those I open and find ai generated writing inside. skills should be a list of commands, maybe with some pitfalls for the agent to avoid, added only by human experience (agents are terrible at prompting)
Tldr: you're doing it wrong but I will not show you how to do it right. I also did not run the bench using my approach but it definitely “vibes better” to me, and I reject your actual research paper.
Come on, show us some actual skills.
That one you use all the time looks a hell of a lot like “I wont a deterministic shell script for something a skill saying ‘run the shell script’”
Is that what you do? How much time do you spend on them? How do you stop the agent from making a bunch of very similar skills? How do you deal with the explosion of the total number of skills impacting your token use? Do you use skills from github, or is that bad practice? Why?
So many unanswered questions; so little content. :/
Letting an instruction following llm deep research and iterate has given fantastic results before.
Being able to construct non-trivial Zig 0.16 programs without slowing down for version-hallucinating compilation errors is nice as a random example.
Not sure if this take is correct though. I suspect self-generated skills help the agent avoid having to "decompress" its latent knowledge, which might save tokens? idk, I am not an expert
Yet I’ve seen people succeed with „write me a prompt“ prompts. The model makes something up, often it makes sense.
They are like plans in that way: It’s not exactly novel knowledge, but it at least encodes it somewhere to make the process verifiable beforehand and a bit more repeatable.
I wouldn’t be surprised if it improves performance a little, just like thinking blocks do (every model reasons now).