6 points by george_ciobanu 14 hours ago|2 comments
If you had to build a context window manager in 24h, would you stick to the existing model or come up with something better?

Here's what I did:

1. Built a proxy that intercepts Codex's calls to OpenAI and rewrites them on the fly.

2. Replayed 3,807 rounds of SWE-bench Verified traces through it: avg prompt 44k → 6k tokens (-87%).

3. Posted it to HN to get the next reduction applied to my confidence interval — starting with the inevitable "How about accuracy?"

npx -y pando-proxy · github.com/human-software-us/pando-proxy

camelliaPTM 14 hours ago
The prompt went from 44k to 6k tokens, but you're making two extra model calls per round to get there (chunker + working_memory_update). What does the all-in cost comparison actually look like?
george_ciobanu 14 hours ago
the proxy uses a cheap, small model (like gpt-5.4-mini by default) behind the scenes to save tokens on the expensive main model.

Because the proxy has a little bit of overhead per turn, the break-even point depends entirely on session length.

Short sessions (e.g., 2 rounds): The proxy's overhead might actually cost you more than you save.

Long sessions (e.g., 69 to 190 rounds): The token savings on the main model are massive and completely dwarf the small model's overhead.

It's not a universal win for quick, one-off queries, but the math becomes highly favorable on long, complex debugging sessions.