Daily Vibe Casting
Daily Vibe Casting
Episode #451: 04 July 2026
0:00
-25:35

Episode #451: 04 July 2026

Developers debate agent workflows, privacy takedowns, and the widening gap between AI insiders and everyone else

Overview

Today’s thread of conversation had two main currents: developers getting more serious about agent workflows (from better planning tools to giving agents their own machines), and a growing desire to put guardrails around both privacy and online noise. In the background, there’s a familiar tension between what the frontier crowd is doing with new models and what most people think “AI” still means.


The big picture

We’re watching the day-to-day practice of using agents mature. People are treating model choice, planning, testing, and evaluation as first-class parts of the job, not afterthoughts. At the same time, the social layer is getting messier: spammy bot replies push creators to close doors, while privacy tooling tries to give individuals a fighting chance against data brokers.

Vercel Sandbox grows up with FUSE mounts

@vercel_dev shipped FUSE-based filesystem support in Sandbox, which is a big deal if you’ve ever wanted to run normal CLI tools against data that lives somewhere else. Mounting S3 buckets or network filesystems as regular paths means less copying, faster iteration, and a clearer story for sharing state across Sandboxes.

The practical win here is that “remote data” can start to feel local, but it also raises the usual questions around credentials and who can access what when you mount shared storage inside an isolated environment.

Agents that choose cheaper subagents, and save you tokens

@simonw shared a simple Fable habit: tell the model to use its own judgement to pick a lower-power model for routine coding tasks, and only bring the heavy option in when needed. It’s the sort of instruction that sounds obvious after you try it, especially if you’ve watched costs climb because every tiny change was treated like a doctoral thesis.

This also hints at where agent UX is heading: you stop thinking in single-model sessions, and start thinking in a small team with a budget.

Prompting by finding your own blind spots

@trq212’s take on working with Fable is refreshingly honest: the hard part is noticing what you do not know yet. His approach centres on structured steps that tease out missing requirements and fuzzy assumptions before you commit to implementation, then keeps a record of deviations and decisions as you go.

It’s also a neat reminder that “good prompting” is often just good project hygiene, written down.

Plannotator: a UI for reviewing agent plans and diffs

@dillon_mulroy posted a tool he uses every session: Plannotator, a browser interface for annotating agent plans, diffs, and long-running work. The appeal is straightforward, agents generate walls of text and proposed changes, and you need a place to triage, comment, and keep your bearings.

If agent work is going to look like “code review plus orchestration”, tools like this feel inevitable.

Give your agent a whole computer, not just an API

@steipete argued for a dedicated VM so an agent can do real end-to-end testing with full GUI control, including the annoying parts like pop-ups and security warnings. It’s a pragmatic move: plenty of failures only show up when you click the buttons like a person would.

The direction is clear: if you want agents to own a workflow, you have to give them the same surface area that users see.

Skill evals are the hard bit, not the skill itself

@mattpocockuk summed up a pain point many teams quietly share: evaluating “skills” is brutal, especially when you want them to work across different harnesses and agent setups. Making a capability is one thing, proving it works reliably, stays working after model updates, and does not break in edge cases is another.

If you’re wondering why agent progress can feel jumpy, this is part of it. The measurement is still catching up.

Post-training automation, and the prospect of many new ‘minds’

@tszzl pointed to posttraining automation benchmarks and the idea that once models can reliably post-train other models, we could see a rapid expansion in specialised systems. The interesting part is not just performance, it’s the implication that creating a specific kind of “mind” could become a normal craft.

That is a future where the default assistant matters less, and the custom-built ones matter more.

Why writing still feels harder than code for LLMs

@leerob asked the uncomfortable question: are current models simply bad at great creative writing? The thread circles a key difference: code has clean feedback loops (tests, builds, types), while writing often lacks a crisp, agreed definition of “good”.

A useful framing from the replies is to treat models more like editors and critics, and to spend more time “onboarding” them into context and voice, rather than expecting a perfect first draft.

Unbroker: an agent skill for data broker opt-outs

@Teknium shared an optional Hermes Agent skill called Unbroker that tries to find your personal details on data broker sites and submit takedown requests. The conversation quickly goes to the only question that matters: do the removals stick, and what happens when the data reappears?

Tools like this are turning privacy admin into an ongoing process, not a one-off email and wishful thinking.

People are closing replies because bot spam is getting unbearable

@signulll tightened reply permissions after getting worn down by AI reply bots. It’s hard to blame anyone for doing it, but it’s also a real loss: open replies are how you find new people and unexpected insight.

This is what “cheap content generation” looks like in the messy middle of social platforms, where the cost of noise is paid by the person reading and moderating.

Codex vs ChatGPT: why people keep both

@jxnlco asked Codex users what they still use ChatGPT for, and the replies read like a map of tool separation. People want their coding context kept clean, while using ChatGPT for research, longer conversations, mobile access, images, and general questions when they hit limits elsewhere.

It’s a reminder that “AI app choice” is becoming more like choosing a set of instruments than picking a single favourite.

Discussion about this episode

User's avatar

Ready for more?