Episode #451: 04 July 2026

Daily Vibe Casting

0:00

-25:35

Episode #451: 04 July 2026

Developers debate agent workflows, privacy takedowns, and the widening gap between AI insiders and everyone else

Daily Vibe Casting

Jul 04, 2026

Overview

Today’s thread of conversation had two main currents: developers getting more serious about agent workflows (from better planning tools to giving agents their own machines), and a growing desire to put guardrails around both privacy and online noise. In the background, there’s a familiar tension between what the frontier crowd is doing with new models and what most people think “AI” still means.

The big picture

We’re watching the day-to-day practice of using agents mature. People are treating model choice, planning, testing, and evaluation as first-class parts of the job, not afterthoughts. At the same time, the social layer is getting messier: spammy bot replies push creators to close doors, while privacy tooling tries to give individuals a fighting chance against data brokers.

Vercel Sandbox grows up with FUSE mounts

@vercel_dev shipped FUSE-based filesystem support in Sandbox, which is a big deal if you’ve ever wanted to run normal CLI tools against data that lives somewhere else. Mounting S3 buckets or network filesystems as regular paths means less copying, faster iteration, and a clearer story for sharing state across Sandboxes.

The practical win here is that “remote data” can start to feel local, but it also raises the usual questions around credentials and who can access what when you mount shared storage inside an isolated environment.

Vercel Developers@vercel_dev

Sandbox now supports FUSE-based filesystems. ▪︎ Mount S3 buckets and network filesystems ▪︎ Run CLI tools against remote sources ▪︎ Share state across Sandboxes

vercel.com

Vercel Sandbox now supports FUSE-based filesystems - Vercel

5:57 PM · Jul 3, 2026 · 120K Views

12 Replies · 11 Reposts · 310 Likes

Agents that choose cheaper subagents, and save you tokens

@simonw shared a simple Fable habit: tell the model to use its own judgement to pick a lower-power model for routine coding tasks, and only bring the heavy option in when needed. It’s the sort of instruction that sounds obvious after you try it, especially if you’ve watched costs climb because every tiny change was treated like a doctoral thesis.

This also hints at where agent UX is heading: you stop thinking in single-model sessions, and start thinking in a small team with a budget.

Simon Willison@simonw

The most interesting Fable tip I've heard so far is to let the model use its own judgement as much as possible I told it "For all coding tasks use your judgement to decide an appropriate lower power model and run that in a subagent" and it seems to be saving a lot of tokens

6:52 PM · Jul 3, 2026 · 252K Views

148 Replies · 147 Reposts · 3.31K Likes

Prompting by finding your own blind spots

@trq212’s take on working with Fable is refreshingly honest: the hard part is noticing what you do not know yet. His approach centres on structured steps that tease out missing requirements and fuzzy assumptions before you commit to implementation, then keeps a record of deviations and decisions as you go.

It’s also a neat reminder that “good prompting” is often just good project hygiene, written down.

Thariq@trq212

I’ve found the most important part of working with Fable is discovering my own unknowns so I can prompt it better, heres how I do that.

5:46 PM · Jul 3, 2026 · 777K Views

117 Replies · 340 Reposts · 4.34K Likes

Plannotator: a UI for reviewing agent plans and diffs

@dillon_mulroy posted a tool he uses every session: Plannotator, a browser interface for annotating agent plans, diffs, and long-running work. The appeal is straightforward, agents generate walls of text and proposed changes, and you need a place to triage, comment, and keep your bearings.

If agent work is going to look like “code review plus orchestration”, tools like this feel inevitable.

Dillon Mulroy@dillon_mulroy

i use this every single session i work with agents

5:57 PM · Jul 3, 2026 · 59.7K Views

26 Replies · 14 Reposts · 693 Likes

Give your agent a whole computer, not just an API

@steipete argued for a dedicated VM so an agent can do real end-to-end testing with full GUI control, including the annoying parts like pop-ups and security warnings. It’s a pragmatic move: plenty of failures only show up when you click the buttons like a person would.

The direction is clear: if you want agents to own a workflow, you have to give them the same surface area that users see.

Peter Steinberger 🦞@steipete

Give your agent its own computer to REALLY end to end test stuff.

1:16 AM · Jul 4, 2026 · 236K Views

120 Replies · 67 Reposts · 2.14K Likes

Skill evals are the hard bit, not the skill itself

@mattpocockuk summed up a pain point many teams quietly share: evaluating “skills” is brutal, especially when you want them to work across different harnesses and agent setups. Making a capability is one thing, proving it works reliably, stays working after model updates, and does not break in edge cases is another.

If you’re wondering why agent progress can feel jumpy, this is part of it. The measurement is still catching up.

Matt Pocock@mattpocockuk

"Evals on skills are hard" is the understatement of the year

Peter Steinberger 🦞 @steipete

@MichaelArnaldi @EffectTS_ @ZachWarunek Start with a skill that distills the most important things latest gen agents don't get right with Effect and you're 80% there. Evals on skills are hard.

8:22 PM · Jul 3, 2026 · 66.5K Views

19 Replies · 17 Reposts · 428 Likes

Post-training automation, and the prospect of many new ‘minds’

@tszzl pointed to posttraining automation benchmarks and the idea that once models can reliably post-train other models, we could see a rapid expansion in specialised systems. The interesting part is not just performance, it’s the implication that creating a specific kind of “mind” could become a normal craft.

That is a future where the default assistant matters less, and the custom-built ones matter more.

roon@tszzl

i think these posttraining-automation benchmarks are even more important than they seem when models cross the threshold of being able posttrain other models, hopefully there will be a cambrian explosion of the types of minds authoring minds will become an accessible artform

Thoughtful @thoughtfullab

GLM 5.2 is 5x cheaper than Opus 4.8 and 11x than Fable 5, yet it tops PostTrainBench. That’s exciting because lower costs make personalized intelligence economically viable. Every company and country should be able to own models trained on its own data and have sovereignty over

10:28 PM · Jul 3, 2026 · 102K Views

49 Replies · 76 Reposts · 1.15K Likes

Why writing still feels harder than code for LLMs

@leerob asked the uncomfortable question: are current models simply bad at great creative writing? The thread circles a key difference: code has clean feedback loops (tests, builds, types), while writing often lacks a crisp, agreed definition of “good”.

A useful framing from the replies is to treat models more like editors and critics, and to spend more time “onboarding” them into context and voice, rather than expecting a perfect first draft.

Lee Robinson@leerob

Are current LLMs incompatible with great creative writing? I can't tell if it's cope or not, but it seems like even with the best models, I still can't get them to write like humans would. For coding, there is a verifiable reward like it compiling or tests passing. But for

2:29 AM · Jul 4, 2026 · 113K Views

249 Replies · 23 Reposts · 936 Likes

Unbroker: an agent skill for data broker opt-outs

@Teknium shared an optional Hermes Agent skill called Unbroker that tries to find your personal details on data broker sites and submit takedown requests. The conversation quickly goes to the only question that matters: do the removals stick, and what happens when the data reappears?

Tools like this are turning privacy admin into an ongoing process, not a one-off email and wishful thinking.

Teknium 🪽@Teknium

New optional skill available in Hermes Agent. Unbroker teaches Hermes Agent how to find your personal info on data brokers platforms and get it taken down. Learn more:

𒐪 @SHL0MS

i'm open sourcing UNBROKER: a tool that finds where your personal info is exposed by data brokers and files the removals for you it runs as a skill in Hermes Agent _________ your data is everywhere; hundreds of brokers publish your name, current and old addresses, phone, email,

8:27 PM · Jul 3, 2026 · 281K Views

91 Replies · 182 Reposts · 2.47K Likes

People are closing replies because bot spam is getting unbearable

@signulll tightened reply permissions after getting worn down by AI reply bots. It’s hard to blame anyone for doing it, but it’s also a real loss: open replies are how you find new people and unexpected insight.

This is what “cheap content generation” looks like in the messy middle of social platforms, where the cost of noise is paid by the person reading and moderating.

signüll@signulll

okay i am sick & tired of ai reply bots, switching so that only followers of ppl i am following can reply now. sorry everyone. i tried to keep it open but its now untenable.

9:19 PM · Jul 3, 2026 · 74.3K Views

225 Replies · 3 Reposts · 1.03K Likes

Codex vs ChatGPT: why people keep both

@jxnlco asked Codex users what they still use ChatGPT for, and the replies read like a map of tool separation. People want their coding context kept clean, while using ChatGPT for research, longer conversations, mobile access, images, and general questions when they hit limits elsewhere.

It’s a reminder that “AI app choice” is becoming more like choosing a set of instruments than picking a single favourite.