Is This What You Mean?

Field & practiceJune 10, 2026

My son is almost eight, and he is very into games. We don't do a lot of video games at our house, but he gets to play occasionally, and for a while now he's been asking to build his own. He loves creating worlds. He has a whole vision for how his should work.

The old version of me would have found some kid-friendly game creation tool, probably Scratch, and we would have spent a week of evenings dragging blocks around. I've watched that process in schools and edtech programs: an hour a day, a classroom period at a time, maybe a week before there's something that's actually fun to play.

Instead, I had him draw his levels on paper. He mapped out the mechanics he wanted, talked me through how his world was supposed to behave, and then we sat down with Claude Code and built it. Thirty minutes later he had a playable prototype: a little 2D Minecraft-ish thing where you collect resources and put up buildings during the day, and at night creatures come out to attack them and you fight them off with a sword. Not an innovative game by any stretch. But it was his world, and it was playable in half an hour. We've worked on it a couple of evenings since, maybe an hour of actual AI time total.

The game is not the part I keep thinking about. The part I keep thinking about is watching him play test it.

He'd play for a minute, then say I don't like where the buttons are, this doesn't make sense, can the monsters come a little later. I'd ask Claude to make the change, and the new version would show up while he was still holding the thought. The gap between feedback and revision had collapsed to almost nothing. He never had to remember what he meant three weeks ago. He just pointed at the thing while it was in front of him.

The loop I've never had in the field

As a designer, that loop is the thing I've spent years trying to tighten and never could.

Here's how user testing has actually worked for most of my career in Cambodia. You ride out to the province with a build. You watch people use it, you take notes, you ask your questions, you thank everyone, and you ride home. Then comes synthesis, redesign, development, and if the budget allows, a second trip weeks or months later to find out whether you understood what people were telling you the first time. Every round trip costs money, and every gap between rounds is a chance for meaning to leak out of the feedback.

When I did my systematic review of mobile learning research in the Global South, the pattern that bothered me most wasn't weak outcome measurement. It was that user involvement in design was the stage where the literature fell off a cliff. Teams investigated the context. They documented infrastructure, language, connectivity. Then they built the thing without the users and tested it after the decisions were already made. We know the context. We just don't ask the users.

I've heard every justification for that gap, and I've used some of them myself. Iteration with real users is slow. It's expensive. The developers are in another city or another country. By the time the feedback gets back to whoever can act on it, the budget has moved on.

What I watched with my son was that entire excuse structure quietly falling apart over a simple video game.

What the research says

This isn't just me. The research is just starting to explore this area as well. Santiago and colleagues published a framework this past year for what they call live-prototyping: the designer modifies AI-generated parts of a prototype in real time, through a separate control interface, while the user study is happening.¹ Their practitioners saw the same thing I saw, feedback getting integrated in the moment instead of weeks downstream. Li and colleagues ran a case study folding vibe coding into a user-centered design process and found the team could test multiple alternative designs inside live sessions with their users, in their case highway traffic engineers.² Bilgram's group made the business version of the argument earlier: delegating prototype construction to an AI agent means faster iterations at lower cost in exactly the phase where iteration matters most.³

The caveats in those papers are real too. Santiago's participants flagged reliability problems and the amount of planning it takes to do this well. Duan and colleagues found that LLM-generated design feedback got less useful with each iteration, which suggests the model is a better builder than critic.⁴ None of this is push-button yet.

And the evidence from contexts closer to my research is more sobering in a useful way. A team in rural South Africa ran generative AI co-design workshops with adolescents on a mental health app. Almost none of the kids had ever touched these tools. Most found the process engaging and started personalizing outputs to reflect their own identities, which is exactly what you want from co-design. But most of them needed help constructing prompts, and over half noticed cultural bias in what the AI produced.⁵ Till's work on co-design readiness in South Africa makes the deeper point: meaningful co-design rests on trust, cultural respect and familiarity with the technology, and all three take deliberate time to build.⁶ The AI shortens the iteration loop. It does nothing to shorten the trust loop. Those are different loops, and only one of them was ever the bottleneck we admitted to.

What I'd want to build

So here's where my head is going.

At the university, I could have students sit with an interface and give me feedback we act on inside the same session. Move the control, change the flow, hand it back. Is this better? Capture that, bring it home, and use it to decide what the real feature becomes.

In the field, it gets more interesting. Sitting with a participant in the province, prototype running, and instead of walking them through an acceptance script or a TAM questionnaire, we change the thing in front of them and ask the only question that ever mattered: is this what you mean? Is this appropriate? How does this feel? The follow-up gets deep precisely because the artifact can keep up with the conversation.

To do that for real, somebody needs to build the tooling around it. Each participant's instance would need to start from the same original, multi-tenant style, so one person's changes don't contaminate the next session. The AI agent would need to document every variation as it's made, so each fork is captured and attributable to the person whose feedback produced it. Then all of it comes back to the team for review.

And that review is the part I want to be careful about. Not every change a participant asks for makes it into the final product, and it shouldn't. Reconciling forty users' worth of divergent variations into one coherent design is designerly knowledge, the judgment part of the job. The AI multiplies the raw material. It doesn't tell you what the material means. I'm fine with that division of labor. I'd be worried about any version of this that wasn't.

The drawing came first

Which brings me back to my son, because there's a worry sitting under all this excitement and I don't want to skip past it.

He is going to grow up needing to know how to use these tools. I'm fairly convinced of that. I'm equally convinced the risk is real that he uses them in a way that hollows out his own thinking. The research here is uncomfortable: a study of 666 participants found frequent AI use correlated with lower critical thinking, mediated by cognitive offloading, and the effect was strongest in the youngest users.⁷ A systematic review of over-reliance on AI dialogue systems points the same direction.⁸ But there's a counter-thread worth taking seriously: when the offloading is deliberately structured, when students hand the mechanical parts to the AI inside a scaffold that forces them to do the analysis and reflection themselves, critical thinking can actually improve.⁹ The variable isn't the tool. The variable is the scaffolding around it.

I keep coming back to the fact that the drawing came first. The levels on paper, the mechanics talked through out loud, the world already built in his head before we ever opened a laptop. The AI built what he had already thought. I don't know yet how you hold onto that order of operations, for an eight-year-old or for a design team, once the tool makes skipping it so easy.

He wants to add a second level now. He's drawing it on paper before we open the laptop. I didn't tell him to do that.

I'm choosing to take it as a good sign.

AI Disclaimer: I used Claude and Consensus to help pull together the research threads in this post. The writing, the story, the opinions, and the worry are mine.

Santiago, J.M., et al. (2025). The AI of Oz: A conceptual framework for democratizing generative AI in live-prototyping user studies. Applied Sciences, 15(10), 5506. https://doi.org/10.3390/app15105506 — The closest thing yet to a formal method for changing a prototype in real time during a user study. ↩
Li, T., Maheshwari, T., & Voelker, A. (2025). User-centered design with AI in the loop: A case study of rapid user interface prototyping with "vibe coding." arXiv. https://arxiv.org/abs/2507.21012 ↩
Bilgram, V., & Laarmann, F. (2023). Accelerating innovation with generative AI: AI-augmented digital prototyping and innovation methods. IEEE Engineering Management Review, 51(2), 18–25. https://doi.org/10.1109/EMR.2023.3272799 ↩
Duan, P., Warner, J., Li, Y., & Hartmann, B. (2024). Generating automatic feedback on UI mockups with large language models. CHI 2024. https://doi.org/10.1145/3613904.3642782 — Useful for catching subtle errors, but the feedback got less valuable over successive iterations. ↩
Dallison, S., et al. (2025). Using generative AI to co-design digital mental health interventions with adolescents in rural South Africa: Qualitative thematic analysis of participatory workshops. Journal of Medical Internet Research, 27, e73535. https://doi.org/10.2196/73535 ↩
Till, S., Verdezoto Dias, N., & Densmore, M. (2025). Fostering co-design readiness in South Africa. Interacting with Computers. https://doi.org/10.1093/iwc/iwaf005 ↩
Gerlich, M. (2025). AI tools in society: Impacts on cognitive offloading and the future of critical thinking. Societies, 15(1), 6. https://doi.org/10.3390/soc15010006 ↩
Zhai, C., Wibowo, S., & Li, L.D. (2024). The effects of over-reliance on AI dialogue systems on students' cognitive abilities: A systematic review. Smart Learning Environments, 11, 28. https://doi.org/10.1186/s40561-024-00316-7 ↩
Hong, H., Vate-U-Lan, P., & Viriyavejakul, C. (2025). Cognitive offload instruction with generative AI: A quasi-experimental study on critical thinking gains in English writing. Forum for Linguistic Studies, 7(7), 325–334. https://doi.org/10.30564/fls.v7i7.10072 — The hopeful counter-evidence: structured offloading improved critical thinking rather than eroding it. ↩

The loop I've never had in the field

What the research says

What I'd want to build

The drawing came first

Footnotes