We tried using ChatGPT for user-testing. Here’s what happened

On this page:

Can AI replace your user-tests?
1. The limits of remote testing
2. How we put GPT to work
3. What GPT got right (and wrong)
4. GPT vs remote testing tools
5. Where AI really fits
Faster design with real users at the end

Reference:

Can AI replace your user-tests?

Can ChatGPT replace real user tests? A lot of product people want to believe the answer is yes. Spoiler: it’s not. But it can still save us serious time if we use it the right way.

We ran an experiment inside Heatmap, our analytics tool for e-commerce revenue, to see if ChatGPT could take on the role of our users and critique a prototype of our new Funnels feature. What we found is that ChatGPT isn’t a replacement for real user tests, but it is an incredibly useful critique tool. Think of it like a sharp intern who always has an opinion, not a senior researcher you’d base launch decisions on.

I’ve been designing software products for over 20 years, and right now I lead design at Heatmap. We move fast, constantly shipping new features, so we need validation methods that are quick, reliable, and practical.

In this video we’ll walk through how we tested ChatGPT against a real user testing tool, what worked, what didn’t, and the way you can use AI to make your own product process faster today.

1. The limits of remote testing

At Heatmap we were working on Funnels: a complex feature with horizontal scrolling, multiple metrics, and a lot of user interactions. Traditional user testing is valuable, but it’s slow. Recruiting testers, scheduling, watching recordings — that can drag out for weeks.

So the myth we wanted to test was simple: could ChatGPT replace a remote user testing tool like Maze? Maze, and others like it, let you send out a survey tied to a prototype. You get recordings of people clicking through and their answers to scripted questions.

That’s powerful, but it has drawbacks. There are no follow-ups, no conversation, and if your prototype is complex like ours, the tools break down fast. That’s exactly what happened to us with Maze. The connection with Figma was fragile, our prototypes were too heavy, and after weeks of back-and-forth with support, it just wasn’t working.

The irony of spending weeks testing a testing tool is not lost on me.

2. How we put GPT to work

To compare, we brought ChatGPT into the same process. This wasn’t just a one-off experiment. We actually built a repeatable protocol for testing designs with AI, and it’s now part of how we ship features at Heatmap.

Here’s how it works:

Step one: we write a quick brief. Nothing long, just what we’re building, why it matters, must-haves, and who it’s for.
Step two: we design two versions in Figma, export them as PDFs with descriptions, and run them through ChatGPT as a kind of A/B test.

That’s where the personas come in. We tell GPT to role-play as both:

Novice user: new to analytics, looking for simple answers, likely to get lost if the UI isn’t clear.
Expert user: deeply familiar, moving fast, wants advanced options without slowing down.

We give GPT the full flow in PDF form and ask: if you were this user, what’s easy, what’s confusing, and what would you change?

This is key. GPT isn’t clicking around like a real tester. But because we hand it the whole experience, it critiques usability like a reviewer. It even lets us ask follow-ups, switch personas, and request design suggestions.

3. What GPT got right (and wrong)

What came out of this was better than expected. GPT not only critiqued both versions, it also gave us recommendations that balanced the needs of novices and experts. For example, it suggested ways to simplify flows for beginners without slowing down power users.

That’s something you don’t get from a remote tool like Maze. Maze gives recordings, but it’s one-way. With GPT, we can actually have a dialogue: follow up, push deeper, and even ask it to generate improvement lists.

Were the recommendations perfect? No. Some ideas were off-base or missed context. But it gave us enough signal that we could create a hybrid version with confidence, then take that forward to real users.

It’s like getting feedback from the smartest intern in the room. They’ll say something brilliant, then something totally impractical, but either way you’re glad they spoke up.

4. GPT vs remote testing tools

Compared to Maze, the trade-offs were clear.

Remote tools like Maze gave us recordings of real people clicking through. ChatGPT can’t do that.
But ChatGPT let us have a conversation: follow-ups, clarifications, even switching personas midstream. Maze can’t do that.
Maze struggled with our complex Figma prototype. GPT worked fine with our PDF export.

So which is better? Neither. They do different things. ChatGPT doesn’t replace Maze or real users. Instead, it fits earlier in the workflow. It helps us spot issues fast, refine designs, and only bring in real users at the very end.

5. Where AI really fits

Where does this leave us?

ChatGPT isn’t replacing user testing. Real users still reveal the real friction.
But ChatGPT has earned a place in our workflow. We use it for quick critiques during the design phase, to compare perspectives, and to get early insights before looping in stakeholders or users.
By the time we test with actual users, we’ve already eliminated a lot of surface-level issues.

That saves time, keeps us moving fast, and makes our final user testing round far more efficient. AI isn’t replacing people, but it’s making the process sharper.

Faster design with real users at the end

So can ChatGPT replace user testing? No. But can it make the process faster, leaner, and more effective? Absolutely.

At Heatmap, this is how we build: fast cycles, smart use of AI, and real users at the right time. It’s how features like Funnels go from prototype to launch without getting bogged down.

If you’re building software, think of ChatGPT as your critique partner, not your customer base. Thanks for watching, and we’ll see you in the next one.

Receive updates every time new content drops

Follow us on: