When tests start drawing the map

Charlie·October 22, 2025

You open the wiki. The checkout flow doc was last updated eight months ago. Half the screenshots show UI that no longer exists. The OAuth section describes a consent screen that was replaced last quarter.

You open the test suite. It passes. But you know it doesn't actually cover the new payment provider, because nobody updated the tests when finance switched vendors. And it definitely doesn't know about the banner marketing added last week.

So you do what everyone does. You ask Lisa. She's been here two years and remembers how things actually work.

This is the state of mobile testing today. The wiki is a guidebook for an app that no longer exists. The test suite follows an outdated map. The only real source of truth is the person who happens to remember.

We think that's the wrong shape for the problem.

Software changed. Testing didn't.

Three things broke the old model.

Software became stochastic. Modern apps behave differently every run. Same user, same cart, different day, different flow. AI features, fraud detection, A/B tests, feature flags, geography, permissions: every release introduces another source of variation. Scripted tests assume deterministic paths. The paths stopped being deterministic a while ago.

Interfaces became personal. Your app doesn't look like mine. I get the paid experience, you get the upgrade prompt, someone else gets a tutorial in Korean. There are 40 different checkout flows inside the app you call "the checkout flow."

User journeys started crossing boundaries. Tap 'Link Account' and your user leaves your app to authenticate with their bank. Tap 'Login with Google' and they're inside Google's UI for 30 seconds. You don't control what happens there. You're still on the hook when the journey fails.

Traditional tests stop at the boundary of your app. User journeys don't.

Tourists and cartographers

A traditional test is a tourist. It follows a guidebook. Click here, expect this, move on. When the guidebook is wrong, the tourist gets lost.

What you actually want is a cartographer. Something that walks into the app, observes what's there, draws a map of what it finds, and updates that map every time it walks the territory.

A tourist tells you whether one specific path worked today. A cartographer tells you what paths exist, how often each one appears, what variations live inside them, and what changed since last week. A tourist gives you a pass or a fail. A cartographer gives you a map.

Tests as appreciating assets

Most test suites lose value over time. You write them, ship the feature, and start the slow march of maintenance. New banner, broken selector. Redesigned button, broken selector. Rename a route, broken selector. By month twelve, half the team has stopped trusting the suite.

Tests that draw the map work the other way. Every run adds detail. Run one finds a promotional overlay nobody told it about. Run ten has seen three variants and knows their triggers. Run 50 has mapped seven distinct overlay patterns across user cohorts.

That is what we mean when we say tests should appreciate. The suite that documented 12 app states at launch documents 47 by month six. Each run teaches you something about your app you didn't already know.

The inversion: from depreciating asset to appreciating one.

What we're building

This is the first post on the Semaloop blog, so a quick word on what we're doing here.

Mobile is where this matters most. Real devices, real OS behaviour, real consent flows, real third-party SDKs, real audio and camera and notification permissions. Most mobile teams either skip end-to-end testing or spend half their time maintaining it. Neither feels like the right answer.

We're building agents that test apps the way a real user would. They run on real devices, observe what's actually on screen, work out what state the app is in, adapt when something new appears, and remember what they learn for next time. They reason about outcomes rather than steps. The path between login and confirmation can shift each week, but the question "did the user successfully complete checkout?" still has an answer.

The output isn't a test suite in the old sense. It's a living map of how your app actually behaves, validated on every release.

There's more to write about. How confidence gets built when the app underneath is non-deterministic. What this means for documentation, onboarding, and incident response. How testing changes when the test suite knows more about your app than any human on the team. We'll get to all of it.

Charlie·October 22, 2025