Good Metrics Maximize Surprise
"Let it surprise you."
A/B tests don’t have to be Experiments. You can use them for other purposes. Corporate political propaganda, for example. Like when politicians cite random studies they haven’t read on a “debate” stage, the point isn’t the study itself. It’s the conclusion.
No judgment. Politics has its place.
The difference between an Experiment and Propaganda is whether you want to learn something or convince someone.
If you want to convince someone, you will choose metrics that are all but guaranteed to go in a certain direction.
For example, you might remove a screen from a checkout flow and make your metric “time spent in checkout.” Of course, the time spent will go down. You already knew that before running the experiment. You’re building up a case for your preferred product direction, the one that hits your OKR. Not running an Experiment.
If you were running an Experiment, you’d want to know: what do we lose by dropping that screen? What do we gain by dropping it? What metric can I develop that lets both the positives and negatives of dropping the screen shine through, their relative light depending only on how customers behave? Maybe the extra information from the extra screen is worth nothing. Maybe it is everything. The choice of metric should not assume either. It should allow for both cases and let customer behavior decide.
In other words: Experiment metrics maximize surprise. Propaganda metrics minimize it.
Suppose you want to learn something. How do you go about finding a metric?
In the screen-drop example… Suppose the extra screen asks the customer for a bit more information about why they purchased the product. Maybe that information feeds later personalization or targeted messages to get the customer to buy again or subscribe or otherwise take the next step in their journey. It’ll always be something like this. It’s not like the extra-screen was just thrown in there willy-nilly. The folks who added it did so for a reason. We should know why before we knock it down.
We don’t want to use the probability of making it to the end of checkout because, of course, taking away extra screens helps with that.
We also don’t want the metric to be a conversion rate of users to taking the next step. Of course that’s going to fall. The extra screen makes the ad-targeting better.
What we want is a metric that could go either way.
For example, the joint probability of both completing the checkout and taking the next step—not the conditional probability.
If the conditional probability of taking the next step falls, but the number of customers who take the next step increases, then dropping the screen makes sense. You’re improving the probability of checking out without dropping the probability that a top-of-funnel visitor gets to the final stage of the journeys, even though targeting gets worse.
If these two steps have different economic models associated with them (maybe the checkout is the Basic subscription and the extra-screen provides information used to market Premium to the user), then make the metric take that into account. Give each one a dollar value and try to maximize expected dollars.
The point is to get the tradeoff in there. We’re making a move that increases the pr(checkout) and decreases the pr(premium | checkout). So we want to know whether pr(checkout, premium) = pr(checkout) x pr(premium|checkout) increases. We’re surprised either way because, while we know pr(checkout) will increase and pr(premium | checkout) will decrease, we don’t know the net effect.
So, we’ve set up our experiment metrics to learn something, and, along the way, we’ve figured out exactly what we’re uncertain about on the product-side. That’s the right way to select metrics. Like most problems, it can be solved by the algorithm:
Write down we know.
Write down what we don’t know.
Experiment to bridge the gap.
When I was but a wee lad learning to shoot, my dad told me the trick was to “Let it surprise you.” If we anticipate what’ll happen when we pull the trigger, we won’t hit the target.
Thanks for reading!
Zach
Connect at: https://linkedin.com/in/zlflynn

