Have you ever wanted a crystal ball that would predict the best A/B test to boost your product’s growth, or identify which part of your UI drives a target metric?
With a statistical model and a SHAP decision plot, you can identify impactful A/B test ideas in bulk. The Indeed Interview team used this methodology to generate optimal A/B tests, leading to a 5-10% increase in key business metrics.
Case study: Increasing interview invites
Indeed Interview aims to make interviewing as seamless as possible for job seekers and employers. The Indeed Interview team has one goal: to increase the number of interviews happening on the platform. For this case study, we wanted UI test ideas that would help us boost the number of invitations sent by employers. To do this, we needed to analyze their behavior on the employer dashboard, and try to predict interview invitations.
Convert UI elements into features
The first step of understanding employer behavior was to create a dataset. We needed to predict the probability of sending interview invitations based on an employer’s clicks in the dashboard.
We organized the dataset so each cell represented the number of times an employer clicked a specific UI element. We then used these features to predict our targeted action: clicking the Set up interview button vs. not clicking on the button.
Train the model on the target variable
The next step was to train a model to make predictions based on the dataset. We selected a tree-based model, CatBoost, due to its overall superior performance and ability to detect interactions among features. And, just like any model, it works effectively with our interpretation tool – SHAP plot.
We could have used correlation or logistic regression coefficients, but we chose SHAP plot combined with a tree-based model because it provides unique advantages for model interpretation tasks. Two features with similar correlation coefficients could have dramatically different interpretations in SHAP plot, which factors in feature importance. In addition, a tree-based model usually has better performance than logistic regression, leading to a more accurate model. Using SHAP plot combined with a tree-based model provides both performance and interpretability.
Interpret SHAP results into positive and negative predictors
Now that we have a dataset and trained model, we can interpret the SHAP plot generated from it. SHAP works by showing how much a certain feature can change the prediction value. In the SHAP plot below, each row is a feature, and the features are ranked based on descending importance: the ones at the top are the most important and have the highest influence (positive or negative) on our targeted action of clicking Set up interview.
The data for each feature is displayed with colors representing the scale of the feature. A red dot on the plot means the employer clicked a given UI element many times, and a blue dot means the employer clicked it only a few times. Each dot also has a SHAP value on the X axis, which signifies the type of influence, positive or negative, that the feature has on the target and the strength of its impact. The farther a dot is from the center, the stronger the influence.
Based on the color and location of the dots, we categorized the features as positive or negative predictors.
- Positive Predictor – A feature where red dots are to the right of the center.
- They have positive SHAP value: usage of this feature predicts the employer will send an interview invitation.
- In the SHAP plot above, Feature B is a good example.
- Negative Predictor – A feature where red dots are to the left of the center.
- They have negative SHAP value: usage of this feature predicts the employer will not send an interview invitation.
- Feature G is a good example of this.
Red dots on both sides of the center are more complex and need further investigation, using tools such as dependency plots (also in SHAP package).
Note that this relationship between feature and target is not causal yet. A model can only claim causality when it assumes all confounding variables have been included, which is a strong assumption. While the relationships could be causal, we don’t know for certain until they are verified in A/B tests.
Generate test ideas
Our SHAP plot contains 9 positive predictors and 4 negative predictors, and each one is a potential A/B test hypothesis of the relationship between the UI element and the target. We hypothesize that positive predictors boost target usage, and negative predictors hinder target usage.
To verify these hypotheses, we can test ways to make positive predictors more prominent, and direct the employer’s attention to them. After the employer clicks on the feature, we can direct attention to the target, in order to boost its usage. Another option is to test ways to divert the employer’s attention away from negative predictors. We can add good friction, making them less easy to access and see if usage of the target increases.
Boost positive predictors
We tested changes to the positive predictors from our SHAP plot to make them more prominent in our UI. We made Feature B more prominent on the dashboard, and directed the employer’s attention to it. After the employer clicked Feature B, we showed a redesigned UI with improved visuals to make the Set up interview button more attractive.
The results were a 6% increase in clicking to set up an interview.
Divert away from negative predictors
We also tested changes to the negative predictors from our SHAP plot in the hopes of increasing usage of the target. We ran a test to divert employer attention away from Feature G by placing it close to the Set up interview button on the dashboard. This way it was easier for the employer to choose setting up an interview instead.
This change boosted clicks to send interview invitations by 5%.
Gaze into your own crystal ball
A SHAP plot may not be an actual crystal ball. When used with a statistical model, however, it can generate UI A/B test ideas in bulk and boost target metrics for many products. You might find it especially suitable for products with a complex and nonlinear UI, such as user dashboards. The methodology also provides a glimpse of which UI elements drive the target metrics the most, allowing you to focus on testing features that have the most impact. So, what are you waiting for? Start using this method and good fortune will follow.
Cross-posted on Medium