Metrics-Driven Process Improvement: A Case Study

In the previous post, I described how we use a measure-question-learn-improve cycle to refine development processes. To reiterate:

  1. Measure everything we possibly can.
  2. Learn by asking questions and exploring the data we’ve collected.
  3. Use our learnings to try to improve.
  4. Measure continuously to confirm improvement.

At Indeed, we get as much data as we can into Imhotep — everything that happens in our products, but also everything that happens in the development process. Process-oriented Imhotep datasets at Indeed include Git commits, Jira issue updates, production deploys, wiki edits, and more.

Let’s take a look at how we applied the measure-question-learn-improve cycle to a problem at Indeed: translation verification.

Image translates "measure-question-learn-improve" cycle text into French to illustrate Translation Verification.

How long are translations in “Pending Verification”?

Our international team believed it was taking way too long to verify string translations in our products. We track translation work in Jira, and we track Jira issue updates in an Imhotep dataset. So we started asking questions of our measurements that might help us understand this problem.

This Imhotep query gives us a day-by-day grouping of time spent in the Pending Verification state. It includes only Translation issues (in a sample project called LOREM) that moved out of that state:

from jiraactions 2017-01-08 2017-04-02
where issuetype = 'Translation' AND
      prevstatus = 'Pending Verification' AND
      status != 'Pending Verification' AND
      project = 'LOREM'
group by time(1d)
select timeinstate/86400 /* days pending */ 

We ask Imhotep to graph that metric cumulatively over time. We see that for the given 3-month period, Translation issues spent a total of ~233 days in Pending Verification.

Graph of translation issues in Pending Verification over 3 months in 2017, showing the issues were in Pending for about 233 days total.

That sounds like a lot of time, but it’s important to ask more questions! Be skeptical of the answers you’re getting, whether they support your hypothesis or not.

  • Can we dig into the data to better understand it?
  • What other information do we need to interpret?
  • What are the sources of noise?
  • Do we need to iterate on the measurement itself before it is generally useful?

In this example, what if only a few issues dominated this total? Let’s tweak our query to look at how many issues are contributing to this time.

from jiraactions 2017-01-08 2017-04-02
where issuetype = 'Translation' AND
      prevstatus = 'Pending Verification' AND
      status != 'Pending Verification' AND
      project = 'LOREM'
group by time(1d)
select distinct(issuekey) /* number of issues */

Graph of number of translation issues over 3 months in 2017, showing 5 brief, distinct spikes

Our translators shared with our engineers their perspective on the root cause. Translation changes had to wait for a full build before they would show up in a testing environment. Translators would move on to other work and return to verify later. We paid the cost of these context switches in a slower rate of translation.

The spikes we see in the graph above show that delay. Each time a set of changes reached a testing environment, a number of verification events closely followed. The visualized data confirms the inefficiency described by the people actually doing the work.

When we switch that graph to cumulative, we see that translators verified 278 issues in the time period. That is probably a large enough dataset to validate the hypothesis.

Graph of cumulative number of translation issues over 3 months in 2017, showing a stepped climb to 278 issues

These are just a few examples of questions we can quickly and iteratively ask using Imhotep. When we have good measurements and we ask good questions, we learn. And based on our learnings, we can try to improve.

Translation verification: There is a better way

If a translation change could go straight to a testing environment as soon as it was submitted, we would eliminate the inefficiency described above. In fact, a couple of engineers at Indeed figured out a way to deploy translations separate from code. They started to try that incrementally on a project-by-project basis. This capability enabled translators to verify issues minutes after completing their changes.

After a period of time, we were able to compare two similar projects. The IPSUM project used the new translation deployment mechanism, while the LOREM project used the old method.

To illustrate the benefits of the new mechanism, it’s worth comparing the worse case scenarios. This query lets us see the 90th percentile time in Pending Verification for just those two projects.

from jiraactions 2016-09-15 2017-02-28
where issuetype = 'Translation' AND
      prevstatus = 'Pending Verification' AND
      status != 'Pending Verification'
group by project in ('LOREM','IPSUM')
select percentile(timeinstate, 90)

Screenshot of query results for Lorem project at 1,043,964 (12 days) and Ipsum project at 154,533 (1.8 days)

The new process does look faster, with a 90th percentile of 1.8 days, compared to 12 days for the project using the old mechanism.

After digging into the data, asking more questions, and further verifying, we decided to move more projects onto the new system and keep measuring the impact.

Using Imhotep to understand your process

In order to track Jira activity with Imhotep, we wrote a tool that we run daily to extract the data from Jira into an Imhotep dataset. We’ve open sourced this tool, and you can find it in the Imhotep Builder Directory. In the next post, I describe using that builder to analyze Apache Software Foundation projects.


Read the full series of blog posts:


Cross-posted on Medium.