Metrics-Driven Process Improvement: A Case Study

In the previous post, I described how we use a measure-question-learn-improve cycle to refine development processes. To reiterate:

  1. Measure everything we possibly can.
  2. Learn by asking questions and exploring the data we’ve collected.
  3. Use our learnings to try to improve.
  4. Measure continuously to confirm improvement.

At Indeed, we get as much data as we can into Imhotep — everything that happens in our products, but also everything that happens in the development process. Process-oriented Imhotep datasets at Indeed include Git commits, Jira issue updates, production deploys, wiki edits, and more.

Let’s take a look at how we applied the measure-question-learn-improve cycle to a problem at Indeed: translation verification.

Translation Process - metrics-driven improvement

How long are translations in “Pending Verification”?

Our international team believed it was taking way too long to verify string translations in our products. We track translation work in Jira, and we track Jira issue updates in an Imhotep dataset. So we started asking questions of our measurements that might help us understand this problem.

This Imhotep query gives us a day-by-day grouping of time spent in the Pending Verification state. It includes only Translation issues (in a sample project called LOREM) that moved out of that state:

from jiraactions 2017-01-08 2017-04-02
where issuetype = 'Translation' AND
      prevstatus = 'Pending Verification' AND
      status != 'Pending Verification' AND
      project = 'LOREM'
group by time(1d)
select timeinstate/86400 /* days pending */ 

We ask Imhotep to graph that metric cumulatively over time. We see that for the given 3-month period, Translation issues spent a total of ~233 days in Pending Verification.

Translation Days Pending - metrics driven process improvement

That sounds like a lot of time, but it’s important to ask more questions! Be skeptical of the answers you’re getting, whether they support your hypothesis or not.

  • Can we dig into the data to better understand it?
  • What other information do we need to interpret?
  • What are the sources of noise?
  • Do we need to iterate on the measurement itself before it is generally useful?

In this example, what if only a few issues dominated this total? Let’s tweak our query to look at how many issues are contributing to this time.

from jiraactions 2017-01-08 2017-04-02
where issuetype = 'Translation' AND
      prevstatus = 'Pending Verification' AND
      status != 'Pending Verification' AND
      project = 'LOREM'
group by time(1d)
select distinct(issuekey) /* number of issues */

number of issues - metrics driven process improvement

Our translators shared with our engineers their perspective on the root cause. Translation changes had to wait for a full build before they would show up in a testing environment. Translators would move on to other work and return to verify later. We paid the cost of these context switches in a slower rate of translation.

The spikes we see in the graph above show that delay. Each time a set of changes reached a testing environment, a number of verification events closely followed. The visualized data confirms the inefficiency described by the people actually doing the work.

When we switch that graph to cumulative, we see that translators verified 278 issues in the time period. That is probably a large enough dataset to validate the hypothesis.

cumulative number of issues - metrics driven process improvement

These are just a few examples of questions we can quickly and iteratively ask using Imhotep. When we have good measurements and we ask good questions, we learn. And based on our learnings, we can try to improve.

Translation verification: There is a better way

If a translation change could go straight to a testing environment as soon as it was submitted, we would eliminate the inefficiency described above. In fact, a couple of engineers at Indeed figured out a way to deploy translations separate from code. They started to try that incrementally on a project-by-project basis. This capability enabled translators to verify issues minutes after completing their changes.

After a period of time, we were able to compare two similar projects. The IPSUM project used the new translation deployment mechanism, while the LOREM project used the old method.

To illustrate the benefits of the new mechanism, it’s worth comparing the worse case scenarios. This query lets us see the 90th percentile time in Pending Verification for just those two projects.

from jiraactions 2016-09-15 2017-02-28
where issuetype = 'Translation' AND
      prevstatus = 'Pending Verification' AND
      status != 'Pending Verification'
group by project in ('LOREM','IPSUM')
select percentile(timeinstate, 90)

compare translation times - metrics driven process improvement

The new process does look faster, with a 90th percentile of 1.8 days, compared to 12 days for the project using the old mechanism.

After digging into the data, asking more questions, and further verifying, we decided to move more projects onto the new system and keep measuring the impact.

Using Imhotep to understand your process

In order to track Jira activity with Imhotep, we wrote a tool that we run daily to extract the data from Jira into an Imhotep dataset. We’ve open sourced this tool, and you can find it in the Imhotep Builder Directory. In the next post, I describe using that builder to analyze Apache Software Foundation projects.

Cross-posted on Medium.

Tweet about this on TwitterShare on FacebookShare on LinkedInShare on Google+Share on RedditEmail this to someone

Using Metrics to Improve the Development Process (and Coach People)

In the previous post, I described Imhotep, our scalable, efficient, and fast open source data analytics platform. Imhotep helps us use metrics at Indeed for fast, iterative experimentation, which drives improvement to our products.

Improving process and coaching people

We use the same tools and techniques to improve development processes, following a measure-question-learn-improve cycle:

  1. Measure everything we possibly can.
  2. Learn by asking questions and exploring the data we’ve collected.
  3. Use our learnings to try to improve.
  4. Measure continuously to confirm improvement.

Beyond process improvements, this approach can also work for people. Data can help us understand our own work and coach others.

  • How much am I getting done?
  • How am I engaging with other teams?
  • How has my work changed over time?
  • What are my blind spots?

Is measuring processes and people a good idea?

You might be skeptical of using this approach for improving process and measuring people. It’s good to be skeptical. To truly benefit from this approach, you must proceed with caution.

Gaming the stats (Goodhart’s Law)

The first caution is Goodhart’s Law, which states that “When a measure becomes a target, it ceases to be a good measure.” For example, your manager might say: “Our measurements show that our team productivity is declining. Let’s set a target to increase features completed by 20% next quarter. If we hit it, big bonuses all around!”

Okay, but your team might then start “gaming the stats” — making changes that improve the metric without improving productivity. Now your measure is meaningless for gauging productivity, and you’ve rewarded your team for counterproductive measures that don’t advance your goals.

The Number Six Principle

The second caution is something I’ve named the Number Six Principle (inspired by a classic TV character and his famous line): Don’t reduce people to a set of numbers.

I am not a number - use caution measuring people - improve the development process

No one enjoys being judged entirely by numbers. If you tell people you’re measuring them, you run the risk of seriously damaging morale. Many people will assume you’re not considering qualitative performance elements.

It’s how you use them

You can avoid these pitfalls if you’re careful. Consider the example above in which your team is concerned about slipping productivity metrics. If you take a close look at the numbers, understand them in context, and diagnose the situation, you can have a productive dialog about how to improve.

Perhaps your team tackled more complex features, therefore completing fewer. That might be okay, or you might agree as a team that you could have done a better job of simplifying the feature work.

Or maybe you look at a different metric and see that your overall support load went up 50% due to growth in your customer base. You can then decide to live with that balance or try to augment your team’s capacity to handle support while developing new features.

Starting with the measurements, a considered discussion can lead to tangible process improvement. In the next post, I describe a process improvement we validated and measured with Imhotep.

Cross-posted on Medium.

Tweet about this on TwitterShare on FacebookShare on LinkedInShare on Google+Share on RedditEmail this to someone

Imhotep: Scalable, Efficient, and Fast

This post is the first in a five-part series on improving the development process (and coaching developers) with metrics-driven insights.

Move fast and try things — that’s how we develop products at Indeed. We don’t believe in betting on a small number of great ideas. Instead, we bet on exploring lots of ideas as quickly as possible.

To be successful in this approach, we need innovative team members with diverse perspectives. We hire people who are excited to quickly explore ideas in service of our mission — to help people get jobs. Once they’re on board, we give them ownership and autonomy to do exactly that. And we give them the tools to track and analyze their experiments.

The right tools for the job

We’ve developed and open sourced some of these tools, including Imhotep, our data analytics platform. Imhotep enables rapid exploration and analysis of large time-series datasets. It includes a query language (IQL), a web-based UI, and a distributed backend. It is scalable, efficient, and fast.

Imhotep measure question learn improve Indeed Open Source

Imhotep is scalable

Imhotep scales horizontally by adding daemon instances that can run on commodity hardware or in the cloud. Indeed’s internal Imhotep cluster handles up to 5 million queries each week across thousands of datasets. Roughly 90% of those queries come from automated systems.

Our most popular dataset includes about 39 billion events just for the last year. That dataset alone receives around 25,000 distinct queries each month.

Imhotep is efficient

Because the data structure underlying Imhotep is an inverted index, the disk utilization is remarkably low for most time-series datasets. The dataset mentioned above, with 39 billion events and 384 possible fields per event, takes up 5.7 terabytes on disk. That works out to 146 bytes per event.

That kind of storage efficiency allows us to keep all the data for analysis and avoid sampling. Sampling is fine when you want to just look at aggregate trends. But if you want to actually dig down into your data and examine the outliers, you can’t reliably find them or see their effects if you sample.

Imhotep is fast

Imhotep’s speed lets us rapidly iterate and collaborate. Over a recent 90-day period at Indeed, our internal cluster saw around 2 million interactive Imhotep queries (queries done from the webapp). The median response time for those queries was 276 milliseconds.

A powerful cache implementation contributes to this blazing speed, with nearly 60% of interactive queries coming from the cache. But even uncached queries are quite fast, with a median response time of around 4 seconds. An uncached query over a long time span takes longer, but not that much longer. For uncached queries with a 365-day time span, the median response time is about 9 seconds.

How do we know all these stats about Imhotep performance? Because we have an Imhotep dataset for Imhotep usage. In just a few minutes, I was able to iteratively query that dataset to understand recent cluster performance.

Imhotep drives insight and improvement

Imhotep empowers us to experiment and quickly improve our products. We’ve also applied this data-driven approach to improving development processes. In the next post in this series, I explain more about how we use metrics to improve process.

Cross-posted on Medium.

Tweet about this on TwitterShare on FacebookShare on LinkedInShare on Google+Share on RedditEmail this to someone