@IndeedEng 2014 Year In Review

We help people get jobs. That’s our mission at Indeed. And being the #1 job site worldwide challenges each of our engineers to deliver the best job seeker experience at scale. When we launched this blog in 2012, we set out to contribute to the broader software community by sharing what we’ve learned from these challenges.

Response to our @IndeedEng tech talks continues to be strong, with over 700 people attending the series in 2014. And our engineering blog received 59,000 views from 128 countries.

Here’s a brief recap of content we shared in 2014:

Testing and measuring everything is central to our development process at Indeed. We use Proctor, our A/B testing framework, to accomplish this, and we open sourced the framework in 2013. Building on this in 2014, we described Proctor’s features and tools and then wrote about how we integrate Proctor into our development process at Indeed. We also released proctor-pipet, a Java web application that allows you to deploy Proctor as a remote service.

We open sourced util-urlparsing, a Java library we created to parse URL query strings without unnecessary intermediate object creation.

In the first half of 2014, we held several tech talks devoted to Imhotep, our interactive data analytics platform. Imhotep powers data-driven decision making at Indeed and we were excited to talk about the technology: scaling decision trees, building large-scale analytics tools, and using Imhotep to focus on key metrics. Then in November, we open sourced part of the platform and held a tech talk and workshop for attendees to explore their own data in Imhotep.

Being a global company is a challenge we take seriously. We shared our experience of iteratively expanding to new markets and how international success requires solving a diverse set of technical challenges.

Changes coming to the blog and talks pages include translating content into Japanese. Look for more translated posts in the months to come.

We’d like to thank everyone who helped make these accomplishments possible. If you follow the blog and watch our @IndeedEng talks, thank you for your support! We look forward to continuing the conversation in 2015.

Why I Unit Test

If you’ve done any software development in the last fifteen years, you’ve heard people harping on the importance of unit testing. Your manager might have come to you and said “Unit tests are great! They document the code, reduce the risk of adding bugs, and reduce the cost and risk of making changes so we don’t slow down over time! With good unit tests, we can increase overall delivery velocity!” Those are all great reasons to unit test, but they are all fundamentally management reasons. I agree with them, but they don’t go to the core of why I, as a developer, unit test.

The reason I unit test is simple: Unit testing is both an opportunity and a strong incentive to improve new and existing designs, and to improve my skills as a designer of software. The trick is to write as few unit tests as possible and ensure that each test is very simple.

How does that work? It works because writing simple unit tests is intrinsically boring, and the worse your code is, the more difficult and boring it will be to test. The only way to get any traction with unit testing is to drastically improve your implementation to the point where it can be covered with hardly any unit tests at all, and then write those.

Avoiding unit tests by improving your implementation

Here are some approaches for writing fewer unit tests:

  • Refactor out repeated code. Each block of code that you are able to abstract out is one less unit test to write.
  • Delete dead code. You don’t have to write unit tests for code that you can delete instead. If you think this is obvious, then you haven’t seen many large legacy code bases.
  • Externalize framework boilerplate as configuration or annotation. That way, you only have to write unit tests for product logic rather than scaffolding.
  • Every branch of code needs at least one unit test, so every if statement or loop you can remove is one less test to write. Depending on your implementation language, if statements and loops can be removed by subtype polymorphism, code motion, pluggable strategies, aspects, decorators, higher order combinators or a dozen other techniques. Each branch point in your code is both a weakness and a requirement for additional testing. Remove them if at all possible.
  • Identify deeper data-flow patterns and abstract them. Often pieces of code that don’t look similar can be made similar by pulling out some incidental computations. Once you’ve done that, then underlying structures can be merged. That way, more and more of your code becomes trivially testable branch-free computations. In the limit, you end up with a bunch of simple semantic routines (often predicates or simple data transformations) strung together with a double handful of reusable control patterns.
  • Separate out your business logic, persistence, and inter-process communications as much as possible, and you can avoid a bunch of tedious mucking with mock objects. Mock objects are code smells, and overuse of them may indicate that your code has become overly coupled.
  • Figure out how to generalize your logic so that your edge cases are covered by your main flow, and single tests can cover diverse and complex inputs. Too often we write single-purpose code for special cases, when we could instead search for more general solutions that cover those cases without special handling. Note however, that discovering the simpler, more general solutions is often much more difficult than creating a bunch of special cases. You may not have enough time to write small amounts of simple code, and instead have to write large amounts of complex code.
  • Recognize and replace logic that is already implemented as methods in existing libraries, and you can push the trouble of unit testing off onto the library’s author.
  • If you can simplify your data objects so much that they are immutable and their operations follow simple algebraic laws, you can utilize property-based testing, where your unit tests literally write themselves.

But yammering is cheap, let’s see some code!

Finding deep patterns and abstracting out repeated code

A common pattern in data-science code is to look to find the element of some collection for which some function is optimized. The simplest Java code for this might resemble the following:

  double bestValue = Double.MIN_VALUE;
  Job bestJob = null;
  for (Job job : jobs) {
    if (score(job) > bestValue) {
      bestJob = job;
    }
  }
  return bestJob;

This is quick enough to code that you might write it without even thinking about it. Just a loop and an if! What can go wrong? That’s fine the first few times you write it, but you’re building up technical debt every time. Writing unit tests is where the repetition and risk starts to really show up. Every block of code like this will need tests not just for correctness in the common case, but also for a bunch of edge cases: what happens if we passed in an empty collection? a single element collection? null? Even the simple code above has some bugs that unit tests can find, but you have to write a lot of them every time you wish to do an optimization, and I don’t know about you, but frankly I’ve got more useful things to do with my time.

A better solution is to realize that even this small amount of code repetition can and should be abstracted out, coded and tested only once. It also gives us a chance to genericize the code and fix some edge cases.

    public static <J> J argMax(Iterable<J> collection,
                               Function<J, Double> score) {
      double bestValue = Double.MIN_VALUE;
      J bestElement = null;
      if (collection != null) {
        for (J element : collection) {
          if (score.apply(element) > bestValue) {
            bestElement = element;
          }
        }
      }
      return bestElement;
    }

This code needs to be unit tested only once. For an even better solution, we can replace all of this logic with a library call (in this case from Google’s Guava library):

  public static <J> J argMax(Iterable<J> collection,
                             Function<J, Double> score) {
    return Ordering.natural().onResultOf(score).max(collection);
  }

After that, you only need unit tests for each different scoring function you use. Everything else has already been handled.

Avoiding unit tests: a path to understanding great software design

The thing about all of these unit-test avoidance techniques is that they are essential to the process of creating robust and supple designs even if you weren’t going to do any unit testing at all! Too often, in our rush to simply get something working, we don’t follow these techniques, but continual unit testing gives us a time and a reason to do it right. In this way, you can leverage aggressive laziness in implementing unit tests to drive continuous improvement of your project design and implementation.

At least, it can if you let it. If you spend your unit testing time writing unit tests for your code without improving its underlying design, you’ll most likely never learn anything, and you’ll have little reason to create code with quality better than “it mostly works.” If you spend your unit testing time looking to minimize the total amount of testing code that you write (by improving your product code), you’ll quickly learn just what it means for software to be well-designed. I don’t know about you, but that’s why I love programming in the first place.

Dave Griffith has been building software systems for over 20 years.

How Indeed Uses Proctor for A/B Testing

(Editor’s Note: This post is the second in a series about Proctor, Indeed’s open source A/B testing framework.)

Proctor at Indeed

In a previous blog post, we described the features and tools provided by Proctor, our open-source A/B testing framework. In this follow-up, we share details about how we integrate Proctor into Indeed’s development process.

Customized Proctor Webapp

Our internal deployment of the Proctor Webapp integrates with Atlassian JIRA, Subversion, Git, and Jenkins. We use JIRA for issue linking, various sanity checks, and automating issue workflow. For tracking changes over time, we use Subversion (for historical reasons — Git is also an option). We use Jenkins to launch test matrix builds, and the webapp integrates with our internal operational data store to display which versions of a test are in use in which applications.

Proctor webapp

Figure 1: Screenshot of a test definition’s change history in the Proctor Webapp

Issue tracking with JIRA

At Indeed, we track everything with JIRA issues, including changes to test definitions. Requests for new tests or changes to existing tests are represented by a custom issue type in JIRA that we called “ProTest” (short for “Proctor Test”). We track ProTest issues in the JIRA project for the application to which the test belongs. The ProTest issues also use a custom workflow that is tied into our deployment of the Proctor Webapp.

After accepting an assigned ProTest issue, the issue owner modifies the test definition using Proctor Webapp. When saving the changes, she must provide a ProTest issue key. Before committing to our Proctor test definition repository, the webapp first verifies that the ProTest issue exists and is in a valid state (for example, is not closed). The webapp then commits the change (on behalf of the logged-in user), referencing the issue key in the commit message.

After the issue owner has made all changes for a ProTest issue, the JIRA workflow is usually as follows:

  1. The issue owner resolves the issue, which moves to state QA Ready.
  2. A release manager uses Proctor Webapp to promote the new definition to QA. The webapp moves the issue state to In QA.
  3. A QA analyst verifies the expected test behavior in our QA environment and verifies the issue, which moves to state Production Ready.
  4. A release manager uses Proctor Webapp to promote the new definition to production, triggering worldwide distribution and activation of the test change within one or two minutes. The webapp moves the issue state to In Production.
  5. A QA analyst verifies the expected test behavior in production and moves the issue state to Pending Closure.
  6. The issue owner closes the issue to reflect that all work is complete and in production.

In cases where we are simply adjusting the size of an active test group, Proctor Webapp skips this process and automatically pushes the change to production.

Our QA team verifies test modifications because those modifications can result in unintended behavior or interact poorly with other tests. Rules in test definitions are a form of deployable code and need to be exercised to ensure correctness. The verification step gives our QA analysts one last chance to catch any unintended consequences before the modifications go live. Consider the case of this rule, intended to make a test available only to English-language users in the US and Canada:

    (lang=='en' && country=='US') || country=='CA'

The parentheses are in the wrong place, allowing French-language Canadians to see behavior that may not be ready for them. A developer forcing himself into the desired group might have missed this bug. When we catch bugs right away during QA, we avoid wasting the time it would take to notice that the desired behavior never made it to production.

Test definition files

We store test definitions in a single shared project repository called proctor-data. The project contains one file per test definition: test-definitions/<testName>/definition.json

Modifications to tests most often are done via the Proctor Webapp, which makes changes to the JSON in the definition file and commits those changes (on behalf of the logged-in user) to the version control repository.

The definition files are duplicated to two branches in proctor-data: qa and production.  When a test definition revision is promoted to QA, the entire test definition file is copied to the qa branch and committed (as opposed to applying or “cherry-picking” the diff associated with a single revision). Similarly, when a test definition revision is promoted to production, the entire file is copied to the production branch and committed. Since we have one file per test definition, this simple approach maintains the integrity of the JSON definition while avoiding merge conflicts and not requiring us to determine which trunk revision deltas to cherry pick.

Building and deploying the test matrix

Proctor includes a builder that can combine a set of test definition files into a single text matrix file, while also ensuring that the definitions are internally consistent, do not refer to undefined bucket values, and have allocations that sum to 1.0. This builder can be invoked directly from Java or via an Ant task or a Maven plugin. We build a single matrix file using a Jenkins job that invokes Ant in the proctor-data project. An example of building with Maven is available on GitHub.

A continuous integration (CI) Jenkins job builds the test matrix every time a test change is committed to trunk. That matrix file is made available to applications and services in our CI environment.

When a release manager promotes a test change to QA, a QA-specific Jenkins job builds the test matrix using the qa branch. That generated matrix file is then published to all QA servers. The services and applications that consume the matrix periodically reload it. An equivalent production-specific Jenkins job handles new changes on the production branch.

Proctor in the application

Each project’s Proctor specification JSON file is stored with each project’s source code in a standard path (for example, src/main/resources/proctor). At build time, we invoke the code generator (via a Maven plugin or Ant task) to generate code that is then built with the project’s source code.

When launching a new test, we typically deploy the test matrix before the application code that depends on it. However, if the application code goes out first, Proctor will “fall back” and treat the test as inactive – if you follow our convention of mapping your inactive bucket to value -1.

You can change the fallback behavior by setting fallbackValue to the desired bucket value in the test specification. We follow the convention of falling back on the unlogged inactive group to help ensure that test and control groups do not change size unexpectedly. Suppose that you have groups 0 (control) and 1 (test) for a test that runs Monday-Thursday with fallback to group 0. If your test matrix is broken as a result of a change from Tuesday 2pm to Tuesday 5pm, summing your metrics across the whole period from Monday to Thursday will skew the results for the control group. If your fallback was -1 (inactive), there would be no skew for your control and test groups.

When adding a new bucket to a test, we typically take this sequence of actions:

  1. Deploy the test matrix with no allocation for the new bucket.
  2. Deploy the application code that is aware of the new bucket.
  3. Redeploy the matrix with an allocation for that bucket.

If the matrix is deployed with an allocation for a new bucket of which the application is unaware, Proctor errs on the side of safety by using the fallback value for all cases. We made Proctor work that way to avoid telling the application to apply an unknown bucket in some cases for some period of time, which could skew analysis.

We take similar precautions when deleting an entire test from the matrix.

Testing group membership, not non-membership

Proctor’s code generation provides easy-to-use methods for testing group membership. We have found it best to always use these methods to test for membership rather than non-membership. If you’ve made your code conditional on non-membership, you run the risk of getting that conditional behavior in unintended circumstances.

As an example, suppose you have a [50% control, 50% test] split, and in your code you use the conditional expression !groups.isControl(), which is equivalent to groups.isTest(). Then, to reduce the footprint of your test while keeping an equal-sized control group for comparison, you change your test split to [25% control, 50% inactive, 25% test]. Now your conditional expression is equivalent to groups.isTest() || groups.IsInactive(). That logic is probably not what you intended, which is to keep the same behavior for control and inactive. In this example, using groups.isTest() in the first place would have prevented you from introducing unintended behavior.

Evolving bucket allocations

We recognize that assigning users to test buckets may affect how the site behaves for them. Proctor on its own cannot ensure consistency of experience across successive page views or visits as a test evolves. When growing or shrinking allocations, we consider carefully how users will be affected. 

Usually, once a user is assigned to a bucket, we’d like for that user to continue to see the behavior associated with that bucket as long as that behavior is being tested. If your allocations started as [10% control, 10% test, 80% inactive], you would not want to grow to [50% control, 50% test], because users initially in the test bucket would be moved to the control bucket.

There are two strategies for stable growth of buckets. In the “split bucket” strategy (Figure 2), you add new ranges for the existing buckets, moving from 10/10 to 50/50 by taking two additional 40% chunks from the inactive range. The resulting JSON is shown in Figure 3.

Split bucket strategy takes two 40% chunks from the inactive range and adds them to new control and test ranges

Figure 2: Growing control and test by splitting buckets into multiple ranges

 

  "allocations": [
  {
      "ranges": [
      {
          "length": 0.1,
          "bucketValue": 0
      },
      {
          "length": 0.1,
          "bucketValue": 1
      },
      {
          "length": 0.4,
          "bucketValue": 0
      },
      {
          "length": 0.4,
          "bucketValue": 1
      }
      ]
  }
  ]

Figure 3: JSON for “split bucket” strategy; 0 is control and 1 is test

In the “room-to-grow” strategy, you leave enough inactive space between buckets so that you can adjust the size of the existing ranges, as in Figure 4.

Room-to-grow strategy requires inactive space between buckets, increasing the control and test buckets from 10% each to 50% each.

Figure 4: Growing control and test by updating range lengths to grow into the inactive middle

We use the “room-to-grow” strategy whenever possible, as it results in more readable test definitions, both in JSON and the Proctor Webapp.

Useful helpers

Proctor includes some utilities that make it easier to work with Proctor in web application deployments:

  • a Spring controller that provides three views: the groups for the current request, a condensed version of the current test matrix, and the JSON test matrix containing only those tests in the application’s specification;
  • a Java servlet that provides a view of the application’s specification; and
  • support for a URL parameter that allows you to force yourself into a test bucket (persistent via a browser cookie)

We grant access to these utilities in our production environment only to privileged IP addresses, and we recommend you do the same.

It works for Indeed, it can work for you

Proctor has become a crucial part of Indeed’s data-driven approach to product development, with over 100 tests and 300 test variations currently in production. To get started with Proctor, dive into our Quick Start guide. To peruse the source code or contribute your own enhancements, visit our GitHub page.