Being Just Reliable Enough

One Saturday morning, as I settled in on the couch for a nice do-nothing day of watching college football, my wife reminded me that I had agreed to rake the leaves after putting it off for the last two weekends. Being a good neighbor and not wanting another homeowners’ association (HOA) violation (and it being a bye week for the Longhorns), I grabbed my rake and went outside to work.

fall leaves

There were a lot of leaves. I would say my yard was 100% covered in leaves. I began to rake the leaves and with a modest effort I was able to collect about 90% of the leaves into five piles, which I then transferred into those bags you buy at Home Depot or Costco.

The yard looked infinitely better, but there were still plenty of leaves in the yard. I had the rake, I had the bags, I was already outside, and I was already dirty, so I went to work raking the entire yard again to get the remaining 10% I had missed in the first pass. This took about the same amount of time, but wasn’t nearly as fulfilling. My piles weren’t as impressive, and I was only able to get 90% of the remaining leaves into piles and then into bags, but I had cleared 99% of the leaves.

Still having plenty of daylight and knowing I could do better, I went to work on that last 1%. Now, I don’t know if you know this about leaves, but single leaves can be slippery and evasive. When you don’t have a lot of leaves to clump together to get stuck in the rake it may take two, three, sometimes four passes over the same area to get any good leaf accumulation into your pile. This third pass over the yard was considerably more time consuming, but I was able to get 90% of that remaining 1%. I had now cleared 99.9% of the leaves in my yard.

As I sat back and admired my now mostly leaf-free yard, I could see some individual leaves that had escaped my rake and even some new leaves that had just fallen from the trees. There weren’t too many, but they were there. Wanting to do a good job, I started canvassing the yard on my hands and knees, picking up individual leaves one by one. As you can imagine, this was very tedious and it took much longer to do the whole yard, but I was able to pick up 90% of the remaining 0.1%. I had now cleared 99.99% of the leaves in my yard.

The sun was starting to set and all that was left were mostly little leaf fragments that could only really be picked up by tweezers.

I went inside and asked my wife, “Where are the tweezers?” “Why do you need tweezers to paint the fence?” she asked. “Paint the fence?” I thought. Oh, yeah. I had also agreed to paint the fence today. I told her I hadn’t started on the fence yet and wouldn’t be able to do that this weekend because it was getting late and the Cowboys were playing the next day. She was not happy.

Yes, this story is ridiculous and contrived, but it demonstrates some good points that we apply to how we manage system reliability and new feature velocity at Indeed.

Where did I go wrong? 

It was way before I thought about getting the tweezers. When I started raking, my definition of a successfully raked yard was too vague. I did not have a service level objective (SLO) specifying the percentage of my yard that could be covered in leaves and still be considered well-raked by my clients.

Should I have defined the SLO?

I could have defined the SLO, but I might have based it on what I was capable of achieving. I was capable of picking up bits and pieces of leaves with tweezers until I had a 99.999% leaf-free yard. I could have also gone in the other direction (if it wasn’t a bye week) and determined that raking 90% of the leaves would be sufficient. 

SLOs should be driven by the clients who care about them 

The clients in my story are my HOA and my wife. My HOA cites me when my yard is only 50% raked for an extended period of time. My wife says she is happy when I rake 99% of the leaves once a year. For the SLO, we would take the higher of the two. I could have quit raking leaves after the second pass when I reached 99% and had time to paint the fence (depending on the SLO for the number of coats of paint).

But, I still did a good job, right?

I did, but I far exceeded my undefined SLO of 99% by two 9s, and yet I was not rewarded. Sadly, I was punished, because my wife didn’t care about the work I did on that remaining 1% and was upset that I didn’t have the time to meet my other obligation of painting the fence.

This brings us to the moral of the story:

We need to have the right SLOs and work to exceed them, but not by much. 

At Indeed, when our SLOs describe what our users care about, we avoid the effort of adding unnecessary 9s. We then use that saved effort to deploy more features faster, achieving a balance between reliability and velocity.


About the author

Andrew Ford is a site reliability engineer (SRE) at Indeed, who enjoys solving database reliability and scalability problems. He can be found on the couch from the start of College Gameday to the end of the East Coast game most Saturdays from September to December.

Do you enjoy defining SLOs that your clients care about? Check out SRE openings at Indeed!


Cross-posted on Medium.

Tweet about this on TwitterShare on FacebookShare on LinkedInShare on RedditEmail this to someone

Indeed Open Source: All Things Open 2019 Speakers

We’re excited to have three Indeed representatives presenting at All Things Open this year. Join us in Raleigh, NC October 13-15 for engaging discussions.

Indeed talks at All Things Open 2019

Sustaining FOSS Projects by Democratizing the Sponsorship Process

Tuesday, October 15 | 10:45am | Room 201
Speaker: Duane O’Brien, Indeed head of open source

Within a given company, there are typically only a few people involved in deciding which FOSS projects and initiatives to support financially. This year we decided to change all that and democratize the decision making process. We set up an internal FOSS Sustainability Fund and invited everyone to participate in the process.

This talk examines how we got executive buy-in for the fund, set it up, and encouraged participation. It also explores the fund’s impact on our engineering culture.


Using Open Source Tools for Machine Learning

Tuesday, October 15 | 10:45am | Room 301A
Speaker: Samuel Taylor, Indeed data scientist

Machine learning can feel like a magic black box, especially given the wealth of proprietary solutions and vendors. This beginner-friendly talk opens the box. It reveals the math that underlies these services and the open source tools you can use in your own work. It introduces machine learning through the lens of three use cases:

  • Teaching a computer sign language (supervised learning)
  • Predicting energy usage in Texas (time series data)
  • Using machine learning to find your next job (content-based filtering)

You’ll walk away prepared to practice machine learning in the real world.


Your Company Cares about Open Source Sustainability. But Are You Measuring and Encouraging Upstream Contributions?

Tuesday, October 15 | 2:15pm | Room 201
Speaker: Dani Gellis, Indeed software developer

You encourage the behavior that you measure. If you want your company to help sustain the open source projects you depend on, start by measuring how your employees participate in those projects.

How many of your engineers contribute to projects your company consumes? Do they only open issues, or do they contribute code? Are they part of the conversation? Are your non-engineers also involved in the open source community?

This talk demonstrates how we use open source tools to measure the velocity of our employees’ open source contributions, as well as how Indeed chose these tools. It covers the evolution of our tooling as our open source program has grown. And it reveals our exciting new initiatives to promote sustainable contributions.

You’ll leave with new ideas for measuring and improving your organization’s contributions to open source projects.


Cross-posted on Medium.

Tweet about this on TwitterShare on FacebookShare on LinkedInShare on RedditEmail this to someone

Normalizing Resume Text in the Age of Ninjas, Rockstars, and Wizards

Left to right: Ninja by Mwangi Gatheca, Rockstar by Austin Neill and Magic by Pierrick Van Troost

At Indeed we help people get jobs, which means understanding resumes and making them discoverable by the right employers. Understanding massive amounts of text is a tricky problem by itself. With source text as varied as resumes, the problem is even more challenging.

Everyone writes their resume differently, and there are some wild job titles out there. If we want to correctly label resumes for software engineers, we have to consider that developer wizard, java engineer, software engineer, and software eng. may all be the same job title. In fact, there may be thousands of ways to describe a job title in our more than 150 million resumes. Human labeling of all of those resumes—as well as new ones created every day—is an impossible task. 

So what is your job, really?

To better understand what a job actually is, we apply a process called normalization to the job title. Normalization is the process of finding synonyms (or equivalence classes) for terms. It allows us to classify resumes in a meaningful way so that employers can find job seekers with relevant experience for their job listings. 

For example, if we determine that software engineer and software developer are equivalent titles, then we can show employers searching for software engineers additional resumes with the title software developer. This is particularly useful in regions with fewer resumes for a job title the employer wants to fill.

Normalizing job titles, certifications, company names, etc. also helps us use resume information in machine learning models as features and labels. We want to know if biology on a resume has the same meaning as bio or even a common misspelling like boilogy. If we want to predict whether a job seeker has a nursing license, we have to correctly label resumes with RN and registered nurse.

How do we normalize text?

There are many ways to normalize text. For a quick initial model, we can measure how similar strings are to one another. We apply two common methods for measuring these string distances, Levenshtein distance of characters and Jaccard distance of phrases. We measure these distances by characters (to capture misspellings) and by words (so we can group cell biology major and cell biology together). 

Step 1: Preprocessing

As with most text-related models, we must first clean the text data. This preprocessing step removes punctuation from terms, replaces known acronyms and abbreviations with full names, replaces synonyms with more common variants, and stems the words, e.g., removing suffixes such as ing from verbs.

Step 2: Term frequency

After that, we define a term frequency threshold. If a string falls below this threshold, we do not consider it as a potential normalized value. 

Step 3: Minhash

Once we remove low count strings, we have to classify the terms into groups. The most common technique for this kind of grouping involves determining the distance between terms. How different is boilogy from biology?

To prepare, we need to address a computational power problem. We often have millions of unique strings coming from resumes for each field, e.g., for company names. Finding the distances between all pairs of strings is slow and inefficient, since the number of comparisons needed is as follows:

….where n is the number of values. For one million different strings, we would need about 500 billion comparisons. We have to reduce the number of pairwise comparisons to make string distance computation feasible. 

To address this challenge, we use locality sensitive hashing. This set of algorithms hashes similar items together in buckets and can approximate string distance. In particular, the minhash algorithm approximates Jaccard distance, which is the intersection of a set of items over the union of that set. 

Approximating Jaccard distance with minhash is an easy way to measure string distances defined by the words they contain. Using minhash vastly reduces the number of comparisons that we need by only comparing the strings that are in the same minhash bucket. 

Once we carry out minhash and remove a large number of the comparisons we have to make, we calculate a normalized version of the Levenshtein distance to get a character-based distance metric. 

 

Step 4: Levenshtein distance

We then remove pairs with very high Levenshtein distances. Ultimately we are left with groups of pairs that are quite similar, like cell biology and cell biology major.

Step 5: L2 norm

If similar strings are grouped together, it makes sense to choose the normalized value from that group. But which one? Which values of any given string should we designate as the standard (normalized) value? 

To determine this without outside information such as labels, we look at the frequency of strings in our corpus of resumes. Frequently occurring strings are likely to be the more standard values for a string. 

However, we do not want to rely solely on frequency to choose our normalized value. The most frequent value could be a good standard for most strings in that group, but not all of them. A group could have pairs that contain French, French language, and French language and economics. In this case, we might want to normalize the first two strings together, but not the third. 

To address this problem, we create a vector of features for each pair. This vector contains the two distance measures and the weighted inverse of the frequency of the more common term (wf where w is the weight and f is the frequency of the term in the corpus). We use an inverse so that the number output is lower when the string is higher frequency—this is consistent with string distances being lower when similarity is higher. 

We then normalize strings to the term with the lowest vector magnitude (L2 norm) based on those three features. This results in better normalization accuracy as determined by human labelers.

A worked example

Here is how this normalization works in practice. Below is a table of job titles we will consider, as well as their distances from the first job title, Java developer II.

We apply the following steps:

In step 1, during preprocessing, we remove extraneous words such rockstar and stem the remaining words, removing endings like er.

In step 2 we determine which job titles have the necessary number of counts to be potential normalized job titles, based off a threshold of 1,000. Rockstar java developer does not make the cut. 

In step 3 we use the minhash algorithm to group the titles by Jaccard distance, and discard any job titles from the group with a distance of > 0.7. Barista and Night shift janitor are discarded from the group. 

In step 4 we calculate the Levenshtein ratio, and discard job titles from the group with a ratio of > 0.3. Developer is discarded. 

And lastly, in step 5 we select the standard value based on finding the shortest vector of the Jaccard distance, the Levenshtein ratio, and the w/counts (the L2 norm). Since this is a group of two strings, the distances are the same and only the counts feature is different. Here we use a weight of 50. The vectors are:

  • Java developer II [0.33,0.15,0.005]
  • Rockstar java developer [0.33,0.15,0.5]

The normalized value becomes Java developer II since the L2 norm of the first vector is 0.36, less than that of the second vector 0.62.

Is this the best way to approach normalization?

Many other techniques can normalize text and take into account distant synonyms by considering context around the terms of interest. In fact, we are currently working on including phrase embeddings in this framework. In the meantime, our current approach works for us by greatly reducing the amount of time needed to come up with a new normalization for any field in structured text. With a little tuning, this model can work well for many of the 28 languages found in Indeed resumes.

This method also works for different types of data sets. It can apply to job descriptions and even Indeed Questions—the questions that employers use to screen applicants. Normalization does not circumvent the need for expert human judgment. However, it is helpful in aiding and scaling these experts for use in a large international product.

Normalization is the bread and butter of understanding text. It might not be as exciting as text generation or deep learning classifiers, but it is just as important. Normalization helps search engines by finding synonyms. It aids in creating features and labels for machine learning models, and makes analysis of data many times easier. Models like the one described here can speed up the normalization process so we can expand to new countries without years of work. These models can also adapt to new data easily so we can update our normalization to a changing lexicon. 

With mathematical models for normalizing text, Indeed can better understand job seekers and employers and adapt to changes, ultimately helping us help people get the jobs they want.

Cross-posted on Medium.

Tweet about this on TwitterShare on FacebookShare on LinkedInShare on RedditEmail this to someone