Qualitative + Quantitative: How Qualitative Methods Support Better Data Science

Have you ever been embarrassed by the first iteration of one of your machine learning projects, where you didn’t include obvious and important features? In the practical hustle and bustle of trying to build models, we can often forget about the observation step in the scientific method and jump straight to hypothesis testing.

Data scientists and their models can benefit greatly from qualitativeScientific method steps in order of progression: Observation, Research Question, Hypothesis, Experiment, Analysis, Conclusion methods. Without doing qualitative research, data scientists risk making assumptions about how users behave. These assumptions could lead to:

  • neglecting critical parameters,
  • missing a vital opportunity to empathize with those using our products, or
  • misinterpreting data.

In this post, we’ll explore how qualitative methods can help all data scientists build better models, using a case study of Indeed’s new lead routing machine learning model, which ultimately generated several million dollars in revenue.

What are qualitative methods and how are they different from quantitative methods?

Few data scientists are formally trained in qualitative methods. They’re more deeply familiar with quantitative methods like A/B testing, surveys, and regressions. Quantitative methods are great for answering questions like “How much does the average small business spend on a job posting?”, “What are the skills that make someone a data scientist?”, or even “How many licks does it take to get to the center of a Tootsie roll pop?” (The answer is 3. Three licks.)

But there are some questions that quantitative methods can’t answer, such as “Why do account executives reach out to this lead instead of that lead?” or “How do small businesses make the decision to sponsor or not sponsor a job?” Or the truly deep question: “Why do you want to get to the center of the Tootsie roll pop?”

To answer these questions, qualitative researchers rely on methods like in-depth interviews, participant observation, content analysis and usability studies. These methods involve more direct contact with who and what you’re studying. They allow you to better understand how and why people do what they do, and what kinds of meaning they ascribe to different behaviors.

Put another way, quantitative methods can tell you the what, the how much, or how often; qualitative methods can tell you the why or the how.

Cartoon created by Indeed UX Research Manager Dave Yeats using cmx.io

Why should you use qualitative methods? A case study in Lead Generation

Our Lead Generation team recently benefited greatly from the use of qualitative methods. When an employer posts a job, it represents a revenue opportunity for Indeed. We route that employer to an account executive, who then reaches out and helps the employer set an advertising budget to sponsor their job. This increases the job’s visibility and therefore the velocity at which they make a successful hire. Employers who have not yet spent with us on Indeed are referred to as “leads.”

Some leads are better than others, however. We wanted to be able to give leads a score on a scale from one to five stars that would indicate our best estimate for whether or not they would spend. Our Product Science team decided to build a machine learning model that would score leads and route them more effectively. But where to start? Prior to this project, we had little experience with lead scoring and little intuition about what a good lead would look like. How could we even know what features should be in our model?

To answer that question, we turned to people with the most hands-on experience with leads: account executives themselves. Not only are they experts on what makes a good lead, they would also be the beneficiaries of our efforts. We took a three-pronged qualitative approach:

  • Observation. To learn about the day-to-day sales experience, each member of our team shadowed different reps and listened to them on sales calls. We observed how they would select which lead to call, how they would decide what to talk about on the call, and how they actually made deals.
  • Interviews. We sat down with several sales managers and representatives across the company and asked them questions about leads they had previously decided to call or drop, like “How do you pick which leads to call first?” or “Why did you decide to drop this lead?”
  • Content Analysis. We combed through thousands of open-ended responses to a company-wide survey of account executives to better understand their pain points with regards to leads.

We learned a lot! Just by doing three simple qualitative studies for a few hours, we collected a long list of potential features. Had we not sat down next to members of the sales team and observed as they worked, we would have never obtained these insights. Our next step was to start digging into the data and validating how generalizable the findings from reps were.

With the intuition we gained from our qualitative studies on account executives’ behaviors and thought processes, we ultimately built a machine learning model that generated millions of dollars in annual incremental revenue. And we didn’t stop there: we kept interviewing and shadowing reps to get their feedback on the model. We built a new version that generated a additional annual incremental revenue. And we made sure to market our new model so people knew about it.

In short, these qualitative studies kept us grounded and built empathy with our end users. Without qualitative studies, the models we built would have been out of touch with reality and made it harder for us to address our users’ needs. With qualitative methods, we infused our models with intuition and working hypotheses that we could later verify with quantitative data.

Where to start learning the basics for qualitative methods

In the case study above, our end users were our coworkers here at Indeed. It’s worth noting that it’s not always as simple to conduct qualitative studies with external users. Here at Indeed, we have a fantastic UX Qualitative Research team to turn to for these kinds of studies. We encourage you to reach out to such teams at your own companies, and if they don’t exist yet, create them. Work with them. Shadow them. Buy them a beer. They are wonderful!

But don’t just stop there. Below are some of our favorite readings and resources on qualitative methods, recommended by former academics here at Indeed.

If you are passionate about methods and data science, check out product science and data science jobs at Indeed!


How Qualitative Methods Support Better Data Science—cross-posted on Medium.

Does Your Job Title Matter?

The Importance of Picking the Right Job Title for Your Job

Job titles are often the first interaction between job seekers and employers. As a job seeker searches, they click relevant titles before getting to know the role more deeply through its job description. Calling a job “software engineer” versus “programmer” will likely lead to a different number of applicants and proportion of those meeting the minimum qualifications, but just how different? Surprisingly, after a single word change in nearly identical job titles, we observed more qualified candidates and more total candidates. This post describes our initial research and how we can improve on this in the future.

Data and Product Science at Indeed

There are two main roles in Indeed’s Data Science organization — data scientists and product scientists. Indeed currently has data/product scientists in five offices: Austin, San Francisco, Seattle, Singapore, and Tokyo, working on a wide variety of product and engineering teams.

Both roles employ advanced statistical and machine learning techniques to help people get jobs. Data science has a higher emphasis on machine learning and software engineering, while product science focuses on experiments, analysis, and simpler models that can improve the product. In short, data scientists are closer to software engineering than product management, and vice versa for product scientists.

You can view the differences in the job descriptions here: (Product Scientist/Data Scientist). Despite their differences, the ultimate requirements for data and product scientists are essentially the same: a deep understanding and experience in mathematics and computer science, and domain expertise.

venn diagram showing requirements for a successful data and product scientist
Palmer, Shelly. Data Science for the C-Suite. New York: Digital Living Press, 2015. Print. Conway, Drew. The Data Science Venn Diagram. http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

Sequential test: Changing the job title

To find out how job titles affect the hiring process, we conducted an experiment and changed the Product Scientist title to “Data Scientist: Product” in Seattle and “Product Scientist: Data Science” in San Francisco on March 15, while keeping the job title unchanged for Austin. Job descriptions remained the same for all three cities.

Engineering work was required for an A/B test, so we chose to look at this sequentially. We conducted a statistical power analysis to determine the sample size ahead of time. We first compared the click-through rate (defined as clicks/impressions) and number of applies for the three cities before and after March 15. From the following two charts, we see both the number of applies and the click-through rate jumped up since March 15 for Seattle and San Francisco (SF). We performed t-tests that show that applies and clickthrough rates are significantly higher for Seattle and San Francisco than for Austin starting from March 15.

appgrowthrates

Click-through Growth Rates in Austin, San Francisco, and Seattle

ctgrowthrates

However, changing the job titles might affect the job search ranking, and we know the top and bottom ranked jobs on a page usually have a higher probability of being clicked. In order to account for this position bias, we conducted a logistic regression to predict clicks on page, position on the SERP, city (Austin, Seattle or San Francisco), and whether we changed the job title. We also included the interaction term between city, and if we changed the job title to test the hypothesis that log-odds ratios for various cities are different after changing titles than before changing titles.

The regression equation was estimated¹ as follows:

 

The non-parallel lines in the interaction plot below suggest that there are significant interaction effects, which the associated significant p-values for interaction terms confirms.

Before changing titles, the equation is simply:

Switching from Austin to Seattle yields a change in log odds of -0.18 and to San Francisco yields a change in log odds of -0.09.

After changing titles, the equation is:

Switching from Austin to Seattle yields a change in log odds of -0.18+0.6 = 0.42 and to San Francisco yields a change in log odds of -0.09+0.71 = 0.62

The graph below also confirms that log-odds ratio for Seattle and San Francisco are much higher after changing titles vs before changing titles. To sum up, we see significantly higher applicants for cities with changed titles.

citychange

Qualified application model

We see more applicants after changing titles, but is this pool of applicants more suitable for the role? A team at Indeed has developed a model that scores the likelihood of a resume containing skills and experiences that meet the requirements in a job description.

We applied this model to all candidates who applied for “Product Scientist” (before changing titles) from February 1 to March 14 and got the scores² for each candidate. The mean scores for Austin, Seattle and San Francisco were 0.489, 0.498, and 0.471 respectively. The plot below shows the score Kernel Density Estimation (KDE) for Austin, Seattle, and San Francisco, and the chart shows the p-values (insignificant) for t-tests and Kolmogorov-Smirnov (KS) tests. The KS test tries to determine if two samples are drawn from the same distribution. The test is nonparametric and makes no assumption about the data distribution. Both tests indicate that our applicant qualification rate was at the same level for all three locations before changing titles.

kdesbefore

When the model was applied to all applicants after changing titles, the mean scores for Austin, Seattle, and San Francisco were 0.466, 0.516, 0.528 respectively. We observed a small decrease in the mean rate for Austin, accompanied by increases in Seattle and San Francisco. The plot below shows the score distributions for Austin, Seattle, and San Francisco. After controlling the False Discovery Rate to adjust for p-values, both tests indicate that applicant qualification rates with changed titles (Seattle and San Francisco) are significantly higher than those with the original title (Austin), while there is no significant difference between different changed titles (Data Scientist: Product and Product Scientist: Data Science).

kdesafter

Are you surprised by these findings? Our pilot research shows that simply making small changes to job titles led to more and better qualified candidates for Indeed. Job titles do matter, more than you think — they are great attention catchers and a prime focus as much as the job descriptions. So, you should care about your job titles and pick ones that can be noticed and easily stand out for job seekers.

For further reading, more rigorous approaches to establishing causal effect include:

If you are interested in using the scientific method to improve or develop products and help people get jobs, check out our open Product Scientist and Data Scientist positions at Indeed!

This is the second article in our ongoing series about Data Science from Indeed. The first article is There’s No Such Thing as a Data Scientist from our colleague, Clint Chegin.


Footnotes:

1. P-value for the hypothesis test for which the Z value is the test statistic. It tells you the probability of a test statistic being at least as unusual as the one you obtained, if the null hypothesis were true (the coefficient is zero). If this probability is low, it suggests that it would be rare to get a result as unusual as this if the coefficient were really zero. Signif. code is associated to each estimate and is only intended to flag levels of significance. The more asterisks, the more significant p-values are. For example, three asterisks represent a highly significant p-value (if p-value is less than 0.001).

2. These model scores are non-standardized and not probabilities. An application score of 0.8 represents a higher likelihood relative to an application with a score of 0.4 (but doesn’t mean twice as likely).

3. Bollen, K.A.; Pearl, J. (2013). “Eight Myths about Causality and Structural Equation Models”. In Morgan, S.L. Handbook of Causal Analysis for Social Research. Dordrecht: Springer. pp. 301–328.

4. Sekhon, Jasjeet (2007). “The Neyman–Rubin Model of Causal Inference and Estimation via Matching Methods” (PDF). The Oxford Handbook of Political Methodology.


Cross-posted on Medium.

There’s No Such Thing as a Data Scientist

The Inconsistent Definitions of Data Science and More Descriptive Titles

Images from Left to Bottom: 1) Link, By smoothgroover22, License, cropped. 2) Link, By NazWeb, License. 3) Link, By BalticServers.com, License. 4) Link, By Wallpoper. 5) Link, By The Opte Project, License, cropped


What do you really do?

There’s a memorable scene in the movie Office Space where consultants determining employee productivity start by asking, “What would you say… you do here?”

That scene and the “What I Do” images are funny because we empathize with the struggle to describe our jobs. It’s not funny, however, when the same misunderstanding occurs during the job search. It’s important to understand what a job posting means. It’s important for prospective employers to understand our skills and abilities. We’ve all viewed job postings with the same title, but with totally different descriptions.

How can the same title mean such vastly different things from one company to another?

This phenomenon is becoming increasingly common in the field of data science. The discipline has dramatically risen in popularity over the past few years. And while the number of data science jobs has increased, clarity around the role has declined. This post takes advantage of Indeed’s tremendous amounts of behavioral data to describe trends in the field and more specific definitions for data science roles.

The growing popularity of data science

Jobs matching “data scientist” have risen from 0.03% of jobs to about 0.15% (+400%) in a 4-year span.

Even earlier in 2012, a much ballyhooed article called Data Scientist the “Sexiest Job of the 21st Century.” If the title alone isn’t enough, maybe folks are interested for monetary reasons. According to Indeed’s salary data, a data scientist makes an average of $130k per year.

 

OK. Got it. Data science has taken off like discounted Nutella in a European supermarket. With this rise, we’ve also seen the refinement of more specific roles within the discipline. Our colleague Trey Causey wrote about the convergence between product managers and data scientists in the “Rise of the Data Product Manager.”

Many of us at Indeed also felt that the title “data scientist” has recently become more of a catch-all for many different sets of responsibilities. We wanted to dig deeper and test our intuition. Could we find natural delineations of roles within the job market? Could we use data to understand the differences within these titles and better classify them for clarity and consistency?

Spoiler Alert: We can.

Overlapping careers in data science

For this analysis of job titles, we looked at all site visitors who entered the search query “data scientist” on Indeed for the month of January 2018. Next, we looked at other searches these same users performed. We created a matrix for each user by their searches and another for searches by users. We calculated the cartesian product of these matrices to show the frequency between any pair of search terms:

Next, we removed “data scientist” from the data, as this search was present for all users. We used an R package called “igraph” to do the clustering and visualization. According to the igraph documentation, “this function implements the fast greedy modularity optimization algorithm for finding community structure.” While researching this algorithm, we learned that it was designed to quickly create communities from large data sets that have sparse regions. Hmm, that sounds exactly like the data we are using!

Here’s a great obligatory equation we can add for how this works. You’ll have to read that paper to understand what it means.

 

Next, we wrote a function with a pruning parameter to choose the minimum number of vertices in each cluster. This parameter is best set by “guess and check,” as higher numbers don’t necessarily mean more total groups and vice versa. We tried various numbers from 3–20 and checked to see if the groups made sense. We didn’t care about really small clusters and we wanted the queries to fit together. More on this later.

By choosing five as the pruning threshold, four clusters formed. We subsequently labeled these clusters “business intelligence”, “statistician”, “machine learning engineer”, and “natural scientist”.

Here are the queries that make up each group:

See the Pen Job Title Network Graph by Erik Oberg (@obergew) on CodePen.0

Thanks to Erik Oberg for the CodePen viz

And here’s how the clustering turned out:

clusteringresults

Thanks to Zhuying Xu for the Plotly viz

From the preceding chart, we see a few interesting things.

First, there is clear demarcation between statistician and machine learning engineer. Since we don’t see many searches that cross over between these roles, this suggests two distinct career paths.

Second, business intelligence doesn’t seem to have a clean grouping. It is dispersed broadly across the other roles. This contrasts with natural scientist searches, which seem to overlap more with statistician searches. This tells us that job seekers who search for business intelligence might be looking at a wide variety of other jobs within the data science realm. It could also mean that business intelligence positions are being called data science more often now. Further, it seems job seekers who search for machine learning engineer or statistician don’t search for jobs in both categories.

Finally, we see that some natural scientists are perhaps getting into data science through the statistician end of the data science spectrum.

More descriptive roles in data science

From these findings, we would posit that there is no single type of data scientist. Rather, there are many types! There is no single description of a data scientist and thus this title alone doesn’t give us enough information. Data science as a title could translate to a variety of different roles in practice.

Taken together, it’s important to gather more information to understand what it means to be a data scientist at a given company. We believe it would be helpful for employers to think in terms of the roles identified in our clustering. This will help them find the candidates they need and enable job seekers to apply for the jobs they want.

At Indeed, we have a few “data” roles: data engineer, BI developer, BI analyst, product scientist, and data scientist. It looks something like this:

Data Science Job Strengths

Thanks to Ron Chipman for helping put this together

It’s easy to see how confusing this can become. From searching patterns we’ve observed, if someone were to say, “I want to be a data scientist at Indeed,” it could be unclear which team or title would be the best fit. Each team has different interview processes and contributes in different ways, so it’s really important to apply to the right one.

This is the first blog post in a series diving more deeply into data science insights from Indeed. In upcoming posts, we’ll explore the skills associated with data science jobs. We’ll showcase trends and the overlap from each of these more specific job titles. We’ll also describe what skills you should gain if you are interested in a particular career path. We’ll provide employers with tips to interview better for the specific needs of their organization. Finally, we’ll describe “Job Title Supernovae” — jobs that grow quickly and fade away.

Will the title “data scientist” die away like “webmaster” did in the 90s? Subscribe or tune in to future posts for that prediction and more!

At Indeed, We Help People Get Jobs and we hope to help you too. If any of these roles have excited you, please check out www.indeed.jobs and apply today!


Footnotes

A. Clauset, M.E.J. Newman, C. Moore: Finding community structure in very large networks, http://www.arxiv.org/abs/cond-mat/0408187


Cross-posted on Medium.