Does Your Job Title Matter?

The Importance of Picking the Right Job Title for Your Job

Job titles are often the first interaction between job seekers and employers. As a job seeker searches, they click relevant titles before getting to know the role more deeply through its job description. Calling a job “software engineer” versus “programmer” will likely lead to a different number of applicants and proportion of those meeting the minimum qualifications, but just how different? Surprisingly, after a single word change in nearly identical job titles, we observed more qualified candidates and more total candidates. This post describes our initial research and how we can improve on this in the future.

Data and Product Science at Indeed

There are two main roles in Indeed’s Data Science organization — data scientists and product scientists. Indeed currently has data/product scientists in five offices: Austin, San Francisco, Seattle, Singapore, and Tokyo, working on a wide variety of product and engineering teams.

Both roles employ advanced statistical and machine learning techniques to help people get jobs. Data science has a higher emphasis on machine learning and software engineering, while product science focuses on experiments, analysis, and simpler models that can improve the product. In short, data scientists are closer to software engineering than product management, and vice versa for product scientists.

You can view the differences in the job descriptions here: (Product Scientist/Data Scientist). Despite their differences, the ultimate requirements for data and product scientists are essentially the same: a deep understanding and experience in mathematics and computer science, and domain expertise.

Palmer, Shelly. Data Science for the C-Suite. New York: Digital Living Press, 2015. Print. Conway, Drew. The Data Science Venn Diagram. http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

Sequential test: Changing the job title

To find out how job titles affect the hiring process, we conducted an experiment and changed the Product Scientist title to “Data Scientist: Product” in Seattle and “Product Scientist: Data Science” in San Francisco on March 15, while keeping the job title unchanged for Austin. Job descriptions remained the same for all three cities.

Engineering work was required for an A/B test, so we chose to look at this sequentially. We conducted a statistical power analysis to determine the sample size ahead of time. We first compared the click-through rate (defined as clicks/impressions) and number of applies for the three cities before and after March 15. From the following two charts, we see both the number of applies and the click-through rate jumped up since March 15 for Seattle and San Francisco (SF). We performed t-tests that show that applies and clickthrough rates are significantly higher for Seattle and San Francisco than for Austin starting from March 15.

appgrowthrates

Click-through Growth Rates in Austin, San Francisco, and Seattle

ctgrowthrates

However, changing the job titles might affect the job search ranking, and we know the top and bottom ranked jobs on a page usually have a higher probability of being clicked. In order to account for this position bias, we conducted a logistic regression to predict clicks on page, position on the SERP, city (Austin, Seattle or San Francisco), and whether we changed the job title. We also included the interaction term between city, and if we changed the job title to test the hypothesis that log-odds ratios for various cities are different after changing titles than before changing titles.

The regression equation was estimated¹ as follows:

 

The non-parallel lines in the interaction plot below suggest that there are significant interaction effects, which the associated significant p-values for interaction terms confirms.

Before changing titles, the equation is simply:

Switching from Austin to Seattle yields a change in log odds of -0.18 and to San Francisco yields a change in log odds of -0.09.

After changing titles, the equation is:

Switching from Austin to Seattle yields a change in log odds of -0.18+0.6 = 0.42 and to San Francisco yields a change in log odds of -0.09+0.71 = 0.62

The graph below also confirms that log-odds ratio for Seattle and San Francisco are much higher after changing titles vs before changing titles. To sum up, we see significantly higher applicants for cities with changed titles.

citychange

Qualified application model

We see more applicants after changing titles, but is this pool of applicants more suitable for the role? A team at Indeed has developed a model that scores the likelihood of a resume containing skills and experiences that meet the requirements in a job description.

We applied this model to all candidates who applied for “Product Scientist” (before changing titles) from February 1 to March 14 and got the scores² for each candidate. The mean scores for Austin, Seattle and San Francisco were 0.489, 0.498, and 0.471 respectively. The plot below shows the score Kernel Density Estimation (KDE) for Austin, Seattle, and San Francisco, and the chart shows the p-values (insignificant) for t-tests and Kolmogorov-Smirnov (KS) tests. The KS test tries to determine if two samples are drawn from the same distribution. The test is nonparametric and makes no assumption about the data distribution. Both tests indicate that our applicant qualification rate was at the same level for all three locations before changing titles.

kdesbefore

When the model was applied to all applicants after changing titles, the mean scores for Austin, Seattle, and San Francisco were 0.466, 0.516, 0.528 respectively. We observed a small decrease in the mean rate for Austin, accompanied by increases in Seattle and San Francisco. The plot below shows the score distributions for Austin, Seattle, and San Francisco. After controlling the False Discovery Rate to adjust for p-values, both tests indicate that applicant qualification rates with changed titles (Seattle and San Francisco) are significantly higher than those with the original title (Austin), while there is no significant difference between different changed titles (Data Scientist: Product and Product Scientist: Data Science).

kdesafter

Are you surprised by these findings? Our pilot research shows that simply making small changes to job titles led to more and better qualified candidates for Indeed. Job titles do matter, more than you think — they are great attention catchers and a prime focus as much as the job descriptions. So, you should care about your job titles and pick ones that can be noticed and easily stand out for job seekers.

For further reading, more rigorous approaches to establishing causal effect include:

If you are interested in using the scientific method to improve or develop products and help people get jobs, check out our open Product Scientist and Data Scientist positions at Indeed!

This is the second article in our ongoing series about Data Science from Indeed. The first article is There’s No Such Thing as a Data Scientist from our colleague, Clint Chegin.


Footnotes:

1.P-value for the hypothesis test for which the Z value is the test statistic. It tells you the probability of a test statistic being at least as unusual as the one you obtained, if the null hypothesis were true (the coefficient is zero). If this probability is low, it suggests that it would be rare to get a result as unusual as this if the coefficient were really zero. Signif. code is associated to each estimate and is only intended to flag levels of significance. The more asterisks, the more significant p-values are. For example, three asterisks represent a highly significant p-value (if p-value is less than 0.001).

2. These model scores are non-standardized and not probabilities. An application score of 0.8 represents a higher likelihood relative to an application with a score of 0.4 (but doesn’t mean twice as likely).

3. Bollen, K.A.; Pearl, J. (2013). “Eight Myths about Causality and Structural Equation Models”. In Morgan, S.L. Handbook of Causal Analysis for Social Research. Dordrecht: Springer. pp. 301–328.

4. Sekhon, Jasjeet (2007). “The Neyman–Rubin Model of Causal Inference and Estimation via Matching Methods” (PDF). The Oxford Handbook of Political Methodology.


Cross-posted on Medium.

Tweet about this on TwitterShare on FacebookShare on LinkedInShare on RedditEmail this to someone

There’s No Such Thing as a Data Scientist

The Inconsistent Definitions of Data Science and More Descriptive Titles

Images from Left to Bottom: 1) Link, By smoothgroover22, License, cropped. 2) Link, By NazWeb, License. 3) Link, By BalticServers.com, License. 4) Link, By Wallpoper. 5) Link, By The Opte Project, License, cropped


What do you really do?

There’s a memorable scene in the movie Office Space where consultants determining employee productivity start by asking, “What would you say… you do here?”

That scene and the “What I Do” images are funny because we empathize with the struggle to describe our jobs. It’s not funny, however, when the same misunderstanding occurs during the job search. It’s important to understand what a job posting means. It’s important for prospective employers to understand our skills and abilities. We’ve all viewed job postings with the same title, but with totally different descriptions.

How can the same title mean such vastly different things from one company to another?

This phenomenon is becoming increasingly common in the field of data science. The discipline has dramatically risen in popularity over the past few years. And while the number of data science jobs has increased, clarity around the role has declined. This post takes advantage of Indeed’s tremendous amounts of behavioral data to describe trends in the field and more specific definitions for data science roles.

The growing popularity of data science

Jobs matching “data scientist” have risen from 0.03% of jobs to about 0.15% (+400%) in a 4-year span.

Even earlier in 2012, a much ballyhooed article called Data Scientist the “Sexiest Job of the 21st Century.” If the title alone isn’t enough, maybe folks are interested for monetary reasons. According to Indeed’s salary data, a data scientist makes an average of $130k per year.

 

OK. Got it. Data science has taken off like discounted Nutella in a European supermarket. With this rise, we’ve also seen the refinement of more specific roles within the discipline. Our colleague Trey Causey wrote about the convergence between product managers and data scientists in the “Rise of the Data Product Manager.”

Many of us at Indeed also felt that the title “data scientist” has recently become more of a catch-all for many different sets of responsibilities. We wanted to dig deeper and test our intuition. Could we find natural delineations of roles within the job market? Could we use data to understand the differences within these titles and better classify them for clarity and consistency?

Spoiler Alert: We can.

Overlapping careers in data science

For this analysis of job titles, we looked at all site visitors who entered the search query “data scientist” on Indeed for the month of January 2018. Next, we looked at other searches these same users performed. We created a matrix for each user by their searches and another for searches by users. We calculated the cartesian product of these matrices to show the frequency between any pair of search terms:

Next, we removed “data scientist” from the data, as this search was present for all users. We used an R package called “igraph” to do the clustering and visualization. According to the igraph documentation, “this function implements the fast greedy modularity optimization algorithm for finding community structure.” While researching this algorithm, we learned that it was designed to quickly create communities from large data sets that have sparse regions. Hmm, that sounds exactly like the data we are using!

Here’s a great obligatory equation we can add for how this works. You’ll have to read that paper to understand what it means.

 

Next, we wrote a function with a pruning parameter to choose the minimum number of vertices in each cluster. This parameter is best set by “guess and check,” as higher numbers don’t necessarily mean more total groups and vice versa. We tried various numbers from 3–20 and checked to see if the groups made sense. We didn’t care about really small clusters and we wanted the queries to fit together. More on this later.

By choosing five as the pruning threshold, four clusters formed. We subsequently labeled these clusters “business intelligence”, “statistician”, “machine learning engineer”, and “natural scientist”.

Here are the queries that make up each group:

See the Pen Job Title Network Graph by Erik Oberg (@obergew) on CodePen.0

Thanks to Erik Oberg for the CodePen viz

And here’s how the clustering turned out:

clusteringresults

Thanks to Zhuying Xu for the Plotly viz

From the preceding chart, we see a few interesting things.

First, there is clear demarcation between statistician and machine learning engineer. Since we don’t see many searches that cross over between these roles, this suggests two distinct career paths.

Second, business intelligence doesn’t seem to have a clean grouping. It is dispersed broadly across the other roles. This contrasts with natural scientist searches, which seem to overlap more with statistician searches. This tells us that job seekers who search for business intelligence might be looking at a wide variety of other jobs within the data science realm. It could also mean that business intelligence positions are being called data science more often now. Further, it seems job seekers who search for machine learning engineer or statistician don’t search for jobs in both categories.

Finally, we see that some natural scientists are perhaps getting into data science through the statistician end of the data science spectrum.

More descriptive roles in data science

From these findings, we would posit that there is no single type of data scientist. Rather, there are many types! There is no single description of a data scientist and thus this title alone doesn’t give us enough information. Data science as a title could translate to a variety of different roles in practice.

Taken together, it’s important to gather more information to understand what it means to be a data scientist at a given company. We believe it would be helpful for employers to think in terms of the roles identified in our clustering. This will help them find the candidates they need and enable job seekers to apply for the jobs they want.

At Indeed, we have a few “data” roles: data engineer, BI developer, BI analyst, product scientist, and data scientist. It looks something like this:

Data Science Job Strengths

Thanks to Ron Chipman for helping put this together

It’s easy to see how confusing this can become. From searching patterns we’ve observed, if someone were to say, “I want to be a data scientist at Indeed,” it could be unclear which team or title would be the best fit. Each team has different interview processes and contributes in different ways, so it’s really important to apply to the right one.

This is the first blog post in a series diving more deeply into data science insights from Indeed. In upcoming posts, we’ll explore the skills associated with data science jobs. We’ll showcase trends and the overlap from each of these more specific job titles. We’ll also describe what skills you should gain if you are interested in a particular career path. We’ll provide employers with tips to interview better for the specific needs of their organization. Finally, we’ll describe “Job Title Supernovae” — jobs that grow quickly and fade away.

Will the title “data scientist” die away like “webmaster” did in the 90s? Subscribe or tune in to future posts for that prediction and more!

At Indeed, We Help People Get Jobs and we hope to help you too. If any of these roles have excited you, please check out www.indeed.jobs and apply today!


Footnotes

A. Clauset, M.E.J. Newman, C. Moore: Finding community structure in very large networks, http://www.arxiv.org/abs/cond-mat/0408187


Cross-posted on Medium.

Tweet about this on TwitterShare on FacebookShare on LinkedInShare on RedditEmail this to someone

Market Your Data Science Like a Product

A 7-Step ‘Go-to-Market’ Plan for Your Next Data Product

Why do internal tools need marketing?

Have you ever developed a great solution that never gets used? Accuracy, statistical significance, model type: none of these matter if your data product is not put into action. Positively impacting your organization as a data scientist means developing high quality data products and successfully launching those data products.

As a product scientist at Indeed (product science is a team in data science — learn more here!), I think about launching both business products and internal data products. This has helped me see that marketing techniques for launching goods and services can also be applied to launching data products internally. With this perspective, I’ve helped the tools I developed become among the top 10% most used at Indeed.

I have broken down what I do into seven steps:

  1. Naming/branding
  2. Documentation
  3. Champion identification
  4. Timing
  5. Outreach
  6. Demoing
  7. Tracking

1. Get an MBA name

Your product needs a name that’s MBA: Memorable, Brandable, and Available.

Indeed runs over 500 IPython notebook web applications for internal reporting each day. We’ve developed and deployed over 12,000 IPython notebook web applications. In this rich reporting environment, data products need a way to distinguish themselves from one another. It’s hard to summarize the months you have spent exploring data, developing a model, and validating output into just a few words, but it also can shortchange your work to go with “The model” or “The revenue/ job seeker behavior/ sales thing I have been making!”

Identify your high-quality data products in ways that signal your past and future investment in the work.

Memorable

Apple and Starbucks are two of the most valuable brands in the world. Still, only 20% of people in a study by Signs.com could draw the Apple logo perfectly and only 6% for Starbucks. This points to the power of the name. People do not need to remember exactly how a logo or your data product looks and works, but they need to be able to recall it by name.

Memorable names are often:

Pronounceable. They start with a sharp sound and roll off the tongue. Research on English speakers suggests names with initial plosive consonants (p, t, k) are more memorable, but also see research on word symbolism.

Plain. They frequently repurpose common words (e.g., Apple or Indeed), which help you combine rich mental images to your product. Be aware that discoverability through search may be limited when using common words. Slightly modifying the word can help overcome this (Lyft) as long as it’s memorable.

Produced. They can even be entirely new. Making up a new word is also a strategy (Google, Intel, Sony, or Garmin), but this requires substantially more initial seeding to establish the name. This may not be in line with the audience and timeframe of an internal data product launch.

Brandable

You want your name to consistently represent the identity of the data product and reflect an overall positive attitude towards it. This way it can be incorporated seamlessly into the tool and documentation.

Available

Make sure no one else has called their data product the same thing!

Once you have picked the name, you can dress it up with a logo. The logo can simply be your MBA name that’s been stylized following the same MBA principles. A shortcut like Font Meme Text Generator can quickly create a sufficient design.

For example,

2. Document the product

You know what your code does. But what if you’re not around to answer questions, or give a demo when the CEO or a curious new intern ponder to themselves, “What does this thing do?”

Documentation is not only good practice as a data scientist/developer, it is also an opportunity for your work to be found. When one business wants to know if another business has the products and services it needs, 71% start with a simple Google search. Similarly, in addition to being valuable for your user group, wiki documentation and code comments create searchable content that helps your work get discovered.

When writing your documentation, identify:

  • the main problem your data product is solving
  • key features and how they solve the problem
  • key definitions
  • key technical aspects that need to be explained

Documenting your product’s journey can also help build trust in the product. Use consistent messaging by including your MBA name and logo within the documentation to further establish your brand.

3. Identify champions

Who else “gets” the problem you are trying to solve and how the data product delivers a solution?

Seek out people who are affected by that problem, and share your work with them. Also, look to your own team members who have participated in the build or know your work. These champions can recommend your work to others who would also appreciate the solution.

Identifying champions is analogous to customer advocacy in consumer business. Word-of-mouth is a leading influencer across continents and generations for ~83% of consumers (according to a study by Nielsen) when making a purchase decision. Your data product champions will be your top sales reps, lending credibility to the tool and answering questions when you are not around.

4. Timing is everything

Before each launch, consider the current business environment, and time your launch accordingly. The moment you have finished working on your data product is not necessarily the best time to launch it. For example, a product team may be in the middle of fixing a major bug and not ready for a new idea. Conversely, an upcoming related communication activity (e.g., blog post) could be an opportune time for a release with cross promotion.

Look at other recent data products: When were they released and how were they received? Stakeholders can feel inundated with too many new dashboards and models and this may even contribute to “analysis paralysis.”

5. Know your audience

If your champions are not happy, your product can lose its luster in a Snap. Developing positive working relationships with your champions and users is important for the early and long-term success of your data product.

Identify and reach your audience — those who will be using what you’ve made and can benefit from it. With this target audience in mind, comment on tickets, post on Slack, chat, send emails to relevant groups, or go directly to talk to your audience.

Use your audience’s preferred channels to communicate development progress, releases, and feedback. Establishing this communication will build early confidence in your data product. As iteration requests come in, you will have the opportunity to build this confidence with thoughtful acknowledgement of requests.

In 2017, Indeed’s Data Science Platform team — software engineers who built a machine learning deployment framework — went on a roadshow to Indeed’s multiple tech offices to share the data science platform framework. This was a great example of engaging with an audience across offices.

6. Go live!

Only you can see the picture in your mind of how something works. Demoing is a powerful way to communicate what your new data product does. A great way to do this is by getting a minimum viable data product, a prototype, out early to your champions.

Examples include creating a working application with minimal data, sketching a mockup of a dashboard, or taking screenshots. See more examples of consumer products on Forbes. As a demo to explain a sales lead qualification machine learning model to the Sales organization, the product science team built a simple interactive web app that returned the model results when a user changed the value of the model features with sliders.

7. Own the results

It’s not that I’m so smart, it’s just that I stay with problems longer.” — Albert Einstein

You may love the theoretical foundation and implementation of your data product, but ultimately the success of a data product comes down to the user. Long term marketing and retaining users depends on how much you can ensure reliability. Reliability is key to building your data product’s brand, your reputation and your technical credibility. This affects the marketing for your other current and future data products as well. It’s worth noting that this doesn’t mean perfection — it often just means dealing with problems quickly, fully and transparently.

Monitor key metrics of your data product to see how it’s working and what its impact is. Actively seek and be responsive to feedback. Evaluate if your data product is achieving its intended objectives and determine if features can be improved to better suit your audience.

If you are not achieving impact or the tool is not being used, revisit your initial assumptions about the problem you thought you were solving. Then, talk to your users (and non-users) about what might not be working. Be willing to destroy and start again, and create something even better with a new perspective. The initiative to iterate and improve your data product tools requires persistence but will raise the quality of your data products and enhance the rest of your marketing efforts.

Final thoughts

Teams outside the analytics community depend on your marketing efforts to learn about your data products that can make them and the company more effective. You don’t have to wait until the product is finished to start letting other teams know about the product. The marketing can start with documentation, champion identification, and outreach as soon as initial requirements are being gathered.

That being said, creating a data product of quality is a priority over marketing for data science, so choose what you market. A data scientist’s credibility is essential for people to trust your data-driven recommendations and act on them. Ensure that you’re investing it wisely.

If you are passionate about both developing great data products and making sure your data products have impact, check out product science and data science at Indeed!

Tweet about this on TwitterShare on FacebookShare on LinkedInShare on RedditEmail this to someone