Open Source at Indeed: Sponsoring the Open Source Initiative

Open Source Initiative logoAt Indeed, we’re committed to taking a more active role in the open source community. In 2018, we joined the Cloud Native Computing Foundation and began sponsoring the Python Software Foundation, the Apache Software Foundation, and Outreachy. We also began sponsoring the Open Source Initiative (OSI) and we are pleased to renew our sponsorship for 2019.

Since 1998, the Open Source Initiative was formed to hold the definition of what open source means. The OSI provides the final word when it comes to open source licensing, ensuring a common understanding of which software licenses do and do not adhere to the core open source principles. For more than 20 years, the OSI has done critical work in education and advocacy. We’re excited to be on their list of sponsors as they embark on the next 20 years of their mission.

“Indeed’s active engagement with open source communities highlights that open source software is now fundamental, not only for businesses, but developers as well,” says Patrick Masson, General Manager at the OSI. “Like most companies today, Indeed is a user of and contributor to open source software, and interestingly, Indeed’s research of resumes shows developers are too, as job seekers highlight open source skills and experience to win today’s most sought after jobs across technology.”

As we continue to take a more active role in the open source community, Indeed will seek out additional partnerships, sponsorships, and memberships.


For updates on Indeed’s open source projects, visit our open source site. If you’re interested in open source roles at Indeed, visit our hiring page.

Open Source at Indeed is Sponsoring the Open Source Initiative—cross-posted on Medium.

Build and Learn: Accelerating From New Grad to Experienced Data Scientist

A data scientist’s perspective on Indeed’s onboarding program

What if — on your first day at a new job — you were given three months to build a product that helps people find jobs? (Something Indeed has spent 14 years developing!)

This is what I experienced when I attended Indeed University — Indeed’s onboarding program. At first I felt terrified. How could I contribute as a data scientist? Building a software product, especially during the early stages of development, might not yield enough data for most data science work.

Indeed expects full stack data scientists to help with the entire data science process, from collecting data and performing analysis to deploying models in production. But writing React code and participating in discussions about how to build the logging infrastructure — expectations at Indeed U — is a whole different story.

Fortunately I was able to survive the program. In the process, I gained valuable experience that otherwise I might never have had the chance to acquire.

What is Indeed University?

Indeed University (IU) is a three-month program for new Indeed employees who are fresh out of college. At the same time that IU gets new hires up to speed, it incubates new and innovative products for the company. Employees come from diverse disciplines, including Engineering, Product, Software Reliability Engineering, Data/Product Science, and Online Marketing.

Group of coworkers posing for a photo

Indeed University Participants and Leads, Seattle 2018

At the start of IU, anyone can pitch their new ideas. All ideas are welcome, as long as they aim to solve real problems for job seekers or employers. People then form teams based on the problems they most want to solve. Diverse teams consist of 3–5 new employees and senior employees as team leads. With a shared vision and real marketing dollars (to access tens of millions of real users), the group builds and tests a new product.

Products that prove their value can continue and those who’ve launched them have the chance to create a formal product team.

What did I help build?

Our team included two software engineers and me, a data scientist. Together, we built a product for job seekers facing career transitions.

field selection screen

Our objective was to help these job seekers identify their next potential field. Our product asked users to name their current field, and then recommended new fields that were most relevant.

Users were also provided skill requirements, salaries, the percentage of job seekers who have made similar transitions, and related information.

What role did I play on the team?

While being embedded in Indeed’s product teams means data scientists have opportunities to impact the team’s product decisions, new data scientists don’t usually start leading discussions on designing a product’s framework — or deciding the product’s next big initiatives. We can more typically expect responsibilities like exploratory data analysis, building models and deploying them.

In IU, the roles are a lot more flexible. As I anticipated, I designed A/B tests, did test analysis, and helped the team make data-driven decisions. I also acted as product manager, marketing analyst, UX researcher, and part-time front-end engineer. As a product manager, I was responsible for defining and tracking product metrics, and prioritizing work within the team. As a marketing analyst, I owned the marketing campaigns of our product on Google and Facebook as well as Indeed’s internal ads system. I designed the ads, budgeted our spending and made sure that we used our budget on the most effective channels. As a UX researcher, I created and launched surveys to get user feedback on our product. At times I even went out of the office and interviewed people.

Why participate in IU as data scientists?

Having data scientists at IU brings value to everyone involved. Data scientists have unique experience to offer, and IU gives us valuable first-hand knowledge that can be hard to gain elsewhere.

1. Observing data-driven decision making in action

At Indeed, we want all our product decisions to be backed up by data. Through IU, I got a sense of what it means to “A/B test everything.” As my team’s IU product rapidly iterated, we were constantly faced with the question: “What feature should we add to our product next?” The easy answer is “whatever we liked most,” but the correct answer is to “prioritize based on effort and do A/B tests”! We should identify which features give the largest potential impact for the least amount of effort. Only those features that show impact in tests should be kept in production.

Rigorous A/B tests require a lot of data science effort, such as defining success metrics, defining A/B test requirements, and doing A/B test analysis. As our product evolved and our user base grew from the data-driven decisions we made, I saw how building a solid product takes engineering effort AND scientific effort.

2. Learning how Indeed.com works

Even though our product attracted more than 20K users, we ended up not continuing with it because it was not performing as well as Indeed’s job search. We wanted to keep adding new features and providing as much information as possible to the users, thinking that more is always better. What we found out was that with more features comes a more complicated product. This inevitably means more users lose interest, because they are navigating through an increasingly intricate system.

We really learned to appreciate Indeed’s simple yet effective what+where job search interface. It turns out that Indeed really knows how to do its job well! As a more general rule, we found it is often more effective to focus on one feature and make it shine as opposed to building a variety of different features.

3. Learning about the big picture

Data scientists’ work often starts with gathering data. Sometimes we might not get to look closely at what is behind all the data. How is it stored? Where does it come from? What architecture is in place for the data to be readily available? Building a product from scratch gives data scientists a chance to view the design and development process from a more holistic level. We are thus able to think about data science questions that derive from this process from a more critical point of view.

4. Building empathy

As data scientists, a lot of our work involves effective communication and collaboration with software developers, product managers, and other members of the team. Having been in their shoes, I have a much better understanding of what their work is like, and how a data scientist can make everyone else’s work easier.

5. Having fun!

Lastly, we got to have lots of fun! You might spend some late nights in the office — but this hard work is often accompanied by a variety of fun activities. No matter which city hosts IU, you have the opportunity to familiarize yourself with the area. IU schedules all kinds of activities for the teams throughout the entire program. Cruises, room escapes, fancy dinners, Go-Karts, VR challenges… you name it!

Just as in formal academic universities, in IU you meet and build close relationships with a group of people from all over the world. You see your ideas transform into real products that benefit real users. You go out of your comfort zone and practice skills that are outside of your expertise. If any of these sound interesting to you, check out our open positions for Data Scientist and Product Scientist at Indeed!


From New Grad to Experienced Data Scientist—cross-posted on Medium.

Where Do Data Scientists Come From?

Our previous article in this series on Data Science titles made the case that there’s no such thing as a data scientist — instead, the phrase “data scientist” has come to represent a number of distinct roles. So in addition to their different skills and job duties, we’d like to know who data scientists are and what backgrounds they come from.

In this article, we dig into the resume data of practicing data scientists, and discover that data scientists come from a wide variety of fields of study, levels of education, and prior jobs. We also explore what this data can tell us about the similarities and differences in the roles of data scientists, analysts, engineers, and software and machine learning engineers.

Who are data scientists?

If you ask every data scientist around you what they did before data science, they’re each likely to give you a different answer. Many have master’s and PhD degrees in fields ranging from astrophysics to zoology. Others come from the many new data science graduate programs that universities now offer. And still others come from technology roles, such as software engineering or data analysis.

At Indeed, we help people get jobs. One way we do this is by letting job seekers submit resumes so employers can find a perfect match. Our datasets contain tens of thousands of resumes from current and former data scientists. We can use this resume data to gain some insight into where data scientists come from.

Does educational background matter?

Highest degree achieved

First, we took a look at the highest degree achieved by those who hold the title of “data scientist” or a related field¹.

Stacked bar chart detailing highest level of education achieved by job title. Further description below.

We’ve chosen the job titles of data engineer, data analyst, software engineer, machine learning engineer, and data scientist², as these reflect some of the distinct roles we found in our previous articles.

Data Scientists

Data scientists have the highest average education level of any of the job titles we examined.

  • Data scientists have more PhDs than any of the other job titles. However, a PhD is not required for becoming a data scientist; only 20% of data scientists have them.
  • Advanced degrees (master’s or PhD) are held by 75% of data scientists.
  • Less than 5% of data scientists have only a high school diploma or associate’s degree.

Machine Learning, Data, and Software Engineers

Software and data engineers have more bachelor’s degrees than advanced degrees, while machine learning engineers are more likely to hold advanced degrees.

  • Machine learning engineers have a similar distribution of education levels to data scientists, but are about 30% less likely to hold a PhD. These results seem roughly in line with a similar study by Stitch Data.
  • Engineering-focused roles tend to favor bachelor’s degrees with some master’s degrees, but very few (<5%) PhDs.
  • One in four data engineers has a high school diploma or associate’s degree as their highest level of education.

Data Analysts

Data analysts have a very different distribution of degrees than data scientists, and more closely resemble software engineers in their levels of academic achievement³.

  • Data scientists have PhDs at almost 10 times the rate of data analysts, and are twice as likely to hold a graduate degree.
  • As we’ll see later, this may be due in part to an emerging pattern of software engineers transitioning into data analysis.
  • This could also mean that PhDs are being treated as relevant work experience by employers, who may be seeing data scientists as having more senior roles. Or perhaps the training in a master’s or PhD program uniquely prepares individuals for research-oriented data science work.

Field of study

Looking at the distribution of fields of study between job titles reveals some intriguing results.

Stacked bar chart detailing degree field of study by job title. Further description below.

The “data scientist” job title exhibits the most diversity in field of study of any of the titles we looked at, and no one field seems to dominate. We can quantify the diversity by calculating the gini impurity of each job title.

Gini Impurity (Larger means more diverse fields of study)

  • Data Scientist — 85%
  • Machine Learning Engineer — 73%
  • Software Engineer — 53%
  • Data Analyst — 78%
  • Data Engineer — 79%

Data Scientists

Data scientists clearly have the most diverse fields-of-study in the job titles we’ve looked at, while software engineers have the least diverse educational backgrounds. While the social sciences are somewhat under-represented in the data science population, they still make up about 5% of data scientists. Data science majors make up a slightly larger portion of data scientists (9%), which is somewhat surprising given how new most university data science programs are.

Machine Learning Engineers

Our data also shows a pronounced distinction between data scientists and machine learning engineers. Over 60% of machine learning engineers come from a computer science or engineering background, and are almost twice as likely to be from these backgrounds than someone holding the title of “data scientist.” There were effectively no social scientists with the title of “machine learning engineer” in our sample.

Software Engineers

Software engineers are — unsurprisingly — even more heavily focused on computer science and engineering majors. It’s been proposed that machine learning engineers are a merger between software engineers and data scientists. Our data appears to support this assertion.

Data Analysts

Like data scientists, data analysts seem to come from a diverse educational background. They differ from data scientists in that they are more often business, economics, and social science majors, and less often have mathematics, statistics, and natural science degrees. It’s also interesting to note that those with data science degrees represent more of the data scientist population than the analyst population.

Data Engineers

Data engineers show a field of study distribution that is somewhere between data scientists and machine learning engineers. However, as noted above, many data engineers don’t have any degree beyond a high school diploma!

Which jobs do data scientists hold prior to data science?

Unsurprisingly, many individuals (approximately 25% of our sample) held the same title in their previous role as in their current one.

Stacked bar chart detailing prior job title by current job title. Further description below.

This is especially true of software engineers, who are very likely (71%) to have held a software engineering role previously. This is probably due to the relative maturity of the field of software engineering as opposed to data science, which didn’t even have its own title until fairly recently.

“Academic” here means actually being employed by a university, or as a researcher in an academic environment. Graduate students in particular are likely to have held such positions, and we see that the most graduate-degree heavy fields (data science, machine learning engineer, data analyst) have the most transitions from academia.

Perhaps a more interesting question is, what was the last different job title that data scientists held?

Stacked bar chart detailing prior job transition by current job title. Further description below.

Here we see some interesting patterns: data scientists, machine learning engineers, and software engineers are more likely to start straight out of academia. Many of the “other” previous jobs are unrelated, such as catering, tutoring, store clerks, and other positions people can often hold while completing their degrees.

Many roles transition into data scientists or machine learning engineers, but rarely do we see data scientists and machine learning engineers transitioning into any of the other roles. This is likely due in part to the relative sizes of the fields, the infancy of the “data scientist” and “machine learning engineer” titles, and the recent growth in popularity of those titles. However, I believe we are also observing an interesting phenomenon that speaks to how individuals are moving between and progressing⁶ through each role.

This chord diagram illustrates the main transitions we see between these roles. The color of the chord indicates which role people are transitioning from.

Chord diagram detailing employment transitions. Further description below.

Software engineers make up a big slice of the pie. Many transition to analyst roles, while others hop straight to data science.

Data science is equally fed by academia, analysts, and software engineers. Software engineers are far more likely to hop into a data analyst role, although this is in part due to the larger number of analyst roles than data scientist roles.

Again, we see few individuals leaving data science at this moment. It’s unclear if this pattern will change in the future. The key takeaway here is that the data science field is fed by a wide variety of backgrounds, and it is relatively common to see software engineers become data analysts, and data analysts become data scientists. This may represent a viable path for anyone looking to transition out of a software engineering role.

Transitions into data engineering come almost exclusively from software engineering⁴.

Conclusion

Where do data scientists come from? Everywhere! Although the field is predominantly populated by individuals with master’s and PhD degrees, there are still plenty of individuals with bachelor’s degrees (26%) in the role. No field of study seems to dominate data science at this time; conversely, we see a great diversity in backgrounds for data scientists, especially compared to fields like software engineering. In addition, we see a large number of individuals moving from other tech roles — such as software engineering and data analytics — into data science.

While machine learning engineers reflect data scientists in their levels of academic achievement, they seem to be more heavily focused in engineering backgrounds, and are more likely to have transitioned from a software engineer role. Data engineers also have more of an engineering focus, but tend to have lower levels of degree achievement when compared to the other roles in this study.

What does this mean for data science job seekers?

Graduate school is still the dominant way data scientists get into the field. Data science degrees have a growing presence, and now appear to be a somewhat common way to get entry into the field. Any field of study seems viable if one has obtained an advanced degree. If you’re in a graduate program now, there’s almost certainly someone in your field of study working in data science. I suggest you reach out to them and find out how they made the leap!

Software engineers and data analysts seem to transition into data science roles quite regularly, and represent substantial portions of new data scientists. Future job seekers should consider these routes as well.

What does this mean for employers looking for data scientists?

If you’re looking for a generalist data scientist, don’t throw out a resume just because the field or degree isn’t what you expect. Data scientists are diverse in their education and background. Although most have an advanced degree in some field, there is no one field that dominates the job market.

If you’re having difficulty hiring experienced data scientists or scientists out of academia, consider bringing in individuals from software engineering or data analyst roles, as that is clearly a common pathway to data science.

Also — as we’ll discuss in a later article — make sure you know the role you’re actually hiring for. Do you think need a data scientist, but feel your role is more heavy on engineering? Consider introducing a “machine learning engineer” role. Do you think you need a data scientist, but with more focus on a business background? Consider hiring an analyst. Do you need someone with a focus on database and infrastructure skills? Consider a data engineer, and don’t focus as much on their educational background.

Finally, if you think you do need some sort of generalist data scientists for your team, consider looking for a variety of educational backgrounds. At Indeed, the members of our data science and product science teams span a wide range of fields, including astronomy, sociology, biology, mathematics, economics, and business. Having a diverse data science team — both in demographics and in field of study — is essential for doing great work⁶ ⁷.


Footnotes

¹Note that there is almost certainly a bias here, in that we’re looking at the resumes of job seekers that have already added “data scientist” to their resume. This means we’re going to be looking at individuals who have likely already been in the field for several years, and may not be entirely representative of more recent trends.

²For each job title, we’ve bucketed related job titles as well, e.g. “Senior Data Scientist” will be in the Data Scientist category, and “C++ Programmer” will be in the Software Engineer category.

³Paula Leonova has a good, data-driven discussion of the difference between data science and data analyst roles.

⁴To be absolutely clear, I do not mean to imply a hierarchy of roles. Many software engineering roles, for example, are far more senior than many data scientist roles. I am simply referring to the directional pattern that seems to be emerging.

⁵Stitch Science did a nice breakdown of data engineering roles, and also noted the major overlap with software engineering.

⁶For more information on the importance of diversity in the workplace, see also The Difference by Scott E. Page; “Why diversity matters” by Hunt, Layton, and Prince; and “Evidence for a Collective Intelligence Factor in the Performance of Human Groups” by Woolley et al.

⁷It is not my intention to conflate “diversity in field of study” with broader diversity topics. I strongly believe diversity in all dimensions is essential for doing great work and creating a better society, and it will take far more than focusing on degree of study to overcome the overwhelming lack of diversity in tech workers in the US right now. As argued by an article from Stitch, Data Science does not appear to be doing any better than engineering roles in many aspects of diversity.


Where Do Data Scientists Come From? cross-posted on Medium.