Indeed adds over 30 million jobs online every month, which helps connect 250 million job seekers to prospective employers. How do we keep our services available, fast, and scalable? That’s the ongoing challenge for our site reliability engineering (SRE) team.
What is SRE?
The idea behind SRE is simple: The team ensures that a company’s core infrastructure works effectively. SRE originated in 2003 when Google formed a small production engineering team to address reliability issues. Its initial focus was on-call, monitoring, release pipeline, and other operations work. The team established service-level indicators and objectives (SLIs and SLOs) to improve infrastructure across the company. Other companies took note, and SRE soon became an industry standard.
SRE is distinct from other engineering roles. Team members work across business areas to ensure that services built by software engineering (SWE) teams remain scalable, performant, and resilient. Working with platform teams, SRE helps manage and monitor infrastructure like Kubernetes. SRE teams build frameworks to automate processes for operations teams. They might also develop applications to handle DNS, load balancing, and service connections for network engineering teams.
These functions are crucial for any company competing in today’s tech world. However, because of the vast range of technologies and methods available, each SRE team takes a different approach.
SRE at Indeed
At Indeed, we established an SRE team in 2017 to increase attention on reliability goals and optimize value delivery for product development teams. Our SRE team uses an embedded model, where each team member works with a specific organization. They code custom solutions to automate critical processes and reduce toil for engineers.
Indeed SRE focuses on these key goals:
Promote reliability best practices. SRE helps product teams adopt and iterate on metrics, such as SLOs, SLIs, and error budget policies. They promote an Infrastructure as Code (IaC) model. That means they write code to automate management of data centers, SLOs, and other assets. They also drive important initiatives to improve reliability and velocity, like Indeed’s effort to migrate products to AWS.
Drive the creation of reliability roadmaps. At Indeed, the SRE team spends more than 50% of their time on strategic work for roadmaps. They analyze infrastructure to define how and when to adopt new practices, re-architect systems, switch to new technologies, or build new tools. Once product teams approve these proposals, SRE helps design and implement the necessary code changes.
Strive for operational excellence. SRE works with product teams to identify operational challenges and build more efficient tools. They also guide the process of responding to and learning from critical incidents, adding depth to individual team retrospectives. Their expertise in incident analysis helps them identify patterns and speed up improvements across the company.
Who works in Indeed SRE?
Our SRE team is diverse and global. We asked a few team members to talk about how they arrived at Indeed SRE.
Ted, Staff SRE
I love programming. Coming from a computer science background, I started my career as a software engineer. As I progressed in my role, I became interested in certain infrastructure related challenges. How can we move a system to the cloud and maximally reduce the costs? How do we scale a legacy service to several machines? What metrics should we collect—and how frequently—to tell if a service works as intended?
Later, I discovered that these questions are at the intersection of SWE and SRE. Without realizing it, I had implemented SRE methodology in every company I’d worked for! I decided to apply at Indeed, a company with an established SRE culture where I could learn—not only teach.
Working for Indeed SRE gives me more freedom to select my focus than working as a SWE. I can pick from a range of tasks: managing major outages, building internal tools, improving reliability and scalability, cleaning up deprecated infrastructure, migrating systems to new platforms. My work also has a broad impact. I can improve scalability for 20+ repositories in different programming languages in one go. Or I can migrate them to a new environment in a week. SRE has given me deeper knowledge of how services from container orchestration tools to front end applications are physically managed, which makes me a better engineer.
Jessica, Senior SRE
Before joining Indeed SRE, I tried many roles, from QA to full-stack web developer to back-end engineer. Over time, I realized that I liked being able to fix issues that I identify. I wanted to communicate and empathize with the customer instead of being part of a feature factory. Those interests led me to explore work in operations, infrastructure, and reliability. That’s when I decided on SRE.
Now I support a team that works on a set of role-based authentication control (RBAC) services for our clients. All our employer-facing services use this RBAC solution to determine whether a particular user is authorized to perform an action. Disruptions can lead to delays in our clients’ hiring processes, so we have to make sure they get fast, consistent responses.
The best thing about being on the SRE team is working with a lot of very talented engineers. Together, we solve hard problems that software engineers aren’t often exposed to. The information transfer is amazing, and I get to help.
Xiaoyun, Senior SRE Manager
When I joined Indeed in 2015, I was a SWE and then a SWE manager. At first I worked on product features, but gradually my passion shifted to engineering work. I started improving the performance of services, e.g., making cron jobs run in minutes instead of hours. This led me to explore tools for streaming process logs and database technology for improving query latency.
Then I learned about SRE opportunities at Indeed that focused on those subjects. I was attracted to the breadth and depth offered by SRE. Since joining, I have worked with a range of technologies, services, and infrastructure across Indeed. At the same time, I’ve had the opportunity to dive deep into technologies like Kafka and Hadoop. My team has diagnosed and solved issues in several complex AWS managed services.
Indeed also encourages SRE to write reliability focused code. This makes my background useful—I enjoy using my SWE skills to solve these kinds of challenges.
Yusuke, Staff SRE
I joined Indeed in 2018 as a new university graduate. In school, I studied computer science and did a lot of coding. I learned different technologies from infrastructure to web front-end and mobile apps. Eventually I decided to start my career in SRE, which I felt utilized my broad skill set better than a SWE role would.
I started on a back-end team that builds the platform to enable job search at Indeed. To begin, we defined SLIs and SLOs, set monitors for them, and established a regular process to plan capacity. Soon we were re-architecting the job processing system for better reliability and performance. We improved the deployment process with more resilient tooling. I helped adopt cloud native technologies and migrate applications to the cloud. To track and share investigation notes, we also started building an internal knowledge base tool.
I enjoy Indeed SRE because I can flex different skills. With the nature and the scale of the system we’re supporting, I get to share my expertise in coding, technologies, and infrastructure. SRE members with different backgrounds are always helping each other to solve problems.
Building your SRE career
Develop a broad skill set
SRE works with a variety of systems, so it’s important to diversify your technical skills. Besides SWE skills, you’ll need an understanding of the underlying infrastructure. A passion for learning and explaining new technologies is helpful when making broader policy and tool recommendations.
Focus on the wider organization
SRE takes a holistic view of reliability practices and core systems. When working with shared infrastructure, your decisions can affect systems across the company. To prioritize changes, you need to understand how others are using those systems and why. Working across different teams is a positive way to achieve personal and professional growth, and it advances your SRE journey.
Join us at Indeed
If you’re a software engineer, pivoting to SRE gives you exposure to the full stack of technologies that enable a service to run. If you’re currently doing operational work (in SRE or elsewhere), Indeed’s broad approach can add variety to your workload. Each team we work with has its own set of reliability challenges. You’ll be able to pick projects that interest you.
Indeed SRE also provides opportunities to grow. Our SRE culture is well established and always expanding. You’ll work with SWE and other roles, learning from each other along the way.
If you’re interested in challenging work that expands your horizons, browse our open positions today.