Indeed’s rapid growth has presented us with many challenges, especially to our release process. Our largely manual process did not scale and became a bottleneck. We decided to develop a custom solution. The lessons we learned in automating our process can be applied to any rapidly growing organization that wants to maintain software quality and developer goodwill.
How did we end up here?
Our software release process has four main goals:
- Understand which features are being released
- Understand cross-product and cross-team dependencies
- Quickly fix bugs in release candidates
- Record release details for tracking, analysis, and repeatability
Our process ended up looking like this:
This process was comprehensive but required a lot of work, containing as many as 40 possible workflow states. To put it in perspective, a software release with 4 new features required over 100 clicks and Git actions. Each new feature added about 13 actions to the process.
We identified four primary problems:
- Release management took a lot of time.
- It was hard to understand what exactly was in a release.
- There was a lot of potential for error through so many manual steps.
- Only senior engineers knew enough to handle a release.
We came to a realization: we needed more automation.
But wait — why not just simplify?
Of course, rather than automating our process, we could just simplify it. However, our process provided secondary benefits that we did not want to lose:
Data. Our process provided us with a lot of data and metrics, which allowed us to make continual improvements.
History. Our process allowed us to keep track of what was released and when it was released.
Transparency. Our process, while complicated, allowed us to examine each step.
Automating our way out
We realized that we could automate much of our process and reduce our overhead. To do so, we would need to integrate better with the solutions we already had in place — and be smart about it.
Our process uses multiple systems:
- Atlassian JIRA: issue management and tracking
- Atlassian Crucible: code reviews
- Jenkins: release candidate builds and deploys
- Gitlab: source control
- Various build and dependency management tools
Rather than replace these tools, we decided to create a unified release system that could communicate with each of them. We called this unified release system Control Tower.
Slideshow of Control Tower features
Integration with dependency management tools allows release managers (RMs) to track new code coming in through library updates. RMs can quickly assess code interdependencies and see the progress of changes in a release. Finally, when an RM has checked everything, they can trigger a build through Jenkins.
The Control Tower main view allows RMs to see details from all the relevant systems. Changes are organized by JIRA issue key, and each change item includes links to Crucible code review information and Git repo locations.
By automating, we significantly reduced the amount of human interaction necessary in our release process. In the following image, every grey box represents a manual step that was eliminated.
After automating, we reduced the number of required clicks and Git actions from over 100 to fewer than 15. And new features now add no extra work, instead of requiring 13 extra actions.
To learn even more about Control Tower, see our Indeed Engineering tech talk. We talk about Control Tower starting at 32:45.
In the process of creating our unified release system, we learned some valuable lessons.
Lesson 1: Automate the process you have, not the one you want
When we first set out to automate our release process, we did what engineers naturally do in such a situation — we studied the process to understand it as best as we could before starting. Then, we did what engineers also naturally do — we tried to improve it.
While it seemed obvious to “fix” the process while we were automating it, we learned that a tested, working process — even one with problems — is preferable to an untested one, no matter how slick. Our initial attempts at automation met with resistance because developers were unfamiliar with the new way.
Lesson 2: Automation can mean more than you think
When most people think of “automating” a process, they assume it means removing decisions from human actors — “set it and forget it.” But sometimes you can’t remove human interaction from a process. It might be too difficult technically, or you might want a human eye on a process to assure a correct outcome. Even in these situations, automation can come into play.
Sometimes automation means collecting and displaying data to help humans make decisions faster. We found that, even when we needed a human to make a choice, we were able to provide better data to help them make a more informed choice.
Deciding on the proper balance between human and machine action is key to automating. We see future opportunities for improvement by applying machine learning techniques to help humans make decisions even faster.
Lesson 3: Transparency, transparency, transparency
Engineers might not like inefficiency, but they also don’t like mystery. We wanted to avoid a “black box” process that does everything without giving insight as to how and why.
We provide abundant transparency through logging and messaging whenever we can. Allowing developers to examine what the process had done — and why — helped them to trust and adopt the automation solution. Logging also helps should anything go wrong.
Where do we go from here?
Even with our new system in place, we know that we can improve it. We are already working behind-the-scenes on the next steps.
We are developing algorithms that can monitor issue statuses, completed code reviews, build/test statuses, and other external factors. We can develop systems capable of programmatically understanding when a feature is ready for release. We can then automatically make the proper merge requests and set the release process in motion. This further reduces the time between creating and shipping a feature.
We can use machine learning techniques to take in vast amounts of data for use in our decision-making process. This can point out risky deploys and let us know if we need to spend extra effort testing or if we can deploy with minimal oversight.
Our release management system is an important step toward increasing our software output while maintaining the quality our customers expect. This system is a step, not the final goal. By continually improving our process, by learning as we go, we work toward our ultimate goal — helping even more people get jobs.