Imhotep: Scalable, Efficient, and Fast

This post is the first in a five-part series on improving the development process (and coaching developers) with metrics-driven insights.

Move fast and try things — that’s how we develop products at Indeed. We don’t believe in betting on a small number of great ideas. Instead, we bet on exploring lots of ideas as quickly as possible.

To be successful in this approach, we need innovative team members with diverse perspectives. We hire people who are excited to quickly explore ideas in service of our mission — to help people get jobs. Once they’re on board, we give them ownership and autonomy to do exactly that. And we give them the tools to track and analyze their experiments.

The right tools for the job

We’ve developed and open sourced some of these tools, including Imhotep, our data analytics platform. Imhotep enables rapid exploration and analysis of large time-series datasets. It includes a query language (IQL), a web-based UI, and a distributed backend. It is scalable, efficient, and fast.

Graphic displays an Imhotep motto: measure. question. learn. improve.

Imhotep is scalable

Imhotep scales horizontally by adding daemon instances that can run on commodity hardware or in the cloud. Indeed’s internal Imhotep cluster handles up to 5 million queries each week across thousands of datasets. Roughly 90% of those queries come from automated systems.

Our most popular dataset includes about 39 billion events just for the last year. That dataset alone receives around 25,000 distinct queries each month.

Imhotep is efficient

Because the data structure underlying Imhotep is an inverted index, the disk utilization is remarkably low for most time-series datasets. The dataset mentioned above, with 39 billion events and 384 possible fields per event, takes up 5.7 terabytes on disk. That works out to 146 bytes per event.

That kind of storage efficiency allows us to keep all the data for analysis and avoid sampling. Sampling is fine when you want to just look at aggregate trends. But if you want to actually dig down into your data and examine the outliers, you can’t reliably find them or see their effects if you sample.

Imhotep is fast

Imhotep’s speed lets us rapidly iterate and collaborate. Over a recent 90-day period at Indeed, our internal cluster saw around 2 million interactive Imhotep queries (queries done from the webapp). The median response time for those queries was 276 milliseconds.

A powerful cache implementation contributes to this blazing speed, with nearly 60% of interactive queries coming from the cache. But even uncached queries are quite fast, with a median response time of around 4 seconds. An uncached query over a long time span takes longer, but not that much longer. For uncached queries with a 365-day time span, the median response time is about 9 seconds.

How do we know all these stats about Imhotep performance? Because we have an Imhotep dataset for Imhotep usage. In just a few minutes, I was able to iteratively query that dataset to understand recent cluster performance.

Imhotep drives insight and improvement

Imhotep empowers us to experiment and quickly improve our products. We’ve also applied this data-driven approach to improving development processes. In the next post in this series, I explain more about how we use metrics to improve process.

Read the full series of blog posts:

Imhotep: Scalable, Efficient, and Fast cross-posted on Medium.