We are excited to announce the open source availability of Imhotep, the interactive data analytics platform that powers data-driven decision making at Indeed. When we test changes to our applications and services, whether to our user interface or our backend algorithms, we measure how those changes affect job seekers. We built Imhotep to allow our engineering and product organizations to focus on key metrics at scale.
The Imhotep platform and tools allow you to:
- Perform fast, interactive, ad hoc queries and aggregate results for large datasets
- Combine results from multiple time-series datasets
- Build your own data tools for analysis, monitoring, reporting, and automated data processing on top of the Imhotep platform
At its core, Imhotep is a distributed inverted index on time-series data that runs across a cluster of servers. We’ve made it easy to set up an Imhotep cluster on Amazon Web Services (AWS). Once you’ve set up your cluster, you can upload your data and then interactively query that data using IQL, the Imhotep Query Language. The IQL web client enables you to answer all sorts of questions about your data, and iterate quickly on those questions to get to important insights.
For example, at Indeed, we use Imhotep to answer these and many more questions about how people around the world are using our job search engine:
- How many unique job search queries were performed on a specific day in a specific country?
- What are the top 50 queries in a specific country? How many times did job seekers click on a search result for each of those queries?
- Which job titles have the highest click-through rate for the query “Architecture” in the US? Which titles have the lowest click-through rate?
Getting started with Imhotep
You can use our tools to configure your Imhotep cluster on AWS. These setup tools require that you have an AWS account, two S3 buckets for data storage, and your time-series data in TSV or CSV format for uploading into the system.