Imhotep: Large Scale Analytics and Machine Learning at Indeed
This talk was held on Wednesday, March 26, 2014 at 7:00pm
To scale the building of decision trees on large amounts of Indeed job search data, we created a system called Imhotep. In addition to being a crucial tool for building these machine learning models, Imhotep has proven to be applicable to many different analytics problems. The core of Imhotep is a distributed system that manages the parallel execution of queries across a set of time-sharded inverted indices.
This talk will cover Imhotep’s primitive operations that allow us to build decision trees, drill into data, build graphs, and even execute sql-like queries in IQL (Imhotep Query Language). We will also discuss what makes Imhotep fast, highly available, and fault tolerant.