Data-driven organizations like Indeed need great tools. We built Imhotep, our interactive data analytics platform (released last year), to manage the parallel execution of queries. To balance memory efficiency and performance in Imhotep, we developed a technique called vectorized variable-byte (VByte) decoding. VByte with differential decoding Many applications use VByte and differential encoding to compress […]
All posts categorized in: Big Data
Memory Mapping with util-mmap
We are excited to highlight the open-source availability of util-mmap, a memory mapping library for Java. It provides an efficient mechanism for accessing large files. Our analytics platform Imhotep (released last year) uses it for managing data access. Why use memory mapping? Our backend services handle large data sets, like LSM trees and Lucene indexes. […]
Serving over 1 billion documents per day with Docstore v2
[Editor’s note: This post is the second installment of a two-part piece accompanying our first @IndeedEng talk.] The number of job searches on Indeed grew at an extremely rapid rate during our first 6 years. We made multiple improvements to our document serving architecture to keep pace with that growing load. A core focus at […]