Memory Mapping with util-mmap

Posted on February 25, 2015 by Preetha Appan

We are excited to highlight the open-source availability of util-mmap, a memory mapping library for Java. It provides an efficient mechanism for accessing large files. Our analytics platform Imhotep (released last year) uses it for managing data access.

Why use memory mapping?

Our backend services handle large data sets, like LSM trees and Lucene indexes. The util-mmap library provides safe memory mapping of these kinds of large files. It also overcomes known limitations of MappedByteBuffer in the JDK.

Memory mapping is the process of bringing part of a file into a virtual memory segment. Applications can then treat the mapped part like primary memory. We use memory mapping in latency-sensitive production applications that have particularly large files. By doing so, we prevent expensive I/O operations.

Limitations with MappedByteBuffer

The JDK provides MappedByteBuffer in the java.nio package for doing memory mapping. This library has three main problems:

Unable to safely unmap
The only way to request unmapping with MappedByteBuffer is to call System.gc(). This approach doesn’t guarantee unmapping and is a known bug. You must unmap a memory mapped file before you can delete it. This bug will cause disk space problems when mapping large, frequently-updated files.

Unable to map files larger than 2GB
MappedByteBuffer uses integers for all indexes. That means you must use multiple buffers to manage files that are larger than 2GB. Managing multiple buffers can lead to complicated, error-prone code.

Thread safety
ByteBuffer maintains internal state to track the position and limit. Reading using relative methods like get() requires a unique buffer per thread via duplicate(). Example:

public class ByteBufferThreadLocal extends ThreadLocal<ByteBuffer>
{
    private ByteBuffer src;
    public ByteBufferThreadLocal(ByteBuffer src)
    {
        src = src;
    }

    @Override
    protected synchronized ByteBuffer initialValue()
    {
        return src.duplicate();
    }
}

Memory mapping with util-mmap

util-mmap addresses all of these issues:

implements unmapping so that you can delete unused files immediately;
uses long pointers, so it is capable of memory mapping files larger than 2GB;
works well with our AtomicSharedReference for safe, simple access from multiple threads.

Example: memory mapping a large long[] array

Use Guava’s LittleEndianDataOutputStream to write out a binary file:

try (LittleEndianDataOutputStream out =
        new LittleEndianDataOutputStream(new FileOutputStream(filePath))) {
    for (long value : contents) {
        out.writeLong(value);
    }
}

Use MMapBuffer to memory map this file:

final MMapBuffer buffer = new MMapBuffer(
       filePath,
       FileChannel.MapMode.READ_ONLY,
       ByteOrder.LITTLE_ENDIAN);
final LongArray longArray =
    buffer.memory().longArray(0, buffer.memory().length() / 8);

Why not use Java serialization?
Java manages data in big-endian form. Indeed’s production systems run on Intel processors that are little endian. Also, the actual data for a long array starts at 17 bytes into the file, after the object header.

To properly memory map a native Java serialized array, you would have to write code to manage the above mentioned offset correctly. You would also have to flip the bytes around, which is expensive. Writing data in little endian results in more straightforward memory mapping code.

Thread Safety

For safe access from multiple threads, use AtomicSharedReference. This class wraps the Java object that’s using the memory mapped file. For example:

final AtomicSharedReference<LongArray> objRef =
    AtomicSharedReference.create(longArray);

The objRef variable is a mutable reference to the underlying SharedReference, a ref-counted object. When using the array, you must call getCopy() and then close the reference.

try(final SharedReference<LongArray> myData = objRef.getCopy())  {
    LongArray obj = myData.get();
    // … do something …
}

SharedReference keeps track of references and unmaps the file when none are still open.

Reloads

Use the setQuietly method to replace newer copies of the file.

final MyObject newMyObj = reloadMyObjectFromDisk();
objRef.setQuietly(newMyObj);

Close

Use closeQuietly upon application shutdown to unmap the file.

objRef.closeQuietly();

Get started with util-mmap

At Indeed, we use util-mmap in several production services. We are using it to access files that are up to 15 GB and updated every few minutes. If you need to memory map your large files, visit us on GitHub and give util-mmap a try.

Making History with Code Festival 2014

Posted on January 21, 2015 by Indeed Engineering

What do you get when you combine a coding contest, a music and arts festival, and games? Do those things even belong together?

For two days in November 2014, two hundred collegiate coders came together in Tokyo to participate in Code Festival 2014, put on by Recruit and Indeed Tokyo. The event included five distinct coding challenges, as well as fun non-coding activities. After the initial coding challenge – the main round – participants could brush up on their skills with tutoring tailored to that challenge. Not your average coding contest.

So, what does a room full of 200 coders look like?

Why make history?

Organizers wanted to capitalize on love of coding and competition and bring lots of talented coders together to work hard, have fun, make friends, and learn something new. Traditionally, programming contests are limited to a few top competitors, which can discourage those who don’t make the cut.

Ayaka Matsuo, the event’s project lead, decided to break free from that tradition. The structure of the festival allowed many more participants to take advantage of the events. Another history-making facet of the event was Matsuo’s event team: 16 new college hires who will join Indeed Tokyo after graduation in April 2015. They provided ideas, helped run the event, and generated a lot of enthusiasm.

Expanding on a tradition

Indeed and Recruit held coding duels in Fall 2013, December 2013, and February 2014. (Read about them here.) They were a warm-up to the November 2014 festival in Tokyo, providing a lot of valuable insights into how to expand the types of coding challenges. Code Festival 2014 was so successful that plans are already forming for the next event. Could we host even more competitors?

The coding challenges

The event included 5 separate coding challenges. Two of the challenges – the main round and the morning programming contest on the second day – were standard programming contests, but the remaining three were not at all traditional.

Main round

All 200 participants worked through 10 questions (including debugging) in 3 hours.

Participants during the main round portion of the contest

The participant who solved the most problems in the least amount of time won the round. The top five participants from the main round advanced to the exhibition challenge.

Top five winners with Indeed’s Senior VP of Engineering, Doug Gray

The winner took home 294,409 yen, an amount that resembles a prime number. Last year’s Tokyo coding duel awarded prize amounts that were prime numbers. This year, to mix it up, the organizers chose amounts that are strong pseudo-primes. In Japanese, these numbers are called Kyogi sosu (強擬素数) — a clever choice, since “Kyogi” can also mean strongly doubted or competition. Check out the prize amounts for the top 20 contestants here.

Exhibition

In the evening of the first day, the top five finalists from the main round moved to a separate room for the exhibition challenge.

Participants were filmed during the exhibition challenge.

This room was far from private, however, as live video from each of the five computers was streamed into the main hall, allowing everyone to follow along with the competitors’ progress in solving the problem. Audio commentary added to the excitement.

Onlookers during the exhibition challenge

Were the challengers aware that their every move was being evaluated in the next room? Yes! And being watched only made the competition more lively.

Morning programming contest

All 200 participants were invited to return the next morning for another programming contest. To change it up, participants joined one of three groups, determined by skill level, and competed individually against others in the same group.

AI challenge

The AI programming contest required participants to write code that manipulates virtual players in a computer game. Fifty participants who had registered in advance of the festival participated in a preliminary challenge, with the top 16 progressing to the final. Those 16 were divided into four groups of 4, competing tournament style.

An exhibition match followed with Naohiro Takahashi (President of AtCoder and a competition programmer) and Colun (a competition programmer) and the first- and second-place winners of the AI challenge.

Team relay

During the last challenge on day 2, the 200 participants were divided into 20 teams of 10 members each. Each team needed to solve 10 questions, one at a time, within 1 hour 30 minutes. Live video aired, along with commentator play-by-play.

While the participant solved the problem, the rest of the team huddled apart from the contest area. If the teammate with the “baton” had a question, s/he stepped away from the computer to collaborate with the other teammates.

Team huddle during the relay

Other festival activities

Event organizers sought to ensure that all participants had a chance to learn, play, and connect with their peers. Non-coding activities included calligraphy with code-related content, board games, Taiko-no Tatsujin (drum masters), and DDR (Dance Dance Revolution).

Calligraphy coding

Participants also had the opportunity to take private lessons with coding competition experts and attend panels with industry professionals covering these topics:

The future of programming contests
A question: Is the coding competition effective for learning programming?
How to create redcoder
How to handle increasing speed in coding competitions

Want to know more?

Gizmodo Japan wrote about the Code Festival. To review the participants’ submissions, navigate to the AtCoder standings page and click the magnifying glass beside each user’s name. To brush up on your own skills, participate in Top Coder and challenge yourself with past problems from the ACM-ICPC World Finals.

@IndeedEng 2014 Year In Review

Posted on January 5, 2015 by Jack Humphrey

We help people get jobs. That’s our mission at Indeed. And being the #1 job site worldwide challenges each of our engineers to deliver the best job seeker experience at scale. When we launched this blog in 2012, we set out to contribute to the broader software community by sharing what we’ve learned from these challenges.

Response to our @IndeedEng tech talks continues to be strong, with over 700 people attending the series in 2014. And our engineering blog received 59,000 views from 128 countries.

Here’s a brief recap of content we shared in 2014:

Testing and measuring everything is central to our development process at Indeed. We use Proctor, our A/B testing framework, to accomplish this, and we open sourced the framework in 2013. Building on this in 2014, we described Proctor’s features and tools and then wrote about how we integrate Proctor into our development process at Indeed. We also released proctor-pipet, a Java web application that allows you to deploy Proctor as a remote service.

We open sourced util-urlparsing, a Java library we created to parse URL query strings without unnecessary intermediate object creation.

In the first half of 2014, we held several tech talks devoted to Imhotep, our interactive data analytics platform. Imhotep powers data-driven decision making at Indeed and we were excited to talk about the technology: scaling decision trees, building large-scale analytics tools, and using Imhotep to focus on key metrics. Then in November, we open sourced part of the platform and held a tech talk and workshop for attendees to explore their own data in Imhotep.

Being a global company is a challenge we take seriously. We shared our experience of iteratively expanding to new markets and how international success requires solving a diverse set of technical challenges.

Changes coming to the blog and talks pages include translating content into Japanese. Look for more translated posts in the months to come.

We’d like to thank everyone who helped make these accomplishments possible. If you follow the blog and watch our @IndeedEng talks, thank you for your support! We look forward to continuing the conversation in 2015.

«Newer
28
29
30current
31
32
Older»