Efficient Query String Parsing with util-urlparsing

We’re excited to announce the open source release of util-urlparsing, a Java library we created to parse URL query strings without unnecessary intermediate object creation. It also includes number parsing methods in ParseUtils that are faster than Java’s equivalent methods like Integer.parseInt and Float.parseFloat.

Java versions 1.6 and lower have a significant flaw that leads to inefficient memory usage when using the String.substring method. When processing data from our log repository, we need to extract small substrings from much larger strings containing event data key/value pairs. The primary class in util-urlparsing, QueryStringParser, was written to efficiently parse this data without generating any intermediate string objects. It does this via a callback mechanism that lets you only parse the keys you are interested in from the larger query string.

Our query parsing benchmark shows nearly 4X speedup over a naive Java implementation using String.split under significant heap space constraints. It can parse a million key-value pairs in under 3 seconds given a max heap of only 64MB. Our number parsing benchmark shows over 2X speedup compared to equivalent methods like Integer.parseInt and Float.parseFloat.

util-urlparsing is available for download on GitHub (github.com/indeedeng/util/tree/master/urlparsing) or maven.org (search.maven.org/#browse%7C-525259937), and we have documentation which includes usage examples to help you get started. If you have any questions, check out the Q&A forum for our open-source Java utilities.

If you’re interested in learning more about Indeed’s log repository, check out the video and slides of our January 26th talk “Logrepo: Enabling Data-Driven Decisions.”

Tweet about this on TwitterShare on FacebookShare on LinkedInShare on Google+Share on RedditEmail this to someone

join the conversation