We’re excited to announce the open source release of util-urlparsing, a Java library we created to parse URL query strings without unnecessary intermediate object creation. It also includes number parsing methods in ParseUtils that are faster than Java’s equivalent methods like
Java versions 1.6 and lower have a significant flaw that leads to inefficient memory usage when using the
String.substring method. When processing data from our log repository, we need to extract small substrings from much larger strings containing event data key/value pairs. The primary class in util-urlparsing,
QueryStringParser, was written to efficiently parse this data without generating any intermediate string objects. It does this via a callback mechanism that lets you only parse the keys you are interested in from the larger query string.
Our query parsing benchmark shows nearly 4X speedup over a naive Java implementation using
String.split under significant heap space constraints. It can parse a million key-value pairs in under 3 seconds given a max heap of only 64MB. Our number parsing benchmark shows over 2X speedup compared to equivalent methods like
util-urlparsing is available for download on GitHub (github.com/indeedeng/util/tree/master/urlparsing) or maven.org (search.maven.org/#browse%7C-525259937), and we have documentation which includes usage examples to help you get started. If you have any questions, check out the Q&A forum for our open-source Java utilities.
If you’re interested in learning more about Indeed’s log repository, check out the video and slides of our January 26th talk “Logrepo: Enabling Data-Driven Decisions.”