Jackson: More than JSON for Java

Jackson is a mature and feature-rich open source project that we use, support, and contribute to here at Indeed. As Jackson’s creator and primary maintainer, I want to highlight Jackson’s core competencies, extensions, and challenges in this two-part series.

Photo of Cape Flattery by Tatu Saloranta, Jackson's creator

   Photo of Cape Flattery by Tatu Saloranta

Jackson’s core competency

If you’re creating a web service in Java that reads or returns JSON, you need a library to convert Java objects into JSON and vice versa. Since this functionality is not available in the Java standard library (JDK), you can use the Jackson library for its core competency: writing Java objects as JSON and reading JSON as Java objects. Later in this post, I introduce Jackson’s additional features. 

As a Java JSON library, Jackson is:

Jackson is downloaded 25 million times per month via Maven Central, and 16,000 projects depend on jackson-databind. It’s the second most widely used Java library, after Guava, according to the Core Infrastructure Initiative’s census results.

You can use Jackson directly, but nowadays users are more likely to encounter its functionality exposed by libraries and frameworks. These libraries and frameworks use Jackson by default for JSON handling to expose JSON requests and responses; or JSON configuration files as Java objects.

Some examples are:

Users of these frameworks, libraries, and clients may not even be aware that they utilize Jackson. We are more likely to learn this when troubleshooting usage issues or making changes to default handling.

Examples of Jackson usage

The following example shows how to annotate request and response models in the Spring Boot framework.

// Model of updated and accessed content
public class User {
  private Long id;
  private String name;
  @JsonProperty(“dob”) // customize name used in JSON
  private LocalDate dateOfBirth;
  private LocalDateTime lastLogin;
  // plus public Getters, Setters
}

// Service endpoint definition
@RestController
@RequestMapping("/users")
public class UserController {
  // ...
  @GetMapping("/{id}")
  public User findUserById(@PathVariable Long id) {
    return userStorage.findUserById(id);
  }
  @PostMapping(consumes = MediaType.APPLICATION_JSON_VALUE)
  @ResponseStatus(HttpStatus.CREATED)
  public Long createUser(@RequestBody User user) {
    return userStorage.createUser(user);
  }
}

In this example, the Spring Boot framework reads JSON as an instance of the User class. In the POST method, it passes a User instance to a storage handler. It also fetches the User instance via a storage handler for a given ID and writes serialized JSON of that instance. For a full explanation, see Jackson JSON Request and Response Mapping in Spring Boot.

You can also read and write JSON directly via the Jackson API, without any framework or library, as follows:

final ObjectMapper mapper = new ObjectMapper();
// Read JSON as a Java object:
User user;
try (final InputStream in = requestContext.getInputStream()) {
  user = mapper.readValue(in, User.class);
}
// Write a Java object as JSON
try (final OutputStream out = requestContext.getOutputStream()) {
  mapper.writeValue(out, user);
}

Jackson is more than JSON for Java

While Jackson’s core functionality originally started as JSON for Java, it quickly expanded through modules, which are pluggable extension components you can add to the Jackson core. These modules support features that the core does not handle by default. Jackson also has extensions for other JVM languages.

These are some of the types of Jackson modules and extensions that are especially helpful:

  • Data format modules
  • Data type modules
  • Modules for Kotlin, Scala, and Clojure

Data format modules

The data format modules allow you to read and write content that is encoded in formats other than JSON. The low-level encoding details of JSON differ from those of YAML, CSV, or Protobuf. However, much of the higher-level data-binding functionality is similar or identical. The higher-level data binding deals with the structure of Java Objects and expressing them as token streams (or a similar abstraction).

Most of the code in the Jackson core is format-independent and only a small part is truly JSON-format specific. So, these so-called data format modules can easily extend Jackson to read and write content in other data formats. The modules implement low-level streaming from the Jackson API, but they share common data-binding functionality when it comes to converting content to and from Java objects.

In the implementation of these modules, you find factories for streaming parsers and generators. They use a specific factory to construct an ObjectMapper that handles format-specific details, while you only interact with a (mostly) format-agnostic mapper abstraction.

The very first data format module added to Jackson is jackson-dataformat-xml for XML. Supported data formats now include:

The usage for data format modules is similar to the Jackson API JSON usage. You only need to change  JsonMapper (or the generic ObjectMapper) to a format-specific alternative like XmlMapper. There are format-specific features that you can enable and disable. Also, some data formats require additional schema information to map content into JSON-like token representation, such as Avro, CSV and Protobuf. But across all formats, the API usage is similar.

Examples

Compared to the previous example that simply reads and writes JSON, these are alternatives for reading and writing other data formats:

// XML usage is almost identical except it uses a different mapper object
ObjectMapper xmlMapper = new XmlMapper();
String doc = xmlMapper.writeValueAsString(user);
User user = xmlMapper.readValue(doc, User.class);

// YAML usage is almost identical except it uses a different mapper object
ObjectMapper yamlMapper = new YAMLMapper();
byte[] asBytes =  yamlMapper.writeValueAsString(user);
User user2 = yamlMapper.readValue(asBytes, User.class);

// CSV requires Schema for most operations (to convert between property
// names and column positions)
// You can compose the schema manually or generate it from POJO.
CsvMapper csvMapper = new CsvMapper();
CsvSchema userSchema = csvMapper.schemaFor(User.class);
String csvDoc = csvMapper.writer(schema)
.writeValueAsString(user);
User user3 = csvMapper.readerFor(User.class)
.with(schema)
.readValue(csvDoc);

Data type modules

Jackson core allows you to read and write Plain Old Java Objects (POJOs) and most standard JDK types (strings, numbers, booleans, arrays, collections, maps, date, calendar, URLs, and UUIDs). However, many Java projects also use value types defined by third-party libraries, such as Guava, Hibernate and Joda. It doesn’t work well if you handle instances of these value types as simple POJOs, especially collection types. Without explicit support, you would have to implement your own handlers as serializers and deserializers to extend Jackson functionality. It’s a huge undertaking to add such explicit support in Jackson core for even the most common Java libraries and could also lead to problematic dependencies.

To solve this challenge, Jackson added a concept called data type modules. These are extensions built and packaged separately from core Jackson and added to the ObjectMapper instance during construction. Data type modules are released by both the Jackson team and external contributors such as authors of third-party libraries. Authors usually add these modules because they want to solve a specific use case and then share the fruits of their labors with others.

Due to the pluggability of these modules, it is possible to use data type modules with different formats and to mix different value types. For example, you can read and write Hibernate-backed Guava ImmutableLists that contain Joda-defined Period values as JSON, CSV, or XML.

The list of known data type modules is long—see Jackson Portal. Here are some examples:

In addition to the data type module implementations, many frameworks also directly support Jackson data type module usage. In particular, various “immutable values” libraries offer such support, such as:

Modules for Kotlin, Scala, Clojure

If you’re using Jackson, you’re not limited to only using POJOs and JDK types with other JVM languages. Jackson has extensions to handle custom types of many other JVM languages.

The following Jackson modules support Kotlin and Scala.

And for Clojure, there are a few libraries that use Jackson under the hood to implement similar support, such as:

This simplifies interoperability further. It makes it easier to use Java functionality from Kotlin, Scala, Clojure, or vice versa.

What’s next

In my next post, I will share my observations on the challenges that the Jackson project faces and how you can contribute to help.


About the Author

Tatu Saloranta is a staff software engineer at Indeed. He leads the team that is integrating the next-generation continuous deployment system. Outside of Indeed, he is best known for his open source activities. Under the handle @cowtowncoder, he has authored many popular open source Java libraries, such as Jackson, Woodstox, lzf compression codec, and Java ClassMate. For a complete list see FasterXML Org.


Cross-posted on Medium.

Tweet about this on TwitterShare on FacebookShare on LinkedInShare on RedditEmail this to someone

Want to Code as an Engineering Manager? Time to Find a Unicorn

Coding as an engineering manager is an exercise in cognitive dissonance.

If you’ve just become a manager, you’ve likely been measuring success by the quantity and quality of the code you ship as an individual contributor (IC). Suddenly, you have new metrics for success and your day-to-day work looks wildly different. One mentor tells you that coding as a manager is futile. Another tells you to stay in the code or risk dulling your technical skills. Some companies encourage engineering managers to conduct IC work as part of their culture, even codifying this behavior in promotion criteria. Others penalize managers for the exact same behavior. It’s confusing and stressful!

Since I started managing, I’ve tried to deliver IC work at least once a quarter (some quarters more successfully than others). In this post, I share some of what I’ve learned: the challenges of coding as an engineering manager1, the benefits, and the ways to identify well-scoped unicorn projects to work on.

Unicorn

Ruby star rainbow sparkle via SVG SILH

While there is no I in Team, there is both a U and an IC in unicorn, just sayin’.

Coding as a manager is hard

Doing IC work as an engineering manager boils down to two challenges: constantly shifting contexts and priorities.

Management requires a lot of context shifting. One meeting you’re coming up with a strategy for headcount, another you’re listening to someone grapple with giving tough feedback, and in yet another you’re leading a kanban meeting. By contrast, coding requires deep work, with an uninterrupted sense of focus for several hours. If you’ve ever tried to code in the half-hour between meetings, you know it looks something like this:

import numpy as np
import pandas as pd
# pull data here
If n in list(TODO):
# look this up, used to know how to do this

As a leader, your job is to prioritize supporting your teammates. That means having career conversations, providing feedback, and helping them deliver innovative and impactful work. While this often takes place in one-on-ones, there are follow-up meetings and check-ins with relevant collaborators, too. As you increase the size of your team, it becomes harder and harder to find any breathing room in a 40-hour work week, let alone the several hours needed to code even a small project from beginning to end. 

Why you should consider coding anyway2

First, coding an IC project can help build empathy for your teammates, the tools that they use, and the challenges they encounter. For instance, I worked on a project with a teammate and learned firsthand about several services around Indeed and their challenges. As a result, in later meetings I was able to speak more specifically and with greater confidence about them. I submitted requests to the maintaining teams and, consequently, got updates prioritized that greatly helped my team. 

Second, coding (sometimes for the first time in weeks or even months) means you’ll need to ask your teammates for help. Recently, I was trying to solve a data wrangling problem that ended up with not one, not two, but three nested for loops. My teammates joked that it put the “OH NO” in Big O Notation. I humbly asked for help and together we figured out a way to solve the problem with much less complexity (and had a good laugh). In my experience, people like feeling helpful, and there’s something special about helping your boss when they’re struggling. We want our teammates to ask others for help. Being vulnerable and asking for help as a leader helps you model that behavior for your teammates, too. 

Finally, and most important to your mental health as a manager: shipping code can make you feel good. Management rarely lends itself to that feeling of “doneness” and is often riddled with self-doubt. Doing IC work, by contrast, is usually a discrete task. You can build something, point at it, and say, “Look! I built that. Sweet.”

How to code as a manager

Broadly speaking, doing IC work as a manager looks something like this: 

  1. Make sure nothing is on fire.
  2. Find a narrowly scoped unicorn project.
  3. Block off time for deep work. 
  4. Start with delegation in mind. 

I’ll walk through each of these in the sections below. 

Make sure nothing is on fire

Coding while someone on your team is struggling with something urgent is like fiddling while Rome burns. It also means that you’re not doing the core functions of your job, i.e., helping your teammates. So, before you even think about scoping an IC project, make sure things on your team feel relatively stable. 

Find a narrowly scoped unicorn project

Let me show you what scoping the right projects doesn’t look like. 

One time, in my eagerness to help a new product team, I took on running several A/B tests that we wanted to roll out by the end of the quarter. The A/B tests were simple enough to keep an eye on until I needed to spin other managerial plates. Meanwhile, my product manager had to pick up the slack. In the end, we delegated the tests to someone else who ran them to completion. It wasn’t a good feeling knowing I was letting my PM down. 

By contrast, a well-scoped IC project for managers: 

  • is not time-sensitive 
  • is fairly small
  • does not have any dependencies 
  • is a "nice-to-have" or quality-of-life improvement that won’t get prioritized by your teammates and might have some nice impact
  • plays to your strengths 

That is, a unicorn. Unicorn IC projects are not going to come up all the time. You can’t find them at all if you don’t know what to look for, though. 

For instance, I was in a design jam a few years back, where some UX teammates said to one another, “Yeah, we don’t always know X about Y queries. It would be nice if we had a tool that could do that.” What they were asking for was fairly small. Before they even knew it, a couple hours later I had built an Ishbook that they still use to help them understand user behaviors on the site. 

Alas, ye have yeself a unicorn! 

It’s also important that your IC project plays to your strengths. It’s already going to be easy to fall into the trap of feeling bad about your coding skills because they will likely be rusty and you’ll code more slowly than you used to. Consequently, you probably won’t keep feeling motivated to do more IC work and this blog post and I will have failed you. 

My IC projects usually are some kind of analysis of survey or measurement bias, helping A/B testing, or building well-designed graphs. Why? Because I like these things, I’m good at them, and they give me energy. When you choose an IC project that feels the same way, you’ll be able to get it done more quickly and at a higher quality. 

Block off time for deep work

Half an hour here and there is usually not enough to get meaningful deep work done. Since becoming a manager, I have blocked off several hours in the morning to do deep work every week, whether that’s coding or reading the latest in my field. This encourages others to message me first before booking over my deep work block. Sometimes I need to join the meeting anyway, but more often than not, I avoid meeting during my peak coding hours. Google Calendar’s new Out of Office feature makes this even more aggressive, by auto-declining meetings booked over your block. 

Some weeks, I can’t prioritize deep work as much, other weeks I have more time than usual. I’ve seen managers beat themselves up about not having time to do IC work every week. Stop. It’s not a realistic goal. In his discussion of coding as a manager, Ben Edmunds writes, “Redefine what success looks like for yourself […] understand that day-to-day tasks aren’t set in stone. As a manager you need to be fluid.” Amen. 

Start with delegation in mind

Coding as a manager means that you’re going to need to spin your other managerial plates again fairly quickly. You won’t have a ton of time to code projects, so whatever you build will likely need to be a prototype of some kind. When coding an IC project as a manager, figure out what a delegatable minimum viable product (MVP) might look like. Hint: it likely includes well-commented code and pair programming. Keep that MVP as your end goal. 

One of the tools I built as a manager helped our job search product run multivariate A/B tests more rigorously. I knew it was janky (heck, it pulled in data from a Google spreadsheet), but it could get the job done for the team and was better than nothing. I was then able to delegate it to my teammate. 

This was great for two reasons. First, it gave my teammate the chance to learn from my expertise. She got to deepen her understanding of measuring statistical significance in multivariate A/B tests. Second, she took what I had originally built and made it way better. While my prototype effectively shipped with Comic Sans as its font, her version had these beautiful, easily digestible graphs and an even more rigorous statistical approach. Her V.1 of the tool is a much better finished project that’s still in use today. 

To sum up

Engineering managers get a lot of conflicting messages about whether they should code. Coding can help you build empathy and trust with your teammates, thereby making you a more effective leader. You set yourself up for failure, though, if you take on the same kinds of projects you had as an IC and try to stuff in coding between meetings. Instead, reframe the kinds of code you ship. Carve out dedicated time for deep work. And keep an eye out for small, non-urgent, delegatable unicorn projects that play to your strengths and can bring value to the team. 


Notes

  1. By the way, when I refer to “engineering,” I don’t just mean software engineers: I also include data scientists, QA, i.e., those whose IC work involves some degree of coding. (back to top)
  2. Probably. The field of engineering management is still in its infancy—up until recently, very few books were published on the subject. A lot of the evidence presented here is anecdotal, so your mileage may vary. For instance, one hypothesis I’ve heard about coding as a manager is that it helps you build "street cred." I honestly don’t know if I’ve helped my cred or I just made myself look foolish in the Git repository here at work, so I chose not to touch on this point in this article, but I’m curious about others’ experiences with this. The scientist in me wants more rigorously collected qualitative and quantitative data around the benefits and drawbacks of doing IC work in the fields of software engineering and data science. So, reach out to me if you’re interested. I have some ideas. (back to top)

Cross-posted on Medium.

Tweet about this on TwitterShare on FacebookShare on LinkedInShare on RedditEmail this to someone

Making Our Code More Inclusive

At Indeed, inclusion extends beyond employee resource groups and celebrations. Diversity of background, experience, and thought makes for a stronger workforce, more effective decision-making, and powerful innovation. To foster inclusion, we want to build a psychologically safe environment at every level and in every area of the business. That’s why we’re removing terminology that works against such inclusion from our codebase.

Image of five Indeed inclusion group members smiling and wearing shirts labeled "Women at Indeed" in a relaxed office setting

Diversity and inclusion is an ingredient for success. Leaders of Indeed Amsterdam’s Women at Indeed employee resource group (l-r): Edwin Moses, Trudy Danso-Osei, Freek van Schaik, Renske van der Linden, and Valerie Sampimon.

What does technical terminology have to do with inclusion?

Like all words, technical terms have connotations that give them immense expressive power. Some connotations are well known and generally understood. Others depend on context and are understood differently by people with varying lived experiences. The original etymology of a term often has little to do with the connotations it accrues over time.

Computer science and software engineering employ many terms that are convenient, meaningful, and useful. However, some terms ask groups of people to ignore the powerful negative and exclusive connotations they carry.

The terms “master” and “slave” exemplify this. Some engineers see these words and are privileged to deduce a benign connotation—a slave is a subordinate process that acts in accordance with the demands of the master. However, for many people, particularly people of color, these terms immediately conjure images of human slavery’s horrors. This connotation doesn’t just exist in the context of one country’s history, such as American slavery. With an estimated 21-45 million people currently enslaved worldwide, the terms master and slave represent both an historic and current global humanitarian crisis.

Many other terms have similar negative connotations. Words that associate colors with value judgments, such as “blacklist,” and language around the exploitation or denigration of cultures, such as “tribal knowledge,” represent just a couple. Ableist language such as “lame” and “blind” used in the wrong context can negatively impact people with disabilities. People continually fight bigotry and prejudice based on these characteristics, and these terms invoke and perpetuate those injustices.

Some of these terms might surprise those of us who don’t share the lived experiences of marginalized individuals. But when our colleagues tell us we are using terms that exclude or hurt them, we should trust them and find new words to use.

Starting the conversation

Even before Indeed officially introduced inclusion and belonging as one of our company values in 2019, our engineers began removing problematic terms from our technology.

We started by opening up the discussion on our internal wiki, with internal blog posts and a dedicated content hub for identifying and deprecating exclusive terminology. All engineers can contribute to and comment on the Inclusive Terminology wiki page. From contributions made there, we created a non-exhaustive quick reference guide to help each other make responsible terminology decisions.

Instead of Use Why
master* primary, main, trunk These terms represent an inherently oppressive relationship.

*The removal of “slave” from the set in common usage does not remove the implied oppressive relationship. Historically, the usage of the term “master” in relation to a Git branch stems from a master / slave relationship.

slave replica, secondary
whitelist allowlist, unblocklist These terms imply a culturally specific binary of good versus evil.
blacklist denylist, blocklist, banned
backlog grooming backlog refinement “Grooming” is a term with specific legal meaning in British English.
tribal knowledge institutional knowledge “Tribe” is a loaded term with negative connotations for First Nations and African communities.
grandfathered legacy, pre-existing Grandfather clauses originated from Jim Crow era discrimination laws in the United States.

Each engineering team chose how to implement the new language in their code. Then, teams shared best practices and processes. We continue these conversations today.

Case study: Replacing “master” with “primary” in a Git project

Renaming the master branch of a Git project is not a trivial exercise, especially for projects with lengthy histories. Recently, our Design Systems team completed this work for one of their projects. To do this, the team:

  1. Cloned the master branch and named the clone “primary.”
  2. Updated the default branch in GitLab from master to primary.
  3. Locked down the master branch. It still exists for historical purposes, but it can no longer be used.
  4. Applied the former settings for the master branch to the new primary branch.

A couple of issues could arise in this scenario. For example, a user could create a branch off master before the team created the new primary branch. Because primary and master share a common history, the user could theoretically merge the feature into primary. To mitigate such issues, the team enacted a code freeze while they made the change. They also tested their process on a smaller project before renaming the main project.

Tangible results

To track this work, Indeed engineers leaned on Atlassian’s Jira, our tool for software development tracking. We added a label to Jira tickets that involve inclusive terminology so we can filter and sort them. This gives us a high-level view of where exclusive language exists, our ongoing efforts to remove that language, and our progress. To date, we’ve closed 97 of 113 issues and counting.

Pie graph showing the number of Jira issues labeled "inclusive-terminology" by status, with 97 closed, 1 deferred, 9 on backlog, 1 pending review, 2 pending triage, and 3 in wish list status.

Jira issues labeled “inclusive-terminology” by status

Challenges to making this happen

This work sparked lots of discussion among our engineers. The last thing we wanted to do was turn these language changes into a policing and shaming process. So, we decided to make this a grassroots effort instead of a top-down mandate. That way, everyone is empowered to respectfully discuss terminology changes while learning from one another. Leadership provides support and guidance when necessary and actively participates in the conversation.

One subject that came up in these discussions was cost and level of effort. Changing terminology throughout all our products is a long-term project that requires many engineer hours. In fact, as of today we still need to remove over 5000 instances of the term “slave” from our codebase. We’re committed, and the psychological safety generated by this work far outweighs the time and effort required to remove exclusive terminology.

A way forward

Language constantly evolves to meet the needs of those who use it, and words fall out of fashion as we progress. Because of this, we know changing the terms in our codebase is an ongoing practice, not a one-time effort.

We continue to document words we want to replace and offer suitable alternatives. We avoid using those terms in any new code and ask our vendors to avoid those terms in their products as well. As we change our codebase, we methodically and carefully locate and replace the existing usages.

We still have work to do. We constantly increase our awareness of exclusive terms and their implications, and we engage in respectful conversations about these topics with each other. Together, we want to create a work environment that is psychologically safe, inclusive, and welcoming for all people at Indeed. By sharing these practices, we hope to model inclusivity and improve the tech industry as well.


Cross-posted on Medium.

Tweet about this on TwitterShare on FacebookShare on LinkedInShare on RedditEmail this to someone