Jackson: A Growing User Base Presents New Challenges

Jackson is a mature and feature-rich open source project that we use, support, and contribute to here at Indeed. In my previous post, I introduced Jackson’s core competency as a JSON library for Java. I went on to describe the additional data formats, data types, and JVM languages Jackson supports. In this post, I will present challenges resulting from Jackson’s expansion, in my view, as Jackson’s creator and primary maintainer. I’ll also share our plans to address these challenges as a community.

The other side of Cape Flattery by Tatu Saloranta

Growing pains

Over its 13 years, the Jackson project added a lot of new functionality with the help of a large community of contributors. Users report issues and developers contribute fixes, new features, and even new modules. For example, the Java 8 date/time datatype module—which supports Java 8 date/time types, java.time.*, specified in JSR-310—was developed concurrently with the specification itself by a member of the community.

Because of its ever-expanding functionality and user base, Jackson is experiencing growing pains and could use help with:

  • Improving the documentation
  • Enhancing the project’s website and branding
  • Managing and prioritizing large-scale changes and features
  • Refining the details of reported issues
  • Testing the compatibility of new Jackson versions against  downstream dependencies

Documentation’s structure and organization

The availability, accessibility, and freshness of documentation is always a challenge for widely used libraries and frameworks. Jackson is no exception. It may even suffer from more documentation debt than other libraries with a comparable user base.

The problem is not so much a lack of documentation per se. A lot of content exists already:

3rd-party tutorials
Jackson project repos
StackOverflow

Jackson also has a Javadoc reference that documents most classes of interest. However, Jackson is a vast library. The Javadoc reference can seem overwhelming, especially to new users. And the Javadoc reference does not include usage examples.

Our first priority is creating How-To guides that walk library consumers through the most typical use cases, such as:

  • How to convert the f.ex JSON structure to match Java objects.
  • How to format other data types such as XML.
  • How to use Jackson within frameworks, such as Spring/Spring Boot, DropWizard, JAX-RS/Jersey, Lombok, and Immutables.

We also want to improve the Javadoc reference. Some classes and methods have no descriptions, while descriptions for others are incomplete. In addition to the auto-generated Javadoc reference, we publish manually created wiki pages that detail the most used features and objects. We’d like to find ways to auto-update these wiki pages.

To implement these modifications, we need the following documentation structure, tooling, and process changes:

  • A super structure for adding inline content and links to external content
  • Access permissions for contributors to add and correct documentation
  • A documentation feedback mechanism to help focus improvement and maintenance efforts
  • Documentation templates to make it easier to add information

Project website and branding

Even though the project is 13 years old and widely used, the main project home page still has the stock GitHub project look. This repo contains a lot of helpful information about Jackson, but it doesn’t have a distinct style, brand, or logo.

For Hacktoberfest 2020, we created an issue for a Jackson logo design. Examples we like:

Managing large-scale changes

We appreciate the healthy stream of code contributions—mostly to fix reported issues, but for new features as well.

The most valuable type of contribution for the Jackson project has historically been user bug reports filed as GitHub issues. We get a dozen issues each week, leading to fixes and quality improvements. Without these reports, Jackson would not be half as good as it is. The new feature requests we receive regularly are another valuable source of improvements. New feature requests also help us prioritize new development work.

Better management of larger changes and features is an area for improvement. These efforts take a long time to execute and benefit from careful planning and collaboration. While it’s possible to handle bigger changes under a single issue, this can get unwieldy quickly.

The Jackson project could use help with setting the priority of new feature requests.

  • Which issues should we focus on improving?
  • Can we gauge the consensus of the whole community instead of relying on a small number of active and vocal participants? This helps to avoid “squeaky wheel problems” getting priority.
  • What’s the best way to obtain early feedback on API and implementation plans? Using existing mailing lists, issue trackers, and chat all have their challenges.

To address the need for more community feedback, I introduced the idea of “mini-specifications” called Jackson STrategic Enhancement Plan: JSTEP. Similar to KIPs in Kafka, JSTEPs are intended to foster collaboration. So far this has had limited success, partly due to the limitations of the tool used: GitHub wiki pages do not have the same feature set as GitHub issues or Google documents.

I also started keeping a personal but public todo or work-in-progress (WIP) list—Jackson WIP—as a GitHub wiki page. I wanted a low-maintenance mechanism for me to track and share my own near-term plans.

Refining reported issues, collaboratively

As mentioned above, bug reports have historically been the most useful type of feedback. But while useful, managing reported issues and new feature requests takes a lot of effort. This has become especially challenging now that the data formats, data types, JVM languages, and user base have grown.

Basic issue triage has become time consuming and includes the following steps:

  • Validating the reported issues.
  • Determining if the reported issues resulted from documentation that is missing or unclear.
  • Asking for more information.

The triage time takes away from the limited time we have to work on functionality and documentation. It can also slow down responses to issue reports, which in turn leads to a poor reporter experience.

We would like to find ways to train and include volunteer issue triagers. They can help with refining issues in a timely manner and improving user engagement by:

  • Adding labels
  • Asking questions to get more information
  • Adding findings of their own
  • Getting module and domain experts engaged where applicable

Starting with jackson-databind issues, we improved the workflow and refined GitHub issues by creating:

  • Issue templates to help users submit better issue reports
  • The “to-evaluate” label—set automatically for new issues
  • The “most-wanted” label—which indicates issues that have consistently come up on mailing lists or up-voted using the GitHub thumbs-up emoticon

If these changes are successful, we’ll propagate them to other project repos. Similarly, we can make other incremental changes along these lines.

Compatibility testing of new versions with dependencies

The unit test coverage in the Jackson project’s code is reasonable, and coverage for core components is quite good. There’s even a nascent effort to write more inter-component functional tests in the jackson-integration-tests repo. However, there is currently a sizable gap in testing the compatibility between the minor Jackson version that is under development and downstream dependencies, such as frameworks like Spring Boot.

One challenge is the difference between the development lifecycles of Jackson and the libraries and frameworks that use Jackson. Existing libraries and frameworks depend on the stable Jackson minor version and test everything only against that version. Version 2.11 is the current stable version. They report any issues they find. We may fix the issues in a patch release, like 2.11.1.

Simultaneously, the Jackson project is developing a new minor version, 2.12, which will be released and published as version 2.12.0. Working on versions 2.11.1 and 2.12 concurrently should not be an issue. We use the standard semantic versioning scheme for the Jackson public API, which should guard against creating new issues.

In practice, standard semantic versioning cannot protect against all actual issues. Semantic versioning is used for Jackson’s public API, but internal APIs across core components do not have the same level of backwards compatibility. While Jackson maintainers consider the compatibility between the published and documented API, users tend to rely on the API’s actual observed behavior rather than its intended behavior. Sometimes what maintainers consider bugs or unexpected behaviors, users consider the behavior an actual feature or working as expected.

That said, we can catch and address such issues during the development cycle of the new version if we test downstream systems using the under-development SNAPSHOT version of Jackson. At the time of writing this article, the current version of Spring Boot ships with a dependency on Jackson 2.11.2, the latest patch version. We can create an alternative set of Spring Boot unit and integration tests that instead rely on Jackson 2.12.0-SNAPSHOT. The snapshot version is published regularly, and the Jackson project provides automatic snapshot builds.

Such cross-project integration tests would benefit all involved projects. This prevents the release of some (or perhaps most) regression issues, reducing the need for patch versions. Over time it should also significantly improve compatibility across Jackson versions, closing the gap between expected and actual behavior of the API.

How to get involved

Jackson’s growing user base and features have presented us with documentation, branding, issue triaging, issue prioritization, and code testing challenges. I’ve shared some of the steps we’ve taken to address them. You can help us by implementing some of these solutions or suggesting alternates. Together we can lay a foundation for Jackson’s future growth. For contribution opportunities, visit Jackson Hacktoberfest 2020. To find out more, join us on Gitter chat or sign up for the jackson-dev mailing list.


About the author

Tatu Saloranta is a staff software engineer at Indeed, leading the team that is integrating the next-generation continuous deployment system. Outside of Indeed, he is best known for his open source activities. Under the handle @cowtowncoder, he has authored many popular open source Java libraries, such as Jackson, Woodstox, lzf compression codec, and Java ClassMate. For a complete list see FasterXML Org.

Cross-posted on Medium.

Tweet about this on TwitterShare on FacebookShare on LinkedInShare on RedditEmail this to someone

Jackson: More than JSON for Java

Jackson is a mature and feature-rich open source project that we use, support, and contribute to here at Indeed. As Jackson’s creator and primary maintainer, I want to highlight Jackson’s core competencies, extensions, and challenges in this two-part series.

Photo of Cape Flattery by Tatu Saloranta, Jackson's creator

   Photo of Cape Flattery by Tatu Saloranta

Jackson’s core competency

If you’re creating a web service in Java that reads or returns JSON, you need a library to convert Java objects into JSON and vice versa. Since this functionality is not available in the Java standard library (JDK), you can use the Jackson library for its core competency: writing Java objects as JSON and reading JSON as Java objects. Later in this post, I introduce Jackson’s additional features. 

As a Java JSON library, Jackson is:

Jackson is downloaded 25 million times per month via Maven Central, and 16,000 projects depend on jackson-databind. It’s the second most widely used Java library, after Guava, according to the Core Infrastructure Initiative’s census results.

You can use Jackson directly, but nowadays users are more likely to encounter its functionality exposed by libraries and frameworks. These libraries and frameworks use Jackson by default for JSON handling to expose JSON requests and responses; or JSON configuration files as Java objects.

Some examples are:

Users of these frameworks, libraries, and clients may not even be aware that they utilize Jackson. We are more likely to learn this when troubleshooting usage issues or making changes to default handling.

Examples of Jackson usage

The following example shows how to annotate request and response models in the Spring Boot framework.

// Model of updated and accessed content
public class User {
  private Long id;
  private String name;
  @JsonProperty(“dob”) // customize name used in JSON
  private LocalDate dateOfBirth;
  private LocalDateTime lastLogin;
  // plus public Getters, Setters
}

// Service endpoint definition
@RestController
@RequestMapping("/users")
public class UserController {
  // ...
  @GetMapping("/{id}")
  public User findUserById(@PathVariable Long id) {
    return userStorage.findUserById(id);
  }
  @PostMapping(consumes = MediaType.APPLICATION_JSON_VALUE)
  @ResponseStatus(HttpStatus.CREATED)
  public Long createUser(@RequestBody User user) {
    return userStorage.createUser(user);
  }
}

In this example, the Spring Boot framework reads JSON as an instance of the User class. In the POST method, it passes a User instance to a storage handler. It also fetches the User instance via a storage handler for a given ID and writes serialized JSON of that instance. For a full explanation, see Jackson JSON Request and Response Mapping in Spring Boot.

You can also read and write JSON directly via the Jackson API, without any framework or library, as follows:

final ObjectMapper mapper = new ObjectMapper();
// Read JSON as a Java object:
User user;
try (final InputStream in = requestContext.getInputStream()) {
  user = mapper.readValue(in, User.class);
}
// Write a Java object as JSON
try (final OutputStream out = requestContext.getOutputStream()) {
  mapper.writeValue(out, user);
}

Jackson is more than JSON for Java

While Jackson’s core functionality originally started as JSON for Java, it quickly expanded through modules, which are pluggable extension components you can add to the Jackson core. These modules support features that the core does not handle by default. Jackson also has extensions for other JVM languages.

These are some of the types of Jackson modules and extensions that are especially helpful:

  • Data format modules
  • Data type modules
  • Modules for Kotlin, Scala, and Clojure

Data format modules

The data format modules allow you to read and write content that is encoded in formats other than JSON. The low-level encoding details of JSON differ from those of YAML, CSV, or Protobuf. However, much of the higher-level data-binding functionality is similar or identical. The higher-level data binding deals with the structure of Java Objects and expressing them as token streams (or a similar abstraction).

Most of the code in the Jackson core is format-independent and only a small part is truly JSON-format specific. So, these so-called data format modules can easily extend Jackson to read and write content in other data formats. The modules implement low-level streaming from the Jackson API, but they share common data-binding functionality when it comes to converting content to and from Java objects.

In the implementation of these modules, you find factories for streaming parsers and generators. They use a specific factory to construct an ObjectMapper that handles format-specific details, while you only interact with a (mostly) format-agnostic mapper abstraction.

The very first data format module added to Jackson is jackson-dataformat-xml for XML. Supported data formats now include:

The usage for data format modules is similar to the Jackson API JSON usage. You only need to change  JsonMapper (or the generic ObjectMapper) to a format-specific alternative like XmlMapper. There are format-specific features that you can enable and disable. Also, some data formats require additional schema information to map content into JSON-like token representation, such as Avro, CSV and Protobuf. But across all formats, the API usage is similar.

Examples

Compared to the previous example that simply reads and writes JSON, these are alternatives for reading and writing other data formats:

// XML usage is almost identical except it uses a different mapper object
ObjectMapper xmlMapper = new XmlMapper();
String doc = xmlMapper.writeValueAsString(user);
User user = xmlMapper.readValue(doc, User.class);

// YAML usage is almost identical except it uses a different mapper object
ObjectMapper yamlMapper = new YAMLMapper();
byte[] asBytes =  yamlMapper.writeValueAsString(user);
User user2 = yamlMapper.readValue(asBytes, User.class);

// CSV requires Schema for most operations (to convert between property
// names and column positions)
// You can compose the schema manually or generate it from POJO.
CsvMapper csvMapper = new CsvMapper();
CsvSchema userSchema = csvMapper.schemaFor(User.class);
String csvDoc = csvMapper.writer(schema)
.writeValueAsString(user);
User user3 = csvMapper.readerFor(User.class)
.with(schema)
.readValue(csvDoc);

Data type modules

Jackson core allows you to read and write Plain Old Java Objects (POJOs) and most standard JDK types (strings, numbers, booleans, arrays, collections, maps, date, calendar, URLs, and UUIDs). However, many Java projects also use value types defined by third-party libraries, such as Guava, Hibernate and Joda. It doesn’t work well if you handle instances of these value types as simple POJOs, especially collection types. Without explicit support, you would have to implement your own handlers as serializers and deserializers to extend Jackson functionality. It’s a huge undertaking to add such explicit support in Jackson core for even the most common Java libraries and could also lead to problematic dependencies.

To solve this challenge, Jackson added a concept called data type modules. These are extensions built and packaged separately from core Jackson and added to the ObjectMapper instance during construction. Data type modules are released by both the Jackson team and external contributors such as authors of third-party libraries. Authors usually add these modules because they want to solve a specific use case and then share the fruits of their labors with others.

Due to the pluggability of these modules, it is possible to use data type modules with different formats and to mix different value types. For example, you can read and write Hibernate-backed Guava ImmutableLists that contain Joda-defined Period values as JSON, CSV, or XML.

The list of known data type modules is long—see Jackson Portal. Here are some examples:

In addition to the data type module implementations, many frameworks also directly support Jackson data type module usage. In particular, various “immutable values” libraries offer such support, such as:

Modules for Kotlin, Scala, Clojure

If you’re using Jackson, you’re not limited to only using POJOs and JDK types with other JVM languages. Jackson has extensions to handle custom types of many other JVM languages.

The following Jackson modules support Kotlin and Scala.

And for Clojure, there are a few libraries that use Jackson under the hood to implement similar support, such as:

This simplifies interoperability further. It makes it easier to use Java functionality from Kotlin, Scala, Clojure, or vice versa.

What’s next

In my next post, I will share my observations on the challenges that the Jackson project faces and how you can contribute to help.


About the author

Tatu Saloranta is a staff software engineer at Indeed. He leads the team that is integrating the next-generation continuous deployment system. Outside of Indeed, he is best known for his open source activities. Under the handle @cowtowncoder, he has authored many popular open source Java libraries, such as Jackson, Woodstox, lzf compression codec, and Java ClassMate. For a complete list see FasterXML Org.


Cross-posted on Medium.

Tweet about this on TwitterShare on FacebookShare on LinkedInShare on RedditEmail this to someone

Want to Code as an Engineering Manager? Time to Find a Unicorn

Coding as an engineering manager is an exercise in cognitive dissonance.

If you’ve just become a manager, you’ve likely been measuring success by the quantity and quality of the code you ship as an individual contributor (IC). Suddenly, you have new metrics for success and your day-to-day work looks wildly different. One mentor tells you that coding as a manager is futile. Another tells you to stay in the code or risk dulling your technical skills. Some companies encourage engineering managers to conduct IC work as part of their culture, even codifying this behavior in promotion criteria. Others penalize managers for the exact same behavior. It’s confusing and stressful!

Since I started managing, I’ve tried to deliver IC work at least once a quarter (some quarters more successfully than others). In this post, I share some of what I’ve learned: the challenges of coding as an engineering manager1, the benefits, and the ways to identify well-scoped unicorn projects to work on.

Unicorn

Ruby star rainbow sparkle via SVG SILH

While there is no I in Team, there is both a U and an IC in unicorn, just sayin’.

Coding as a manager is hard

Doing IC work as an engineering manager boils down to two challenges: constantly shifting contexts and priorities.

Management requires a lot of context shifting. One meeting you’re coming up with a strategy for headcount, another you’re listening to someone grapple with giving tough feedback, and in yet another you’re leading a kanban meeting. By contrast, coding requires deep work, with an uninterrupted sense of focus for several hours. If you’ve ever tried to code in the half-hour between meetings, you know it looks something like this:

import numpy as np
import pandas as pd
# pull data here
If n in list(TODO):
# look this up, used to know how to do this

As a leader, your job is to prioritize supporting your teammates. That means having career conversations, providing feedback, and helping them deliver innovative and impactful work. While this often takes place in one-on-ones, there are follow-up meetings and check-ins with relevant collaborators, too. As you increase the size of your team, it becomes harder and harder to find any breathing room in a 40-hour work week, let alone the several hours needed to code even a small project from beginning to end. 

Why you should consider coding anyway2

First, coding an IC project can help build empathy for your teammates, the tools that they use, and the challenges they encounter. For instance, I worked on a project with a teammate and learned firsthand about several services around Indeed and their challenges. As a result, in later meetings I was able to speak more specifically and with greater confidence about them. I submitted requests to the maintaining teams and, consequently, got updates prioritized that greatly helped my team. 

Second, coding (sometimes for the first time in weeks or even months) means you’ll need to ask your teammates for help. Recently, I was trying to solve a data wrangling problem that ended up with not one, not two, but three nested for loops. My teammates joked that it put the “OH NO” in Big O Notation. I humbly asked for help and together we figured out a way to solve the problem with much less complexity (and had a good laugh). In my experience, people like feeling helpful, and there’s something special about helping your boss when they’re struggling. We want our teammates to ask others for help. Being vulnerable and asking for help as a leader helps you model that behavior for your teammates, too. 

Finally, and most important to your mental health as a manager: shipping code can make you feel good. Management rarely lends itself to that feeling of “doneness” and is often riddled with self-doubt. Doing IC work, by contrast, is usually a discrete task. You can build something, point at it, and say, “Look! I built that. Sweet.”

How to code as a manager

Broadly speaking, doing IC work as a manager looks something like this: 

  1. Make sure nothing is on fire.
  2. Find a narrowly scoped unicorn project.
  3. Block off time for deep work. 
  4. Start with delegation in mind. 

I’ll walk through each of these in the sections below. 

Make sure nothing is on fire

Coding while someone on your team is struggling with something urgent is like fiddling while Rome burns. It also means that you’re not doing the core functions of your job, i.e., helping your teammates. So, before you even think about scoping an IC project, make sure things on your team feel relatively stable. 

Find a narrowly scoped unicorn project

Let me show you what scoping the right projects doesn’t look like. 

One time, in my eagerness to help a new product team, I took on running several A/B tests that we wanted to roll out by the end of the quarter. The A/B tests were simple enough to keep an eye on until I needed to spin other managerial plates. Meanwhile, my product manager had to pick up the slack. In the end, we delegated the tests to someone else who ran them to completion. It wasn’t a good feeling knowing I was letting my PM down. 

By contrast, a well-scoped IC project for managers: 

  • is not time-sensitive 
  • is fairly small
  • does not have any dependencies 
  • is a "nice-to-have" or quality-of-life improvement that won’t get prioritized by your teammates and might have some nice impact
  • plays to your strengths 

That is, a unicorn. Unicorn IC projects are not going to come up all the time. You can’t find them at all if you don’t know what to look for, though. 

For instance, I was in a design jam a few years back, where some UX teammates said to one another, “Yeah, we don’t always know X about Y queries. It would be nice if we had a tool that could do that.” What they were asking for was fairly small. Before they even knew it, a couple hours later I had built an Ishbook that they still use to help them understand user behaviors on the site. 

Alas, ye have yeself a unicorn! 

It’s also important that your IC project plays to your strengths. It’s already going to be easy to fall into the trap of feeling bad about your coding skills because they will likely be rusty and you’ll code more slowly than you used to. Consequently, you probably won’t keep feeling motivated to do more IC work and this blog post and I will have failed you. 

My IC projects usually are some kind of analysis of survey or measurement bias, helping A/B testing, or building well-designed graphs. Why? Because I like these things, I’m good at them, and they give me energy. When you choose an IC project that feels the same way, you’ll be able to get it done more quickly and at a higher quality. 

Block off time for deep work

Half an hour here and there is usually not enough to get meaningful deep work done. Since becoming a manager, I have blocked off several hours in the morning to do deep work every week, whether that’s coding or reading the latest in my field. This encourages others to message me first before booking over my deep work block. Sometimes I need to join the meeting anyway, but more often than not, I avoid meeting during my peak coding hours. Google Calendar’s new Out of Office feature makes this even more aggressive, by auto-declining meetings booked over your block. 

Some weeks, I can’t prioritize deep work as much, other weeks I have more time than usual. I’ve seen managers beat themselves up about not having time to do IC work every week. Stop. It’s not a realistic goal. In his discussion of coding as a manager, Ben Edmunds writes, “Redefine what success looks like for yourself […] understand that day-to-day tasks aren’t set in stone. As a manager you need to be fluid.” Amen. 

Start with delegation in mind

Coding as a manager means that you’re going to need to spin your other managerial plates again fairly quickly. You won’t have a ton of time to code projects, so whatever you build will likely need to be a prototype of some kind. When coding an IC project as a manager, figure out what a delegatable minimum viable product (MVP) might look like. Hint: it likely includes well-commented code and pair programming. Keep that MVP as your end goal. 

One of the tools I built as a manager helped our job search product run multivariate A/B tests more rigorously. I knew it was janky (heck, it pulled in data from a Google spreadsheet), but it could get the job done for the team and was better than nothing. I was then able to delegate it to my teammate. 

This was great for two reasons. First, it gave my teammate the chance to learn from my expertise. She got to deepen her understanding of measuring statistical significance in multivariate A/B tests. Second, she took what I had originally built and made it way better. While my prototype effectively shipped with Comic Sans as its font, her version had these beautiful, easily digestible graphs and an even more rigorous statistical approach. Her V.1 of the tool is a much better finished project that’s still in use today. 

To sum up

Engineering managers get a lot of conflicting messages about whether they should code. Coding can help you build empathy and trust with your teammates, thereby making you a more effective leader. You set yourself up for failure, though, if you take on the same kinds of projects you had as an IC and try to stuff in coding between meetings. Instead, reframe the kinds of code you ship. Carve out dedicated time for deep work. And keep an eye out for small, non-urgent, delegatable unicorn projects that play to your strengths and can bring value to the team. 


Notes

  1. By the way, when I refer to “engineering,” I don’t just mean software engineers: I also include data scientists, QA, i.e., those whose IC work involves some degree of coding. (back to top)
  2. Probably. The field of engineering management is still in its infancy—up until recently, very few books were published on the subject. A lot of the evidence presented here is anecdotal, so your mileage may vary. For instance, one hypothesis I’ve heard about coding as a manager is that it helps you build "street cred." I honestly don’t know if I’ve helped my cred or I just made myself look foolish in the Git repository here at work, so I chose not to touch on this point in this article, but I’m curious about others’ experiences with this. The scientist in me wants more rigorously collected qualitative and quantitative data around the benefits and drawbacks of doing IC work in the fields of software engineering and data science. So, reach out to me if you’re interested. I have some ideas. (back to top)

Cross-posted on Medium.

Tweet about this on TwitterShare on FacebookShare on LinkedInShare on RedditEmail this to someone