SECRET OF CSS

No More MultipleBagFetchException Thanks to Multiset Nested Collections – Java, SQL and jOOQ.


I’ve recently stumbled upon this interesting Stack Overflow question about Hibernate’s popular MultipleBagFetchException. The question is super popular, and the answers are plenty. The various limitations are discussed throughout the question, it all boils down to a simple fact:

Joins are the wrong tool to nest collections.

Given a schema like the Sakila database:

image 1

There are many to many relationships between:

  • ACTOR and FILM
  • FILM and CATEGORY

Nothing extraordinary. The problem is that when using an ORM, given the nature of the O (Object, Object Orientation), we want to represent this data in a hierarchy, as a graph, or at least a tree. The same holds true when we want to represent this in JSON or XML.

For example, in Java, the following DTOs are a natural representation of the above schema:

record Actor(
    String firstName, 
    String lastName
) {}

record Category(
    String name
) {}

record Film(
    String title,
    List<Actor> actors,
    List<Category> categories
) {}

In JSON, the data might look something like this:

[
  {
    "title": "ACADEMY DINOSAUR",
    "actors": [
      {
        "first_name": "PENELOPE",
        "last_name": "GUINESS"
      },
      {
        "first_name": "CHRISTIAN",
        "last_name": "GABLE"
      },
      {
        "first_name": "LUCILLE",
        "last_name": "TRACY"
      },
      {
        "first_name": "SANDRA",
        "last_name": "PECK"
      },

      ...
    ],
    "categories": [
      { "name": "Documentary" }
    ]
  },
  {
    "title": "ACE GOLDFINGER",
    "actors": [
      {
        "first_name": "BOB",
        "last_name": "FAWCETT"
      },
  ...

Emulated nesting using joins

But the problem in Hibernate and in SQL in general is the fact that joins create cartesian products. It’s not actually a problem. It’s a feature of SQL and relational algebra. We have an entire blog post about how our industry has been teaching joins wrong, using Venn Diagrams.

Joins are filtered cartesian products. Here’s an example of a cartesian product (without filter):

venn-cross-product

Now, if you want to produce the previous nested collection representation using joins only, then you’ll probably write something like this:

SELECT *
FROM film AS f
  JOIN film_actor AS fa USING (film_id)
    JOIN actor AS a USING (actor_id)
  JOIN film_category AS fc USING (film_id)
    JOIN category AS c USING (category_id)

I’ve indented the joins deliberately to illustrate the tree structure of this denormalisation. For each film, we join:

  • Many actors (e.g. M)
  • Many categories (e.g. N)

This means, that we duplicate the film M * N times due to the nature of joins being cartesian products. Not just that, but worse, we also duplicate:

  • Each actor N times (once per category)
  • Each category M times (once per actor)

Eventually, this might even lead to wrong results, e.g. when aggregating, as some combinations shouldn’t be combined.

Apart from the potential correctness problem, this is a very big performance problem. As the ubiquitous Vlad has explained in his answer, JOIN FETCH syntax is being suggested along with DISTINCT and multiple queries as a workaround. You then have to re-assemble the results manually (I stand corrected, Hibernate takes care of re-assembling, see Vlad’s comment below) and take proper care about eager and lazy loading (while you generally have to be careful with these, they don’t apply here, see again Vlad’s comment below). Quite the chore if you ask me!

This is my most favourite Google search on the subject matter:

To be fair, the chore used to be also present with jOOQ, in the past – at least you couldn’t shoot yourself in the foot with accidentally loading all the entire database.

Actual nesting

Ever since ORDBMS were introduced (e.g. Informix, Oracle, PostgreSQL), and the more popular SQL/XML and SQL/JSON extensions were added, it is possible to perform actual nesting directly in SQL. I’ve blogged about this many times now on this blog:

The correct way to nest collections is with SQL via one of the above 3 serialisation formats (native, JSON, XML).

With the above techniques, you can nest your data to any nested DTO structure in Java, or to any nested JSON format. This is possible with native SQL or with jOOQ. It might also be possible with Hibernate in the future, or with other ORMs that follow jOOQ’s lead in this area.

Given the popularity of this Stack Overflow question, it is hard to ignore how important of a problem the nesting of multiple to-many relationships is, and how both SQL (the language) and ORMs have ignored this problem for so long, offering only quirky workarounds that leave users to implement their serialisation manually, when jOOQ has shown how simple and transparent it could be.

Try jOOQ’s MULTISET operator today, no need to wait. It’s as simple as this:

List<Film> result =
dsl.select(
      FILM.TITLE,
      multiset(
        select(
          FILM_ACTOR.actor().FIRST_NAME, 
          FILM_ACTOR.actor().LAST_NAME)
        .from(FILM_ACTOR)
        .where(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
      ).as("actors").convertFrom(r -> r.map(mapping(Actor::new))),
      multiset(
        select(FILM_CATEGORY.category().NAME)
        .from(FILM_CATEGORY)
        .where(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))
      ).as("categories").convertFrom(r -> r.map(Record1::value1))
   )
   .from(FILM)
   .orderBy(FILM.TITLE)
   .fetch(mapping(Film::new));

And the above query is type safe! As soon as you modify your DTO, the query no longer compiles. Not just that! jOOQ also has a parser, so you can pretend your favourite SQL dialect already has MULTISET support today. Try this query here: https://www.jooq.org/translate/

SELECT
  f.title,
  MULTISET(
    SELECT a.first_name, a.last_name
    FROM film_actor AS fa
    JOIN actor AS a USING (actor_id)
    WHERE fa.film_id = f.film_id
  ) AS actors,
  MULTISET(
    SELECT c.name
    FROM film_category AS fc
    JOIN category AS c USING (category_id)
    WHERE fc.film_id = f.film_id
  ) AS categories
FROM film AS f
ORDER BY f.title

jOOQ’s translator will translate this to the following on PostgreSQL:

SELECT
  f.title,
  (
    SELECT coalesce(
      jsonb_agg(jsonb_build_array("v0", "v1")),
      jsonb_build_array()
    )
    FROM (
      SELECT
        a.first_name AS "v0",
        a.last_name AS "v1"
      FROM film_actor AS fa
        JOIN actor AS a
          USING (actor_id)
      WHERE fa.film_id = f.film_id
    ) AS "t"
  ) AS actors,
  (
    SELECT coalesce(
      jsonb_agg(jsonb_build_array("v0")),
      jsonb_build_array()
    )
    FROM (
      SELECT c.name AS "v0"
      FROM film_category AS fc
        JOIN category AS c
          USING (category_id)
      WHERE fc.film_id = f.film_id
    ) AS "t"
  ) AS categories
FROM film AS f
ORDER BY f.title



News Credit

%d bloggers like this: