SECRET OF CSS

Setting the JDBC Statement.setFetchSize() to 1 for Single Row Queries – Java, SQL and jOOQ.


An interesting hint by Vladimir Sitnikov has made me think about a new benchmark for jOOQ:

The benchmark should check whether single row queries should have a JDBC Statement.setFetchSize(1) call made to them by default. The Javadoc of the method says:

Gives the JDBC driver a hint as to the number of rows that should be fetched from the database when more rows are needed for ResultSet objects generated by this Statement. If the value specified is zero, then the hint is ignored. The default value is zero.

If an ORM (e.g. jOOQ) knows that it will fetch only 1 row, or if it knows that there can be only 1 row, then that hint certainly makes sense. Examples in jOOQ include:

  • When users call ResultQuery.fetchSingle(), or fetchOne(), or fetchOptional(), or any similar method, then it is reasonable to expect only 0 – 1 rows to be returned. In the case of those methods returning more than 1 row, an exception is thrown, so even if there are more rows, 2 rows will be fetched at most.
  • When users add a LIMIT 1 clause on a top level query, there can never be more than 1 row.
  • When the query is trivial (no joins, or only to-one joins, no GROUP BY GROUPING SETS, no UNION, etc.) and an equality predicate on a UNIQUE constraint is present, there can also be no more than 1 row.

The database optimiser knows all of these things as well. If you add LIMIT 1 to a query, then the optimiser can be reasonably expected to take that as a strong hint about the result set size. But the JDBC driver doesn’t know these things (or at least, it shouldn’t be expected to), because it’s unlikely that it parses the SQL and calculates statistics on it, or considers meta data for such optimisations.

So, the user could hint. And because that would be very tedious for users, even better, the ORM (e.g. jOOQ) should hint. Or so it seems.

Benchmarks

But should it? Is it really worth the trouble? Here’s Vladimir’s assessment about the pgjdbc driver, where he wouldn’t expect an improvement now, but perhaps in the future.

Better than making assumptions, let’s measure, using a JMH benchmark. JMH is normally used for microbenchmarking things on the JVM, to test assumptions about JIT runtime behaviour. This is obviously not a microbenchmark, but I still like JMH’s approach and output, which includes standard deviations and errors, as well as ignores warmup penalties, etc.

First off, the results:

Because benchmark results can’t be published for some commercial RDBMS (at least not when comparing between RDBMS), I have normalised the results so a comparison of actual execution speed between RDBMS is not possible. I.e. for each RDBMS, the faster execution is 1, and the slower one is some fraction of 1. That way, the RDBMS is only benchmarked against itself, which is fair.

The results are below. We’re measuring throughput, so lower is worse.

Db2
---
Benchmark                            Mode   Score 
JDBCFetchSizeBenchmark.fetchSize1   thrpt   0.677
JDBCFetchSizeBenchmark.noFetchSize  thrpt   1.000

MySQL
-----
Benchmark                            Mode   Score 
JDBCFetchSizeBenchmark.fetchSize1   thrpt   0.985
JDBCFetchSizeBenchmark.noFetchSize  thrpt   1.000

Oracle
------
Benchmark                            Mode   Score 
JDBCFetchSizeBenchmark.fetchSize1   thrpt   0.485
JDBCFetchSizeBenchmark.noFetchSize  thrpt   1.000

PostgreSQL
----------
Benchmark                            Mode   Score 
JDBCFetchSizeBenchmark.fetchSize1   thrpt   1.000
JDBCFetchSizeBenchmark.noFetchSize  thrpt   0.998

SQL Server
----------
Benchmark                            Mode   Score 
JDBCFetchSizeBenchmark.fetchSize1   thrpt   0.972
JDBCFetchSizeBenchmark.noFetchSize  thrpt   1.000

For each RDBMS, I have run a trivial query producing a single row with 1 column. Each time, I have re-created a JDBC Statement, and fetched the ResultSet. In fetchSize1, I have specified the fetch size hint. In noFetchSize, I left the default untouched. As can be summarised:

In these RDBMS, there was no effect

  • MySQL
  • PostgreSQL
  • SQL Server

In these RDBMS, things got significantly worse (not better!):

This is quite surprising, as the benchmark includes running the entire statement on the server, so I would have expected, at best, a negligible result.

For this benchmark, I was using these server and JDBC driver versions:

  • Db2 11.5.6.0 with jcc-11.5.6.0
  • MySQL 8.0.29 with mysql-connector-java-8.0.28
  • Oracle 21c with ojdbc11-21.5.0.0
  • PostgreSQL 14.1 with postgresql-42.3.3
  • SQL Server 2019 with mssql-jdbc-10.2.0

The benchmark logic is here:

package org.jooq.test.benchmarks.local;

import java.sql.*;

import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.infra.Blackhole;

@Fork(value = 1)
@Warmup(iterations = 3, time = 3)
@Measurement(iterations = 7, time = 3)
public class JDBCFetchSizeBenchmark {

    @State(Scope.Benchmark)
    public static class BenchmarkState {

        Connection connection;

        @Setup(Level.Trial)
        public void setup() throws Exception {
            Class.forName("org.postgresql.Driver");
            connection = DriverManager.getConnection(
                "jdbc:postgresql://localhost:5432/postgres",
                "postgres",
                "test"
            );
        }

        @TearDown(Level.Trial)
        public void teardown() throws Exception {
            connection.close();
        }
    }

    @FunctionalInterface
    interface ThrowingConsumer<T> {
        void accept(T t) throws SQLException;
    }

    private void run(
        Blackhole blackhole,
        BenchmarkState state,
        ThrowingConsumer<Statement> c
    ) throws SQLException {
        try (Statement s = state.connection.createStatement()) {
            c.accept(s);

            try (ResultSet rs = s.executeQuery(
                "select title from t_book where id = 1")
            ) {
                while (rs.next())
                    blackhole.consume(rs.getString(1));
            }
        }
    }

    @Benchmark
    public void fetchSize1(Blackhole blackhole, BenchmarkState state)
    throws SQLException {
        run(blackhole, state, s -> s.setFetchSize(1));
    }

    @Benchmark
    public void noFetchSize(Blackhole blackhole, BenchmarkState state)
    throws SQLException {
        run(blackhole, state, s -> {});
    }
}

A few remarks:

  • The query is by no means representative of a production workload. But if things did get improved by the fetchSize flag, the improvement should have manifested
  • The benchmark didn’t use prepared statements, which could have removed some side-effects, or added some side-effects. Feel free to repeat the benchmark using prepared statements.
  • It is not yet understood why things didn’t matter in some drivers, or why they did in others. For the conclusion, the “why” is not too important, because nothing will be changed as a result of this blog post. If you know why (the db2 driver and ojdbc code isn’t open source, regrettably), I’d be curious.

Conclusion

Optimisations are a tricky beast. Some things seem to make a lot of sense when reasoning about them, but in actual measurements, the seemingly more optimal thing is actually worse, or irrelevant.

In this case, at first, it looked as though we should hint the JDBC driver about our intentions of fetching only 1 row. I don’t know why the JDBC driver behaved worse than if I didn’t hint it. Perhaps it allocated a buffer that was too small, and had to increase it, rather than allocating a buffer that was too large, but large enough.

I’ve done similar benchmarks in the past, trying to “optimise” initial sizes of ArrayList or StringBuilder. I was hardly able to consistently outperform the defaults. Sometimes, the “improvement” did seem to improve things. Sometimes, it worsened things.

With no clear wins (which are understood, don’t blindly trust benchmark results either, even if you’re winning!), I lost confidence in these improvements, and didn’t implement them in the end. This case here is the same. I have not been able to achieve improvements, but in 2/5 cases, things got significantly worse.

Follow up

On /r/java, there had been a discussion about this article. It suggested 2 additional checks:

1. Try using a fetchSize of 2

You’d be tempted to think that other fetch sizes could still be appropriate, e.g. 2, to prevent that potential buffer size increment. I just tried that with Oracle only, producing:

JDBCFetchSizeBenchmark.fetchSize1   thrpt  0.513
JDBCFetchSizeBenchmark.fetchSize2   thrpt  0.968
JDBCFetchSizeBenchmark.noFetchSize  thrpt  1.000

While the penalty of setting the fetchSize to 1 has disappeared, there is again no improvement over the default value.

2. Try using PreparedStatements

In my opinion, PreparedStatement usage shouldn’t matter for this specific benchmark, which is why I had originally left them out. Someone on the reddit discussion was eager to put all their money on the single PreparedStatement card, so here’s an updated result, again with Oracle only, comparing static statements with prepared ones (updated benchmark code below):

Benchmark                                    Mode     Score
JDBCFetchSizeBenchmark.fetchSizePrepared1    thrpt    0.503
JDBCFetchSizeBenchmark.fetchSizeStatic1      thrpt    0.518
JDBCFetchSizeBenchmark.fetchSizePrepared2    thrpt    0.939
JDBCFetchSizeBenchmark.fetchSizeStatic2      thrpt    0.994
JDBCFetchSizeBenchmark.noFetchSizePrepared   thrpt    1.000
JDBCFetchSizeBenchmark.noFetchSizeStatic     thrpt    0.998

The result is the same for both. Not just that, it can be seen that in my particular setup (Querying Oracle XE 21c in docker, locally), there’s absolutely no difference between using a static statement and a prepared statement in this case.

It would again be interesting to investigate why that is, hypotheses may include e.g.

  • ojdbc caches also static statements in the prepared statement cache
  • the effect of caching a prepared statement is negligible in a benchmark that runs only a single statement, which is nowhere near representative of a production workload
  • the client side effect of preparing statements is irrelevant compared to the benefits of the cursor cache on the server side, or compared to the detrimental effect of setting the fetchSize to 1

The updated benchmark code:

package org.jooq.test.benchmarks.local;

import java.sql.*;

import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.infra.Blackhole;

@Fork(value = 1)
@Warmup(iterations = 3, time = 3)
@Measurement(iterations = 7, time = 3)
public class JDBCFetchSizeBenchmark {

    @State(Scope.Benchmark)
    public static class BenchmarkState {

        Connection connection;

        @Setup(Level.Trial)
        public void setup() throws Exception {
            Class.forName("oracle.jdbc.OracleDriver");
            connection = DriverManager.getConnection(
                "jdbc:oracle:thin:@localhost:1521/XEPDB1",
                "TEST",
                "TEST"
            );
        }

        @TearDown(Level.Trial)
        public void teardown() throws Exception {
            connection.close();
        }
    }

    @FunctionalInterface
    interface ThrowingConsumer<T> {
        void accept(T t) throws SQLException;
    }

    private void runPrepared(
        Blackhole blackhole,
        BenchmarkState state,
        ThrowingConsumer<Statement> c
    ) throws SQLException {
        try (PreparedStatement s = state.connection.prepareStatement(
            "select title from t_book where id = 1")
        ) {
            c.accept(s);

            try (ResultSet rs = s.executeQuery()) {
                while (rs.next())
                    blackhole.consume(rs.getString(1));
            }
        }
    }

    private void runStatic(
        Blackhole blackhole,
        BenchmarkState state,
        ThrowingConsumer<Statement> c
    ) throws SQLException {
        try (Statement s = state.connection.createStatement()) {
            c.accept(s);

            try (ResultSet rs = s.executeQuery(
                "select title from t_book where id = 1")
            ) {
                while (rs.next())
                    blackhole.consume(rs.getString(1));
            }
        }
    }

    @Benchmark
    public void fetchSizeStatic1(Blackhole blackhole, BenchmarkState state)
    throws SQLException {
        runStatic(blackhole, state, s -> s.setFetchSize(1));
    }

    @Benchmark
    public void fetchSizeStatic2(Blackhole blackhole, BenchmarkState state)
    throws SQLException {
        runStatic(blackhole, state, s -> s.setFetchSize(2));
    }

    @Benchmark
    public void noFetchSizeStatic(Blackhole blackhole, BenchmarkState state)
    throws SQLException {
        runStatic(blackhole, state, s -> {});
    }

    @Benchmark
    public void fetchSizePrepared1(Blackhole blackhole, BenchmarkState state)
    throws SQLException {
        runPrepared(blackhole, state, s -> s.setFetchSize(1));
    }

    @Benchmark
    public void fetchSizePrepared2(Blackhole blackhole, BenchmarkState state)
    throws SQLException {
        runPrepared(blackhole, state, s -> s.setFetchSize(2));
    }

    @Benchmark
    public void noFetchSizePrepared(Blackhole blackhole, BenchmarkState state)
    throws SQLException {
        runPrepared(blackhole, state, s -> {});
    }
}





News Credit

%d bloggers like this: