Skip to content

Support fetching query results in chunks#682

Open
staticlibs wants to merge 1 commit into
duckdb:mainfrom
staticlibs:chunked_result
Open

Support fetching query results in chunks#682
staticlibs wants to merge 1 commit into
duckdb:mainfrom
staticlibs:chunked_result

Conversation

@staticlibs
Copy link
Copy Markdown
Member

This PR allows to read a query result as a lazily-fetched sequence of data chunks.

It effectively exposes the duckdb_fetch_chunk call to Java allowing to read the results in batches avoiding the per-row overhead mandated by the JDBC specification.

For accessing the chunks contents the same DuckDBDataChunkReader API is used as with Java user-defined functions.

Usage example:

try (DuckDBConnection conn = DriverManager.getConnection("jdbc:duckdb:").unwrap(DuckDBConnection.class);
     DuckDBPreparedStatement ps = conn.prepare("SELECT ? AS col1")) {

    ps.setInt(1, 42); // statement parameters are still 1-based

    try (DuckDBChunkedResult res = ps.query()) {

        // advance to the next chunk, returns true on success
        while (res.nextChunk()) {

            // get the current chunk from the result
            DuckDBDataChunkReader chunk = res.chunk();

            // iterate over the chunk columns, all indices are 0-based
            for (long columnIndex = 0; columnIndex < chunk.columnCount(); columnIndex++) {

                // get a vector for the specified column
                DuckDBReadableVector vector = chunk.vector(columnIndex);

                // iterate over vector rows
                for (long rowIndex = 0; rowIndex < chunk.rowCount(); rowIndex++) {

                    // get a value in the vector on the specified row
                    int val = vector.getInt(rowIndex);
                    System.out.println(val);
                }
            }
        }
    }
}

Note1: Currently it only supports basic data types, support for composite types (LIST, STRUCT) is going to be added in future.

Note2: the query() method can only be used on prepared statements, currently there is no query(String) overload.

This PR allows to read a query result as a lazily-fetched sequence of
[data chunks](https://github.com/duckdb/duckdb-java/blob/32d68a448d27f00e0e86f59e454dcf7a674e9cc8/src/duckdb/src/include/duckdb/common/types/data_chunk.hpp#L26-L30).

It effectively exposes the [duckdb_fetch_chunk](https://github.com/duckdb/duckdb-java/blob/32d68a448d27f00e0e86f59e454dcf7a674e9cc8/src/duckdb/src/include/duckdb.h#L5376-L5386)
call to Java allowing to read the results in batches avoiding the
per-row overhead mandated by the JDBC specification.

For accessing the chunks contents the same `DuckDBDataChunkReader` API
is used as with [Java user-defined functions](https://github.com/duckdb/duckdb-java/blob/32d68a448d27f00e0e86f59e454dcf7a674e9cc8/UDF.MD).

Usage example:

```java
try (DuckDBConnection conn = DriverManager.getConnection("jdbc:duckdb:").unwrap(DuckDBConnection.class);
     DuckDBPreparedStatement ps = conn.prepare("SELECT ? AS col1")) {

    ps.setInt(1, 42); // statement parameters are still 1-based

    try (DuckDBChunkedResult res = ps.query()) {

        // advance to the next chunk, returns true on success
        while (res.nextChunk()) {

            // get the current chunk from the result
            DuckDBDataChunkReader chunk = res.chunk();

            // iterate over the chunk columns, all indices are 0-based
            for (long columnIndex = 0; columnIndex < chunk.columnCount(); columnIndex++) {

                // get a vector for the specified column
                DuckDBReadableVector vector = chunk.vector(columnIndex);

                // iterate over vector rows
                for (long rowIndex = 0; rowIndex < chunk.rowCount(); rowIndex++) {

                    // get a value in the vector on the specified row
                    int val = vector.getInt(rowIndex);
                    System.out.println(val);
                }
            }
        }
    }
}
```

Note1: Currently it only supports basic data types, support for composite types
(`LIST`, `STRUCT`) is going to be added in future.

Note2: the `query()` method can only be used on prepared statements,
currently there is no `query(String)` overload.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant