Skip to content

Adapt read and write paths to async I/O #3

@psarna

Description

@psarna

SQLite already exposes an interface quite friendly to asynchronous I/O, because sqlite3_step function used to proceed with sqlite3_stmt (https://www.sqlite.org/c3ref/stmt.html) state machine is capable of returning SQLITE_BUSY or other error types if it decides that an operation cannot proceed immediately. sqlite3_stmt is stateful, so we're free to keep whatever state we need, assuming that users are going to keep asking for the new step until they get an unrecoverable error or SQLITE_DONE.

Thus, it should be quite possible to modify sqlite3_stmt so that it works with asynchronous I/O, e.g. backed by io_uring.

The new flow could be based on a new implementation of vfs (https://www.sqlite.org/vfs.html), backed by io_uring or libaio for Linux, which returns SQLITE_BUSY when an operation is queued, but not yet finished. Imagine transforming the following high level flow:

  1. The user needs to select rows from the database
  2. sqlite3_step performs a (potentially) blocking read from the filesystem
  3. sqlite3_step returns the results

Into the following flow:

  1. The user needs to select rows from the database
  2. sqlite3_step issues an async read, and it's not immediately ready, so:
    • the handle for this async operation is saved in sqlite3_stmt state
    • SQLITE_BUSY is returned
  3. sqlite3_step is called again as a natural consequence of seeing SQLITE_BUSY (we need to triple-check it's indeed what drivers/apps do)
    • if the operation is still in progress, SQLITE_BUSY is returned
    • otherwise, sqlite3_stmt can move to the next state
  4. sqlite3_step returns the results

With that design, sqlite3_step becomes capable of effectively polling the async i/o backend, e.g. io_uring. This might be beneficial for the app. Let's imagine a node.js program which uses sqlite3. This program is written asynchronously and has several fibers of execution, and one of which involves communication with the sqlite database. In the original design, the thread/process which executes the sqlite query is going to sometimes block on reads/writes. In the new design, the thread/process will instead receive SQLITE_BUSY, and try again later - and in that window of opportunity, it could safely run other asynchronous tasks, thereby improving the overall latency of the whole system.

A very naïve, hardcoded and simplified proof of concept patch which simulates returning SQLITE_BUSY for async I/O can be found here: https://gist.github.com/psarna/2363798cfe023a3a379af8d4a4d3ac44

And it works in harmony with this example, which calls sqlite3_step in a loop until it sees SQLITE_DONE, and ends up properly inserting given value into the database: https://gist.github.com/psarna/d546e327d1c431d3b5e035858441af5e . It does not pass tests, because they generally expect operations to return results instead of SQLITE_BUSY out of the blue. The async VFS flavor will be opt-in anyway, and it is going to need a separate test suite.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions