|
| 1 | +# SQLsmith SQLLogicTest Corpus Generator |
| 2 | + |
| 3 | +This repo builds a container image that compiles [SQLsmith](https://github.com/anse1/sqlsmith) and turns its generated queries into [SQLLogicTest](https://www.sqlite.org/sqllogictest/doc/trunk/about.wiki) cases. Everything runs inside Docker; the host machine only needs the Docker CLI. |
| 4 | + |
| 5 | +## Build the image |
| 6 | + |
| 7 | +```bash |
| 8 | +# from the repository root |
| 9 | +docker build -t sqlsmith-slt . |
| 10 | +``` |
| 11 | + |
| 12 | +The build compiles SQLsmith from the `master` branch and installs the helper scripts under `/usr/local/bin` inside the image. |
| 13 | + |
| 14 | +## Generate a corpus |
| 15 | + |
| 16 | +```bash |
| 17 | +mkdir -p out |
| 18 | + |
| 19 | +docker run --rm \ |
| 20 | + -v "$(pwd)/out":/out \ |
| 21 | + sqlsmith-slt |
| 22 | +``` |
| 23 | + |
| 24 | +The container writes results into `/out` (mapped to `./out` on the host): |
| 25 | + |
| 26 | +- `seeds.sql` – raw SQLsmith statements (aggregated across all batches) for reproducibility. |
| 27 | +- `case_*.test` – SQLLogicTest files containing expected results computed by SQLite. |
| 28 | + |
| 29 | +## Customize generation |
| 30 | + |
| 31 | +Override the environment variables below with `-e NAME=value` flags when running the container: |
| 32 | + |
| 33 | +| Variable | Default | Description | |
| 34 | +| --- | --- | --- | |
| 35 | +| `TARGET_ENGINE` | `sqlite` | Execution backend used for result materialization. Only `sqlite` is supported today. | |
| 36 | +| `SQLSMITH_BATCH_QUERIES` | `250` | Number of statements per SQLsmith batch. `SQLSMITH_MAX_QUERIES` remains as an alias. | |
| 37 | +| `SQLSMITH_SEED` | `1` | Base seed passed to SQLsmith; each batch increments it by one for variety. | |
| 38 | +| `OUTPUT_MODE` | `slt` | Set to `slt` for SQLLogicTest cases or `statements` for a plain SQL file. | |
| 39 | +| `SQLLOGICTEST_ROWSORT` | `rowsort` | Switch to `nosort` to omit the `rowsort` directive. | |
| 40 | +| `SQLITE_TIMEOUT` | `1.0` | Seconds allowed for each SQLite execution (not yet enforced). | |
| 41 | +| `SEED_FILENAME` | `seeds.sql` | Name of the raw SQL dump written to `/out`. | |
| 42 | +| `SQLITE_INIT_SQL` | `/usr/local/share/sqlsmith/init.sql` | SQL script executed once to seed the SQLite database before SQLsmith runs. Set to empty to skip. | |
| 43 | +| `SQLSMITH_PASS_TARGET` | _(unset)_ | Minimum number of passing cases (`query` + `statement ok`) to retain. When set, the container keeps running SQLsmith until the target is met. | |
| 44 | +| `SQLSMITH_MAX_ERRORS` | _(unset)_ | Maximum number of `statement error` cases to keep. Excess failures are discarded. | |
| 45 | +| `SQLSMITH_MAX_CASES` | _(unset)_ | Optional cap on the number of new cases admitted per SQLsmith batch. | |
| 46 | + |
| 47 | +The SQLite connection defaults to `/tmp/sqlsmith.db`; set `ENGINE_URI` (or the legacy `SQLITE_URI`) when you need to target a different database file or URI. |
| 48 | + |
| 49 | +When `SQLSMITH_PASS_TARGET` is specified the entrypoint loops, running SQLsmith in batches of `SQLSMITH_BATCH_QUERIES` statements until the accumulated corpus contains at least that many passing cases. `SQLSMITH_MAX_ERRORS` bounds how many failure cases are retained. Each batch bumps the SQLsmith seed by one to broaden coverage while keeping the run reproducible. |
| 50 | + |
| 51 | +Example: generate a corpus with at least 20 passing cases and at most 3 expected failures: |
| 52 | + |
| 53 | +```bash |
| 54 | +docker run --rm \ |
| 55 | + -v "$(pwd)/out":/out \ |
| 56 | + -e SQLSMITH_PASS_TARGET=20 \ |
| 57 | + -e SQLSMITH_MAX_ERRORS=3 \ |
| 58 | + -e SQLSMITH_BATCH_QUERIES=50 \ |
| 59 | + sqlsmith-slt |
| 60 | +``` |
| 61 | + |
| 62 | +To capture non-empty result sets, point SQLsmith (and the executor) at a populated SQLite database, for example: |
| 63 | + |
| 64 | +```bash |
| 65 | +docker run --rm \ |
| 66 | + -v "$(pwd)/northwind.db":/data/northwind.db:ro \ |
| 67 | + -v "$(pwd)/out":/out \ |
| 68 | + -e ENGINE_URI="file:/data/northwind.db?mode=ro" \ |
| 69 | + sqlsmith-slt |
| 70 | +``` |
| 71 | + |
| 72 | +By default the container seeds `/tmp/sqlsmith.db` using `SQLITE_INIT_SQL`, provisioning sample commerce-style tables so the generated queries have data to read from. Replace that script or mount your own to tailor the schema. |
| 73 | + |
| 74 | +Pass extra flags directly through to SQLsmith by appending them after the image name. Example: `docker run … sqlsmith-slt --exclude-catalog`. |
| 75 | + |
| 76 | +## Verify the output |
| 77 | + |
| 78 | +After running the container you should see files in `out/`: |
| 79 | + |
| 80 | +```bash |
| 81 | +ls out | head |
| 82 | +# case_000001.test |
| 83 | +# case_000002.test |
| 84 | +# ... |
| 85 | +# seeds.sql |
| 86 | +``` |
| 87 | + |
| 88 | +Each `.test` file follows SQLLogicTest formatting and can be executed with your preferred SLT runner. |
| 89 | + |
| 90 | +Count the generated cases: |
| 91 | + |
| 92 | +```bash |
| 93 | +ls out/case_*.test | wc -l |
| 94 | +``` |
0 commit comments