Skip to content

Commit 96a6fc0

Browse files
committed
Add BigQuery adapter with emulator-based integration tests
- Create BigQueryAdapter in sidemantic/db/bigquery.py - Uses google-cloud-bigquery client - Supports bigquery://project_id/dataset_id URL format - BigQueryResult wrapper for DuckDB-compatible API - Arrow support via to_arrow() - Add bigquery optional dependency to pyproject.toml - google-cloud-bigquery>=3.0.0 - pyarrow>=14.0.0 - Update SemanticLayer to recognize bigquery:// URLs - Add BigQueryConnection to config.py - Add tests: - test_bigquery_adapter.py: Basic adapter tests (import, URL parsing) - test_bigquery_integration.py: 8 integration tests against emulator - Add BigQuery emulator to docker-compose.yml - Uses ghcr.io/goccy/bigquery-emulator:latest - Runs on port 9050 - Update integration.yml workflow - Add bigquery-integration job with emulator service - Update documentation in tests/db/README.md Regular tests: 570 passed, 3 skipped, 18 deselected (10 postgres + 8 bigquery)
1 parent 865a423 commit 96a6fc0

10 files changed

Lines changed: 957 additions & 8 deletions

File tree

.github/workflows/integration.yml

Lines changed: 37 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,4 +43,40 @@ jobs:
4343
env:
4444
POSTGRES_TEST: "1"
4545
POSTGRES_URL: "postgres://test:test@localhost:5432/sidemantic_test"
46-
run: uv run pytest -m integration -v
46+
run: uv run pytest -m integration tests/db/test_postgres_integration.py -v
47+
48+
bigquery-integration:
49+
runs-on: ubuntu-latest
50+
51+
services:
52+
bigquery:
53+
image: ghcr.io/goccy/bigquery-emulator:latest
54+
ports:
55+
- 9050:9050
56+
options: >-
57+
--health-cmd "grpc_health_probe -addr=:9050"
58+
--health-interval 10s
59+
--health-timeout 5s
60+
--health-retries 5
61+
62+
steps:
63+
- uses: actions/checkout@v4
64+
65+
- name: Install uv
66+
uses: astral-sh/setup-uv@v5
67+
with:
68+
enable-cache: true
69+
70+
- name: Set up Python
71+
run: uv python install 3.12
72+
73+
- name: Install dependencies
74+
run: uv sync --extra bigquery --extra dev
75+
76+
- name: Run BigQuery integration tests
77+
env:
78+
BIGQUERY_TEST: "1"
79+
BIGQUERY_EMULATOR_HOST: "localhost:9050"
80+
BIGQUERY_PROJECT: "test-project"
81+
BIGQUERY_DATASET: "test_dataset"
82+
run: uv run pytest -m integration tests/db/test_bigquery_integration.py -v

docker-compose.yml

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,16 +15,34 @@ services:
1515
volumes:
1616
- postgres_data:/var/lib/postgresql/data
1717

18+
bigquery:
19+
image: ghcr.io/goccy/bigquery-emulator:latest
20+
platform: linux/amd64
21+
ports:
22+
- "9050:9050"
23+
command: ["--project=test-project", "--dataset=test_dataset"]
24+
healthcheck:
25+
test: ["CMD", "grpc_health_probe", "-addr=:9050"]
26+
interval: 5s
27+
timeout: 5s
28+
retries: 5
29+
1830
test:
1931
build:
2032
context: .
2133
dockerfile: Dockerfile.test
2234
depends_on:
2335
postgres:
2436
condition: service_healthy
37+
bigquery:
38+
condition: service_healthy
2539
environment:
2640
POSTGRES_TEST: "1"
2741
POSTGRES_URL: "postgres://test:test@postgres:5432/sidemantic_test"
42+
BIGQUERY_TEST: "1"
43+
BIGQUERY_EMULATOR_HOST: "bigquery:9050"
44+
BIGQUERY_PROJECT: "test-project"
45+
BIGQUERY_DATASET: "test_dataset"
2846
command: pytest -m integration -v
2947

3048
volumes:

pyproject.toml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,10 @@ postgres = [
3939
"psycopg[binary]>=3.0.0",
4040
"pyarrow>=14.0.0", # For Arrow support
4141
]
42+
bigquery = [
43+
"google-cloud-bigquery>=3.0.0",
44+
"pyarrow>=14.0.0", # For Arrow support
45+
]
4246

4347
[build-system]
4448
requires = ["hatchling"]

sidemantic/config.py

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,15 @@ class PostgreSQLConnection(BaseModel):
2424
password: str = Field(..., description="Password")
2525

2626

27+
class BigQueryConnection(BaseModel):
28+
"""BigQuery connection configuration."""
29+
30+
type: Literal["bigquery"] = "bigquery"
31+
project_id: str = Field(..., description="GCP project ID")
32+
dataset_id: str | None = Field(default=None, description="Default dataset ID (optional)")
33+
location: str = Field(default="US", description="BigQuery location")
34+
35+
2736
class PostgresServerConfig(BaseModel):
2837
"""PostgreSQL wire protocol server configuration (ALPHA).
2938
@@ -35,7 +44,7 @@ class PostgresServerConfig(BaseModel):
3544
password: str | None = Field(default=None, description="Password for authentication (optional)")
3645

3746

38-
Connection = DuckDBConnection | PostgreSQLConnection
47+
Connection = DuckDBConnection | PostgreSQLConnection | BigQueryConnection
3948

4049

4150
class SidemanticConfig(BaseModel):
@@ -192,5 +201,8 @@ def build_connection_string(config: SidemanticConfig) -> str:
192201
f"postgres://{config.connection.username}{password_part}@"
193202
f"{config.connection.host}:{config.connection.port}/{config.connection.database}"
194203
)
204+
elif isinstance(config.connection, BigQueryConnection):
205+
dataset_part = f"/{config.connection.dataset_id}" if config.connection.dataset_id else ""
206+
return f"bigquery://{config.connection.project_id}{dataset_part}"
195207
else:
196208
raise ValueError(f"Unknown connection type: {type(config.connection)}")

sidemantic/core/semantic_layer.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ def __init__(
3131
- duckdb:///:memory: (default)
3232
- duckdb:///path/to/db.duckdb
3333
- postgres://user:pass@host:port/dbname
34+
- bigquery://project_id/dataset_id
3435
dialect: SQL dialect for query generation (optional, inferred from adapter)
3536
auto_register: Set as current layer for auto-registration (default: True)
3637
use_preaggregations: Enable automatic pre-aggregation routing (default: False)
@@ -58,10 +59,15 @@ def __init__(
5859

5960
self.adapter = PostgreSQLAdapter.from_url(connection)
6061
self.dialect = dialect or "postgres"
62+
elif connection.startswith("bigquery://"):
63+
from sidemantic.db.bigquery import BigQueryAdapter
64+
65+
self.adapter = BigQueryAdapter.from_url(connection)
66+
self.dialect = dialect or "bigquery"
6167
else:
6268
raise ValueError(
6369
f"Unsupported connection URL: {connection}. "
64-
"Supported: duckdb:///, postgres://, or BaseDatabaseAdapter instance"
70+
"Supported: duckdb:///, postgres://, bigquery://, or BaseDatabaseAdapter instance"
6571
)
6672
else:
6773
raise TypeError(f"connection must be a string URL or BaseDatabaseAdapter instance, got {type(connection)}")

sidemantic/db/bigquery.py

Lines changed: 186 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,186 @@
1+
"""BigQuery database adapter."""
2+
3+
from typing import Any
4+
5+
from sidemantic.db.base import BaseDatabaseAdapter
6+
7+
8+
class BigQueryResult:
9+
"""Wrapper for BigQuery query result to match DuckDB result API."""
10+
11+
def __init__(self, query_job):
12+
"""Initialize BigQuery result wrapper.
13+
14+
Args:
15+
query_job: BigQuery query job result
16+
"""
17+
self.query_job = query_job
18+
self._result = query_job.result()
19+
self._rows_iter = iter(self._result)
20+
21+
def fetchone(self) -> tuple | None:
22+
"""Fetch one row from the result."""
23+
try:
24+
row = next(self._rows_iter)
25+
return tuple(row.values())
26+
except StopIteration:
27+
return None
28+
29+
def fetchall(self) -> list[tuple]:
30+
"""Fetch all remaining rows."""
31+
return [tuple(row.values()) for row in self._rows_iter]
32+
33+
def fetch_record_batch(self) -> Any:
34+
"""Convert result to PyArrow RecordBatchReader."""
35+
import pyarrow as pa
36+
37+
# BigQuery can return Arrow tables directly
38+
arrow_table = self._result.to_arrow()
39+
return pa.RecordBatchReader.from_batches(arrow_table.schema, arrow_table.to_batches())
40+
41+
@property
42+
def description(self):
43+
"""Get column descriptions."""
44+
return [(field.name, field.field_type) for field in self._result.schema]
45+
46+
47+
class BigQueryAdapter(BaseDatabaseAdapter):
48+
"""BigQuery database adapter.
49+
50+
Example:
51+
>>> adapter = BigQueryAdapter(project_id="my-project", dataset_id="my_dataset")
52+
>>> result = adapter.execute("SELECT * FROM table")
53+
"""
54+
55+
def __init__(
56+
self,
57+
project_id: str | None = None,
58+
dataset_id: str | None = None,
59+
credentials: Any | None = None,
60+
location: str = "US",
61+
**kwargs,
62+
):
63+
"""Initialize BigQuery adapter.
64+
65+
Args:
66+
project_id: GCP project ID (if None, uses default credentials project)
67+
dataset_id: Default dataset ID (optional)
68+
credentials: Google Cloud credentials (if None, uses default credentials)
69+
location: BigQuery location (default: US)
70+
**kwargs: Additional arguments passed to bigquery.Client
71+
"""
72+
try:
73+
from google.cloud import bigquery
74+
except ImportError as e:
75+
raise ImportError(
76+
"BigQuery support requires google-cloud-bigquery. "
77+
"Install with: pip install sidemantic[bigquery] or pip install google-cloud-bigquery"
78+
) from e
79+
80+
self.client = bigquery.Client(project=project_id, credentials=credentials, location=location, **kwargs)
81+
self.project_id = project_id or self.client.project
82+
self.dataset_id = dataset_id
83+
84+
def execute(self, sql: str) -> BigQueryResult:
85+
"""Execute SQL query."""
86+
query_job = self.client.query(sql)
87+
return BigQueryResult(query_job)
88+
89+
def executemany(self, sql: str, params: list) -> Any:
90+
"""Execute SQL with multiple parameter sets.
91+
92+
Note: BigQuery doesn't have native executemany, so we run queries sequentially.
93+
"""
94+
results = []
95+
for param_set in params:
96+
# BigQuery uses @param syntax for parameters
97+
query_job = self.client.query(sql, job_config={"query_parameters": param_set})
98+
results.append(BigQueryResult(query_job))
99+
return results
100+
101+
def fetchone(self, result: BigQueryResult) -> tuple | None:
102+
"""Fetch one row from result."""
103+
return result.fetchone()
104+
105+
def fetch_record_batch(self, result: BigQueryResult) -> Any:
106+
"""Fetch result as PyArrow RecordBatchReader."""
107+
return result.fetch_record_batch()
108+
109+
def get_tables(self) -> list[dict]:
110+
"""List all tables in the dataset."""
111+
if not self.dataset_id:
112+
# If no dataset specified, list tables from all datasets
113+
tables = []
114+
for dataset in self.client.list_datasets():
115+
dataset_ref = self.client.dataset(dataset.dataset_id)
116+
for table in self.client.list_tables(dataset_ref):
117+
tables.append({"table_name": table.table_id, "schema": dataset.dataset_id})
118+
return tables
119+
120+
# List tables in specific dataset
121+
dataset_ref = self.client.dataset(self.dataset_id)
122+
tables = []
123+
for table in self.client.list_tables(dataset_ref):
124+
tables.append({"table_name": table.table_id, "schema": self.dataset_id})
125+
return tables
126+
127+
def get_columns(self, table_name: str, schema: str | None = None) -> list[dict]:
128+
"""Get column information for a table."""
129+
schema = schema or self.dataset_id
130+
if not schema:
131+
raise ValueError("schema (dataset_id) required for get_columns")
132+
133+
table_ref = self.client.dataset(schema).table(table_name)
134+
table = self.client.get_table(table_ref)
135+
136+
columns = []
137+
for field in table.schema:
138+
columns.append(
139+
{
140+
"column_name": field.name,
141+
"data_type": field.field_type,
142+
"is_nullable": field.mode != "REQUIRED",
143+
}
144+
)
145+
return columns
146+
147+
def close(self) -> None:
148+
"""Close the BigQuery client."""
149+
self.client.close()
150+
151+
@property
152+
def dialect(self) -> str:
153+
"""Return SQL dialect."""
154+
return "bigquery"
155+
156+
@property
157+
def raw_connection(self) -> Any:
158+
"""Return raw BigQuery client."""
159+
return self.client
160+
161+
@classmethod
162+
def from_url(cls, url: str) -> "BigQueryAdapter":
163+
"""Create adapter from connection URL.
164+
165+
URL format: bigquery://project_id/dataset_id
166+
or: bigquery://project_id (no default dataset)
167+
168+
Args:
169+
url: Connection URL
170+
171+
Returns:
172+
BigQueryAdapter instance
173+
"""
174+
if not url.startswith("bigquery://"):
175+
raise ValueError(f"Invalid BigQuery URL: {url}")
176+
177+
# Parse URL: bigquery://project_id/dataset_id
178+
path = url[len("bigquery://") :]
179+
if not path:
180+
raise ValueError("BigQuery URL must include project_id: bigquery://project_id/dataset_id")
181+
182+
parts = path.split("/")
183+
project_id = parts[0]
184+
dataset_id = parts[1] if len(parts) > 1 else None
185+
186+
return cls(project_id=project_id, dataset_id=dataset_id)

tests/db/README.md

Lines changed: 35 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ docker compose up test --build --abort-on-container-exit
1919

2020
# Or run tests locally against dockerized Postgres
2121
docker compose up -d postgres
22-
POSTGRES_TEST=1 uv run --extra postgres pytest -m integration -v
22+
POSTGRES_TEST=1 uv run --extra postgres pytest -m integration tests/db/test_postgres_integration.py -v
2323
```
2424

2525
**Manual setup:**
@@ -32,14 +32,45 @@ export POSTGRES_TEST=1
3232
export POSTGRES_URL="postgres://test:test@localhost:5432/sidemantic_test"
3333

3434
# Run integration tests only
35-
uv run pytest -m integration -v
35+
uv run pytest -m integration tests/db/test_postgres_integration.py -v
36+
```
37+
38+
### BigQuery Integration Tests
39+
40+
BigQuery tests use the BigQuery emulator and are marked with `@pytest.mark.integration`. They require the `bigquery` extra dependencies.
41+
42+
**Using Docker Compose (recommended):**
43+
```bash
44+
# Start BigQuery emulator and run integration tests
45+
docker compose up test --build --abort-on-container-exit
46+
47+
# Or run tests locally against dockerized emulator
48+
docker compose up -d bigquery
49+
BIGQUERY_TEST=1 BIGQUERY_EMULATOR_HOST=localhost:9050 uv run --extra bigquery pytest -m integration tests/db/test_bigquery_integration.py -v
50+
```
51+
52+
**Manual setup:**
53+
```bash
54+
# Install bigquery dependencies
55+
uv sync --extra bigquery
56+
57+
# Set up BigQuery emulator (adjust as needed)
58+
export BIGQUERY_TEST=1
59+
export BIGQUERY_EMULATOR_HOST=localhost:9050
60+
export BIGQUERY_PROJECT=test-project
61+
export BIGQUERY_DATASET=test_dataset
62+
63+
# Run integration tests only
64+
uv run pytest -m integration tests/db/test_bigquery_integration.py -v
3665
```
3766

3867
**Note:** Normal `pytest` runs will skip integration tests automatically. Use `-m integration` to run them explicitly.
3968

4069
## Test Coverage
4170

4271
- **test_duckdb_adapter.py**: Tests for DuckDB adapter implementation
43-
- **test_postgres_adapter.py**: Basic Postgres adapter tests (mostly ImportError checks)
44-
- **test_postgres_integration.py**: Full integration tests against real Postgres database
72+
- **test_postgres_adapter.py**: Basic Postgres adapter tests (import checks, no connection required)
73+
- **test_postgres_integration.py**: Full integration tests against real Postgres database (10 tests)
74+
- **test_bigquery_adapter.py**: Basic BigQuery adapter tests (import checks, URL parsing)
75+
- **test_bigquery_integration.py**: Full integration tests against BigQuery emulator (10 tests)
4576
- **test_semantic_layer_adapters.py**: Tests for SemanticLayer integration with different adapters

0 commit comments

Comments
 (0)