This guide explains how to configure GitHub Actions for running BigQuery integration tests automatically.
Before the CI/CD can run, you need to add secrets and variables to your GitHub repository:
- Go to your repository on GitHub
- Click Settings β Secrets and variables β Actions
Click New repository secret for each of the following:
| Secret Name | Description | Example |
|---|---|---|
GCP_SA_KEY |
Google Cloud Service Account JSON key | {"type": "service_account", "project_id": "your-project"...} |
GCP_PROJECT_ID |
Google Cloud Project ID | my-bigquery-project |
Click on the Variables tab, then New repository variable for each:
| Variable Name | Description | Example |
|---|---|---|
BIGQUERY_DATABASE |
BigQuery dataset name for tests (optional, defaults to 'sqltesting') | sqltesting |
Create a dedicated service account for GitHub Actions with minimal permissions:
# Set your project ID
export PROJECT_ID="your-project-id"
# Create service account
gcloud iam service-accounts create github-actions-bigquery \
--display-name="GitHub Actions BigQuery" \
--description="Service account for BigQuery integration tests" \
--project=$PROJECT_ID# BigQuery Data Editor (to create/manage datasets and tables)
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:github-actions-bigquery@$PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/bigquery.dataEditor"
# BigQuery Job User (to run queries)
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:github-actions-bigquery@$PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/bigquery.jobUser"
# BigQuery User (to access BigQuery resources)
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:github-actions-bigquery@$PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/bigquery.user"# Create service account key
gcloud iam service-accounts keys create github-actions-key.json \
--iam-account=github-actions-bigquery@$PROJECT_ID.iam.gserviceaccount.com \
--project=$PROJECT_ID
# Display the key content for copying to GitHub secrets
cat github-actions-key.jsonFor more granular control, create a custom policy:
{
"title": "GitHub Actions BigQuery Policy",
"description": "Minimal permissions for BigQuery integration tests",
"stage": "GA",
"includedPermissions": [
"bigquery.datasets.create",
"bigquery.datasets.get",
"bigquery.datasets.update",
"bigquery.datasets.delete",
"bigquery.tables.create",
"bigquery.tables.get",
"bigquery.tables.list",
"bigquery.tables.update",
"bigquery.tables.delete",
"bigquery.tables.getData",
"bigquery.jobs.create",
"bigquery.jobs.get",
"bigquery.jobs.list"
]
}Tests are organized using pytest markers for precise execution control:
- Unit tests: No markers (default)
- Integration tests:
@pytest.mark.integration - BigQuery-specific:
@pytest.mark.bigquery - Combined:
@pytest.mark.integration+@pytest.mark.bigquery
tests.yaml: Runspytest -m "not integration"(unit tests only)bigquery-integration.yml: Runspytest -m "integration and bigquery"(BigQuery integration only)
This ensures: β Unit tests run on every commit (free) β Integration tests run only for specific adapters β No cross-contamination between test types β Clear cost attribution per adapter
The CI/CD workflow runs automatically on:
- Triggers: Any PR to
masterormainbranch - File changes: Only when BigQuery-related files are modified:
src/sql_testing_library/_adapters/bigquery.pysrc/sql_testing_library/_core.pysrc/sql_testing_library/_mock_table.pytests/test_bigquery*.pytests/integration/test_bigquery_integration.py- Workflow file itself
- Triggers: Direct pushes to
masterormainbranch - Runs: Full test suite including real BigQuery tests
- Access: Repository β Actions β "BigQuery Integration Tests" β "Run workflow"
- Use case: Testing without making commits
- Runs mocked BigQuery adapter tests
- No GCP resources used
- Fast execution (~2-3 minutes)
- Tests adapter logic with mocked BigQuery responses
- Validates SQL generation and parsing
- No GCP costs
- Connects to real Google BigQuery
- Executes actual queries
- Cost: ~$0.01-0.10 per test run (depending on data processed)
- Includes automatic cleanup
- Every PR: Real BigQuery tests run (~$0.05-0.10 per PR)
- Every merge: Real BigQuery tests run (~$0.05-0.10 per merge)
- Estimated monthly cost: $5-30 (depending on PR frequency)
- Query processing: $5 per TB processed
- Storage: $0.02 per GB per month
- Streaming inserts: $0.01 per 200MB
- Test dataset: Usually processes <1MB per test run
When ready to optimize costs, update the workflow condition:
# Change this condition in .github/workflows/bigquery-integration.yml
if: |
(github.event_name == 'push' && (github.ref == 'refs/heads/master' || github.ref == 'refs/heads/main')) ||
(github.event_name == 'workflow_dispatch')
# Remove: (github.event_name == 'pull_request')This will:
- β Run real tests on main branch pushes
- β Allow manual triggers
- β Skip real tests on PRs (only unit/mock tests)
- π° Reduce costs by ~80%
- View detailed logs in: Repository β Actions β Workflow run
- Each step shows detailed output
- Failed tests include full error messages
- Automatic summary appears in PR comments
- Shows which test stages passed/failed
- Includes cost estimation
β GCP credentials not configured in GitHub secrets
Solution: Add GCP_SA_KEY and GCP_PROJECT_ID to repository secrets
β GCP_SA_KEY: Set but invalid JSON format
Solution: Ensure the service account key is valid JSON and properly formatted
β Access denied. Check service account permissions
Solution: Review IAM roles and ensure all required BigQuery permissions
β Dataset 'project.dataset' not found
Solution: The workflow will automatically create the dataset, or check permissions
β Query failed: Quota exceeded
Solution: Check BigQuery quotas and limits in GCP Console
- Service account keys stored as GitHub secrets (encrypted)
- Minimal IAM permissions (BigQuery only)
- Automatic cleanup of test resources
- No credentials in logs or code
- Temporary credentials file cleanup
- Use Workload Identity Federation instead of service account keys
- Restrict service account access by IP/time
- Enable Google Cloud Audit Logs
- Use Google Cloud Security Command Center
- Auto-creation: Workflow creates test dataset if it doesn't exist
- Location: US region (configurable)
- Cleanup: Tables with 'test' or 'mock' in name are automatically removed
- Dry runs: Validate queries without processing data
- Slot usage: Monitor and limit concurrent query slots
- Partitioning: Use partitioned tables for large test datasets
- Query caching: BigQuery automatically caches results
- Approximate aggregation: Use APPROX functions for large datasets
- Sampling: Use TABLESAMPLE for testing with large tables
Use the validation script to test your setup locally:
# Set up environment variables
export GCP_SA_KEY="$(cat path/to/service-account-key.json)"
export GCP_PROJECT_ID="your-project-id"
export BIGQUERY_DATABASE="sqltesting" # optional
# Run validation
python scripts/validate-bigquery-setup.pyThe script checks:
- β Environment variables
- β GCP connectivity
- β BigQuery permissions
- β Dataset access (creates if needed)
- β Query execution
- β Service account permissions
When running multiple cloud adapters, consider:
strategy:
matrix:
adapter: [athena, bigquery, redshift, snowflake]
include:
- adapter: bigquery
cloud: gcp
secrets_prefix: GCP- BigQuery: Use GCP Billing alerts
- Cross-cloud: Set up cost monitoring per adapter
- Usage tracking: Monitor test frequency and data processed
- Separate service accounts per adapter
- Different projects/datasets for different test types
- Environment-specific configurations
# In test configuration
client = bigquery.Client(
project=project_id,
default_query_job_config=bigquery.QueryJobConfig(
dry_run=False,
use_query_cache=True,
maximum_bytes_billed=1000000, # 1MB limit
job_timeout=30, # 30 second timeout
)
)# Use table sampling for performance tests
query = """
SELECT * FROM `project.dataset.large_table`
TABLESAMPLE SYSTEM (0.1 PERCENT)
LIMIT 1000
"""# Monitor query performance
job = client.query(query)
print(f"Bytes processed: {job.total_bytes_processed}")
print(f"Slot time: {job.slot_millis}")
print(f"Creation time: {job.created}")