Skip to content

bundle: Add genie_space resource (direct engine only)#5282

Merged
janniklasrose merged 55 commits into
mainfrom
janniklasrose/genie
Jun 10, 2026
Merged

bundle: Add genie_space resource (direct engine only)#5282
janniklasrose merged 55 commits into
mainfrom
janniklasrose/genie

Conversation

@janniklasrose

@janniklasrose janniklasrose commented May 20, 2026

Copy link
Copy Markdown
Contributor

Summary

Introduces a new genie_space bundle resource (direct-engine only) that mirrors the existing dashboard pattern.

Resolves #3008
Closes #4191

Test plan

  • CI
  • Cloud tests pass
  • Manual live workspace verification

This pull request and its description were written by Isaac.


Example

A bundle that deploys a Genie space for the samples.nyctaxi.trips table:

# databricks.yml
bundle:
  name: nyc-taxi-genie
  engine: direct # genie_spaces require the direct deployment engine

resources:
  genie_spaces:
    nyc_taxi_genie:
      title: "NYC Taxi Trip Analysis"
      description: "Ask questions about NYC taxi trip data in natural language"
      warehouse_id: <warehouse-id>
      file_path: ./nyc_taxi_genie.geniespace.json
      permissions:
        - level: CAN_RUN
          group_name: users

The .geniespace.json file holds the serialized definition of the space — its data sources, instructions, and sample questions. (It can also be inlined in YAML under serialized_space instead; the two fields are mutually exclusive.)

{
  "version": 2,
  "config": {
    "sample_questions": [
      {
        "id": "11111111111111111111111111111111",
        "question": ["What is the average fare per trip?"]
      }
    ]
  },
  "data_sources": {
    "tables": [
      {
        "identifier": "samples.nyctaxi.trips",
        "column_configs": [
          { "column_name": "fare_amount" },
          { "column_name": "pickup_zip" },
          { "column_name": "tpep_pickup_datetime" },
          { "column_name": "trip_distance" }
        ]
      }
    ]
  },
  "instructions": {
    "text_instructions": [
      {
        "id": "22222222222222222222222222222222",
        "content": ["Fare amounts are in USD. When asked about revenue, use SUM(fare_amount)."]                                                                              }
    ]
  }
}

Run databricks bundle deploy to create the space and databricks bundle open to view it.

To import a space that was authored in the Databricks UI into a bundle:

databricks bundle generate genie-space --existing-id <space-id>

To pull remote edits made in the UI back into the local .geniespace.json file, use --resource <key> instead, optionally with --watch to poll continuously.

Complete examples will be made available in databricks/bundle-examples.

@longchass

Copy link
Copy Markdown

Hi just proposing a fix on the permission URL, thank you.

break
}

time.Sleep(1 * time.Second)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: aligning dashboard with other resources. including here since genie spaces follow dashboards closely

Comment thread bundle/direct/dresources/permissions.go Outdated
Comment thread bundle/direct/dresources/resources.yml Outdated
Comment thread bundle/direct/dresources/resources.yml Outdated
Comment thread bundle/direct/dresources/resources.yml Outdated
Adds first-class support for Genie spaces as a bundle resource,
complete with CRUD via direct-mode deploy, `bundle generate genie-space`
to import an existing space, permissions handling, and acceptance tests.

The resource configuration follows the dashboards pattern: a `file_path`
field points to a local `.genie.json` file whose contents are inlined
into `serialized_space` during deployment. The parent_path defaults to
`${workspace.resource_path}` and is normalized to the `/Workspace`
prefix, matching the API's expected form.

Co-authored-by: Isaac
…ng-parent-path errors

Genie surfaces a missing parent folder inconsistently across
environments: some workspaces return a standard 404 missing-resource
error, while others return 400 INVALID_PARAMETER_VALUE with a NOT_FOUND
"Tree node ... does not exist" message embedded in the text. Treat both
forms as "create the parent directory and retry once".

Co-authored-by: Isaac
…sitor

VisitGenieSpacePaths existed but was never called by VisitPaths, so
NormalizePaths did not rewrite genie_space file_path values from
"relative to YAML location" to "relative to bundle root" before
applyGenieSpaceTranslations resolved them. The result was that
generator output like "../src/<name>.genie.json" failed on deploy
with "path ... is not contained in sync root path".

Co-authored-by: Isaac
Inline YAML serialized_space stayed as a structured value (map[string]any
with int leaves) in the config struct, while state held the JSON string
that was sent to the API. structdiff compared an `any` field with
reflect.DeepEqual, which reports map != string, so every plan after
deploy showed a false update for the genie_space.

Marshal inline serialized_space to its JSON string in
ConfigureGenieSpaceSerializedSpace, mirroring the file_path code path,
so config-side and state-side carry the same type. The
genie_space_complex validate test is updated to reflect that
serialized_space is now a string regardless of input form, and a new
acceptance test under resources/genie_spaces/inline asserts that a
deploy + plan cycle is drift-free for inline serialized_space.

Co-authored-by: Isaac
Databricks workspaces do not expose a permissions endpoint for Genie
Spaces (PUT /permissions/genie/spaces/<id> returns 404
ENDPOINT_NOT_FOUND). Without an upfront check the deploy creates the
space first and then errors when applying permissions, leaving partial
state behind.

Add ValidateGenieSpacePermissions to the PreDeployChecks pipeline so
both per-resource permissions and bundle-level permissions propagated
by ApplyBundlePermissions surface a clear validation error before any
API call is made.

Co-authored-by: Isaac
Two minor follow-ups to the genie_spaces work:

- ConfigureGenieSpaceSerializedSpace silently let file_path win when a
  user also set serialized_space inline. Emit a warning that points at
  the inline block so the user knows their YAML is being dropped on
  the floor.

- ValidateDirectOnlyResources only mentioned the
  DATABRICKS_BUNDLE_ENGINE env var as a way to opt into direct mode,
  even though 'bundle.engine: direct' in databricks.yml is the more
  common entry point. Mention both.

Co-authored-by: Isaac
When the Genie API returns parent_path on GetSpace, propagate it
through bundle generate genie-space so the produced YAML deploys back
to the same workspace folder. The testserver is updated to mirror
that response shape so the acceptance fixture exercises the new path.

Filter ParentPath out of ForceSendFields in DoRead and
responseToGenieSpaceConfig: we deliberately clear ParentPath in the
returned GenieSpaceConfig because the GET API does not reliably
include it, but the SDK still surfaces it in ForceSendFields when the
field appeared on the wire. Without this filter, deploy state
serialization force-emits parent_path: "" even though the field is
logically unset, producing spurious output diffs.

Co-authored-by: Isaac
- Replace switch-with-fallthrough on dyn.Kind with a guard clause to
  satisfy the exhaustive linter without listing every Kind variant.
- Use http.StatusBadRequest in isMissingGenieParentPathError instead
  of a magic 400 (auto-fix from golangci-lint).

Co-authored-by: Isaac
…cally

serialized_space is in ignore_remote_changes because we cannot diff a
structured local YAML body against a remote JSON string. That makes UI
edits invisible at plan time, but the unconditional UpdateSpace request
was still sending the local body, so any later update to title or
description would silently overwrite UI changes.

Use the plan entry to detect whether the user actually changed
serialized_space locally; only include it in the update request when the
change is an Update action (not a Skip from ignore_remote_changes).

Co-authored-by: Isaac
The previous implementation polled w.Workspace.GetStatusByPath using
resource.FilePath, which is the local relative path (e.g.
"src/foo.genie.json"). Both lookups (with and without the "/Workspace"
prefix) were invalid for the workspace API, so currentModified stayed at
0 and the file never updated past the first iteration.

Genie has no remote modification timestamp on the response, so use
content comparison instead: canonicalize the just-fetched
serialized_space and compare against the on-disk body, re-saving only
when they differ. The first iteration still always saves, preserving
the prior unconditional initial sync.

Co-authored-by: Isaac
The parent generate command exposes --key as a persistent flag, but the
genie-space subcommand was always deriving the key from the remote
title. Read the flag value and fall back to the title-derived key only
when not provided.

Co-authored-by: Isaac
Calling json.Unmarshal on an empty serialized_space surfaces a confusing
"unexpected end of JSON input" error and writes nothing useful. Bail
out early with a clear message that names the target file.

Co-authored-by: Isaac
DoRead duplicated the field copy and the ParentPath-drop comment that
already lives in responseToGenieSpaceConfig. Reuse the helper directly
so the two stay in sync.

Co-authored-by: Isaac
The user-facing fields (title, description, warehouse_id, parent_path,
file_path, serialized_space) had PLACEHOLDER descriptions, leaving the
generated reference and resources docs blank. Fill them in with short
descriptions and regenerate the schema and docs output.

Co-authored-by: Isaac
Lowercase the genie_space error message to satisfy ST1005 and let the
linter convert an empty []string{} to a nil slice.

Co-authored-by: Isaac
The simple acceptance test fixture was a v1 serialized_space sample that
the Genie backend now rejects with 409 ABORTED ("The export format has
changed since this export was taken"). Bumps version to 2 and replaces
get_example_values / build_value_dictionary with the v2-equivalent
enable_format_assistance / enable_entity_matching, matching the format
that bundle generate genie-space now produces.

Co-authored-by: Isaac
The state DB API gained context, withRecovery and withWrite arguments
on origin/main; mirror the dashboard generate command and use the same
arguments. Also regenerates the simple acceptance plan output to pick
up the WAL-implementation serial increment.

Co-authored-by: Isaac
Adapt genie_space to changes that landed on main while this branch
was open:

- ResourceGenieSpace.DoDelete now takes the third state parameter
  (_ *resources.GenieSpaceConfig) to match main's IResource.DoDelete
  signature. Without it the adapter failed to initialize at runtime
  ("param count mismatch: interface 3, concrete 2").
- Bump the structwalk config.Root field-count guard from 5800 to 6000
  to account for the new genie_space fields (count is now 5814).
- Regenerate bundle docs, JSON schema field reference, and the genie
  plan fixture against the rebased tree.

Co-authored-by: Isaac
Apply the unresolved self-review comments on PR #5282:

- Permissions use object type "genie" (endpoint /permissions/genie/{id})
  rather than "genie/spaces" (permissions.go, all_test.go, and the
  testserver object-type map).

- parent_path is a normal, updatable field now that the Genie GET API
  returns it (without the /Workspace prefix) and the update API accepts it:
    * DoRead re-adds the /Workspace prefix via the shared ensureWorkspacePrefix
      (mirrors ResourceDashboard) instead of clearing the field.
    * DoUpdate sends parent_path so the backend can move the space.
    * Removed parent_path from recreate_on_changes and ignore_remote_changes.
    * The testserver strips the /Workspace prefix on write to match the real
      API, and `generate genie-space` re-adds it so the generated config keeps
      the conventional form.
    * Renamed the parent_path_recreate acceptance test to parent_path_update:
      changing parent_path now plans an update, not a recreate.

- serialized_space uses reason: etag_based (matching dashboards) since drift
  is detected via etag.

Also switches isMissingGenieParentPathError to errors.AsType[T] (forbidigo
rule from the SDK bump on main) and lets testifylint rewrite the genie
assertions in state_load_test.go to assert.Empty.

Co-authored-by: Isaac
The genie_spaces permissions acceptance test is direct-only, so
analyze_requests.py lists its request files as DIRECT_ONLY. Regenerate
bundle/resources/permissions/output.txt to include those entries (the
aggregate test wasn't refreshed when the genie_spaces permissions case
was added).

Co-authored-by: Isaac
@janniklasrose janniklasrose force-pushed the janniklasrose/genie branch from d633f4a to a595ca4 Compare June 4, 2026 06:45
space_id is output-only: responseToGenieSpaceConfig wrote it into state and RemapState immediately cleared it, with nothing reading config.SpaceId (BaseResource.ID is the canonical id, and generate reads the SDK response). Remove the field and its read/clear sites. RemapState no longer needs a cleared-output block, so rename its parameter state -> remote to reflect that it receives the remote view. Regenerate schema, docs, and the reference field listing.

Co-authored-by: Isaac
Genie's GetSpace returns 403 (not 404) for a missing space, so Exists previously surfaced a raw error to the bind flow instead of reporting absence. Branch on apierr.IsMissing/ErrPermissionDenied and return (false, nil), mirroring resolveFromID and genieSpaceGoneError.

Co-authored-by: Isaac
ConfigureGenieSpaceSerializedSpace normalizes serialized_space to a JSON string (from file_path or inline YAML) before the deploy engine runs, so by this point the value is always a string or unset. Replace the string/marshal branch with a type switch that returns the string directly and errors loudly on any other type, instead of silently re-marshalling. Drops the now-unused encoding/json import.

Co-authored-by: Isaac
The genieSpace command struct carried out/err io.Writer fields that were initialized but never read (logging goes through cmdio/logdiag). Remove them and the now-unused io import.

Co-authored-by: Isaac
Relocate the genie generate helper into a shared utils.go in the package so it can be reused, rather than living in genie_space.go.

Co-authored-by: Isaac
Wrap the commands in musterr (and drop exec) so the expected non-zero exit is asserted explicitly rather than relying on the trailing Exit code: 1. Regenerated outputs.

Co-authored-by: Isaac
Genie spaces carry conversation history, so deleting or recreating one is destructive. Add genie_spaces to the deploy and destroy approval groups with messages that call out the chat-history loss, mirroring the other data-bearing resources. Adds a delete_warning acceptance test for the deploy path; existing tests' cleanup-destroys cover the destroy message.

Co-authored-by: Isaac
The Genie backend now returns 404 (not 403) when a space does not exist, matching dashboards. Drop the interim 403-as-gone handling: remove genieSpaceGoneError and let DoRead/DoDelete return the raw error (the direct engine's isResourceGone already recognizes 404), and key Exists and generate's resolveFromID on apierr.IsMissing alone. Update the testserver mock to return 404, and remove the now-obsolete 403-translation unit tests. Behavior is unchanged (404 -> gone), so acceptance outputs are unaffected.

Co-authored-by: Isaac
Comment on lines +2 to +4
# Local-only: the inline serialized_space references tables (main.sales.orders)
# that do not exist on a real workspace, so the backend rejects the create (403).
Cloud = false

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general there's no need to have complicated serialized_space if it hinders cloud tests. However, this one specifically testing about inline serialized_space so that's ok

@@ -0,0 +1,3 @@
Local = true
Cloud = false

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be cloud because of reliance on permissions API behaviour

Comment thread bundle/direct/dresources/genie_space.go
Comment thread acceptance/bundle/invariant/configs/genie_space.yml.tmpl Outdated
Comment thread bundle/direct/dresources/genie_space.go Outdated
Comment thread bundle/direct/dresources/genie_space.go Outdated
Comment thread bundle/direct/dresources/genie_space.go Outdated
Comment thread cmd/bundle/generate/genie_space.go Outdated
$CLI api get /api/2.0/permissions/genie/<id> is passed a leading-slash path; on Windows MSYS (Git Bash) rewrites it into a path under the Git install root before the CLI sees it, so the request misses the testserver stub and the assertion fails. Set MSYS_NO_PATHCONV=1 for the test (inert on Linux/macOS), matching the cmd/api and export-dir tests.

Co-authored-by: Isaac
DoUpdate previously sent serialized_space only when the user changed it locally (a hasUpdate check against the plan entry), to avoid clobbering out-of-band UI edits. But that meant a deploy could not restore the bundle's intended body after a UI edit -- it adopted the remote edit instead, contrary to config-is-source-of-truth (pietern, #5282).

Always send serialized_space so deploy converges the space to the bundle config, mirroring dashboards. State records the sent body plus the backend-returned etag; drift is still surfaced on read via OverrideChangeDesc. We still cannot send the etag as an If-Match guard: the backend bumps it on transparent serialized_space schema migration, so a stale etag would 409 a legitimate update. Aligned with the Genie backend team; a migration-stable etag (tracked as a follow-up) would let us restore the If-Match guard.

Removes the hasUpdate helper, pathSerializedSpace, and the respSerialized reconstruction. Unit tests updated; no acceptance output changes (the skip only affected the update request payload, which no test asserts).

Co-authored-by: Isaac
@janniklasrose janniklasrose added this pull request to the merge queue Jun 10, 2026
Merged via the queue into main with commit 2c0a2f0 Jun 10, 2026
27 checks passed
@janniklasrose janniklasrose deleted the janniklasrose/genie branch June 10, 2026 12:49
deco-sdk-tagging Bot added a commit that referenced this pull request Jun 10, 2026
## Release v1.3.0

### Notable Changes
* The `direct` deployment engine is now Generally Available and the default for new deployments. To opt out, set `engine: terraform` under `bundle` in your `databricks.yml` or set `DATABRICKS_BUNDLE_ENGINE=terraform`. Existing deployments keep their current engine; see https://docs.databricks.com/aws/en/dev-tools/bundles/direct to migrate.

### CLI
* Added the `databricks quickstart` command, a short introduction to the CLI that prints a human-friendly guide interactively and an agent-oriented version when run non-interactively ([#5464](#5464)).
* Add `databricks version --check` to report whether a newer CLI version is available and print the upgrade command for the detected install method ([#5469](#5469)).
* `databricks auth describe` now verifies credentials against both the workspace and account endpoints before reporting a failure, fixing false "Unable to authenticate" errors for account console profiles ([#5479](#5479)).
* `databricks auth login` no longer prompts for workspace selection when logging in to an account console host (`https://accounts.*`). Pass `--workspace-id` explicitly to store a workspace ID on such a profile ([#5504](#5504)).
* `databricks auth profiles --skip-validate` no longer makes any network calls; the host metadata fetch is skipped along with validation ([#5530](#5530)).

### Bundles
* Set the default `data_security_mode` to `DATA_SECURITY_MODE_AUTO` in bundle templates ([#5452](#5452)).
* Mark vector search index index_subtype as backend_default to prevent drift after deployment ([#5454](#5454)).
* `bundle deployment migrate`: handle resources added to or removed from `databricks.yml` since the last Terraform deploy ([#5463](#5463)).
* Add the `genie_spaces` bundle resource for managing Databricks Genie spaces as code, plus `bundle generate genie-space` to import an existing space. Direct deployment engine only ([#5282](#5282)).
* Fix spurious recreate of schemas and volumes whose names use mixed case ([#5531](#5531)).
janniklasrose added a commit to databricks/bundle-examples that referenced this pull request Jun 11, 2026
## Summary

The Databricks CLI supports the `genie_space` bundle resource since
[databricks/cli#5282](databricks/cli#5282)
(upcoming v1.3.0 release, direct deployment engine only). This PR adds
examples for it:

- **`knowledge_base/genie_space_nyc_taxi`** — a minimal bundle that
deploys a Genie space for the `samples.nyctaxi.trips` table. The README
also covers importing an existing space with `databricks bundle generate
genie-space` and keeping the local `.geniespace.json` in sync with UI
edits.
- **`knowledge_base/app_with_genie_space`** — a Databricks app that
answers questions through the Genie Conversation API. The bundle
declares the Genie space as an app resource (granting the app's service
principal `CAN_RUN`) and injects the space ID into the app via
`valueFrom`.

## Test plan

- [x] `databricks bundle validate` passes for all bundles with a CLI
build that includes databricks/cli#5282 (verified `file_path` is read
and inlined into `serialized_space`, dev-mode prefixes, default
`parent_path`, and permissions)
- [x] The resource/JSON layout matches what `databricks bundle generate
genie-space` produces (`resources/<key>.genie_space.yml` +
`src/<key>.geniespace.json`)
- [x] `ruff format --check` passes on the new app code
- [x] Genie space deployment and the app resource wiring were verified
against a live workspace during development of databricks/cli#5282

Deployed app:

<img width="964" height="379" alt="Screenshot 2026-06-10 at 16 37 20"
src="https://github.com/user-attachments/assets/5104ced3-de79-4baa-869f-68ffb4272e3e"
/>


This pull request and its description were written by Isaac.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Genie Space resource support to Databricks asset bundles

6 participants