Strip INPUT_ONLY fields from CLI response output#5575
Open
Divyansh-db wants to merge 5 commits into
Open
Conversation
New package with Strip(v any, paths []string) (any, error) that round-trips v through encoding/json into a generic representation, removes the dotted paths from the tree, and returns the masked value for the caller to marshal in its preferred format. Used by generated CLI commands (cmd/account/**, cmd/workspace/**) to drop fields the OpenAPI spec marks INPUT_ONLY before the response flows into cmdio.Render. The SDK transport struct serializes these fields unconditionally (REQUIRED on the request side, no omitempty), so the CLI sees them as empty strings even though the server never populates them on responses; the stripped any rendering matches what the spec promises. Path semantics: arrays are traversed transparently (each element visited), and dynamically-keyed maps (proto map<string, V>) are too — when no literal key matches the next path component, every value is visited with the same path. This handles list responses and map-valued fields with the same path expression as singletons. Co-authored-by: Isaac
cliv1's .codegen/cli.json already carries x-databricks-field-behaviors in its schemas block, so the CLI generator can pull the INPUT_ONLY paths per method without any genkit-side change. (Cf. discussion at go/slack DECO-27296.) What's added: - internal/cligen/input_only.go: walk the schema graph rooted at the method's response type, return sorted dotted paths of every field with the INPUT_ONLY behavior. Singleton message refs are followed; array/map element types aren't (cli.json's SchemaFieldJSON carries only one ref slot, populated for singleton fields). A field that is itself INPUT_ONLY emits a single path and stops — the whole subtree is stripped at runtime by libs/inputonly.Strip. - internal/cligen/cligen.go: populateInputOnlyPaths runs after Resolve(); it skips paginated, LRO, wait, byte-stream, and empty responses because each renders a different shape and needs its own path source (deferred follow-up). - internal/cligen/model.go: MethodJSON gains InputOnlyPaths []string (transient, not in JSON); ServiceJSON gains HasInputOnlyPaths() so the template can gate the libs/inputonly import. - templates/service.go.tmpl: at the standard render emit site, emit inputonly.Strip(response, paths) into a `masked` local and pass that to the existing cmdio.Render. cmdio stays untouched. If paths is empty the existing call is byte-identical. Tests cover flat, nested-via-ref, INPUT_ONLY-message (no recursion), cycle, and unknown-root cases. Co-authored-by: Isaac
Run of ./task generate-cligen after the previous commit. Six service files change; everything else is byte-identical (most response types have no INPUT_ONLY fields). Each modified service has at least one method whose response type declares an INPUT_ONLY field in cli.json's schemas block. Examples: - disaster-recovery: `get-stable-url` / `create-stable-url` strip `initial_workspace_id`; failover-group methods strip `initial_primary_region`. - clean-rooms: nested paths like `remote_detailed_info.creator.invite_recipient_workspace_id`. - database / postgres / secrets-uc / quality-monitor-v2: multi-field paths including container fields whose entire subtree is INPUT_ONLY (e.g. `spec` on secrets-uc). Paginated list methods (e.g. `list-stable-urls`) still go through cmdio.RenderIterator unchanged. cli.json does not currently surface the iterator element type, so per-element path computation is a follow-up — the leak persists on those commands until then. Co-authored-by: Isaac
Contributor
Approval status: pending
|
DatabaseInstance.stopped is INPUT_ONLY in cli.json, so the regenerated update-database-instance command now strips it from the response. The fixture was captured before that change and still expected `stopped` in the rendered JSON. Regenerated via ./task test-update. Co-authored-by: Isaac
Two bugs in the original Strip implementation, both pointed out on the
prior PR's review and inherited here:
1. The json.Marshal -> json.Unmarshal(any) round-trip decoded every
number as float64, silently losing precision above 2^53 — i.e. on
real SDK fields like spark_context_id (int64). Switch to
json.NewDecoder(...).UseNumber() so numbers stay as json.Number
strings and re-marshal verbatim.
2. The map-handling branch in deletePath fell through to iterate every
value when the literal key didn't match. For INPUT_ONLY fields the
server always omits the field at its expected location, so the
literal miss was the common case, and the fallback would silently
recurse into every nested object and strip any same-named leaf —
turning anchored paths into match-anywhere expressions.
Repro: path "name" against {"id":"123","details":{"name":"x"}}
stripped details.name. The same fallback also created a map-key
collision foot-gun (a proto-map entry literally named the leaf
would be removed, leaving the mask unapplied elsewhere).
Fix: strict literal lookup on maps, return on miss. Arrays stay
transparent because []any is type-distinguishable at runtime so
the path semantic there is unambiguous. cligen does not emit paths
that descend through proto maps today; if cli.json grows map value
refs later, the path language should grow an explicit marker
("tags.*.field") rather than reintroducing implicit fallback.
Adds two regression tests:
- TestStripPreservesLargeInt64: value 2^53+1 survives a strip on a
sibling field.
- TestStripDoesNotMatchAnywhere: path "name" leaves
details.name untouched.
Drops two tests from the previous commit that relied on the map
fallback (TestStripMapValues, TestStripNestedInMapValue) and replaces
them with TestStripDoesNotDescendIntoMaps, which documents the strict
behavior.
Also notes in the doc comment that filtering reorders JSON keys
alphabetically (map[string]any vs struct declaration order) so
downstream consumers and acceptance fixtures expect that.
Co-authored-by: Isaac
Collaborator
Integration test reportCommit: 7b43980
22 interesting tests: 15 SKIP, 7 KNOWN
Top 28 slowest tests (at least 2 minutes):
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Replaces #5450 with the cligen-based approach: pull the
INPUT_ONLYfield-behaviors that are already in.codegen/cli.json, run the masking in generated CLI command code to produce anany, and pass that to plaincmdio.Render.libs/cmdiostays untouched.Why
The Databricks Go SDK uses one Go struct per resource for both request and response (transport-layer pattern). Some fields are
REQUIREDon the request side — so the SDK emits them withoutomitempty— butINPUT_ONLYper the OpenAPI spec, meaning the server never returns them. The CLI hands the SDK response straight tojson.MarshalIndent, so those fields leak into JSON output as empty strings. For example,databricks account disaster-recovery get-stable-urlreturns\"initial_workspace_id\": \"\".The SDK behavior is intentional and not changing. The fix has to live in the CLI.
How
Three commits:
libs/inputonly(new package).Strip(v, paths) (any, error)round-trips throughencoding/json, deletes the listed dotted paths from the generic tree, and returns the masked value for the caller to marshal. Arrays are traversed transparently; dynamically-keyed maps are too (when no literal key matches the next path component, every map value is visited with the same path — same semantic as protomap<string, V>fields). Self-contained, no dependency on cmdio.internal/cligenreads INPUT_ONLY paths fromcli.jsonand emitsStripcalls.input_only.gowalks the schema graph rooted at each method's response type and collects the dotted paths of everyINPUT_ONLYfield. Eligible methods are the standard sync render path only: paginated, LRO, wait, byte-stream, and empty responses are skipped. The template at the existing render emit site forks on the precomputedInputOnlyPaths: when non-empty, generate```go
masked, err := inputonly.Strip(response, []string{"initial_workspace_id"})
if err != nil {
return err
}
return cmdio.Render(ctx, masked)
```
otherwise the existing `return cmdio.Render(ctx, response)` is byte-identical. The `libs/inputonly` import is gated on `ServiceJSON.HasInputOnlyPaths()` so services that don't strip anything don't carry an unused import.
Regenerate. `./task generate-cligen` updates six service files (disaster-recovery, clean-rooms, database, postgres, quality-monitor-v2, secrets-uc); everything else is byte-identical.
Scope (what's not in this PR)
Test plan