Skip to content

Add cmdio.RenderFiltered to strip fields from rendered output#5450

Closed
Divyansh-db wants to merge 2 commits into
mainfrom
input-only-field-handling
Closed

Add cmdio.RenderFiltered to strip fields from rendered output#5450
Divyansh-db wants to merge 2 commits into
mainfrom
input-only-field-handling

Conversation

@Divyansh-db

Copy link
Copy Markdown
Contributor

Summary

Adds new cmdio.RenderFiltered and cmdio.RenderIteratorFiltered entry points that mirror Render / RenderIterator but accept a []string of dotted JSON paths to strip from the value before it is marshaled. The path list is consulted only by the JSON render path; text/template rendering is unchanged.

// before
return cmdio.Render(ctx, response)

// after, when the response type has input-only fields
return cmdio.RenderFiltered(ctx, response, []string{"initial_workspace_id"})

Path syntax is dotted (a.b.c); arrays are traversed transparently, so the same path expression works for singleton and list responses without an items[] marker.

Motivation

The Databricks SDK uses a single Go struct per resource for both request and response (transport-layer pattern). Some fields are REQUIRED on the request side, so the SDK marshals them unconditionally, while also being INPUT_ONLY per the OpenAPI spec — they're never populated on responses. Generated CLI commands hand the SDK response struct directly to json.MarshalIndent, so those fields leak into user-visible output as empty strings even when the server omits them.

Existing in-repo precedents for honoring x-databricks-field-behaviors:

  • bundle/direct/tools/generate_resources.py reads INPUT_ONLY / OUTPUT_ONLY from the OpenAPI spec and propagates them into the DABs direct engine's ignore_remote_changes config.
  • bundle/internal/schema/main.go's removeOutputOnlyFields strips OUTPUT_ONLY from the bundle JSON schema.

RenderFiltered is the equivalent seam on the CLI response render path. This PR adds only the entry points; the genkit template change that starts calling them will land separately, followed by the generated-code regeneration in cmd/account/** and cmd/workspace/**.

Design notes

  • Render is unchanged in behavior (now a thin wrapper around RenderFiltered(_, _, nil)). Existing callers don't need to change.
  • Internal helper applyInputOnlyMask in libs/cmdio/filter.go round-trips the value through json.Marshal into a generic map[string]any tree, removes the listed leaf keys, and returns the masked value for the caller to marshal in its preferred format. Operating on the generic tree (rather than on raw bytes) lets defaultRenderer.renderJson and iteratorRenderer.renderJson reuse the same masking logic despite using different MarshalIndent prefixes.
  • Filtering happens before colorizeJSON, so the colorized output stays consistent.

Test plan

  • go test ./libs/cmdio/... — pass (new and existing tests)
  • gofmt -l clean on changed files; go vet ./libs/cmdio/... clean
  • ./task fmt-q / ./task lint-q — 0 issues
  • Unit tests cover: empty paths no-op, flat field strip, nested field strip, slice element strip, multiple paths, missing-path no-op, and an integration test confirming RenderFiltered matches plain Render when no paths are supplied.

Adds new cmdio.RenderFiltered and cmdio.RenderIteratorFiltered entry
points that mirror Render / RenderIterator but accept a list of dotted
JSON paths to strip from the value before it is marshaled. The list is
consulted only on the JSON render path; text/template rendering is
unchanged.

Motivation: the Databricks SDK uses a single transport struct per
resource for both request and response. Some fields are required on the
request side (so the SDK marshals them unconditionally) but
input-only on the response side per the OpenAPI spec. The CLI today
hands the SDK response struct directly to json.MarshalIndent, so those
fields leak into user-visible output even though the server doesn't
populate them. RenderFiltered gives generated CLI commands a way to
strip such fields without modifying the SDK or introducing a separate
view layer.

Nothing in the generated CLI surface calls these new entry points yet;
that switch will land alongside the corresponding codegen change.

Co-authored-by: Isaac
@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Waiting for approval

Based on git history, these people are best suited to review:

  • @pietern -- recent work in libs/cmdio/

Eligible reviewers: @chrisst, @hectorcast-db, @mihaimitrea-db, @parthban-db, @rauchy, @renaudhartert-db, @simonfaltum, @tanmay-db, @tejaskochar-db

Suggestions based on git history. See OWNERS for ownership rules.

@eng-dev-ecosystem-bot

eng-dev-ecosystem-bot commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

Commit: e3d5421

Run: 27016619624

@Divyansh-db Divyansh-db requested a review from hectorcast-db June 5, 2026 12:52
deletePath was traversing arrays transparently but not maps. When the
generator emits a path like "tags.initial_workspace_id" for a
map[string]V field whose value type has an INPUT_ONLY field, the literal
"initial_workspace_id" key doesn't exist directly under "tags" — it
lives inside each map value — so the path was silently a no-op at
render time.

Mirror the array behavior: when a path component doesn't match a
literal key on the current object, descend into every value with the
same key list. The literal-match path is still preferred, so struct
field paths keep working unchanged.

Adds two tests: a top-level map field and a map nested inside another
struct, both checking that the inner field is stripped from every
value.

Co-authored-by: Isaac

@hectorcast-db hectorcast-db left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no concerns with the code quality or test coverage, but I don't have enough context on the implementation of the CLI to be sure this is the best approach.

E.g., on the SDK we may want to generate methods for each struct instead of using a generic one, but here it is likely not possible due to CLI design.

Can you get a second opinion?

@Divyansh-db Divyansh-db requested review from andrewnester, shreyas-goenka and simonfaltum and removed request for andrewnester and shreyas-goenka June 8, 2026 08:30

@simonfaltum simonfaltum left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a deep look as the second opinion @hectorcast-db asked for. I ran this through multiple independent review passes and they converged on the same issues, so I'm fairly confident in the findings.

The approach itself is fine imo: a generic render-side seam fits how the CLI consumes SDK structs, and per-struct generated methods would be a much bigger change for little gain. Two things need fixing before generated code starts calling this though:

  1. The JSON round-trip corrupts large int64 values (inline comment, blocking).
  2. The path semantics are ambiguous for maps vs structs (two inline comments). The genkit follow-up bakes these paths into every generated command, so I think we should make the path language explicit now rather than live with fuzzy matching.

Smaller things, no inline comments:

  • RenderIteratorFiltered with non-empty paths has no test (the masking inside iteratorRenderer.renderJson).
  • RenderFiltered on an io.Reader silently ignores the paths (newRenderer returns readerRenderer first). Worth a doc line.

Everything else looks good: test coverage for the happy paths is solid, and Render with nil paths is provably unchanged.

Comment thread libs/cmdio/filter.go
return nil, fmt.Errorf("input-only mask: marshal: %w", err)
}
var out any
if err := json.Unmarshal(b, &out); err != nil {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: this round-trip corrupts large int64 fields whenever a path is filtered. json.Unmarshal into any decodes every JSON number as float64 (53-bit mantissa), and it hits every numeric field in the response, not just the stripped one. Repro:

input : {"cluster_name":"c","spark_context_id":7189401748684612345}
output: {"cluster_name":"c","spark_context_id":7189401748684613000}

spark_context_id is a real cluster response field that exceeds 2^53. The fix is small:

var out any
dec := json.NewDecoder(bytes.NewReader(b))
dec.UseNumber()
if err := dec.Decode(&out); err != nil {
    return nil, fmt.Errorf("input-only mask: unmarshal: %w", err)
}

json.Number re-marshals verbatim and deletePath never touches it. A regression test with a value above 2^53 (i.e. int64(9007199254740993)) would catch this.

One more side effect worth a deliberate decision: the map[string]any tree marshals keys alphabetically while structs marshal in field order, so turning filtering on for a command reorders its whole JSON output. UseNumber doesn't change that. I think sorted keys are probably acceptable, but it should be a conscious choice, not a surprise.

Comment thread libs/cmdio/filter.go
}
switch t := v.(type) {
case map[string]any:
if child, ok := t[keys[0]]; ok {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The literal-match-wins rule here is worse than the doc comment describes: for tags.initial_workspace_id against a proto map, a user entry literally named initial_workspace_id gets deleted and traversal stops, so the input-only field survives in every other map value. Both failure modes at once (user data dropped, mask not applied). Same root cause as the fallback descent below; fix proposal there. A regression test with a colliding map key would be good to have either way.

Comment thread libs/cmdio/filter.go
}
return
}
for _, child := range t {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern with this fallback: after the round-trip, nested structs and proto maps are both map[string]any, so this branch can't tell them apart. When keys[0] doesn't literally match, the full path is re-applied to every value at any depth, which turns an anchored path into match-anywhere. Repro: path name against {"id":"123","details":{"name":"keep-me","size":"L"}} deletes details.name.

This bites exactly in the INPUT_ONLY case: when the server omits the field at its expected location, the descent falls through and strips same-named legitimate output elsewhere in the response.

I think the fix for this and the map collision above is the same: make map traversal explicit in the path language. genkit knows where the proto maps are, so it can emit tags.*.initial_workspace_id, and deletePath becomes strict: literal segments match literally, * fans out over map values, arrays stay transparent. No ambiguity left, and the paths stay schema-derived.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants