Skip to content

docs(rfc): add driver config passthrough proposal#1589

Open
elezar wants to merge 2 commits into
mainfrom
1492-driver-config-rfc/elezar
Open

docs(rfc): add driver config passthrough proposal#1589
elezar wants to merge 2 commits into
mainfrom
1492-driver-config-rfc/elezar

Conversation

@elezar
Copy link
Copy Markdown
Member

@elezar elezar commented May 27, 2026

Summary

Add RFC 0005 proposing a generic driver_config passthrough for driver-owned sandbox creation settings.

Related Issue

Addresses #1492

Changes

  • Defines the public SandboxTemplate.driver_config envelope and driver-side DriverSandboxTemplate.driver_config forwarding model.
  • Documents gateway forwarding, exact driver-name matching, portable multi-driver configs, and driver-owned validation.
  • Captures security guardrails, schema evolution expectations, schema discovery follow-ups, and Kubernetes use cases that should inform the first nested config shape.

Testing

  • mise run pre-commit passes
  • Unit tests added/updated
  • E2E tests added/updated (if applicable)

mise run pre-commit passed locally. In the Codex shell, the default Nix-provided clang/SDK failed while compiling aws-lc-sys; the successful run used the Apple Command Line Tools compiler/linker environment explicitly.

This is a docs-only RFC PR, so no unit or E2E tests were added.

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

elezar added 2 commits May 27, 2026 10:56
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
@elezar elezar force-pushed the 1492-driver-config-rfc/elezar branch from 24ceb12 to 9ec6a42 Compare May 27, 2026 09:09
Comment on lines +305 to +306
shape, but the nested Kubernetes schema should not be finalized from a single
GPU resource example.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the nested Kubernetes schema should not be finalized from a single
GPU resource example.
What do you mean by that ? What does GPU have to do with this case ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is poorly worded and should be updated. The point is that:

  1. This is not intended as a mechanism to bypass resource requests that are exposed as first-class in the API. (GPUs, CPU, Memory).
  2. We should consider more use cases than just a non-standard resource request to drive the desing of the API. We need to answer the questions: What k8s-specific properties could a user want to set.


This example is illustrative, not the final required schema.

The Kubernetes driver should prefer raw Kubernetes resource names and
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to talk the k8s implementation, but why bind it to this RFC? The k8s part is simply an implementation detail and other than a reference

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can get started on the k8s part in parallel to this RFC. I was initially just going to comment on or update your issue, but the content (after iterating through some design decisions) got to the point, that I thought an RFC makes more sense.

```json
{
"driver_config": {
"kubernetes": {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding this and the later mentions, did you consider having the enveloped fields something more generic e.g. compute_config versus the specific Kubernetes part ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The top-level "kubernetes" here maps to a concrete driver name. We are trying to add a mechanism for specifying driver-specific configs.

Could you provide an example of what you're expecting? What would you expect to be present in the compute_config?

Copy link
Copy Markdown

@kon-angelo kon-angelo May 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am asking if the concrete driver name is really important.

  • Does gateway need to know that it talks with a "kubernetes" driver ?
  • Do we expect to have multiple driver configurations nested ? (e.g. kubernetes and podman)
    • If not, it is not good enough to just dump the part inside kubernetes and skip the extra nesting? Maybe even consider a named field e.g.
    {
        "driver_config": {
        "type": "kubernetes" //just validation
        "config: {...} // driver only get's the value passed
    }
    

or

{
    "driver_config": {
      "runtimeClass": "foo" // directly passing the `driver_config`
      ...
}

The compute_config thing did not help to convey the idea very well 😅

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the concrete driver name is important because this is what defines the spec for the allowed config. Although the content is arbitrary and opague from the point of view of the gateway, the contents need to be understood by the driver itself. This also opens up support for a gateway being connected to multiple drivers in the future -- without REQUIRING it at this stage.

Note that although most (if not all) drivers are currently in-tree, it is reasonable to assume that third-party drivers could be written at some stage. Since these are not tied to the release cadence of the gateway itself, the config object can be used to allow users to set driver-specific options without aligning with the gateway. This also allows the OpenShell developers to further decouple the gateway from the driver if that make sense.

Let me try to find better examples here -- possibly with a first PR for k8s.

@kon-angelo
Copy link
Copy Markdown

Since we are at it, would a driver config make sense for all top level resources created by openshell ? They are to a certain degree managed by drivers e.g. provider secrets etc. Would it not make sense to have similar capability in all of these objects and keep the modeling somewhat similar ? I do understand that there are more applications for the compute drivers but still..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants