@@ -416,6 +416,130 @@ int wh_Client_Sha256Dma(whClientContext* ctx, wc_Sha256* sha, const uint8_t* in,
416416| **`requestSent` flag** | Adds a parameter to the API, but avoids unnecessary round-trips when input is absorbed entirely into the local buffer |
417417| **Snapshot/rollback on send failure** | Small CPU cost to copy the partial buffer, but guarantees SHA state consistency even on transport failures |
418418
419+
420+
421+ ## RNG: Single-Shot with Caller-Driven Chunking
422+
423+ The RNG generate operation is the second algorithm to receive the async
424+ treatment. Unlike SHA, RNG is **single-shot** -- there is no intermediate
425+ state to carry, no partial-block buffering, and no multi-call Init/Update/Final
426+ sequence. Each Request asks for N random bytes and the matching Response
427+ delivers them.
428+
429+ RNG is still interesting because the existing blocking API silently chunks
430+ large requests into multiple round-trips when the caller asks for more bytes
431+ than fit in one comm-buffer message. The async split has to decide where
432+ that chunking logic lives.
433+
434+ ### Chunking Policy
435+
436+ The async Request/Response pair is **single-shot per call**: one Request
437+ produces one Response. Callers requesting more bytes than fit in a single
438+ inline message must loop themselves. The per-call inline cap is exposed as:
439+
440+ ```c
441+ #define WH_MESSAGE_CRYPTO_RNG_MAX_INLINE_SZ \
442+ (WOLFHSM_CFG_COMM_DATA_LEN - \
443+ (uint32_t)sizeof(whMessageCrypto_GenericResponseHeader) - \
444+ (uint32_t)sizeof(whMessageCrypto_RngResponse))
445+ ```
446+
447+ Requests exceeding this cap (or of size zero) are rejected with
448+ ` WH_ERROR_BADARGS ` before any bytes hit the wire.
449+
450+ The existing blocking ` wh_Client_RngGenerate() ` function is retained as a
451+ thin wrapper that chunks internally against the cap, so application code
452+ using the wolfCrypt RNG callback path continues to work without changes:
453+
454+ ``` c
455+ int wh_Client_RngGenerate (whClientContext* ctx, uint8_t* out, uint32_t size)
456+ {
457+ while (remaining > 0) {
458+ uint32_t chunk = min(remaining, WH_MESSAGE_CRYPTO_RNG_MAX_INLINE_SZ);
459+ uint32_t got = chunk;
460+ wh_Client_RngGenerateRequest(ctx, chunk);
461+ do {
462+ ret = wh_Client_RngGenerateResponse(ctx, out, &got);
463+ } while (ret == WH_ERROR_NOTREADY);
464+ out += got; remaining -= got;
465+ }
466+ }
467+ ```
468+
469+ This keeps the async primitives predictable (each call is bounded by a single
470+ round trip) and pushes the scheduling decision -- "when should I yield
471+ between chunks?" -- up to the async caller, who is the only one with enough
472+ context to answer it.
473+
474+ ### Response Size Negotiation
475+
476+ The Response function takes an `inout_size` parameter: on entry it is the
477+ capacity of the output buffer; on exit it is the actual number of bytes the
478+ server wrote. This lets the caller distinguish short reads from bugs:
479+
480+ ```c
481+ uint32_t got = requested;
482+ ret = wh_Client_RngGenerateResponse(ctx, out, &got);
483+ /* got may be < requested if the server returned a shorter reply */
484+ ```
485+
486+ If the server somehow returns more bytes than the caller's buffer can hold
487+ (should not happen, but defended against), the Response returns
488+ ` WH_ERROR_ABORTED ` instead of overflowing.
489+
490+ ### DMA Variant
491+
492+ The DMA variant bypasses the comm buffer entirely for the data payload: the
493+ server writes random bytes directly into the client's output buffer via
494+ translated DMA addresses. The Request/Response split introduces the same
495+ address-stashing pattern used by SHA DMA:
496+
497+ ``` c
498+ typedef struct {
499+ uintptr_t outAddr; /* translated DMA address */
500+ uintptr_t clientAddr; /* original client address (for POST) */
501+ uint64_t outSz; /* DMA'd size (0 means "nothing to clean up") */
502+ } whClientDmaAsyncRng;
503+ ```
504+
505+ Stored in ` whClientContext.dma.asyncCtx.rng ` , this context carries the
506+ translated address across the Request/Response boundary so the Response can
507+ perform the matching POST cleanup.
508+
509+ Two points worth calling out:
510+
511+ - ** Fail-fast on occupied transport** : the DMA Request checks
512+ ` wh_CommClient_IsRequestPending() ` * before* acquiring the DMA mapping.
513+ Without this check, a request that would be rejected by ` SendRequest ` would
514+ still leave a leaked DMA mapping behind, because the Response (which
515+ normally releases the mapping) would never run.
516+ - ** POST runs on every non-NOTREADY exit** : once the Response receives a
517+ reply -- success or otherwise -- it performs the POST cleanup
518+ unconditionally, so the client buffer is safe to read regardless of the
519+ final return code.
520+
521+ Unlike the non-DMA variant, the DMA variant has no per-call size cap: the
522+ server writes directly to client memory, so a single DMA call can fulfill
523+ arbitrarily large requests.
524+
525+ ### API Reference
526+
527+ ``` c
528+ /* Non-DMA */
529+ int wh_Client_RngGenerateRequest (whClientContext* ctx, uint32_t size);
530+ int wh_Client_RngGenerateResponse(whClientContext* ctx, uint8_t* out,
531+ uint32_t* inout_size);
532+
533+ /* DMA (requires WOLFHSM_CFG_DMA) * /
534+ int wh_Client_RngGenerateDmaRequest(whClientContext* ctx, uint8_t* out,
535+ uint32_t size);
536+ int wh_Client_RngGenerateDmaResponse(whClientContext* ctx);
537+
538+ /* Blocking (unchanged; now wraps the async primitives and chunks internally) * /
539+ int wh_Client_RngGenerate(whClientContext* ctx, uint8_t* out, uint32_t size);
540+ int wh_Client_RngGenerateDma(whClientContext* ctx, uint8_t* out, uint32_t size);
541+ ```
542+
419543## Roadmap: Remaining Algorithms
420544
421545The async split pattern will be applied algorithm by algorithm to all crypto
@@ -430,6 +554,7 @@ the full set of operations and their planned async status.
430554| SHA-224 | Update/Final Request/Response | Shares SHA-256 wire format |
431555| SHA-384 | Update/Final Request/Response | Shares SHA-512 wire format |
432556| SHA-512 | Update/Final Request/Response | Non-DMA and DMA variants |
557+ | RNG Generate | `wh_Client_RngGenerate{Request,Response}` and DMA variants | Single-shot per call; non-DMA callers chunk against `WH_MESSAGE_CRYPTO_RNG_MAX_INLINE_SZ`, DMA has no per-call cap |
433558
434559**Planned:**
435560
@@ -450,7 +575,6 @@ the full set of operations and their planned async status.
450575| CMAC | `wh_Client_Cmac{Request,Response}` | Low | Already has partial split pattern |
451576| ML-DSA Sign | `wh_Client_MlDsaSign{Request,Response}` | Low | Post-quantum; single-shot |
452577| ML-DSA Verify | `wh_Client_MlDsaVerify{Request,Response}` | Low | Post-quantum; single-shot |
453- | RNG Generate | `wh_Client_RngGenerate{Request,Response}` | Medium | Chunking needed for large requests; async callers must handle chunking themselves |
454578
455579Most remaining algorithms are **single-shot** operations (one request, one
456580response) and are straightforward to split compared to SHA's streaming
0 commit comments