Skip to content

Commit 8fc1e4b

Browse files
committed
added async RNG
1 parent 655e1c9 commit 8fc1e4b

6 files changed

Lines changed: 676 additions & 98 deletions

File tree

docs/draft/async-crypto.md

Lines changed: 125 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -416,6 +416,130 @@ int wh_Client_Sha256Dma(whClientContext* ctx, wc_Sha256* sha, const uint8_t* in,
416416
| **`requestSent` flag** | Adds a parameter to the API, but avoids unnecessary round-trips when input is absorbed entirely into the local buffer |
417417
| **Snapshot/rollback on send failure** | Small CPU cost to copy the partial buffer, but guarantees SHA state consistency even on transport failures |
418418
419+
420+
421+
## RNG: Single-Shot with Caller-Driven Chunking
422+
423+
The RNG generate operation is the second algorithm to receive the async
424+
treatment. Unlike SHA, RNG is **single-shot** -- there is no intermediate
425+
state to carry, no partial-block buffering, and no multi-call Init/Update/Final
426+
sequence. Each Request asks for N random bytes and the matching Response
427+
delivers them.
428+
429+
RNG is still interesting because the existing blocking API silently chunks
430+
large requests into multiple round-trips when the caller asks for more bytes
431+
than fit in one comm-buffer message. The async split has to decide where
432+
that chunking logic lives.
433+
434+
### Chunking Policy
435+
436+
The async Request/Response pair is **single-shot per call**: one Request
437+
produces one Response. Callers requesting more bytes than fit in a single
438+
inline message must loop themselves. The per-call inline cap is exposed as:
439+
440+
```c
441+
#define WH_MESSAGE_CRYPTO_RNG_MAX_INLINE_SZ \
442+
(WOLFHSM_CFG_COMM_DATA_LEN - \
443+
(uint32_t)sizeof(whMessageCrypto_GenericResponseHeader) - \
444+
(uint32_t)sizeof(whMessageCrypto_RngResponse))
445+
```
446+
447+
Requests exceeding this cap (or of size zero) are rejected with
448+
`WH_ERROR_BADARGS` before any bytes hit the wire.
449+
450+
The existing blocking `wh_Client_RngGenerate()` function is retained as a
451+
thin wrapper that chunks internally against the cap, so application code
452+
using the wolfCrypt RNG callback path continues to work without changes:
453+
454+
```c
455+
int wh_Client_RngGenerate(whClientContext* ctx, uint8_t* out, uint32_t size)
456+
{
457+
while (remaining > 0) {
458+
uint32_t chunk = min(remaining, WH_MESSAGE_CRYPTO_RNG_MAX_INLINE_SZ);
459+
uint32_t got = chunk;
460+
wh_Client_RngGenerateRequest(ctx, chunk);
461+
do {
462+
ret = wh_Client_RngGenerateResponse(ctx, out, &got);
463+
} while (ret == WH_ERROR_NOTREADY);
464+
out += got; remaining -= got;
465+
}
466+
}
467+
```
468+
469+
This keeps the async primitives predictable (each call is bounded by a single
470+
round trip) and pushes the scheduling decision -- "when should I yield
471+
between chunks?" -- up to the async caller, who is the only one with enough
472+
context to answer it.
473+
474+
### Response Size Negotiation
475+
476+
The Response function takes an `inout_size` parameter: on entry it is the
477+
capacity of the output buffer; on exit it is the actual number of bytes the
478+
server wrote. This lets the caller distinguish short reads from bugs:
479+
480+
```c
481+
uint32_t got = requested;
482+
ret = wh_Client_RngGenerateResponse(ctx, out, &got);
483+
/* got may be < requested if the server returned a shorter reply */
484+
```
485+
486+
If the server somehow returns more bytes than the caller's buffer can hold
487+
(should not happen, but defended against), the Response returns
488+
`WH_ERROR_ABORTED` instead of overflowing.
489+
490+
### DMA Variant
491+
492+
The DMA variant bypasses the comm buffer entirely for the data payload: the
493+
server writes random bytes directly into the client's output buffer via
494+
translated DMA addresses. The Request/Response split introduces the same
495+
address-stashing pattern used by SHA DMA:
496+
497+
```c
498+
typedef struct {
499+
uintptr_t outAddr; /* translated DMA address */
500+
uintptr_t clientAddr; /* original client address (for POST) */
501+
uint64_t outSz; /* DMA'd size (0 means "nothing to clean up") */
502+
} whClientDmaAsyncRng;
503+
```
504+
505+
Stored in `whClientContext.dma.asyncCtx.rng`, this context carries the
506+
translated address across the Request/Response boundary so the Response can
507+
perform the matching POST cleanup.
508+
509+
Two points worth calling out:
510+
511+
- **Fail-fast on occupied transport**: the DMA Request checks
512+
`wh_CommClient_IsRequestPending()` *before* acquiring the DMA mapping.
513+
Without this check, a request that would be rejected by `SendRequest` would
514+
still leave a leaked DMA mapping behind, because the Response (which
515+
normally releases the mapping) would never run.
516+
- **POST runs on every non-NOTREADY exit**: once the Response receives a
517+
reply -- success or otherwise -- it performs the POST cleanup
518+
unconditionally, so the client buffer is safe to read regardless of the
519+
final return code.
520+
521+
Unlike the non-DMA variant, the DMA variant has no per-call size cap: the
522+
server writes directly to client memory, so a single DMA call can fulfill
523+
arbitrarily large requests.
524+
525+
### API Reference
526+
527+
```c
528+
/* Non-DMA */
529+
int wh_Client_RngGenerateRequest(whClientContext* ctx, uint32_t size);
530+
int wh_Client_RngGenerateResponse(whClientContext* ctx, uint8_t* out,
531+
uint32_t* inout_size);
532+
533+
/* DMA (requires WOLFHSM_CFG_DMA) */
534+
int wh_Client_RngGenerateDmaRequest(whClientContext* ctx, uint8_t* out,
535+
uint32_t size);
536+
int wh_Client_RngGenerateDmaResponse(whClientContext* ctx);
537+
538+
/* Blocking (unchanged; now wraps the async primitives and chunks internally) */
539+
int wh_Client_RngGenerate(whClientContext* ctx, uint8_t* out, uint32_t size);
540+
int wh_Client_RngGenerateDma(whClientContext* ctx, uint8_t* out, uint32_t size);
541+
```
542+
419543
## Roadmap: Remaining Algorithms
420544
421545
The async split pattern will be applied algorithm by algorithm to all crypto
@@ -430,6 +554,7 @@ the full set of operations and their planned async status.
430554
| SHA-224 | Update/Final Request/Response | Shares SHA-256 wire format |
431555
| SHA-384 | Update/Final Request/Response | Shares SHA-512 wire format |
432556
| SHA-512 | Update/Final Request/Response | Non-DMA and DMA variants |
557+
| RNG Generate | `wh_Client_RngGenerate{Request,Response}` and DMA variants | Single-shot per call; non-DMA callers chunk against `WH_MESSAGE_CRYPTO_RNG_MAX_INLINE_SZ`, DMA has no per-call cap |
433558
434559
**Planned:**
435560
@@ -450,7 +575,6 @@ the full set of operations and their planned async status.
450575
| CMAC | `wh_Client_Cmac{Request,Response}` | Low | Already has partial split pattern |
451576
| ML-DSA Sign | `wh_Client_MlDsaSign{Request,Response}` | Low | Post-quantum; single-shot |
452577
| ML-DSA Verify | `wh_Client_MlDsaVerify{Request,Response}` | Low | Post-quantum; single-shot |
453-
| RNG Generate | `wh_Client_RngGenerate{Request,Response}` | Medium | Chunking needed for large requests; async callers must handle chunking themselves |
454578
455579
Most remaining algorithms are **single-shot** operations (one request, one
456580
response) and are straightforward to split compared to SHA's streaming

0 commit comments

Comments
 (0)