Skip to content

Add DTLS throughput benchmark tool and optimize send path#10551

Open
julek-wolfssl wants to merge 1 commit into
wolfSSL:masterfrom
julek-wolfssl:dtls-perf-benchmark
Open

Add DTLS throughput benchmark tool and optimize send path#10551
julek-wolfssl wants to merge 1 commit into
wolfSSL:masterfrom
julek-wolfssl:dtls-perf-benchmark

Conversation

@julek-wolfssl
Copy link
Copy Markdown
Member

Add examples/benchmark/dtls_bench, a DTLS throughput benchmark that completes a handshake and then measures bulk-send throughput. It supports DTLS 1.2 and 1.3, selectable cipher suites, an end-to-end mode, and a -z sink mode that discards records on the server after the handshake to isolate the sender's record-layer cost. The socket is set up with wolfSSL_set_dtls_fd_connected.

Optimize the send path exercised by the benchmark:

  • wolfio (EmbedSendTo): cache the per-descriptor socket-type probe (getsockopt SO_TYPE) in WOLFSSL_DTLS_CTX instead of running it on every send, removing a syscall from the record send path. The cache is invalidated whenever rfd/wfd is reassigned.

  • internal (BuildMessage): for AEAD suites whose explicit nonce is the 8-byte record sequence number, write the sequence number directly as nonce_explicit instead of drawing it from the RNG. This covers AES-GCM (RFC 5288 sec 3), AES-CCM (RFC 6655 sec 3), SM4-GCM/CCM (RFC 8998 sec 3), and Camellia-/ARIA-GCM which inherit the RFC 5288 construction; ChaCha20 uses an implicit nonce and is excluded. A new read-only PeekSEQ() helper reads the sequence number without advancing the per-direction counter, leaving the single mandated increment to writeAeadAuthData().

Also ignore the built dtls_bench binary in .gitignore.

Copilot AI review requested due to automatic review settings May 28, 2026 14:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new DTLS throughput benchmark under examples/benchmark/ and makes two optimizations in the DTLS send path to better measure (and reduce) per-record overhead in wolfSSL’s record layer and socket I/O glue.

Changes:

  • Add examples/benchmark/dtls_bench.c: a DTLS 1.2/1.3 throughput benchmark with cipher selection, plain-UDP baseline mode, and a client-side “sink send” mode.
  • Optimize DTLS send path by caching the SO_TYPE (datagram vs stream) probe in WOLFSSL_DTLS_CTX instead of calling getsockopt() on every send.
  • Optimize AEAD explicit-nonce construction by writing the record sequence number directly for suites where the explicit nonce is defined as the seq number, using a new read-only PeekSEQ() helper.

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
wolfssl/internal.h Adds DTLS context fields for caching socket type probe results.
tests/api.c Resets new DTLS context cache fields when copying SSL state in an API test helper.
src/wolfio.c Changes datagram-vs-stream detection to cache SO_TYPE results.
src/ssl.c Invalidates the DTLS socket-type cache when read/write fds are (re)assigned.
src/internal.c Adds PeekSEQ() and uses it to derive AEAD explicit nonce from sequence number for applicable suites.
examples/benchmark/include.am Adds dtls_bench to Automake build outputs.
examples/benchmark/dtls_bench.c New DTLS benchmark tool implementation.
.gitignore Ignores the built examples/benchmark/dtls_bench binary.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/wolfio.c
Comment thread examples/benchmark/include.am
Comment thread examples/benchmark/dtls_bench.c
Comment thread examples/benchmark/dtls_bench.c
@julek-wolfssl julek-wolfssl self-assigned this May 28, 2026
@julek-wolfssl julek-wolfssl force-pushed the dtls-perf-benchmark branch from 8d445f1 to 32c7f0b Compare May 28, 2026 17:02
@julek-wolfssl julek-wolfssl marked this pull request as ready for review May 28, 2026 17:19
@github-actions
Copy link
Copy Markdown

retest this please

Add examples/benchmark/dtls_bench, a DTLS throughput benchmark that
completes a handshake and then measures bulk-send throughput. It
supports DTLS 1.2 and 1.3, selectable cipher suites, an end-to-end
mode, and a -z sink mode that discards records on the server after the
handshake to isolate the sender's record-layer cost. The socket is set
up with wolfSSL_set_dtls_fd_connected.

Optimize the send path exercised by the benchmark:

- wolfio (EmbedSendTo): cache the per-descriptor socket-type probe
  (getsockopt SO_TYPE) in WOLFSSL_DTLS_CTX instead of running it on
  every send, removing a syscall from the record send path. The cache
  is invalidated whenever rfd/wfd is reassigned.

- internal (BuildMessage): for AEAD suites whose explicit nonce is the
  8-byte record sequence number, write the sequence number directly as
  nonce_explicit instead of drawing it from the RNG. This covers
  AES-GCM (RFC 5288 sec 3), AES-CCM (RFC 6655 sec 3), SM4-GCM/CCM
  (RFC 8998 sec 3), and Camellia-/ARIA-GCM which inherit the RFC 5288
  construction; ChaCha20 uses an implicit nonce and is excluded. A new
  read-only PeekSEQ() helper reads the sequence number without advancing
  the per-direction counter, leaving the single mandated increment to
  writeAeadAuthData().

Also ignore the built dtls_bench binary in .gitignore.
@julek-wolfssl julek-wolfssl force-pushed the dtls-perf-benchmark branch from 32c7f0b to 7b5387d Compare May 28, 2026 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants