Skip to content

Parallelize volume header key derivation across layouts on mount#1793

Open
damianrickard wants to merge 1 commit into
veracrypt:masterfrom
damianrickard:perf/parallel-header-keyderivation
Open

Parallelize volume header key derivation across layouts on mount#1793
damianrickard wants to merge 1 commit into
veracrypt:masterfrom
damianrickard:perf/parallel-header-keyderivation

Conversation

@damianrickard

@damianrickard damianrickard commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Summary

On mount, Volume::Open auto-detects the volume layout by probing each candidate layout's header, deriving each header's key independently. Two things made this slow:

  1. The derivations for the different candidate layouts ran one at a time.
  2. When the mount is performed by the elevated core-service process (e.g. macOS device mounts), that process does not start EncryptionThreadPool, so the derivations ran single-threaded on a single core.

With Argon2id (memory-hard, single-lane, seconds-to-minutes per derivation) this is slow — worst for hidden volumes, where the outer and hidden headers are both probed.

This change derives the candidate layouts' header keys concurrently via the thread pool, so the expensive KDFs overlap and use all available cores.

Change

  • New VolumeHeader::DecryptHeaderParallel(candidates, password, pim): dispatches every (candidate layout × KDF) derivation to EncryptionThreadPool at once. Candidates are then resolved strictly in their original (priority) order — candidate N is considered only once every higher-priority candidate is known not to decrypt and not to throw — so the serial detection's layout priority and exception ordering (e.g. HigherVersionRequired) are preserved. Within a candidate, the first KDF whose derived key decrypts the header wins, so a fast match does not wait on that candidate's slower KDFs.
  • Volume::Open gathers the candidate headers and uses the parallel path when no specific KDF is requested; the existing serial per-layout path is the unchanged fallback (specific KDF requested, or thread pool unavailable).
  • Volume::Open starts the pool for the derivation only if it is not already running, and stops it again before returning (see below). This is what lets the core-service mount path use the pool at all, without keeping pool threads alive across the subsequent FUSE fork().

Thread-pool lifetime / fork safety

fork() in a multithreaded process is unsafe, and the mount path fork()s to launch the FUSE service. So Volume::Open only starts the pool if it is not already running, and stops it before returning:

  • The GUI's persistent pool (started in main()) is left untouched.
  • A process that does not keep a pool running (e.g. the elevated core service) gets a transient pool scoped to the derivation and is single-threaded again at the FUSE fork. The FUSE service starts its own pool after that fork, as before.

Scope

Platform-independent; touches only src/Volume/ (Volume.cpp, VolumeHeader.cpp, VolumeHeader.h). No on-disk format change. The serial detection path and its results are unchanged.

Testing

Built and tested on Apple Silicon (macOS, FUSE-T). The real-world impact is large: a hidden volume on an external device that previously took over 15 minutes to mount (in one attempt it had still not completed when I cancelled it) now mounts in under a minute — contents verified, clean dismount. Normal/PBKDF2 volumes mount without regression (a fast KDF match returns without waiting on Argon2). The serial fallback path is exercised via specific-KDF mounts.

Volume mounting auto-detects the layout by probing each candidate layout's
header, each deriving its key (PBKDF2 or Argon2) independently. This ran one
layout at a time, so with Argon2 (memory-hard, single-lane, seconds-to-
minutes per pass) the probe is slow -- worst for hidden volumes, where the
outer and hidden headers are derived serially.

- Add VolumeHeader::DecryptHeaderParallel: dispatch every (candidate layout
  x KDF) derivation to the encryption thread pool at once, so different
  layouts' expensive KDFs run concurrently. Candidates are then resolved in
  their original (priority) order -- candidate N is considered only once every
  higher-priority candidate is known not to decrypt and not to throw -- so the
  serial detection's layout priority and exception ordering are preserved.
  Within a candidate the first KDF whose key decrypts the header wins, so a
  fast match does not wait on that candidate's slow KDFs.
- Volume::Open gathers the candidate headers and uses it when no specific KDF
  is requested; the serial per-layout path is otherwise unchanged.
- If the pool is not already running (e.g. in the elevated core service),
  Volume::Open starts it only for the derivation and stops it before
  returning -- so it is NOT running when the caller fork()s the FUSE daemon
  (fork() in a multithreaded process is unsafe; the FUSE daemon starts its
  own pool after that fork). The GUI's persistent pool is left untouched.
@idrassi

idrassi commented Jun 22, 2026

Copy link
Copy Markdown
Member

Thanks for the PR. The performance goal is interesting, but I can’t merge this as-is.

First, the new lambda in Volume.cpp breaks Linux compatibility baseline, especially CentOS 6 / GCC 4.4 where lambdas are not supported.

More importantly, starting/stopping the global EncryptionThreadPool from Volume::Open is risky. The current pool lifecycle is not designed for repeated local start/stop cycles, and this can leave stale thread handles/global state across repeated elevated mount requests and before the FUSE fork.

Also, parallelizing across layouts can run multiple Argon2 derivations at the same time, significantly increasing peak memory use and potentially causing correct mounts to fail under memory pressure.

I think this needs a redesign: avoid lambdas, avoid ad-hoc global pool lifecycle changes in Volume::Open and keep Argon2 layout probing serial.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants