v4.3 release notes#774
Open
TysonRayJones wants to merge 33 commits into
Open
Conversation
…ibility with CUDA 13. Fixes #693.
gpu_thrust.cuh: removed thrust::[unary|binary]_function which has been removed from CCCL.
* Simplify installation path configuration Removed unnecessary path normalization and appending for installation. * Updates CMake config for conditional installs Modifies CMakeLists to conditionally build shared libraries and install binaries only at the top-level project. Introduces the INSTALL_BINARIES option to control the inclusion of example binaries in the installation process. Corrects a typo from 'RATH' to 'RPATH' for build configurations.
* docs/cmake.md: fixed formatting of non-default options for mt and distribution * cmake: wrapped user source install in if(INSTALL_BINARIES) * docs/cmake.md: added INSTALL_BINARIES option
* gpu_thrust.cuh: modified initial thrust counting iterator declarations to use long long to avoid overflow at >30 qubits. Fixes #698. * patched test of rightapplyCompMatr distributed validation The operation validation tests previously always uses a statevector to test the "targeted amps fit in node" validation, though the rightapply*() functions cannot accept statevectors, instead only density matrices. Because the "was given a density matrix" validation happens before "targeted amps fit in node" validation, the latter intended triggered error was beaten out by the earlier unintended one. Now, we are careful to pass a density matrix Qureg to the validation of "targeted amps fit in node" when triggered by a function which 'right-applies' (and is ergo only compatible with density matrices) * changed literals to defensive type --------- Co-authored-by: Tyson Jones <tyson.jones.input@gmail.com>
* Implement PauliStrSum random permutations inspired by [arXiv:1805.08385](https://arxiv.org/abs/1805.08385) * Add randomisation to Trotter functions * Document random Pauli permutations for Trotterisation * Add test for Trotter randomisation
* Added unit tests requested by Quantum Motion * tests/unit/trotterisation.cpp: updated time evo calls to new API * tests/unit/trotterisation.cpp: updated authorlist * Fixed valgrind errors * tests/unit/trotterisation.cpp: tuned floating-point comparison epsilon to account for worst-case scenario which is single precision, single thread * added get-arbitrary-qureg test util since it will be used frequently by new input validation * removed Qureg creation in validation tests so that test failures do not cause memory leaks and e.g. add to valgrind noise. Tests now instead use getArbitraryCachedStatevec() or getArbitraryCachedDensmatr() to obtain an existing qureg with an arbitrary deployment. * restoring missing-validation comments since the validation for these functions wasn't added. Such functions have additional tests to their tested counterparts; for example, validating that matrix elements are non-zero when given a negative exponent * fixing test category * added missing operation validation tests * fixed indentation * making spacing consistent and adding a missing Hermiticity validation to applyTrotterizedUnitaryTimeEvolution test * added warning about untested deployments * removed defunct signature * patching C++ validation err msg Previously, an error message of the C++ API was not substituting in values for its placeholder variables. This affected the C++ variants of the below functions when passing vectors for the targets and outcome parameters of mismatching length: - calcProbOfMultiQubitOutcome - leftapplyMultiQubitProjector - rightapplyMultiQubitProjector - applyMultiQubitProjector - applyForcedMultiQubitMeasurement * added missing C++ API signatures * added C++-API validation tests * updated doc warnings * added Vasco to Trotter API authorlist * merged Tyson's patches --------- Co-authored-by: Tyson Jones <tyson.jones.input@gmail.com> Co-authored-by: Maurice Jamieson <m.jamieson@epcc.ed.ac.uk> Co-authored-by: Oliver Thomson Brown <otbrown@users.noreply.github.com> Co-authored-by: Oliver Thomson Brown <8394906+otbrown@users.noreply.github.com>
CI updated to use latest AMD ROCm install instructions. As of this commit corresponding to ROCm 7.2. --------- Co-authored-by: Oliver Thomson Brown <8394906+otbrown@users.noreply.github.com>
Remove numControls argument from applyMultiStateControlledSqrtSwap overloaded definition taking std::vector<int> (cherry picked from commit 9c20792) Co-authored-by: D-Exposito <dexposito@cesga.es>
* tests/unit/trotterisation.cpp: updated to use REQUIRE_AGREE and cached statevecs and densmats, and both permutePaulis options * tests/utils/compare.hpp/cpp: added setters for test epsilon * tests/unit/trotterisation.cpp: adjusted test epsilon for quad precision imaginary time evolution tests * tests/unit/trotterisation.cpp: moved unitary time evo test to REQUIRE_AGREE * tests/utils/cache.hpp/cpp: added additional utilities for creating and destroying temp caches (which I guess makes them not caches?) with a set number of qubits * tests/unit/trotterisation.cpp: updated unitary time evo test to test across deployments * tests/unit/trotterisation.cpp: reduced number of qubits and increased number of steps to admit the possibility of testing density matrices too * tests/unit/trotterisation.cpp: added density matrix tests * reduce test precision to lazily pass CPU clang quad-precision * skip Trotter tests in paid CI * changing varname convention * renaming cache funcs --------- Co-authored-by: Oliver Thomson Brown <8394906+otbrown@users.noreply.github.com> Co-authored-by: Tyson Jones <tyson.jones.input@gmail.com>
--------- Co-authored-by: Oliver Thomson Brown <otbrown@users.noreply.github.com>
Formerly, the Trotter functions (such as applyTrotterizedPauliStrSumGadget()), when passed permutePaulis=true, would randomly permutate the order of the passed PauliStrSum, mutating it and affecting the outputs of subsequent functions like reportPauliStrSum(). The function also contained superfluous memory allocs/copies equal in size to the PauliStrSum. Now, the PauliStrSum is never mutated, and an internally allocated ordering list keeps track of the randomised permutation. We also updated the doc, renamed permutePaulis to permuteTerms, and improved validation. Note that 'permuteTerms' had not yet reached main/release, so these changes do not need to be documented in the v4.3 release notes.
Created cpu_qcomp and gpu_qcomp (from a shared base_qcomp) to avoid std::complex arithmetic operators in hot loops which caused performance issues. Removed all prior compiler flags and related scaffolding attempting to mitigate the performance issue. Also gave MSVC build the params `/Zc:preprocessor -Xcompiler=/Zc:preprocessor /bigobj` as needed for compilation of the unit tests on my windows machines.
Optimisations include: - Adopted SmallView (const SmallList&) to avoid superfluous SmallList copies - Made internally created matrices static - Change accelerator dynamic function vectors to static arrays - Exit all validators early when validation is disabled Additional cleanup includes: - Tidied accelerator macros (replaced param-specific macros like "numCtrls" and "numTargs" with "param") - Fill ctrlStates vectors with default before localiser - Renamed getBitsFromInteger to setToBitsOfInteger - Adopted const in bitwise.hpp to better express intent Note that the naming of SmallList and SmallView will be subsequently changed to List64 and ConstList64
such that they all begin with QUEST, but some have additional changes
so that we can compile MPI tests on systems which cannot actually run with MPI, because they are missing an MPI or UCX library file, as is witnessed in the CI (when compiling with MPICH). It's generally irksome too to trigger an execution of the test binary (which itself initialises QuEST) during build when on a HPC platform with distinct submit and compute nodes
* Added ENABLE_SUBCOMM build option * Moved from MPI_COMM_WORLD to mpiQuestComm * Decided passing *MPI_Comm was probably overly cautious, and updated function name to comm_getMpiComm * environment.cpp: added methods to reset rank and numNodes, and reporting for subcomm compiled * comm_config.hpp/cpp: added comm_setMpiComm * CMakeLists.txt: PUBLIC MPI::MPI_CXX turned out to be unhelpful, even for SubComm, because of course it enforces CXX * Added new custom QuESTEnv initialiser which allow user to positively declare that they take ownership of MPI * validation.cpp: updated comm_end call * comm_config.hpp: added config.h include so COMPILE_MPI is actually defined * subcommunicator.h/cpp: implemented QuESTEnv initialiser with custom MPI_Comm * CMake: added subcommunicator.cpp * comm_config.hpp: added missing config.h include... * comm_config.cpp: explicitly initialise mpiCommQuest to MPI_COMM_NULL, updated setComm for init only workflow * quest.h: added subcommunicator header * CMake: added MPI to application binaries when SUBCOMM is enabled * comm_routines.cpp: post Irecv before Isend which probably won't do anything but it makes MPI library implementers less nervous * tests: added new env test for initCustomMpiQuESTEnv * Added error throws to comm_config to cover new scenarios of badness with user owned MPI * subcommunicator.cpp: updated var names to match QuEST style * tests/unit/initialisations.cpp: slightly modified setQuregAmps test to avoid unexpected test failure due to range checking when compild in Debug configuration * Updated validation in comm_setMpiComm Co-authored-by: iarejula-bsc <inigo.arejula@bsc.es> * userOwnsMpi int->bool * comm_config.cpp: corrected call to MPI_Comm_free * subcommunicator.cpp: userOwnsMpi int->bool * subcommunicator.cpp: added comm_isInit guard around comm_setMpiComm * environment.cpp: USER_OWNS_MPI -> userOwnsMpi * comm_init: fixed case where useDistrib = 0 and userOwnsMpi = true * comm_init: moved (recently) misplaced MPI_Init * AUTHORS.txt: added iarejula-bsc * Added placeholder docstrings to new initialisers * docs/cmake.md: added ENABLE_SUBCOMM to list of QuEST CMake vars * Newly added COMPILE_MPI -> QUEST_COMPILE_MPI * ENABLE_SUBCOMM -> QUEST_ENABLE_SUBCOMM * CMake: corrected OpenMP and subcommunicator pre-processor definitions --------- Co-authored-by: Oliver Thomson Brown <8394906+otbrown@users.noreply.github.com> Co-authored-by: iarejula-bsc <inigo.arejula@bsc.es>
to reduce the likelihood of users printing from non-root nodes interrupting QuEST root output. This is not bullet-proof; we sync the active communicator rather than MPI_COMM_WORLD so the user-controlled non-participating processes may still be printing. Furthermore, even if all processes participate, some may have outstanding non-root prints that are not aggregated to the user screen by the time MPI_Barrier finishes. But these syncs greatly reduce the change of corruption, and are effectively free!
This enables CRAY MPICH platforms to leverage GPU-awareness, greatly accelerating distributed GPU simulation Co-authored-by: JPRichings <james.richings@ed.ac.uk>
Important changes: - permit user initialisation of MPI when QuEST is not distributed - changed QuESTEnv fields bool from int (e.g. isMultithreaded) - add user-input validation for custom MPI calls - disambiguated comm_config.cpp concepts of "MPI is initialised" (comm_isMpiInit) from "QuEST communication is active" (comm_isActive) - refactored comm_config.cpp flow, especially related to pre-quest-init flow (during validation) - added Oliver's custom-MPI examples (from #712) - moved new API functions to experimental.h - tweaked reportQuESTEnv output grouping
Added: - QUEST_DEFAULT_NUM_GPU_THREADS_PER_BLOCK CMake option - QUEST_DEFAULT_NUM_GPU_THREADS_PER_BLOCK environment variable - setQuESTNumGpuThreadsPerBlock() API function - getQuESTNumGpuThreadsPerBlock() API function - set_num_gpu_threads examples in examples/extended --------- Co-authored-by: Oliver Thomson Brown <8394906+otbrown@users.noreply.github.com> Co-authored-by: Tyson Jones <tyson.jones.input@gmail.com>
Beware this included removing the superfluous `numControls` argument from the C++only `std::vector` overload of `applyMultiStateControlledCompMatr2`, which is technically a teeny tiny API break ¯\_(ツ)_/¯
…de new validation (#771) Updated number of seeds test to use a valid pointer and added a separate NULL pointer test.
test_free.yml: added Release config to ctest commands (#773)
Member
Author
|
@otbrown @JPRichings Draft of release notes above, feel free to edit directly! |
Collaborator
|
Added a note about the change from |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
(Ignore the diff; using the commit history to prepare the release notes)
Overview
This release accelerates few-qubit, CPU and multi-GPU simulation (on some platforms), adds control and performance tuning utilities for MPI and GPU superusers, adds randomised Trotter simulation, and reduces the likelihood of QuEST symbols colliding with other software stacks.
Optimisations
New features
initCustomMpiCommQuESTEnv()andinitCustomMpiCommQuESTEnv()functions which permit users to retain control of MPI during QuEST simulation, and dedicate only a sub-communicator (e.g. only some MPI processes) to QuEST. Users can now also disable QuEST distribution throughinitCustomQuESTEnv()while still themselves using MPI, even when QuEST was compiled with MPI.setQuESTNumGpuThreadsPerBlock()function to override QuEST's GPU parallelisation granularity at runtime, permitting simple performance tuning. This accompanies a getter (get...), an environment variableQUEST_DEFAULT_NUM_GPU_THREADS_PER_BLOCKto override the parallelisation at launch time, and a CMake option of the same name to override at build-time.sortPauliStrSumLexicographic()andsortPauliStrSumMagnitude()to reorder the terms within aPauliStrSum, affecting the numerical accuracy of Trotterisation.permuteTermsboolean, which whentrue, sees every Trotter repetition randomise the ordering of the Trotter terms, often improving numerical accuracy. This affects:inverseboolean, to apply the inverse QFT.QUEST_INSTALL_BINARIESto enable including examples and user-source in the QuEST installation directory.QUEST_prefix to (almost) all CMake options, to avoid collision with other projects (see API breaks below).API breaks
permuteTermsboolean. This affects:apply(Multi)(State)(Controlled)TrotterizedPauliStrSumGadget()applyTrotterizedNonUnitaryPauliStrSumGadget()applyTrotterized(Unitary|Imaginary|Noisy)TimeEvolution()inverseboolean. This affects:apply(Full)QuantumFourierTransform()numControlsargument was removed from thestd::vector(C++ only) overloads ofapplyMultiStateControlledSqrtSwap()andapplyMultiStateControlledCompMatr2().QuEST(for example,setQuESTSeeds()) as the second word, indicated by[QuEST]below. This affects functions:(set|get)[QuEST]Seeds(ToDefault)()get[QuEST]NumSeeds()set[QuEST]InputErrorHandler()set[QuEST]Validation(On|Off)()set[QuEST]ValidationEpsilon(ToDefault)()set[QuEST]MaxNumReportedItems()set[QuEST]MaxNumReportedSigFigs()set[QuEST]NumReportedNewlines()set[QuEST]ReportedPauli(Chars|StrStyle)()get[QuEST]GpuCacheSize()clear[QuEST]GpuCache()get[QuEST]EnvironmentString()QUEST_, as indicated by[QUEST_]below. This affects environment variables:[QUEST_]PERMIT_NODES_TO_SHARE_GPU[QUEST_]DEFAULT_VALIDATION_EPSILON[QUEST_]TEST_NUM_QUBITS_IN_QUREG[QUEST_]TEST_MAX_NUM_QUBIT_PERMUTATIONS[QUEST_]TEST_MAX_NUM_SUPEROP_TARGETS[QUEST_]TEST_NUM_MIXED_DEPLOYMENT_REPETITIONSTEST_ALL_DEPLOYMENTShas becomeQUEST_TEST_TRY_ALL_DEPLOYMENTSQUEST_orUSER_(to disambiguate whether they relate to the QuEST library or the user's optional source files)QUEST_, and several have been made more explicit. The changes are:USER_SOURCE->USER_SOURCE_NAMESOUTPUT_EXE->USER_OUTPUT_EXE_NAMELIB_NAME->QUEST_OUTPUT_LIB_NAMEVERBOSE_LIB_NAME->QUEST_APPEND_CONFIG_TO_LIB_NAMEFLOAT_PRECISION->QUEST_FLOAT_PRECISIONBUILD_EXAMPLES->QUEST_BUILD_EXAMPLESENABLE_MULTITHREADING->QUEST_ENABLE_OMPENABLE_DISTRIBUTION->QUEST_ENABLE_MPIENABLE_TESTING->QUEST_BUILD_TESTSDOWNLOAD_CATCH2->QUEST_TESTS_DOWNLOAD_CATCH2[QUEST_]ENABLE_CUDA[QUEST_]ENABLE_CUQUANTUM[QUEST_]ENABLE_HIP[QUEST_]ENABLE_DEPRECATED_API[QUEST_]DISABLE_DEPRECATION_WARNINGSMinor changes
reportQuESTEnv()has been rearranged and reordered.setSeeds(now calledsetQuESTSeeds()) validates that its given list of integers is non-null.intfields of theQuESTEnvstruct (such asisMultithreaded) have been changed to typebool(exposed by<stdbool.h>in C).extendedexamplesset_num_gpu_threads.(c|cpp)anduser_owned_(sub)mpi.(c|cpp)Patches
-Ofastto be applied to CPU subroutines.64 GiBor more of memory in a single GPU (i.e. a non-distributed 32-qubit statevector, or 16-qubit density matrix). Formerly, the below functions would induce a crash, or set all amplitudes to zero, or output zero.(set|create)FullStateDiagMatrFromPauliStrSum()setQuregToPauliStrSum()calcTotalProb()calcProbOf(Multi)QubitOutcome()calcFidelity()calcExpecPauliStr()calcExpecPauliStrSum()calcExpecFullStateDiagMatr()apply(Multi)QubitProjector()initRandomPureState()Notable internal changes
std::complexarithmetic overloads, and instead make use of customcpu_qcompandgpu_qcomptypes.std::vector<int>. They now instead useList64- a custom, stack-based, light-weight, fixed-capacity list - and pass constant references thereof (``ConstList64`) where possible.MPI_COMM_WORLDto avoid collisions with user messages.New contributors
This release contained contributions from new contributors: