A C++17 Thread Pool for High-Performance Scientific Computing.
We present a modern C++17-compatible thread pool implementation, built from
scratch with high-performance scientific computing in mind. The thread pool is
implemented as a single lightweight and self-contained class, and does not have
any dependencies other than the C++17 standard library, thus allowing a great
degree of portability. In particular, our implementation does not utilize
OpenMP or any other high-level multithreading APIs, and thus gives the
programmer precise low-level control over the details of the parallelization,
which permits more robust optimizations. The thread pool was extensively tested
on both AMD and Intel CPUs with up to 40 cores and 80 threads. This paper
provides motivation, detailed usage instructions, and performance tests.

