It is possible to set the number of threads used by OpenBLAS via openblas_set_num_threads. For the "custom thread" solution this works quite well: Independent of what the application may do the set number of threads is used inside OpenBLAS.
However for OpenMP this is not the case: An application might want to have OpenBLAS use 4 of 16 threads while using OpenMP itself to schedule other work or run 4 OpenBLAS operations in parallel each using 4 threads (up to the runtime if that is even possible, but the first use case should be). Another use case would be that OpenBLAS should use only 4 threads (e.g. due to performance reasons, usual matrix size, ...) but the application wants to use OpenMP (at other times, so not in parallel to OpenBLAS) with all 16 threads.
Now OpenBLAS does something nasty: It uses the max number of openmp threads and sets the max number of used threads to that value. So it is impossible to use less than the number of OpenMP threads.
In code the problem is 2-fold:
So for a first fix I'd suggest to make num_cpu_avail return the lesser of blas_cpu_number and openmp_nthreads instead of setting anything.
It is possible to set the number of threads used by OpenBLAS via
openblas_set_num_threads. For the "custom thread" solution this works quite well: Independent of what the application may do the set number of threads is used inside OpenBLAS.However for OpenMP this is not the case: An application might want to have OpenBLAS use 4 of 16 threads while using OpenMP itself to schedule other work or run 4 OpenBLAS operations in parallel each using 4 threads (up to the runtime if that is even possible, but the first use case should be). Another use case would be that OpenBLAS should use only 4 threads (e.g. due to performance reasons, usual matrix size, ...) but the application wants to use OpenMP (at other times, so not in parallel to OpenBLAS) with all 16 threads.
Now OpenBLAS does something nasty: It uses the max number of openmp threads and sets the max number of used threads to that value. So it is impossible to use less than the number of OpenMP threads.
In code the problem is 2-fold:
num_cpu_availwhich should only query does a modification: https://github.com/xianyi/OpenBLAS/blob/develop/common_thread.h#L154-L156So for a first fix I'd suggest to make
num_cpu_availreturn the lesser ofblas_cpu_numberandopenmp_nthreadsinstead of setting anything.