From bf84d874fe0d06b4955316709b56272d74fb60af Mon Sep 17 00:00:00 2001 From: Itamar Turner-Trauring Date: Wed, 17 Jun 2026 09:57:10 -0400 Subject: [PATCH 1/6] Update docs --- README.md | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 648143ad..b8004166 100644 --- a/README.md +++ b/README.md @@ -8,6 +8,16 @@ Fine control of the underlying thread-pool size can be useful in workloads that involve nested parallelism so as to mitigate oversubscription issues. +> **Important:** In its current state, `threadpoolctl` is only designed for situations where BLAS/OpenMP are called from the main thread. +> Once you start calling code from a Python thread pool, behavior will be very inconsistent. +> +> Examples where it will work fine: +> +> * When you're using it to configure a worker in a process pool (as long as the workers don't starts their own Python thread pool.) +> * A Jupyter notebook, again so long as you don't call BLAS/OpenMP from a Python thread pool. +> +> For more details and a plan to fix this, see https://github.com/joblib/threadpoolctl/issues/208 + ## Installation - For users, install the last published version from PyPI: @@ -322,11 +332,16 @@ https://github.com/xianyi/OpenBLAS/issues/2985). and workarounds: https://github.com/joblib/threadpoolctl/blob/master/multiple_openmp.md -- Setting the maximum number of threads of the OpenMP and BLAS libraries has a global - effect and impacts the whole Python process. There is no thread level isolation as - these libraries do not offer thread-local APIs to configure the number of threads to - use in nested parallel calls. +- Setting the maximum number of threads of the OpenMP and BLAS libraries has + inconsistent scope (thread-local vs process-wide) and semantics (thread-local + vs process-wide) depending on the underlying library. For more details see + https://github.com/joblib/threadpoolctl/issues/208 + For example, if you're using OpenMP with libgomp (gcc) or libomp (clang), the + setting is thread-local and sets how many OpenMP threads will be started in + the current thread. On the other hand, with OpenBLAS with pthreads backend or + on Windows, the setting is process-wide and impacts the size of a process-wide + thread pool shared across all threads in the process. ## Maintainers From 9ca94947d45750f589f17de4a52c837fdae1e88c Mon Sep 17 00:00:00 2001 From: Itamar Turner-Trauring Date: Wed, 17 Jun 2026 09:58:21 -0400 Subject: [PATCH 2/6] Clarify. --- README.md | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index b8004166..9f63489b 100644 --- a/README.md +++ b/README.md @@ -8,13 +8,16 @@ Fine control of the underlying thread-pool size can be useful in workloads that involve nested parallelism so as to mitigate oversubscription issues. -> **Important:** In its current state, `threadpoolctl` is only designed for situations where BLAS/OpenMP are called from the main thread. -> Once you start calling code from a Python thread pool, behavior will be very inconsistent. +> **Important:** In its current state, `threadpoolctl` is only designed for +> situations where BLAS/OpenMP are called from the main thread. Once you start +> calling BLAS or OpenMP from a Python thread pool, the impact of the +> `threadpoolctl` limiting APIs will be very inconsistent. > > Examples where it will work fine: > -> * When you're using it to configure a worker in a process pool (as long as the workers don't starts their own Python thread pool.) -> * A Jupyter notebook, again so long as you don't call BLAS/OpenMP from a Python thread pool. +> * When you're using it to configure a worker in a process pool (as long as the +> workers don't starts their own Python thread pool.) * A Jupyter notebook, +> again so long as you don't call BLAS/OpenMP from a Python thread pool. > > For more details and a plan to fix this, see https://github.com/joblib/threadpoolctl/issues/208 From b14ed4b035d5b33767d6b6ea5e6822395db13d6f Mon Sep 17 00:00:00 2001 From: Itamar Turner-Trauring Date: Wed, 17 Jun 2026 09:59:13 -0400 Subject: [PATCH 3/6] Rewrap --- README.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 9f63489b..0cd4b131 100644 --- a/README.md +++ b/README.md @@ -16,8 +16,9 @@ oversubscription issues. > Examples where it will work fine: > > * When you're using it to configure a worker in a process pool (as long as the -> workers don't starts their own Python thread pool.) * A Jupyter notebook, -> again so long as you don't call BLAS/OpenMP from a Python thread pool. +> workers don't starts their own Python thread pool.) +> * A Jupyter notebook, again so long as you don't call BLAS/OpenMP from a +> Python thread pool. > > For more details and a plan to fix this, see https://github.com/joblib/threadpoolctl/issues/208 From d72047072b0a670788385d5799aa799a89c2217d Mon Sep 17 00:00:00 2001 From: Itamar Turner-Trauring Date: Thu, 18 Jun 2026 08:20:57 -0400 Subject: [PATCH 4/6] Rephrase to be more accurate. --- README.md | 19 +++++++------------ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/README.md b/README.md index 0cd4b131..99a5a210 100644 --- a/README.md +++ b/README.md @@ -9,18 +9,13 @@ workloads that involve nested parallelism so as to mitigate oversubscription issues. > **Important:** In its current state, `threadpoolctl` is only designed for -> situations where BLAS/OpenMP are called from the main thread. Once you start -> calling BLAS or OpenMP from a Python thread pool, the impact of the -> `threadpoolctl` limiting APIs will be very inconsistent. +> situations where BLAS and OpenMP are called from the main Python thread. For +> example: > -> Examples where it will work fine: +> * When you're using it to configure a worker in a process pool, which then calls BLAS or OpenMP APIs directly in the main thread. +> * A Jupyter notebook, where the BLAS or OpenMP APIs are being called from code running in the cell's thread. > -> * When you're using it to configure a worker in a process pool (as long as the -> workers don't starts their own Python thread pool.) -> * A Jupyter notebook, again so long as you don't call BLAS/OpenMP from a -> Python thread pool. -> -> For more details and a plan to fix this, see https://github.com/joblib/threadpoolctl/issues/208 +> However, once you start calling BLAS or OpenMP APIs from another, new Python thread, the impact of the `threadpoolctl` limiting APIs will be very inconsistent. For more details and a plan to fix this, see https://github.com/joblib/threadpoolctl/issues/208 ## Installation @@ -337,8 +332,8 @@ https://github.com/xianyi/OpenBLAS/issues/2985). https://github.com/joblib/threadpoolctl/blob/master/multiple_openmp.md - Setting the maximum number of threads of the OpenMP and BLAS libraries has - inconsistent scope (thread-local vs process-wide) and semantics (thread-local - vs process-wide) depending on the underlying library. For more details see + inconsistent scope and semantics (thread-local vs process-wide) depending on + the underlying library. For more details see https://github.com/joblib/threadpoolctl/issues/208 For example, if you're using OpenMP with libgomp (gcc) or libomp (clang), the From 1c4dc7ae94990d1aa850a958b501af5c388aabd0 Mon Sep 17 00:00:00 2001 From: Itamar Turner-Trauring Date: Thu, 18 Jun 2026 08:27:03 -0400 Subject: [PATCH 5/6] Try to be even more accurate --- README.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 99a5a210..ba833f83 100644 --- a/README.md +++ b/README.md @@ -9,11 +9,12 @@ workloads that involve nested parallelism so as to mitigate oversubscription issues. > **Important:** In its current state, `threadpoolctl` is only designed for -> situations where BLAS and OpenMP are called from the main Python thread. For -> example: +> situations where BLAS and OpenMP are only called from the main Python thread. +> Or, to be more accurate, `threadpoolctl` and BLAS/OpenMP APIs should only ever +> called from the same, single Python thread. For example: > > * When you're using it to configure a worker in a process pool, which then calls BLAS or OpenMP APIs directly in the main thread. -> * A Jupyter notebook, where the BLAS or OpenMP APIs are being called from code running in the cell's thread. +> * A Jupyter notebook, where the BLAS or OpenMP APIs are being called from code running in the cell's main thread. > > However, once you start calling BLAS or OpenMP APIs from another, new Python thread, the impact of the `threadpoolctl` limiting APIs will be very inconsistent. For more details and a plan to fix this, see https://github.com/joblib/threadpoolctl/issues/208 From 11d935732d81d55a4aaf9d6a837d4f2c3307b5bf Mon Sep 17 00:00:00 2001 From: Itamar Turner-Trauring Date: Thu, 18 Jun 2026 08:28:31 -0400 Subject: [PATCH 6/6] Try to be even more accurate --- README.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index ba833f83..65fba02c 100644 --- a/README.md +++ b/README.md @@ -16,7 +16,10 @@ oversubscription issues. > * When you're using it to configure a worker in a process pool, which then calls BLAS or OpenMP APIs directly in the main thread. > * A Jupyter notebook, where the BLAS or OpenMP APIs are being called from code running in the cell's main thread. > -> However, once you start calling BLAS or OpenMP APIs from another, new Python thread, the impact of the `threadpoolctl` limiting APIs will be very inconsistent. For more details and a plan to fix this, see https://github.com/joblib/threadpoolctl/issues/208 +> However, once you start calling BLAS or OpenMP APIs and `threadpoolctl` from +> multiple different Python threads, the impact of the `threadpoolctl` limiting +> APIs will be very inconsistent. For more details and a plan to fix this, see +> https://github.com/joblib/threadpoolctl/issues/208 ## Installation