Crash report
What happened?
Hello
I don't have a good unit test for it. I'm running a server under free-threading 3.14.3t build. On a larger QPS, I see a rapid unstoppable RAM increase, while CPU load is okay.
Based on a dump of active threads, I suspect a deadlock somewhere around _Py_qsbr_reserve and Stop the World:
E.g.
- This thread successfully initiated the Stop The World event and is waiting for all other threads to acknowledge and pause (park).
stop_the_world indicates it is the one trying to stop everything.
PyEvent_WaitTimed shows it is sitting there waiting for a signal that everyone has stopped.
do_futex_wait
__new_sem_wait_slow
_PySemaphore_Wait
_PyParkingLot_Park
PyEvent_WaitTimed
stop_the_world
type_set_abstractmethods
type_setattro
PyObject_SetAttr
_abc__abc_init
... etc
- This thread also needed to stop the world (to grow an internal array) but got blocked because anotheer already held the master lock.
_Py_qsbr_reserve shows it was trying to reserve space in the memory management system.
_PyMutex_LockTimed shows it is blocked waiting for a lock inside stop_the_world. This is the lock held by Thread 1.
do_futex_wait
__new_sem_wait_slow
_PySemaphore_Wait
_PyParkingLot_Park
_PyMutex_LockTimed
stop_the_world
_Py_qsbr_reserve
PyGILState_Ensure
... etc
I suspect a modification of _Py_qsbr_reserve could help but I don't know enough about the peace of infra to make changes, so please help. Specifically, changing _Py_qsbr_reserve to this seemed to help, the server thread dump does not complain about waiting on Stop the World:
Py_ssize_t
_Py_qsbr_reserve(PyInterpreterState *interp)
{
struct _qsbr_shared *shared = &interp->qsbr;
PyMutex_Lock(&shared->mutex);
// Try allocating from our internal freelist
struct _qsbr_thread_state *qsbr = qsbr_allocate(shared);
while (qsbr == NULL) {
// Unlock before stopping the world to avoid deadlocks.
// If we hold shared->mutex while waiting for the world to stop,
// we might block a thread that needs to acquire shared->mutex to park.
PyMutex_Unlock(&shared->mutex);
_PyEval_StopTheWorld(interp);
PyMutex_Lock(&shared->mutex);
// Try allocating again, as another thread might have grown the array
// or freed an entry while we were waiting.
qsbr = qsbr_allocate(shared);
if (qsbr != NULL) {
_PyEval_StartTheWorld(interp);
break;
}
// Still NULL, we must grow it
if (grow_thread_array(shared) == 0) {
qsbr = qsbr_allocate(shared);
} else {
// Failed to grow array (e.g. OOM). Break to avoid infinite loop.
_PyEval_StartTheWorld(interp);
break;
}
_PyEval_StartTheWorld(interp);
}
// Return an index rather than the pointer because the array may be
// resized and the pointer invalidated.
Py_ssize_t index = -1;
if (qsbr != NULL) {
index = (struct _qsbr_pad *)qsbr - shared->array;
}
PyMutex_Unlock(&shared->mutex);
return index;
}
Similar issues in the past:
CPython versions tested on:
3.14
Operating systems tested on:
Linux
Output from running 'python -VV' on the command line:
3.14.3 (free-threading)
Crash report
What happened?
Hello
I don't have a good unit test for it. I'm running a server under free-threading 3.14.3t build. On a larger QPS, I see a rapid unstoppable RAM increase, while CPU load is okay.
Based on a dump of active threads, I suspect a deadlock somewhere around
_Py_qsbr_reserveand Stop the World:E.g.
stop_the_worldindicates it is the one trying to stop everything.PyEvent_WaitTimedshows it is sitting there waiting for a signal that everyone has stopped._Py_qsbr_reserveshows it was trying to reserve space in the memory management system._PyMutex_LockTimedshows it is blocked waiting for a lock inside stop_the_world. This is the lock held by Thread 1.I suspect a modification of
_Py_qsbr_reservecould help but I don't know enough about the peace of infra to make changes, so please help. Specifically, changing_Py_qsbr_reserveto this seemed to help, the server thread dump does not complain about waiting on Stop the World:Similar issues in the past:
CPython versions tested on:
3.14
Operating systems tested on:
Linux
Output from running 'python -VV' on the command line:
3.14.3 (free-threading)