奇怪的Python本地线程Python's Thread Locals Are Weird
来源:互联网 发布:dchp mac地址 租约 编辑:程序博客网 时间:2024/06/12 19:52
The Weirdness
What do you think this script prints?:
import thread, threading, sysclass Weeper(object): def __del__(self): sys.stdout.write('oh cruel world %s\n' % thread.get_ident())local = threading.local()def target(): local.weeper = Weeper()t = threading.Thread(target=target)t.start()t.join()sys.stdout.write('done %s\n' % thread.get_ident())getattr(local, 'whatever', None)
If you guessed something like this:
oh cruel world 4475731968done 140735297751392
...then you'd be right, in Python after 2.7.1. In Python 2.7.0 and older (including the whole 2.6 series), the order of messages is reversed:
done 140735297751392oh cruel world 140735297751392
In New Python, the Weeper is deleted as soon as its thread dies, and __del__ runs on the dying thread. In Old Python, the Weeper isn't deleted until the thread is dead and a different thread accesses the local's __dict__. Thus the Weeper is deleted at the line getattr(local, 'whatever', None)
, after the thread dies, and Weeper.__del__ runs on the main thread.
What if we remove the getattr
call? In Old Python, this happens:
done 140735297751392Exception AttributeError: "'NoneType' object has no attribute 'get_ident'" in <bound method Weeper.__del__ of <__main__.Weeper object at 0x104f95590>> ignored
Without getattr
, the Weeper isn't deleted until interpreter shutdown. The shutdown sequence is complex and hard to predict—in this case the thread
module has been set to None
before the Weeper is deleted, so Weeper.__del__ can't do thread.get_ident()
.
Thread Locals in Old Python
To understand why locals act this way in Old Python, let's look at the implementation in C. The core interpreter's PyThreadState
struct has a dict
attribute, and each threading.local
object has a key
attribute formatted like"thread.local.<memory address of self>"
. Each local has a __dict__
of attributes per thread, stored in PyThreadState
's dict
with the local's key.
threadmodule.c includes a function _ldict(localobject *self)
which takes a local and finds its __dict__
for the current thread. _ldict()
finds and returns the local's __dict__
for this thread, and stores it in self->dict
.
This architecture has, in my opinion, a bug. Here's the implementation of_ldict()
:
static PyObject * _ldict(localobject *self){ PyObject *tdict = PyThreadState_GetDict(); // get PyThreadState->dict for this thread PyObject *ldict = PyDict_GetItem(tdict, self->key); if (ldict == NULL) { ldict = PyDict_New(); /* we own ldict */ PyDict_SetItem(tdict, self->key, ldict); Py_CLEAR(self->dict); Py_INCREF(ldict); self->dict = ldict; /* still borrowed */ if (Py_TYPE(self)->tp_init != PyBaseObject_Type.tp_init) { Py_TYPE(self)->tp_init((PyObject*)self, self->args, self->kw); } } /* The call to tp_init above may have caused another thread to run. Install our ldict again. */ if (self->dict != ldict) { Py_CLEAR(self->dict); Py_INCREF(ldict); self->dict = ldict; } return ldict;}
I've edited for brevity. There's a few interesting things here—one is the check for a custom __init__
method. If this object is a subclass of local which overrides __init__
, then __init__
is called whenever a new thread accesses this local's attributes for the first time.
But the main thing I'm showing you is the two calls to Py_CLEAR(self->dict)
, which decrements self->dict
's refcount. It's called when a thread accesses this local's attributes for the first time, or if this thread is accessing the local's attributes after a different thread has accessed them—that is, if self->dict != ldict
.
So now we clearly understand why a thread's locals aren't deleted immediately after it dies:
- The worker thread stores a Weeper in
local.weeper
._ldict()
creates a new__dict__
for this thread and stores it as a value inPyThreadState->dict
, and stores it inlocal->dict
. So there are two references to this thread's__dict__
: one fromPyThreadState
, one from local. - The worker thread dies, and the interpreter deletes its
PyThreadState
. Now there's one reference to the dead thread's__dict__
:local->dict
. - Finally, we do
getattr(local, 'whatever', None)
from the main thread. In_ldict()
,self->dict != ldict
, soself->dict
is dereferenced and replaced with the main thread's__dict__
. Now the dead thread's__dict__
has finally been completely dereferenced, and the Weeper is deleted.
The bug is that _ldict()
both returns the local's __dict__
for the current thread, and stores a reference to it. This is why the __dict__
isn't deleted as soon as its thread dies: there's a useless but persistent reference to the__dict__
until another thread comes along and clears it.
Thread Locals in New Python
In New Python, the architecture's a little more complex. EachPyThreadState
's dict contains a dummy for each local, and each local holds a dict mapping weak references of dummies to a per-thread __dict__
.
When a thread is dying and its PyThreadState
is deleted, weakref callbacks fire immediately on that thread, removing the thread's __dict__
for each local. Conversely, when a local is deleted, it removes its dummy fromPyThreadState->dict
.
_ldict()
in New Python acts more sanely than in Old Python. It finds the current thread's dummy in the PyThreadState
, and gets the __dict__
for this thread from the dummy. But unlike in Old Python, it doesn't store a extra reference to __dict__
anywhere. It simply returns it:
static PyObject * _ldict(localobject *self){ PyObject *tdict, *ldict, *dummy; tdict = PyThreadState_GetDict(); dummy = PyDict_GetItem(tdict, self->key); if (dummy == NULL) { ldict = _local_create_dummy(self); if (Py_TYPE(self)->tp_init != PyBaseObject_Type.tp_init) { Py_TYPE(self)->tp_init((PyObject*)self, self->args, self->kw); } } else { ldict = ((localdummyobject *) dummy)->localdict; } return ldict;}
This whole weakrefs-to-dummies technique is, apparently, intended to deal with some cyclic garbage collection problem I don't understand very well. I believe the real reason why New Python acts as expected when executing my script, and why Old Python acts weird, is that Old Python stores the extra useless reference to the __dict__
and New Python does not.
Update: I finally found the bug reports that describe Old Python's weirdness and 2.7.1's solution. See:
- Issue 1868: threading.local doesn't free attrs when assigning thread exits
- Issue 3757: threading.local doesn't support cyclic garbage collecting
- 奇怪的Python本地线程Python's Thread Locals Are Weird
- Python 的locals 和 globals函数
- Python中globals和locals的区别
- python模块的locals和globals
- python中的locals()
- Python:locals 和 globals
- python locals globals
- python locals函数
- 【Python】 Python:locals 和 globals
- 【Python】 Python:locals 和 globals
- 理解Python线程Thread
- 奇怪的python
- 【Python 】奇怪的引号
- python-----奇怪的国家
- python之globals()和locals()
- python execfile()和locals()函数
- Python 内置函数 locals globals
- python中thread线程运用
- 十进制转化为二进制
- UI界面基础如何用鼠标控制物体移动
- poj1416
- 死锁程序
- sublime Text 3实用功能和常用快捷键收集
- 奇怪的Python本地线程Python's Thread Locals Are Weird
- B - Qualifying Contest
- JSON数据解析(JSONArray和JSONObject)
- eclipse中的SVN插件的导入和连接
- Oracle数据库名字不能超过8位,但实例名可以超过8位
- Eclipse中出现无法找到Maven包
- Winsock网络编程客户端
- SDAU 练习三 1012 堆满骨牌问题
- 16进制数据相加,计算校验和