Why is 'x' in ('x',) faster than 'x' == 'x'?
来源:互联网 发布:c语言软件 知乎 编辑:程序博客网 时间:2024/06/04 19:57
Question:
>>> timeit.timeit("'x' in ('x',)")0.04869917374131205>>> timeit.timeit("'x' == 'x'")0.06144205736110564
Also works for multiple options, both seem to grow linearly:
>>> timeit.timeit("'x' in ('x', 'y')")0.04866674801541748>>> timeit.timeit("'x' == 'x' or 'x' == 'y'")0.06565782838087131>>> timeit.timeit("'x' in ('y', 'x')")0.08975995576448526>>> timeit.timeit("'x' == 'y' or 'x' == 'y'")0.12992391047427532
Based on this, I think I should start using in
everywhere instead of ==
!!
As I mentioned to David Wolever, there's more to this than meets the eye; both methods dispatch to is
; you can prove this by doing
min(Timer("x == x", setup="x = 'a' * 1000000").repeat(10, 10000))#>>> 0.00045456900261342525min(Timer("x == y", setup="x = 'a' * 1000000; y = 'a' * 1000000").repeat(10, 10000))#>>> 0.5256857610074803
The first can only be so fast because it checks by identity.
To find out why one would take longer than the other, let's trace through execution.
They both start in ceval.c
, from COMPARE_OP
since that is the bytecode involved
TARGET(COMPARE_OP) { PyObject *right = POP(); PyObject *left = TOP(); PyObject *res = cmp_outcome(oparg, left, right); Py_DECREF(left); Py_DECREF(right); SET_TOP(res); if (res == NULL) goto error; PREDICT(POP_JUMP_IF_FALSE); PREDICT(POP_JUMP_IF_TRUE); DISPATCH();}
This pops the values from the stack (technically it only pops one)
PyObject *right = POP();PyObject *left = TOP();
and runs the compare:
PyObject *res = cmp_outcome(oparg, left, right);
cmp_outcome
is this:
static PyObject *cmp_outcome(int op, PyObject *v, PyObject *w){ int res = 0; switch (op) { case PyCmp_IS: ... case PyCmp_IS_NOT: ... case PyCmp_IN: res = PySequence_Contains(w, v); if (res < 0) return NULL; break; case PyCmp_NOT_IN: ... case PyCmp_EXC_MATCH: ... default: return PyObject_RichCompare(v, w, op); } v = res ? Py_True : Py_False; Py_INCREF(v); return v;}
This is where the paths split. The PyCmp_IN
branch does
intPySequence_Contains(PyObject *seq, PyObject *ob){ Py_ssize_t result; PySequenceMethods *sqm = seq->ob_type->tp_as_sequence; if (sqm != NULL && sqm->sq_contains != NULL) return (*sqm->sq_contains)(seq, ob); result = _PySequence_IterSearch(seq, ob, PY_ITERSEARCH_CONTAINS); return Py_SAFE_DOWNCAST(result, Py_ssize_t, int);}
Note that a tuple is defined as
static PySequenceMethods tuple_as_sequence = { ... (objobjproc)tuplecontains, /* sq_contains */};PyTypeObject PyTuple_Type = { ... &tuple_as_sequence, /* tp_as_sequence */ ...};
So the branch
if (sqm != NULL && sqm->sq_contains != NULL)
will be taken and *sqm->sq_contains
, which is the function (objobjproc)tuplecontains
, will be taken.
This does
static inttuplecontains(PyTupleObject *a, PyObject *el){ Py_ssize_t i; int cmp; for (i = 0, cmp = 0 ; cmp == 0 && i < Py_SIZE(a); ++i) cmp = PyObject_RichCompareBool(el, PyTuple_GET_ITEM(a, i), Py_EQ); return cmp;}
...Wait, wasn't that PyObject_RichCompareBool
what the other branch took? Nope, that was PyObject_RichCompare
.
That code path was short so it likely just comes down to the speed of these two. Let's compare.
intPyObject_RichCompareBool(PyObject *v, PyObject *w, int op){ PyObject *res; int ok; /* Quick result when objects are the same. Guarantees that identity implies equality. */ if (v == w) { if (op == Py_EQ) return 1; else if (op == Py_NE) return 0; } ...}
The code path in PyObject_RichCompareBool
pretty much immediately terminates. For PyObject_RichCompare
, it does
PyObject *PyObject_RichCompare(PyObject *v, PyObject *w, int op){ PyObject *res; assert(Py_LT <= op && op <= Py_GE); if (v == NULL || w == NULL) { ... } if (Py_EnterRecursiveCall(" in comparison")) return NULL; res = do_richcompare(v, w, op); Py_LeaveRecursiveCall(); return res;}
The Py_EnterRecursiveCall
/Py_LeaveRecursiveCall
combo are not taken in the previous path, but these are relatively quick macros that'll short-circuit after incrementing and decrementing some globals.
do_richcompare
does:
static PyObject *do_richcompare(PyObject *v, PyObject *w, int op){ richcmpfunc f; PyObject *res; int checked_reverse_op = 0; if (v->ob_type != w->ob_type && ...) { ... } if ((f = v->ob_type->tp_richcompare) != NULL) { res = (*f)(v, w, op); if (res != Py_NotImplemented) return res; ... } ...}
This does some quick checks to call v->ob_type->tp_richcompare
which is
PyTypeObject PyUnicode_Type = { ... PyUnicode_RichCompare, /* tp_richcompare */ ...};
which does
PyObject *PyUnicode_RichCompare(PyObject *left, PyObject *right, int op){ int result; PyObject *v; if (!PyUnicode_Check(left) || !PyUnicode_Check(right)) Py_RETURN_NOTIMPLEMENTED; if (PyUnicode_READY(left) == -1 || PyUnicode_READY(right) == -1) return NULL; if (left == right) { switch (op) { case Py_EQ: case Py_LE: case Py_GE: /* a string is equal to itself */ v = Py_True; break; case Py_NE: case Py_LT: case Py_GT: v = Py_False; break; default: ... } } else if (...) { ... } else { ...} Py_INCREF(v); return v;}
Namely, this shortcuts on left == right
... but only after doing
if (!PyUnicode_Check(left) || !PyUnicode_Check(right)) if (PyUnicode_READY(left) == -1 || PyUnicode_READY(right) == -1)
All in all the paths then look something like this (manually recursively inlining, unrolling and pruning known branches)
POP() # Stack stuffTOP() # #case PyCmp_IN: # Dispatch on operation #sqm != NULL # Dispatch to builtin opsqm->sq_contains != NULL #*sqm->sq_contains # #cmp == 0 # Do comparison in loopi < Py_SIZE(a) #v == w #op == Py_EQ #++i # cmp == 0 # #res < 0 # Convert to Python-spaceres ? Py_True : Py_False #Py_INCREF(v) # #Py_DECREF(left) # Stack stuffPy_DECREF(right) #SET_TOP(res) #res == NULL #DISPATCH() #
vs
POP() # Stack stuffTOP() # #default: # Dispatch on operation #Py_LT <= op # Checking operationop <= Py_GE #v == NULL #w == NULL #Py_EnterRecursiveCall(...) # Recursive check #v->ob_type != w->ob_type # More operation checksf = v->ob_type->tp_richcompare # Dispatch to builtin opf != NULL # #!PyUnicode_Check(left) # ...More checks!PyUnicode_Check(right)) #PyUnicode_READY(left) == -1 #PyUnicode_READY(right) == -1 #left == right # Finally, doing comparisoncase Py_EQ: # Immediately short circuitPy_INCREF(v); # #res != Py_NotImplemented # #Py_LeaveRecursiveCall() # Recursive check #Py_DECREF(left) # Stack stuffPy_DECREF(right) #SET_TOP(res) #res == NULL #DISPATCH() #
Now, PyUnicode_Check
and PyUnicode_READY
are pretty cheap since they only check a couple of fields, but it should be obvious that the top one is a smaller code path, it has fewer function calls, only one switch statement and is just a bit thinner.
TL;DR:
Both dispatch to if (left_pointer == right_pointer)
; the difference is just how much work they do to get there. in
just does less.
- Why is 'x' in ('x',) faster than 'x' == 'x'?
- Why Git is Better than X
- Why Git is Better than X
- x=x&(x-1)
- x=x&(x-1)
- x = x&(x-1)
- x = x&(x-1)
- Java x=x+(x++)+(++x)分析
- x &= (x-1)
- x^x=10
- x^x=10
- x&(x-1)=?
- x=x+1、x+=1、x++、++x
- 解方程x[x[x[x[x]]]=2006
- c++ x=x|(x+1); x&(x-1)
- $X
- X
- X
- C++ 11右值引用
- Maven3路程(三)用Maven创建第一个web项目(1)
- Java中的BoneCP数据库连接池
- spring 注解事务控制注意点
- 遇见C++ PPL:C++ 的并行和异步
- Why is 'x' in ('x',) faster than 'x' == 'x'?
- C#中的委托、事件
- linux命令:chmod(常用方法详解)
- hdu 5192 尺取法 树状数组
- Android快速开发系列 常用工具类
- 国美假货太多,维权根本没希望!
- Eclipse中集成Tomcat
- MYSQL老密码与php版本扩展关系
- IOS开发系列—Objective-C之Foundation框架(一)