调试代码(Python)

来源：互联网发布：淘宝服装摄影人像抠图编辑：程序博客网时间：2024/06/10 20:16

调试代码

翻译自：http://scipy-lectures.github.com/advanced/debugging/index.html

作者：Gaël Varoquaux

本教程探索工具，以更好地理解你的编程基础：调试，找到并修复错误。

这并不特定适合Python科学计算社区，但是我们采取的策略量身定制。

先决条件

Numpy
IPython
nosetests (http://readthedocs.org/docs/nose/en/latest/)
line_profiler (http://packages.python.org/line_profiler/)
pyflakes (http://pypi.python.org/pypi/pyflakes)
gdb 用于C调试部分

避免错误
- 避免麻烦最佳编码法
- pyflakes:快速静态分析
  - 在当前编辑文件运行pyflakes
  - 一个在线拼写检查集成
调试工作流
使用Python调试器
- 调用调试器
  - 尸检
  - 单步执行
  - 其它开始调试的方法
- 调试器命令和交互
使用gdb调试段错误
总结练习
Footnotes

避免错误

避免麻烦最佳编码法

我们都写有bug的代码。接受它，解决它。
写程序时，在心里测试和调试。
Keep it Simple, Stupid(KISS).
- 可能有效的最简单方法
Don‘t Repeat Yourself (DRY).
- 每个知识都必须有个单独、明确、在一个系统中的正式表示
- 常量、算法，等等。
尝试限制你的代码间的相互关系(松耦合)
赋予你的变量、函数和模块有意义的名字(不是数学名称)

pyflakes:快速静态分析

这些是几个Python中的静态分析工具：pylint、pychecker和pyflakes。这里我们重点放在pyflakes上，这是最简单的工具。

快速，简单
探测语法错误，漏掉的导入，名称的拼写错误。

高度推荐在你的编辑器中集成pyflakes，它确实产生巨大生产力。

在当前编辑文件运行pyflakes

你可以绑定一个键来在当前缓冲区运行pyflakes。

kate中菜单：设置 -> 配置kate
- 在插件中勾选’外部工具’1
- 在外部工具中，添加pyflakes:
```
kdialog --title "pyflakes %filename" --msgbox "$(pyflakes %filename)"
```
在TextMate中菜单：TextMate -> 首选项 -> 高级 -> Shell变量，添加一个shell变量：
```
  TM_PYCHECKER=/Library/Frameworks/Python.framework/Versions/Current/bin/pyflakes
```
然后Ctrl-Shift-V被绑定到pyflakes报告上。

在Vim中，在你的vimrc中(将F5绑定到pyflakes上)：

  autocmd FileType python let &mp = 'echo "*** running % ***" ; pyflakes %'  autocmd FileType tex,mp,rst,python imap <Esc>[15~ <C-O>:make!^M  autocmd FileType tex,mp,rst,python map  <Esc>[15~ :make!^M  autocmd FileType tex,mp,rst,python set autowrite]]

在emacs中，在你的.emacs中(绑定F5到pyflakes)：

  (defun pyflakes-thisfile () (interactive)     (compile (format "pyflakes %s" (buffer-file-name)))  )      (define-minor-mode pyflakes-mode      "Toggle pyflakes mode.      With no argument, this command toggles the mode.      Non-null prefix argument turns on the mode.      Null prefix argument turns off the mode."      ;; The initial value.      nil      ;; The indicator for the mode line.      " Pyflakes"      ;; The minor mode bindings.      '( ([f5] . pyflakes-thisfile) )  )      (add-hook 'python-mode-hook (lambda () (pyflakes-mode t)))

一个在线拼写检查集成

在vim中使用vim.pyflakes插件：
1. 从http://www.vim.org/scripts/script.php?script_id=2441下载zip文件2
2. 将文件解压到.vim/ftplugin/python
3. 确定你的vimrc中有’filetype plugin indent on’

在emacs中在flymake模式中使用pyflakes，文档在http://www.plope.com/Members/chrism/flymake-mode:添加以下内容到你的.emacs文件：

  (when (load "flymake" t)          (defun flymake-pyflakes-init ()          (let* ((temp-file (flymake-init-create-temp-buffer-copy                              'flymake-create-temp-inplace))              (local-file (file-relative-name                          temp-file                          (file-name-directory buffer-file-name))))              (list "pyflakes" (list local-file))))              (add-to-list 'flymake-allowed-file-name-masks                  '("\\.py\\'" flymake-pyflakes-init)))      (add-hook 'find-file-hook 'flymake-find-file-hook)

调试工作流

你确实有个并不微不足道的bug，这时调试策略就关键了。这没有银色子弹。然而，策略有帮助：

对调试一个给定的问题，最受欢迎的情况是当问题被隔离成少量几行的代码，在框架和应用之外，伴随的更改-运行-失败循环

让出错可靠。找到一个测试实例让代码每次都出错(fail)
分解并且克服。一旦你有了一个失败测试实例，隔绝出错代码。
- 哪个模块
- 哪个函数
- 哪行代码 = > 隔离小块可再生错误：测试样例。
一次更改一样东西，并且重新运行错误测试样例
使用调试器来理解哪里出错了
记笔记，保持耐心。这将花很久。

注意：一旦你已经完成这个过程：严格隔离一片再生bug的代码并使用这片代码修复它，向你的测试套件中添加相应的代码。

使用Python调试器

Python调试器，pdb：http://docs.python.org/library/pdb.html，允许你交互地检查你的代码。

具体来说它允许你：

查看源代码
向上或向下查看调用堆栈
检查变量值
更改变量值
设置断点

是的，print确实可以作为一个调试工具。然而在运行时检查的话，使用调试器通常更有效率。

调用调试器

开启调试器的方法：

尸检(post-mortem)，在模块错误后开启调试器
用调试器运行模块
在模块中调用调试器

尸检

情形：你使用ipython并且你获得一个回溯(traceback)。

现在让我们调试文件index_error.py。当运行它时，一个IndexError被引发。键入%debug将进入调试器：

In [1]: %run index_error.py---------------------------------------------------------------------------IndexError                                Traceback (most recent call last)/usr/lib/python2.7/site-packages/IPython/utils/py3compat.py in execfile(fname, *where)    176             else:    177                 filename = fname--> 178             __builtin__.execfile(filename, *where)/home/lyy/index_error.py in <module>()      6       7 if __name__ == '__main__':----> 8     index_error()      9 /home/lyy/index_error.py in index_error()      3 def index_error():      4     lst = list('foobar')----> 5     print lst[len(lst)]      6       7 if __name__ == '__main__':IndexError: list index out of rangeIn [2]: %debug> /home/lyy/index_error.py(5)index_error()      4     lst = list('foobar')----> 5     print lst[len(lst)]      6 ipdb> list      1 """Small snippet to raise an IndexError."""      2       3 def index_error():      4     lst = list('foobar')----> 5     print lst[len(lst)]      6       7 if __name__ == '__main__':      8     index_error()      9 ipdb> len(lst)6ipdb> print lst[len(lst)-1]ripdb> quitIn [3]:

在IPython之外尸检3调试

在一些情形下你无法使用IPython，例如调试一个想要从命令行调用的脚本。这中情况下，你可以使用python -m pdb script.py来调用脚本：

lyy@arch ~ % python2 -m pdb index_error.py> /home/lyy/index_error.py(1)<module>()-> """Small snippet to raise an IndexError."""(Pdb) continueTraceback (most recent call last):  File "/usr/lib/python2.7/pdb.py", line 1314, in main    pdb._runscript(mainpyfile)  File "/usr/lib/python2.7/pdb.py", line 1233, in _runscript    self.run(statement)  File "/usr/lib/python2.7/bdb.py", line 387, in run    exec cmd in globals, locals  File "<string>", line 1, in <module>  File "index_error.py", line 1, in <module>    """Small snippet to raise an IndexError."""  File "index_error.py", line 5, in index_error    print lst[len(lst)]IndexError: list index out of rangeUncaught exception. Entering post mortem debuggingRunning 'cont' or 'step' will restart the program> /home/lyy/index_error.py(5)index_error()-> print lst[len(lst)](Pdb) print lst[len(lst)]*** IndexError: list index out of range

单步执行

情形：你相信在模块中存在一个bug但不确定在哪里。

例如我们尝试调试wiener_filtering.py。确实代码运行了，但是滤波效果并不好。

用调试器运行脚本：

In [1]: %run -d wiener_filtering.py*** Blank or comment*** Blank or comment*** Blank or commentBreakpoint 1 at /home/lyy/wiener_filtering.py:4NOTE: Enter 'c' at the ipdb>  prompt to start your script.> <string>(1)<module>()

进入wiener_filtering.py并且在34行设置断点：

ipdb> n> /home/lyy/wiener_filtering.py(4)<module>()      3 1---> 4 import numpy as np      5 import scipy as spipdb> b 34Breakpoint 2 at /home/lyy/wiener_filtering.py:34

用c(out(inue))继续执行到下一个断点:

  ipdb> c  > /home/lyy/wiener_filtering.py(34)iterated_wiener()       33     """  2--> 34     noisy_img = noisy_img       35     denoised_img = local_mean(noisy_img, size=size)

用n(ext)和s(tep)回到代码：next跳到当前执行文本的下一个语句，而step将穿越执行文本，也就是，可以深入探索函数调用：4

  ipdb> s  > /home/lyy/wiener_filtering.py(35)iterated_wiener()  2    34     noisy_img = noisy_img  ---> 35     denoised_img = local_mean(noisy_img, size=size)       36     l_var = local_var(noisy_img, size=size)      ipdb> n  > /home/lyy/wiener_filtering.py(36)iterated_wiener()       35     denoised_img = local_mean(noisy_img, size=size)  ---> 36     l_var = local_var(noisy_img, size=size)       37     for i in range(3):

执行几行然后探索局部变量：

  ipdb> n  > /home/lyy/wiener_filtering.py(37)iterated_wiener()       36     l_var = local_var(noisy_img, size=size)  ---> 37     for i in range(3):       38         res = noisy_img - denoised_img      ipdb> print l_var  [[5868 5379 5316 ..., 5071 4799 5149]   [5013  363  437 ...,  346  262 4355]   [5379  410  344 ...,  392  604 3377]   ...,    [ 435  362  308 ...,  275  198 1632]   [ 548  392  290 ...,  248  263 1653]   [ 466  789  736 ..., 1835 1725 1940]]  ipdb> print l_var.min()  0

Oh,亲爱的，除了整数什么也没有，还有0变动。这就是我们的bug所在，我们在做整数算术。

引发数值错误例外

当我们运行wiener_filtering.py文件时，以下警告被引发(raise)：

In [1]: %run wiener_filtering.pywiener_filtering.py:40: RuntimeWarning: divide by zero encountered in divide  noise_level = (1 - noise/l_var )

我们可以将这些警告转化为例外(exception)，这让我们能对它们尸检调试，并且迅速找到我们的问题：

In [2]: np.seterr(all='raise')Out[2]: {'divide': 'warn', 'invalid': 'warn', 'over': 'warn', 'under': 'ignore'}

其它开始调试的方法

引发一个例外作为”可怜的”断点
如果你发现几下行号设置断点很无聊，你可以简单地在你想要检查的地方引发一个例外，使用ipython的%debug。注意在这种情况下你不能step和continue执行。
使用nosetests调试测试失败
你可以运行nosetests --pdb对例外(exception)尸检调试，并且nosetests --pdb-failure使用调试器来检查测试失败。
另外，你可以通过安装nose插件ipdbplugin，在nose中使用IPython调试接口。你可以传递--ipdb和--ipdb-failure选项给nosetests。
显式调用调试器
在你想要调试的地方放入以下一行：
```
  import pdb; pdb.set_trace()
```

警告：当运行nosetests时，输出被捕捉，并且因此似乎调试器不工作了。只要加上-s标志运行nosetests

图形化调试器

为了单步执行和检查变量，你也许觉得用图形化的调试器如winpdb更加方便。

另外，pudb是一个很棒的半图形化调试器，它在终端里有一个文字用户界面。

调试器命令和交互

命令作用l(list)列出当前运行的代码u(p)上一个调用堆栈d(own)下一个调用堆栈n(ext)执行下一行(不到新的函数)s(tep)执行下一个语句(到新的函数)bt打印调用堆栈a打印局部变量!command执行特定的Python命令(相对于pdb命令)

警告：Debugger 命令不是Python代码

你不能想当然的命名一个变量。例如，例如，如果在其中，你不能在当前情形下覆盖用同名变量:当在调试器中敲代码时，使用不同的名字而不是你的局部变量。

使用gdb调试段错误

如果你有一个段错误，你不能通过pdb调试它，因为它会让Python解释器在进入调试器之前崩溃。同样，如果你在Python中嵌入的C代码有bug，pdb也没用。对这种情况我们使用在linux上可用的gnu调试器gdb。

在我们用gdb之前，让我们向它添加一些Python特有的工具。对此我们向我们的gdbinit添加一些宏。宏优化的选择依赖于你的Python版本和gdb版本。我已经在gdbinit里添加了一个简化的版本，但是请自由地阅读DebuggingWithGdb。

为用gdb调试脚本segfault.py，我们可以在gdb中如下运行：

$ gdb python...(gdb) run segfault.pyStarting program: /usr/bin/python segfault.py[Thread debugging using libthread_db enabled]Program received signal SIGSEGV, Segmentation fault._strided_byte_copy (dst=0x8537478 "\360\343G", outstrides=4, src=    0x86c0690 <Address 0x86c0690 out of bounds>, instrides=32, N=3,    elsize=4)        at numpy/core/src/multiarray/ctors.c:365365            _FAST_MOVE(Int32);(gdb)

我们得到一个段错误，并且gdb事后在C级别的堆栈中(不是python调用堆栈)捕捉到了它。我们可以使用使用gdb的命令调试C调用堆栈：

(gdb) up#1  0x004af4f5 in _copy_from_same_shape (dest=<value optimized out>,    src=<value optimized out>, myfunc=0x496780 <_strided_byte_copy>,    swap=0)at numpy/core/src/multiarray/ctors.c:748748         myfunc(dit->dataptr, dest->strides[maxaxis],

如你所见，现在我们在numpy的C代码中。我们想要知道什么Python代码触发了这个段错误，因此我们向上查看堆栈直到我们发现Python执行循环：

(gdb) up#8  0x080ddd23 in call_function (f=    Frame 0x85371ec, for file /home/varoquau/usr/lib/python2.6/site-packages/numpy/core/arrayprint.py, line 156, in _leading_trailing (a=<numpy.ndarray at remote 0x85371b0>, _nc=<module at remote 0xb7f93a64>), throwflag=0)    at ../Python/ceval.c:37503750    ../Python/ceval.c: No such file or directory.        in ../Python/ceval.c(gdb) up#9  PyEval_EvalFrameEx (f=    Frame 0x85371ec, for file /home/varoquau/usr/lib/python2.6/site-packages/numpy/core/arrayprint.py, line 156, in _leading_trailing (a=<numpy.ndarray at remote 0x85371b0>, _nc=<module at remote 0xb7f93a64>), throwflag=0)    at ../Python/ceval.c:24122412    in ../Python/ceval.c(gdb)

一旦我们在python执行循环中时，我们可以使用特殊的python帮助函数。例如我们找到相应的Python代码：

(gdb) pyframe/home/varoquau/usr/lib/python2.6/site-packages/numpy/core/arrayprint.py (158): _leading_trailing(gdb)

这是numpy的代码，我们需要向上直到找到我们自己写的代码：

(gdb) up...(gdb) up#34 0x080dc97a in PyEval_EvalFrameEx (f=    Frame 0x82f064c, for file segfault.py, line 11, in print_big_array (small_array=<numpy.ndarray at remote 0x853ecf0>, big_array=<numpy.ndarray at remote 0x853ed20>), throwflag=0) at ../Python/ceval.c:16301630    ../Python/ceval.c: No such file or directory.        in ../Python/ceval.c(gdb) pyframesegfault.py (12): print_big_array

相应的代码是：

def print_big_array(small_array):    big_array = make_big_array(small_array)    print big_array[-10:]    return big_array

所以当打印big_array[-10:]时发生段错误。原因很简单，big-array的末尾被分配到程序内存之外了。5

注意：在gdbinit中定义的专用于python的命令，请阅读这个文件。6

总结练习

以下脚本清晰易读，它致力于解决数值计算感兴趣的实际问题。但是它不能工作。你能调试它吗？

Python源码to_debug.py

"""A script to compare different root-finding algorithms.This version of the script is buggy and does not execute. It is your taskto find an fix these bugs.The output of the script sould look like:    Benching 1D root-finder optimizers from scipy.optimize:                brenth:   604678 total function calls                brentq:   594454 total function calls                ridder:   778394 total function calls                bisect:  2148380 total function calls"""from itertools import productimport numpy as npfrom scipy import optimizeFUNCTIONS = (np.tan,  # Dilating map             np.tanh, # Contracting map             lambda x: x**3 + 1e-4*x, # Almost null gradient at the root             lambda x: x+np.sin(2*x), # Non monotonous function             lambda x: 1.1*x+np.sin(4*x), # Fonction with several local maxima            )OPTIMIZERS = (optimize.brenth, optimize.brentq, optimize.ridder,              optimize.bisect)def apply_optimizer(optimizer, func, a, b):    """ Return the number of function calls given an root-finding optimizer,         a function and upper and lower bounds.    """    return optimizer(func, a, b, full_output=True)[1].function_calls,def bench_optimizer(optimizer, param_grid):    """ Find roots for all the functions, and upper and lower bounds        given and return the total number of function calls.    """    return sum(apply_optimizer(optimizer, func, a, b)               for func, a, b in param_grid)def compare_optimizers(optimizers):    """ Compare all the optimizers given on a grid of a few different        functions all admitting a signle root in zero and a upper and        lower bounds.    """    random_a = -1.3 + np.random.random(size=100)    random_b =   .3 + np.random.random(size=100)    param_grid = product(FUNCTIONS, random_a, random_b)    print "Benching 1D root-finder optimizers from scipy.optimize:"    for optimizer in OPTIMIZERS:        print '% 20s: % 8i total function calls' % (                    optimizer.__name__,                     bench_optimizer(optimizer, param_grid)                )if __name__ == '__main__':    compare_optimizers(OPTIMIZERS)

Footnotes

我是没找到↩
关于插件的安装，我更倾向于pathogen↩
这词……↩
区别在于step停在函数调用，next立即执行函数调用↩
也许你得将shape从20000改的大点，2e5或更大。↩
提醒下，在gdb7或更高版本中，不再单独需要gdbinit文件。另外，要完成用gdb对python程序的调试，我想你的python程序都应该用-g来编译。↩