scrapy 排错记录

来源:互联网 发布:linux netsnmp 下载 编辑:程序博客网 时间:2024/06/13 18:33

之前在服务器上用scrapy写爬虫,一直用得好好的。结果前天一同学在上面装了NLTK后就再也用不了了(不管是用shell还是crawl),报错如下:

Traceback (most recent call last):  File "/usr/local/bin/scrapy", line 9, in <module>    load_entry_point('Scrapy==0.24.4', 'console_scripts', 'scrapy')()  File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 143, in execute    _run_print_help(parser, _run_command, cmd, args, opts)  File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 89, in _run_print_help    func(*a, **kw)  File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 150, in _run_command    cmd.run(args, opts)  File "/usr/local/lib/python2.7/dist-packages/scrapy/commands/shell.py", line 46, in run    self.crawler_process.start_crawling()  File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 124, in start_crawling    return self._start_crawler() is not None  File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 139, in _start_crawler    crawler.configure()  File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 46, in configure    self.extensions = ExtensionManager.from_crawler(self)  File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 50, in from_crawler    return cls.from_settings(crawler.settings, crawler)  File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 29, in from_settings    mwcls = load_object(clspath)  File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/misc.py", line 42, in load_object    raise ImportError("Error loading object '%s': %s" % (path, e))ImportError: Error loading object 'scrapy.telnet.TelnetConsole': No module named conch 

提示 conch 这个模块没有找到,这可能是 sys.path 被改了。所幸之前在tmux上挂着一个python交互窗口,可以查得旧的sys.path。与现在的 sys.path 对比发现多出了两项:

'/usr/local/lib/python2.7/dist-packages/jieba-0.36.1-py2.7.egg','/usr/local/lib/python2.7/dist-packages/setuptools-15.0-py2.7.egg'

按理说找不到东西应该是 sys.path 少了一些东西才是,这个一时看不出什么。

于是沿着python报错信息,试图简单地重现错误。
错误是在 /usr/local/lib/python2.7/dist-packages/scrapy/utils/misc.py 中抛出的,代码如下:

    try:        mod = import_module(module)    except ImportError as e:        raise ImportError("Error loading object '%s': %s" % (path, e))

于是可以这样重现这个错误:

>>> from importlib import import_module>>> import_module('scrapy.telnet')

同样是得到 No module named conch 的报错。在 scrapy 项目的 telnet.py 里一开始就有这么一行代码:

from twisted.conch import manhole, telnet

这行代码没有执行成功,因为找不到 conch 这个模块。尝试直接 import twisted.conch 也是失败的。

python的第三方包都放在dist-packages目录里,在 /usr/local/lib/python2.7/dist-packages 我找到了 twisted 目录,里面是有 conch 的!
然后我用 locate 指令看系统中的 twisted 目录都在哪里,因为有可能新装了什么把原来可用的给替代了。

$ locate twisted

最终发现,在 /usr/lib/python2.7/dist-packages 下也有一个 twisted 目录,而且里面确实没有 conch 这个子目录。查看 _version.py,有这么一行:

     version = versions.Version('twisted', 11, 1, 0)

而看原来在用的 /usr/local/lib/python2.7/dist-packages/twisted 里的 _version.py,这一行是:

     version = versions.Version('twisted', 14, 0, 2)

这说明现在沿 sys.path 搜到的是老版本的 twisted(可能是以前谁装的),sys.path 被改动后,又指向了这个老的 twisted. 仔细比较 sys.path,有两行的顺序改变了。
这是之前正常的 sys.path:

'''/usr/local/lib/python2.7/dist-packages/requests-2.0.0-py2.7.egg''/usr/local/lib/python2.7/dist-packages/kafka_python-0.8.1_1-py2.7.egg''/usr/local/lib/python2.7/dist-packages/tox-1.6.1-py2.7.egg''/usr/local/lib/python2.7/dist-packages/py-1.4.19-py2.7.egg''/usr/local/lib/python2.7/dist-packages/virtualenv-1.10.1-py2.7.egg''/usr/local/lib/python2.7/dist-packages/pymongo-2.6.3-py2.7-linux-x86_64.egg''/usr/lib/python2.7''/usr/lib/python2.7/plat-linux2''/usr/lib/python2.7/lib-tk''/usr/lib/python2.7/lib-old''/usr/lib/python2.7/lib-dynload''/usr/local/lib/python2.7/dist-packages'        ### 注意这一行'/usr/lib/python2.7/dist-packages'              ### 还有这一行'/usr/lib/python2.7/dist-packages/PIL''/usr/lib/python2.7/dist-packages/gst-0.10''/usr/lib/python2.7/dist-packages/gtk-2.0''/usr/lib/pymodules/python2.7''/usr/lib/python2.7/dist-packages/ubuntu-sso-client''/usr/lib/python2.7/dist-packages/ubuntuone-client''/usr/lib/python2.7/dist-packages/ubuntuone-control-panel''/usr/lib/python2.7/dist-packages/ubuntuone-couch''/usr/lib/python2.7/dist-packages/ubuntuone-installer''/usr/lib/python2.7/dist-packages/ubuntuone-storage-protocol'

这是被人装了东西后,即现在的sys.path:

'''/usr/local/lib/python2.7/dist-packages/requests-2.0.0-py2.7.egg''/usr/local/lib/python2.7/dist-packages/kafka_python-0.8.1_1-py2.7.egg''/usr/local/lib/python2.7/dist-packages/tox-1.6.1-py2.7.egg''/usr/local/lib/python2.7/dist-packages/py-1.4.19-py2.7.egg''/usr/local/lib/python2.7/dist-packages/virtualenv-1.10.1-py2.7.egg''/usr/local/lib/python2.7/dist-packages/pymongo-2.6.3-py2.7-linux-x86_64.egg''/usr/local/lib/python2.7/dist-packages/setuptools-15.0-py2.7.egg''/usr/lib/python2.7/dist-packages'          # 这一行被挪到了前面'/usr/local/lib/python2.7/dist-packages/jieba-0.36.1-py2.7.egg''/usr/lib/python2.7''/usr/lib/python2.7/plat-linux2''/usr/lib/python2.7/lib-tk''/usr/lib/python2.7/lib-old''/usr/lib/python2.7/lib-dynload''/usr/local/lib/python2.7/dist-packages'    # 这一行相比就在后面了'/usr/lib/python2.7/dist-packages/PIL''/usr/lib/python2.7/dist-packages/gst-0.10''/usr/lib/python2.7/dist-packages/gtk-2.0''/usr/lib/pymodules/python2.7''/usr/lib/python2.7/dist-packages/ubuntu-sso-client''/usr/lib/python2.7/dist-packages/ubuntuone-client''/usr/lib/python2.7/dist-packages/ubuntuone-control-panel''/usr/lib/python2.7/dist-packages/ubuntuone-couch''/usr/lib/python2.7/dist-packages/ubuntuone-installer''/usr/lib/python2.7/dist-packages/ubuntuone-storage-protocol'

注意注释的那两行,/usr/lib/python2.7/dist-packages 在新的sys.path 中被提到了前面,于是就先找到了那个老版本的 twisted!终于知道为什么出错了,长吁一口气~
接下来,把老版本的 twisted 目录删掉(或改名)就行了,同样处理掉的还有对应的几个 egg-info 文件。文件名如下:

twisted  Twisted_Core-11.1.0.egg-info  Twisted_Names-11.1.0.egg-info  Twisted_Web-11.1.0.egg-info

当然也可以改默认的 sys.path,把 /usr/local/lib/python2.7/dist-packages 放在前面。但考虑到可能同样会影响别人,还是直接把老版本的东西丢掉得了,反正没什么用。

最后的解决虽然简单,但还是花了不少时间来找这个问题,服务器排错本身就是一个考验耐心的事情!
写这篇文章可能没什么直接的参考价值,因为每个人的环境不一样,出错的原因也不一样。只是排错的思路,或许可以给无助的朋友一点帮助,因为一开始我遇到这个问题的时候,也是非常地懊恼,网上找不到什么帮得上忙的资料。最终还是得静下心来,加深对 Python 的理解。总之要有这个信念:问题总是能解决的!

0 0
原创粉丝点击