ubuntu安装Scrapy

来源：互联网发布：淘宝店多个店铺管理编辑：程序博客网时间：2024/05/18 00:51

Scrapy是Python开发的一个快速,高层次的屏幕抓取和web抓取框架，用于抓取web站点并从页面中提取结构化的数据。Scrapy用途广泛，可以用于数据挖掘、监测和自动化测试。官网网站http://www.scrapy.org/
1、安装如下软件

sudo apt-get install build-essential;sudo apt-get install python-dev;sudo apt-get install libxml2-dev;sudo apt-get install libxslt1-dev;sudo apt-get install python-setuptools;

2、安装Scrapy

sudo easy_install Scrapy;

wang@ubuntu:/usr/local/lib/python2.7/dist-packages$ sudo easy_install ScrapySearching for ScrapyBest match: Scrapy 0.16.1Processing Scrapy-0.16.1-py2.7.eggScrapy 0.16.1 is already the active version in easy-install.pthInstalling scrapy script to /usr/local/bin Using /usr/local/lib/python2.7/dist-packages/Scrapy-0.16.1-py2.7.eggProcessing dependencies for ScrapySearching for lxmlReading http://pypi.python.org/simple/lxml/Reading http://codespeak.net/lxmlBest match: lxml 3.0.1Downloading http://pypi.python.org/packages/source/l/lxml/lxml-3.0.1.tar.gz#md5=0f2b1a063ab3b6b0944cbc4a9a85dcfaProcessing lxml-3.0.1.tar.gzRunning lxml-3.0.1/setup.py -q bdist_egg --dist-dir /tmp/easy_install-qibAzL/lxml-3.0.1/egg-dist-tmp-mSvUVNBuilding lxml version 3.0.1.Building without Cython.Using build configuration of libxslt 1.1.26Building against libxml2/libxslt in the following directory: /usr/lib/x86_64-linux-gnuwarning: no files found matching '*.txt' under directory 'src/lxml/tests'src/lxml/lxml.etree.c: In function ‘__pyx_f_4lxml_5etree__getFilenameForFile’:src/lxml/lxml.etree.c:26310:7: warning: variable ‘__pyx_clineno’ set but not used [-Wunused-but-set-variable]src/lxml/lxml.etree.c:26309:15: warning: variable ‘__pyx_filename’ set but not used [-Wunused-but-set-variable]src/lxml/lxml.etree.c:26308:7: warning: variable ‘__pyx_lineno’ set but not used [-Wunused-but-set-variable]src/lxml/lxml.etree.c: In function ‘__pyx_pf_4lxml_5etree_4XSLT_18__call__’:src/lxml/lxml.etree.c:132608:81: warning: passing argument 1 of ‘__pyx_f_4lxml_5etree_12_XSLTContext__copy’ from incompatible pointer type [enabled by default]src/lxml/lxml.etree.c:130569:52: note: expected ‘struct __pyx_obj_4lxml_5etree__XSLTContext *’ but argument is of type ‘struct __pyx_obj_4lxml_5etree__BaseContext *’src/lxml/lxml.etree.c: In function ‘__pyx_f_4lxml_5etree__copyXSLT’:src/lxml/lxml.etree.c:133997:79: warning: passing argument 1 of ‘__pyx_f_4lxml_5etree_12_XSLTContext__copy’ from incompatible pointer type [enabled by default]src/lxml/lxml.etree.c:130569:52: note: expected ‘struct __pyx_obj_4lxml_5etree__XSLTContext *’ but argument is of type ‘struct __pyx_obj_4lxml_5etree__BaseContext *’src/lxml/lxml.etree.c: At top level:src/lxml/lxml.etree.c:12128:13: warning: ‘__pyx_f_4lxml_5etree_displayNode’ defined but not used [-Wunused-function]src/lxml/lxml.etree.c: In function ‘__pyx_f_4lxml_5etree_11_BaseParser__parseDocFromFile’:src/lxml/lxml.etree.c:86715:3: warning: ‘__pyx_r’ may be used uninitialized in this function [-Wuninitialized]src/lxml/lxml.etree.c: In function ‘__pyx_f_4lxml_5etree_11_BaseParser__parseDoc’:src/lxml/lxml.etree.c:86403:3: warning: ‘__pyx_r’ may be used uninitialized in this function [-Wuninitialized]src/lxml/lxml.etree.c: In function ‘__pyx_f_4lxml_5etree_11_BaseParser__parseUnicodeDoc’:src/lxml/lxml.etree.c:86093:3: warning: ‘__pyx_r’ may be used uninitialized in this function [-Wuninitialized]src/lxml/lxml.etree.c: In function ‘__pyx_f_4lxml_5etree_11_BaseParser__parseDocFromFilelike’:src/lxml/lxml.etree.c:86925:3: warning: ‘__pyx_r’ may be used uninitialized in this function [-Wuninitialized]Adding lxml 3.0.1 to easy-install.pth file Installed /usr/local/lib/python2.7/dist-packages/lxml-3.0.1-py2.7-linux-x86_64.eggSearching for w3lib>=1.2Reading http://pypi.python.org/simple/w3lib/Reading http://github.com/scrapy/w3libBest match: w3lib 1.2Downloading http://pypi.python.org/packages/source/w/w3lib/w3lib-1.2.tar.gz#md5=f929d5973a9fda59587b09a72f185a9eProcessing w3lib-1.2.tar.gzRunning w3lib-1.2/setup.py -q bdist_egg --dist-dir /tmp/easy_install-ZAXTgy/w3lib-1.2/egg-dist-tmp-aU3vpczip_safe flag not set; analyzing archive contents...Adding w3lib 1.2 to easy-install.pth file Installed /usr/local/lib/python2.7/dist-packages/w3lib-1.2-py2.7.eggSearching for Twisted>=8.0Reading http://pypi.python.org/simple/Twisted/Reading http://www.twistedmatrix.comReading http://twistedmatrix.com/products/downloadReading http://twistedmatrix.com/Reading http://tmrc.mit.edu/mirror/twisted/Twisted/9.0/Reading http://tmrc.mit.edu/mirror/twisted/Twisted/10.0/Reading http://twistedmatrix.com/projects/core/Reading http://tmrc.mit.edu/mirror/twisted/Twisted/8.2/Reading http://tmrc.mit.edu/mirror/twisted/Twisted/8.1/Best match: Twisted 12.2.0Downloading http://pypi.python.org/packages/source/T/Twisted/Twisted-12.2.0.tar.bz2#md5=9a321b904d01efd695079f8484b37861Processing Twisted-12.2.0.tar.bz2Running Twisted-12.2.0/setup.py -q bdist_egg --dist-dir /tmp/easy_install-kw897y/Twisted-12.2.0/egg-dist-tmp-sZWFYbIn file included from /usr/include/python2.7/Python.h:8:0,                 from twisted/internet/_sigchld.c:9:/usr/include/python2.7/pyconfig.h:1161:0: warning: "_POSIX_C_SOURCE" redefined [enabled by default]/usr/include/features.h:215:0: note: this is the location of the previous definitiontwisted/internet/_sigchld.c: In function ‘got_signal’:twisted/internet/_sigchld.c:15:13: warning: variable ‘ignored_result’ set but not used [-Wunused-but-set-variable]Adding Twisted 12.2.0 to easy-install.pth fileInstalling mailmail script to /usr/local/binInstalling conch script to /usr/local/binInstalling pyhtmlizer script to /usr/local/binInstalling twistd script to /usr/local/binInstalling lore script to /usr/local/binInstalling tkconch script to /usr/local/binInstalling tapconvert script to /usr/local/binInstalling ckeygen script to /usr/local/binInstalling tap2rpm script to /usr/local/binInstalling manhole script to /usr/local/binInstalling trial script to /usr/local/binInstalling cftp script to /usr/local/binInstalling tap2deb script to /usr/local/bin Installed /usr/local/lib/python2.7/dist-packages/Twisted-12.2.0-py2.7-linux-x86_64.eggFinished processing dependencies for Scrapy

表示安装成功。

3、测试

scrapy shell http://ziki.cn

获取所有a标签

hxs.select('//a').extract()

参考资料

http://doc.scrapy.org/en/latest/intro/install.htmlhttp://doc.scrapy.org/en/latest/intro/tutorial.html

原创文章，转载请注明：转载自海波无痕

0 0