python 爬虫

来源：互联网发布：js object empty 编辑：程序博客网时间：2024/06/03 14:53

<pre name="code" class="html">#!usr/bin/pythonimport reimport urllibdef getHtml(url):    page=urllib.urlopen(url)    html=page.read()    return htmldef getImg(html):    reg=r'src="(.*?\.jpg)" width'    imgre=re.compile(reg)    imglist=re.findall(imgre,html)    x=0    for imgurl in imglist:        urllib.urlretrieve(imgurl,'%s.jpg' % x)        x+=1    return imglisthtml= getHtml("http://tieba.baidu.com/p/1898043927")print getImg(html)

解释如下：

getHtml(url) 下载指定url 的网页

getImg（html） 从网页中获取指定正则表达式的连接，同时下载该图片 按顺序保存

python 写爬虫果然比java简洁，java写个同样的功能的爬虫费大劲啊

0 0

python爬虫-->爬虫基础
[爬虫] Python爬虫技巧
Python爬虫
python 爬虫
python 爬虫
python 爬虫
python爬虫
Python爬虫
Python爬虫
python 爬虫
Python爬虫
python爬虫
python 爬虫
python 爬虫
python爬虫
python爬虫
python爬虫
python 爬虫
Linux进程通信之共享内存
CheckIO The Longest Palindromic
ios developer tiny share-20160822
Office2013企业版23&64位官方镜像
图片360旋转
python 爬虫
opencv3.1
最短路径
各种View刷新
Java IO BufferedInputStream和BufferedOutputStream
strcat
#150 Best Time to Buy and Sell Stock II
2016年最有效的贴吧引流策略
Oracle表锁或行锁问题解决办法