python学习之旅-爬虫

来源:互联网 发布:java防止页面脚本注入 编辑:程序博客网 时间:2024/04/24 09:06
# -*- coding: utf-8 -*-"""Spyder EditorThis is a temporary script file."""import reimport urllib.requestdef getHtml(url):    page = urllib.request.urlopen(url)    html = page.read()    return htmldef getImg(html):        html = html.decode('utf_8')    reg = r'src="(.*?\.jpg)" width'    imgre = re.compile(reg)    imglist = imgre.findall(html)    return imglisthtml = getHtml('https://movie.douban.com/')x = 0for imgurl in getImg(html):    urllib.request.urlretrieve(imgurl,'%s.jpg' % x)    x += 1print(getImg(html))
 
原创粉丝点击