Python爬虫

来源：互联网发布：ff14捏脸动漫数据编辑：程序博客网时间：2024/06/06 03:53

from urllib import request
from bs4 import BeautifulSoup
url = 'http://python.org/'

# 下载网页
print ("连接网络")
html = request.urlopen(url)
print ("开始下载网页")
content = html.read()
content = content.decode('utf-8')
print ("下载网页完成")
html.close()

# 使用BeautifulSoup匹配图片
html_soup = BeautifulSoup(content,'lxml')
# 相较通过正则表达式去匹配,BeautifulSoup提供了一个更简单灵活的方式
all_img_links = html_soup.findAll('img')
print (all_img_links)

# 接下来就是老生常谈的下载图片
img_counter = 1
for img_link in all_img_links:
img_name = '%s.jpg' % img_counter
# 下载到本地默认下载在文件相对路径
request.urlretrieve(url+img_link['src'], img_name)
img_counter += 1

阅读全文

0 0

python爬虫-->爬虫基础
[爬虫] Python爬虫技巧
Python爬虫
python 爬虫
python 爬虫
python 爬虫
python爬虫
Python爬虫
Python爬虫
python 爬虫
Python爬虫
python爬虫
python 爬虫
python 爬虫
python爬虫
python爬虫
python爬虫
python 爬虫
Linux 文件的管理与目录之 ls 、 cp 命令操作详解
银联支付开发流程
java 日志监控收集：动态流量峰值进行采样率配置
20171124 整型与布尔型的转换
总结：jquery选择器
Python爬虫
【OpenCV入门教程之十八】OpenCV仿射变换 & SURF特征点描述合辑
Java读写文件
WebGL入门系列一
pip安装指定版本组件
基于Flume的美团日志收集系统(二)改进和优化
redhat linux手动RPM安装gcc,g++
PAT乙级1048 数字加密
Html介绍