[爬虫] Python爬虫 urllib BeautifulSoup
来源:互联网 发布:mac air怎么截屏 编辑:程序博客网 时间:2024/05/21 06:43
开发文档与源码
爬虫开源代码:https://github.com/REMitchell/python-scraping
urllib开发文档:https://docs.python.org/3/library/urllib.html
BeautifulSoup开发文档:https://www.crummy.com/software/BeautifulSoup/bs4/doc/
所需文件:http://pan.baidu.com/s/1i55olGL 密码:1985
安装BeautifulSoup
BeautifulSoup可以帮助你解析获取的文档,HTML或XML格式
- 下载版本
https://www.crummy.com/software/BeautifulSoup/bs4/download/ - 解压缩到Python的lib目录下
cmd进入beautifulsoup文件夹中,运行命令
setup.py buildsetup.py install
错误You are trying to run the Python 2 version of Beautiful Soup under Python 3. This will not work:
- 把bs4文件夹解压到python/lib
- 把python/Tools/scripts/2to3.py也放到lib目录中
- cmd到python/lib文件夹下,运行
2to3.py bs4 -w
记录:
2to3.py param1 (-w)
param1可以是要转换的.py文件、文件夹(文件及里的.py都会被转换)
-w可选,如果不写默认输出转换后的结果到显示屏,如果要把转换的文件再写入原文件
简单安全爬虫
from urllib.request import urlopenfrom urllib.error import HTTPError,URLErrorfrom bs4 import BeautifulSoupdef getTitle(url): try: html = urlopen(url) except (HTTPError,URLError) as e: return None try: bsObj = BeautifulSoup(html) title = bsObj.body.h1 except AttributeError as e: return None return titletitle = getTitle("url")if title==None: print("Title cound not be found")else: print(title)
结果
阅读全文
0 0
- [爬虫] Python爬虫 urllib BeautifulSoup
- Python爬虫基础细节(urllib+cookielib+BeautifulSoup)
- Python使用urllib库和BeautifulSoup库爬虫总结
- python爬虫--urllib
- Python爬虫-urllib库
- python爬虫之BeautifulSoup
- python爬虫之-BeautifulSoup
- python beautifulsoup 爬虫学习
- python爬虫之BeautifulSoup
- python-爬虫-beautifulsoup
- python爬虫爬取斗图网BeautifulSoup
- python爬虫--BeautifulSoup
- python爬虫(BeautifulSoup)
- urllib/urllib2和BeautifulSoup爬虫学习
- python的【爬虫】:使用urllib爬取wiki文章,使用beautifulSoup解析html
- Python爬虫urllib笔记(四)之使用BeautifulSoup爬取百度贴吧
- python爬虫urllib使用B
- Python爬虫urllib笔记(一)
- DrawerLayout 的简单实现
- 新的开始
- test 2017 9.18
- Fragment 的getUserVisibleHint()与setUserVisibleHint()
- 简单的验证码生成
- [爬虫] Python爬虫 urllib BeautifulSoup
- 添加删除修改
- 渗透测试工具sqlmap基础教程
- C语言里的strcpy()
- 网络安全之VPN详解
- spring中@param和mybatis中@param使用区别
- connect方法分析
- Codeforces 862A Mahmoud and Ehab and the MEX
- POJ3670 Eating Together 【动态规划】【LIS模板】