python-Beautiful Soup解析数据
来源:互联网 发布:h3c snmp 网管软件 编辑:程序博客网 时间:2024/06/06 00:04
安装Beautiful Soup
下面说一下在Windows下面如何安装Beautiful Soup:
1.到http://www.crummy.com/software/BeautifulSoup/网站上上下载,最新版本是4.1.3。
2.下载完成之后需要解压缩,假设放到D:/python下。
3.运行cmd,切换到D:/python/beautifulsoup4-4.1.3/目录下(根据自己解压缩后的目录和下载的版本号修改),
cd /d D:/python/beautifulsoup4-4.1.3
4.运行命令:
setup.py build
setup.py install
5.在IDE下from bs4 import BeautifulSoup,没有报错说明安装成功。
安装Beautiful Soup使用
#!/usr/bin/python#coding:utf-8from bs4 import BeautifulSoupimport urllibimport urllib2import rehtml = """<html><head><title>The Dormouse's story</title></head> <body> <p class="title" name="dromouse"><b>The Dormouse's story</b></p> <p class="story">Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well.</p> <p class="story">...</p> """soup = BeautifulSoup(html)print "-------soup格式化打印------"print soup.prettify()print "它查找的是在所有内容中的第一个符合要求的标签"print soup.titleprint soup.headprint soup.aprint soup.pprint "-----对于标签,它有两个重要的属性,是 name 和 attrs----"print soup.nameprint soup.head.nameprint soup.title.nameprint soup.p.attrsprint soup.p.stringprint "--通过标签名查找--"print soup.select('title')print "--通过类名查找--"print soup.select('.sister')print "--通过 id 名查找--"print soup.select('#link1')print "-- 组合查找 查找 p 标签中,id 等于 link1的内容--"print soup.select('p #link1')print "--直接子标签查找--"print soup.select("head > title")print "--属性查找--"print soup.select('a[class="sister"]')print soup.select('a[href="http://example.com/elsie"]')print soup.select('p a[href="http://example.com/elsie"]')
输出如下:
E:\python\python_jdk\python.exe E:/python/py_pro/safly/Python_Demo.py-------soup格式化打印------<html> <head> <title> The Dormouse's story </title> </head> <body> <p class="title" name="dromouse"> <b> The Dormouse's story </b> </p> <p class="story"> Once upon a time there were three little sisters; and their names were <a class="sister" href="http://example.com/elsie" id="link1"> <!-- Elsie --> </a> , <a class="sister" href="http://example.com/lacie" id="link2"> Lacie </a> and <a class="sister" href="http://example.com/tillie" id="link3"> Tillie </a> ; and they lived at the bottom of a well. </p> <p class="story"> ... </p> </body></html>它查找的是在所有内容中的第一个符合要求的标签<title>The Dormouse's story</title><head><title>The Dormouse's story</title></head><a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a><p class="title" name="dromouse"><b>The Dormouse's story</b></p>-----对于标签,它有两个重要的属性,是 name 和 attrs----[document]headtitle{u'class': [u'title'], u'name': u'dromouse'}The Dormouse's story--通过标签名查找--[<title>The Dormouse's story</title>]--通过类名查找--[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]--通过 id 名查找--[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>]-- 组合查找 查找 p 标签中,id 等于 link1的内容--[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>]--直接子标签查找--[<title>The Dormouse's story</title>]--属性查找--[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>][<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>][<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>]Process finished with exit code 0
阅读全文
0 0
- python-Beautiful Soup解析数据
- [Python]HTML/XML解析器Beautiful Soup
- python Beautiful soup网页解析-星座网
- python Beautiful Soup文档
- Python Beautiful Soup简介
- Python Beautiful Soup Example
- [Python]安装Beautiful Soup
- python 安装 Beautiful Soup
- Python模块Beautiful Soup
- python的一个html解析器 Beautiful Soup
- Python的html和xml解析库Beautiful Soup
- Beautiful Soup 示例解析html文件(python)
- HTML解析模块Beautiful Soup
- Beautiful Soup 4解析网页
- HTML解析工具beautiful soup
- 使用Beautiful Soup解析dom
- python Beautiful Soup分析网页
- Python爬虫利器Beautiful Soup
- 乱七八糟
- WEB篇二 CSS
- JAVA Freemarker(5)---取值过程
- jqObject.each()和$.each()区别
- 太上感应篇0019
- python-Beautiful Soup解析数据
- 我的Spring学习记录(三)
- printf实现原理
- 一个dom,点击事件触发两个事件是同步还是异步
- Centos 7 配置Java开发环境
- ssh服务的端口转发模拟
- 利用FreeMaker实现网页到Word文档的生成
- 冒泡排序
- 宜信笔试