采用beautifulsoup库解析html页面

来源：互联网发布：linux socket编程教程编辑：程序博客网时间：2024/04/30 09:51

beautifulsoup是一个开源的html、xml操作库，它构建在第三方的xml、html解析器之上，负责对解析树进行操作。

可选的html、xml解析库有：lxml html5lib

1. 安装

pip install beautifulsoup4

2. 使用

import urllibimport bs4soup = bs4.BeautifulSoup(urllib.urlopen("http://www.example.com/1.html"), "html5lib", from_encoding="gbk")soup = bs4.BeautifulSoup(urllib.urlopen("http://www.example.com/1.html"), from_encoding="gbk")soup = bs4.BeautifulSoup("<html>... ....</html>", from_encoding="gbk")catlog = soup.find_all('div', class_="globalCrumbs")title = soup.find_all('div', class_="articleTitle2011")          for e in title:                  print e                  result["title"] = e.h1.text

采用beautifulsoup库解析html页面
好用的HTML解析库BeautifulSoup
Html文本的解析库BeautifulSoup
BeautifulSoup解析HTML(一)
【Python】 html解析BeautifulSoup
python解析html之BeautifulSoup
用BeautifulSoup解析html表格
使用BeautifulSoup解析html入门
Python 用BeautifulSoup 解析Html
安装lxml HTML 解析器，需要c语言库? BeautifulSoup
BeautifulSoup库（解析html和css文档）入门
爬虫入门系列（四）：HTML文本解析库BeautifulSoup
beautifulsoup 提取html 页面的数据
用Python解析HTML，BeautifulSoup使用简介
使用BeautifulSoup解析HTML和XML
python : BeautifulSoup 网页HTML 解析器
[Python]BeautifulSoup—HTML解析包
BeautifulSoup 解析html方法（爬虫）
w25q16 driver
并行编程之Fork/Join框架
IIS安装方法
Android系统权限和root权限
CONFIG_VMSPLIT_2G
采用beautifulsoup库解析html页面
JSP中forward和redirect有什么区别？什么时候必须用哪个？
SPARQLParser11分析
libmemcached的执行流程
Git 关联多个远程仓库
CKEditor在jsp中的应用
vim编码格式及乱码产生原因
linux中断相关函数与中断上下文理解
SHELL脚本编程的常识

采用beautifulsoup库 解析html页面

采用beautifulsoup库解析html页面