BeautifulSoup的基本用法
来源:互联网 发布:网络棋牌信息 编辑:程序博客网 时间:2024/06/16 11:41
from bs4 import BeautifulSoupimport re#一段代码html_doc = """<html><head><title>The Dormouse's story</title></head><body><p class="title"><b>The Dormouse's story</b></p><p class="story">Once upon a time there were three little sisters; and their names were<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;and they lived at the bottom of a well.</p><p class="story">...</p>"""#打印html_doc所有代码soup=BeautifulSoup(html_doc,"html.parser")#用html.parser解析器解析print(soup.prettify())
print(soup.title)#<title>The Dormouse's story</title>print(soup.title.string)#The Dormouse's storyprint(soup.a)#<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>print(soup.p)#<p class="title"><b>The Dormouse's story</b></p>print(soup.p['class'])#['title']print(soup.findAll('a'))'''[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]'''for link in soup.findAll('a'): print(link.string)#Elsie#Lacie#Tillieprint(soup.find(id="link3"))#<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>print(soup.find('p'))#<p class="title"><b>The Dormouse's story</b></p>
print(soup.find('p',{"class":"story"}))'''<p class="story">Once upon a time there were three little sisters; and their names were<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;and they lived at the bottom of a well.</p>'''print(soup.find('p',{'class':'story'}).get_text())'''Once upon a time there were three little sisters; and their names wereElsie,Lacie andTillie;and they lived at the bottom of a well.'''
正则表达式:
for tag in soup.find_all(re.compile("t")): print(tag.name)#html#titlefor tag in soup.find_all(re.compile("^b")): print(tag.name)# body# bdata=soup.findAll('a',href=re.compile(r"^http://example\.com/"))print(data)'''[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]'''
阅读全文
0 0
- BeautifulSoup的基本用法
- BeautifulSoup和lxml的基本用法示例
- BeautifulSoup基本用法总结
- beautifulsoup的简单用法
- beautifulsoup的简单用法
- Beautifulsoup的用法
- BeautifulSoup的详细用法
- python的BeautifulSoup用法
- python BeautifulSoup的简单用法
- BeautifulSoup库的用法详解
- python爬虫系列(2)—— requests和BeautifulSoup库的基本用法
- BeautifulSoup用法
- BeautifulSoup库的基本使用说明-01
- BeautifulSoup库的基本使用说明-02
- (二)BeautifulSoup的基本了解使用
- BeautifulSoup的安装和基本使用方式
- BeautifulSoup库的安装及基本元素
- python爬虫--BeautifulSoup的简单用法
- python 多线程 实现端口扫描
- JavaWeb三大技术之Filter
- Python学习笔记 5
- JavaScript—在嵌套的内部函数中调用外部this的方法
- mysql存储过程查询结果循环遍历 判断 赋值 游标等基本操作
- BeautifulSoup的基本用法
- 导入导出excel工具类地址
- 最容易入门的JVM讲解
- Codeforces Round #426 (Div. 2) A B C D E
- 省市(中国大学)二级联动的精髓写法
- Oracle中用户和模式的区别
- DOM(文档对象模型)--1获取节点、节点的增删改
- 算法之 求最小最大数
- Codeforces 844B