BeautifulSoup库中find_all()方法
来源:互联网 发布:js实现点击重置按钮 编辑:程序博客网 时间:2024/06/05 05:46
今天看了BeautifulSoup库的find_all()方法,特来总结一下。BeautifulSoup库是专门用来解析、遍历和维护标签树的功能库,在爬取网页信息后,我们可以用BeautifulSoup库来解析网页信息
find_all(names,attrs,recursive,string,**kwargs)
1、name:指的是标签名
import requestsfrom bs4 import BeautifulSoupurl = 'http://python123.io/ws/demo.html'try: r = requests.get(url,timeout=30) r.raise_for_status() #response对象的一个方法,判断返回状态 r.encoding = r.apparent_encoding #encoding为从http header中猜测的编码方式,apparent_encoding则是从内容中猜测的编码方式 demo = r.text print(demo)except: print('there is a mistake')soup = BeautifulSoup(demo,'html.parser')soup.find_all('a')
输出:
soup.find_all('a')Out[1]: [<a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a>, <a class="py2" href="http://www.icourse163.org/course/BIT-1001870001" id="link2">Advanced Python</a>]
2、attrs:指的是标签属性
soup.find_all(id=re.compile('link'))Out[1]: [<a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a>, <a class="py2" href="http://www.icourse163.org/course/BIT-1001870001" id="link2">Advanced Python</a>]这里是检索属性id中含有link的标签
3、recrusive:是否对子孙全部检索,默认为True
soup.find_all('a')Out[2]: [<a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a>, <a class="py2" href="http://www.icourse163.org/course/BIT-1001870001" id="link2">Advanced Python</a>]soup.find_all('a',recursive=False)Out[3]: []
soup.find_all(string = re.compile('python'))Out[4]: ['This is a python demo page', 'The demo python introduces several python courses.']
阅读全文
0 0
- BeautifulSoup库中find_all()方法
- BeautifulSoup使用find_all方法乱码问题
- BeautifulSoup中find(),find_all(),select()函数
- BeautifulSoup find() 和 find_all()
- Python的BeautifulSoup之find和find_all
- python网络爬虫与信息采取之解析网页(二)---BeautifulSoup库的find()和find_all()
- python爬虫(1)——BeautifulSoup库函数find_all()
- BeautifulSoup库入门级方法
- Windows中BeautifulSoup的安装方法
- python之BeautifulSoup之二 带属性值的抓取(find_all('tag', attrs={'class':'value'})
- BeautifulSoup 对象方法
- BeautifulSoup常用方法
- Enumerable#find_all
- find_all用法
- python 中使用BeautifulSoup
- python 中BeautifulSoup入门
- python 中BeautifulSoup入门
- python 中BeautifulSoup入门
- Spring之IOC、AOP的理解
- JQ实现轮播
- docker centos redis 基础常用命令
- 【PAT 1004 Acute Stroke (30)】& dfs
- 【PAT】【Advanced Level】1095. Cars on Campus (30)
- BeautifulSoup库中find_all()方法
- 2017最新java学习大全
- Silver Cow Party
- Retrofit和Rxjava的简单了解
- tf-idf:sklearn中TfidfVectorizer使用
- spring-task
- 【Java】split(".")
- Attack on Titans ZOJ
- czl蒻蒟的OI之路2