python_轻量级爬虫开发4
来源:互联网 发布:顾毓琇 知乎 编辑:程序博客网 时间:2024/05/02 05:06
# coding:utf8from bs4 import BeautifulSoupimport rehtml_doc = """<html><head><title>The Dormouse's story</title></head><body><p class="title"><b>The Dormouse's story</b></p><p class="story">Once upon a time there were three little sisters; and their names were<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;and they lived at the bottom of a well.</p><p class="story">...</p>"""soup = BeautifulSoup(html_doc,'html.parser',from_encoding='utf-8')print u'获取所有的链接'links = soup.find_all('a')for link in links: print link.name,link['href'],link.get_text()print u'获取Lacie的链接'link_node = soup.find('a',href='http://example.com/lacie')print link_node.name,link_node['href'],link_node.get_text()print u'正则匹配'link_node = soup.find('a',href=re.compile(r"ill"))print link_node.name,link_node['href'],link_node.get_text()print u'获取P段落文字'p_node = soup.find('p',class_="title")print p_node.name,p_node.get_text()
0 0
- python_轻量级爬虫开发4
- python_轻量级爬虫开发
- python_轻量级爬虫开发2
- python_轻量级爬虫开发3
- Python_爬虫
- python_爬虫http协议
- python_爬虫限制
- python_爬虫入门
- Python_爬虫学习_1
- python_慕课\Python开发简单爬虫\5-3 Python爬虫urlib2实例代码.py
- Python_爬虫_中文乱码
- python_爬虫今日头条
- python_网络爬虫篇1
- python_慕课\Python开发简单爬虫\7-7 开始运行爬虫和爬取结果展.py
- 轻量级多线程网络爬虫
- Python 轻量级爬虫
- python_爬虫模拟登录微博
- Python_大众点评网站数据爬虫
- 重拾编程之路--1、Two Sum
- UIButton基础以及使用block+UIButton处理点击事件
- 分组控件:CheckedListBox控件的使用
- python中去除列表重复元素的方法汇总
- 小球下落
- python_轻量级爬虫开发4
- Activity和Service通信
- JavaScript之函数和this
- python 学习笔记3
- What is Instance Initializer in Java?
- 学习记录
- UVA 1608 Non-boring sequences (递归分治)
- 动态规划-三角形
- 在Activity中使用Toast