BeautifulSoup4
来源:互联网 发布:如何保持精力充沛 知乎 编辑:程序博客网 时间:2024/06/05 20:45
1. C++调用python脚本时,如果有异常没有处理,之后再调用BeautifulSoup()函数会出现异常,导致函数退出,所以在python脚本上要处理异常
如:
def tableparser(server, strkey, htmltext):
restext = 'exception error'
logtext = "testing exception....." + strkey
logging.debug(logtext)
try:
restext = tableparserEx(server, strkey, htmltext)
except Exception , e:
print e
return restext
2.使用正则表达式查找
htmltext = "<td name = '123'><id>5中国123</id><font>my</font><td>"
htmltext = htmltext.replace('\n', '')
#htmltext = htmltext.decode("utf8")
soup = BeautifulSoup(htmltext)
#[script.extract() for script in soup.findAll('script')]
tag_select = soup.find(text = re.compile(u'中国'))
3.finaAll不要递归查找,只查找直接子节点
htmltext = "<table><tr>1<tr>11</tr></tr><tr>2</tr><tr>3</tr></table>"
htmltext = htmltext.replace('\n', '')
soup = BeautifulSoup(htmltext)
tagtable = soup.find('table')
trs = tagtable.findAll('tr', recursive=False) # 默认是递归查找所有的子节点
4.C++调用后,返回中文遇到崩溃
reload(sys)
sys.setdefaultencoding("utf-8")
- BeautifulSoup4
- BeautifulSoup4
- BeautifulSoup4 UserWarning
- BeautifulSoup4 UserWarning
- BeautifulSoup4 UserWarning
- BeautifulSoup4入门
- Python BeautifulSoup4
- Python BeautifulSoup4 使用指南
- Python:安装BeautifulSoup4模块
- BeautifulSoup4 and lxml notes
- Python3安装BeautifulSoup4模块
- BeautifulSoup4的初步使用
- win10上安装beautifulsoup4
- BeautifulSoup4 安装使用问题
- Beautifulsoup4学习笔记
- BeautifulSoup4小爬虫
- BeautifulSoup4的安装
- python#WS002 beautifulsoup4
- 字节码内容理解
- Ext子页面操作父页面
- java性能优化之String字符串优化
- Handler初学笔记
- 原生App切图
- BeautifulSoup4
- android 获取控件的宽高和view的位置
- LeetCode题解:excel-sheet-column-title
- HDOJ 1695 GCD phi函数+容斥原理
- 习题2—9
- 红桃A
- Macosx Setdns
- PDF文档分割简单方法
- P051第九题