BeautifulSoup4

来源:互联网 发布:如何保持精力充沛 知乎 编辑:程序博客网 时间:2024/06/05 20:45

1. C++调用python脚本时,如果有异常没有处理,之后再调用BeautifulSoup()函数会出现异常,导致函数退出,所以在python脚本上要处理异常

如:

def tableparser(server, strkey, htmltext):
restext = 'exception error'
logtext = "testing exception....." + strkey
logging.debug(logtext)
try:
restext = tableparserEx(server, strkey, htmltext)
except Exception , e:
print e
return restext

 

2.使用正则表达式查找

htmltext = "<td name = '123'><id>5中国123</id><font>my</font><td>"
 htmltext = htmltext.replace('\n', '')
 #htmltext = htmltext.decode("utf8")
 soup = BeautifulSoup(htmltext)
 #[script.extract() for script in soup.findAll('script')] 
 tag_select = soup.find(text = re.compile(u'中国'))

 

3.finaAll不要递归查找,只查找直接子节点

htmltext = "<table><tr>1<tr>11</tr></tr><tr>2</tr><tr>3</tr></table>"
 htmltext = htmltext.replace('\n', '')
 soup = BeautifulSoup(htmltext)
 tagtable = soup.find('table')
 trs = tagtable.findAll('tr', recursive=False) # 默认是递归查找所有的子节点



4.C++调用后,返回中文遇到崩溃

reload(sys) 
sys.setdefaultencoding("utf-8")

0 0
原创粉丝点击