关于Python中HTMLParser使用的一些小疑问

来源：互联网发布：阿里云 vpn 编辑：程序博客网时间：2024/05/14 13:53

这几天在学习HTMLParser的使用，示例代码上都是实现那几个以handle开头的方法来实现特定功能的。handle_starttag和handle_endtag分别是在处理标签开始和结束时调用的。但是对于handle_data这个函数就不太清楚了。在网上找到的文档是这么写的：“This method is called to process arbitrary data”，翻译过来就是“调用此方法来处理任意数据”。对于这里的任意数据，我表示不太理解。本着多动手的原则，我就写了一段代码来测试一下。代码如下：

from HTMLParser import HTMLParserclass MyParser(HTMLParser):    def __init__(self):        HTMLParser.__init__(self)    def handle_starttag(self, tag, attrs):        print tag    def handle_data(self, data):        print 'process'    def handle_endtag(self, tag):        print tagfile = open('hehehe.txt','r')data = file.read()file.close()mp = MyParser()mp.feed(data)

hehehe.txt如下：

<html><head><title>sdfadf</title></head><body><span>123</span><div>123</div><span>123</span></body></html>

结果：

html
process
head
process
title
process
title
process
head
process
body
process
span
process
span
process
div
process
div
process
span
process
span
process
body
process
html

但是当我把所有标签写到一行，也就是

<html><head><title>sdfadf</title></head><body><span>123</span><div></div><span>123</span></body></html>

输出结果为：

html
head
title
process
title
head
body
span
process
span
div
div
span
process
span
body
html

观察结果：当两个标签之间没有任何东西时，程序并没有打印出“process”，反过来也就是说只有相邻两个标签之间有内容（“换行符也算”），程序就会打印出“process”，也就是调用了handle_data函数。若有不当之处，欢迎指正。

0 0