python之html标记的妙用

来源：互联网发布：夜来风雨声花落知多少编辑：程序博客网时间：2024/05/17 05:07

这是我学python学习的第一个项目，就是python教程后面的第一个文本标记，在这里写下来跟大家一起分享一下

1，首先需要一个文件util.py(代码如下

'''''  
Created on 2012-7-30  
 
@author: mars  
''' 
def lines(file):  
    for line in file:yield line  
    yield '/n' 
def blocks(file):  
    block=[]  
    for line in lines(file):  
        if  line.strip():  
            block.append(line)  
        elif block:  
            yield ''.join(block).strip()   
            block=[]

)

主要目的是将文本分块，注意此处yiled生成器的用法，还有block函数，在lines函数下参数是file，for line in file ：yiled line

yield '/n'

就是在文本最后面加上一个空行，方便后面的block的...strip()....函数的调用。

而block 函数就是将文本分块，首先block=[] 是一个空的列表，for循环检查文件中分行的部分，如果分行了就将他弄为一个块，而且，然后如果是块的话也就是执行elif语句部分，将块内用空格（' '.join）连接起来最后又block置空！

2，接下来要出场的就是处理程序，也就是handler.py 代码如下

'''''  
Created on 2012-7-12  
 
@author: mars  
''' 
class Handler:  
    def callback(self,prefix,name,*args):  
        method=getattr(self,prefix+name,None)  
        if callable(method):  
            return method(*args)  
    def start(self,name):  
        self.callback('start_', name)   
    def end(self,name):     
        self.callback("end_", name)    
      
class HTMLRenderer(Handler):  
    def start_document(self):  
        print('<html><head><title>..</title></head><body>')  
    def end_document(self):  
        print('</body></html>')  
    def start_paragraph(self):  
        print('<p>')  
    def end_paragraph(self):  
        print('</p>')  
    def start_heading(self):  
        print('<h2>')  
    def end_heading(self):  
        print('</h2>')   
    def start_listitem(self):  
        print('<li>')  
    def end_listitem(self):  
        print('</li>')      
    def start_list(self):  
        print('<ul>')  
    def end_list(self):  
        print('</ul>')  
    def start_title(self):  
        print('<h1>')  
    def end_title(self):  
        print('</h1>')  
    def feed(self,data):  
        print data             
 
    def sub(self,name):  
            def substitution(match):  
                result=self.callback("sub_", name,match)  
                if result is None:match.group(0)  
                return result  
            return substitution  
       
    def sub_emphasis(self,match):  
        return '<em>%s</em>' %match.group(1)   
    def sub_url(self,match):  
        return '<a href="%s">%s</a>' %(match.group(1) ,match.group(1))  
    def sub_main(self,match):  
        return '<a href=" mail to %s">%s</a>' %(match.group(1) ,match.group(1))

这里主要值得说一下的是，def callback(self,prefix,name,*args):
        method=getattr(self,prefix+name,None)
        if callable(method):
            return method(*args)

这里的method右边的getattr Found at: __builtin__
getattr(object, name[, default]) -> value得知返回的是value

而下面的callable(method) callable Found at: __builtin__
callable(object) -> bool是一个bool型最后返回的是method(*args) 比如callback的参数是prefix 'start_',name 是 document 的话最后返回的就是start_document,

而下面的htmlHandler继承handler类，下面的函数你一看就更加明白了

3，然后就是rules了，代码如下：

'''''  
Created on 2012-7-12  
 
@author: mars  
''' 
class Rule:  
    def action(self,block,handler):  
        handler.start(self.type)  
        handler.feed(block)  
        handler.end(self.type)  
        return True 
class HeadingRule(Rule):  
    type='heading' 
    def condition(self,block):  
        return not '\n' in block and len(block)<=70 and not block[-1]==':' 
class TitleRule(HeadingRule):  
    type='title' 
    first=True 
    def condition(self,block):  
        if not self.first: return False 
        self.first=False 
        return HeadingRule.condition(self, block)  
class ListItemRule(Rule):  
    type='listitem' 
    def condition(self,block):  
        return block[0]=='-' 
    def action(self,block,handler):  
        handler.start(self.type)  
        handler.feed(block[1:].strip())  
        handler.end(self.type)  
        return True              
class ListRule(ListItemRule):  
    type='list' 
    inside=False 
    def condition(self,block):  
        return True     
    def action(self,block,handler):  
        if  not self.inside and ListItemRule.condition(self, block):  
            handler.start(self.type)  
            self.inside=True 
        elif self.inside and not ListItemRule.condition(self, block):  
            handler.end(self.type)  
            self.inside=False 
        return False   
class ParagraphRule(Rule):  
    type='paragraph'     
    def condition(self,block):  
        return True

其实这个也很简单，仔细看，首先是Rule类是个基类，他又start feed(就是给出block的数据) 还有end ，然后后面的函数继承下来，并给type赋值为需要的标签类型

4，最后的就是markup.py了，也就是主执行程序，主要是过滤器

'''''  
Created on 2012-7-12  
 
@author: mars  
''' 
import sys,re  
from handlers import *  
from rules import *  
from util import *  
class Parser:  
    def __init__(self,handler):  
        self.handler=handler  
        self.rules=[]  
        self.filters=[]  
    def addRule(self,rule):  
        self.rules.append(rule)  
    def addFilter(self,pattern,name):  
        def filter(block,handler):  
            return re.sub(pattern,handler.sub(name),block)  
        self.filters.append(filter)  
    def parse(self,file):  
        self.handler.start('document')  
        for block in blocks(file):  
            for filter in self.filters:  
                block=filter(block,self.handler)  
            for rule in self.rules:  
                if rule.condition(block):  
                    last=rule.action(block,self.handler)  
                    if last:  
                        break    
        self.handler.end('document')  
class BasicTextParser(Parser):  
    def __init__(self,handler):  
        Parser.__init__(self,handler)  
        self.addRule(ListRule())  
        self.addRule(ListItemRule())  
        self.addRule(TitleRule())  
        self.addRule(HeadingRule())  
        self.addRule(ParagraphRule())  
          
        self.addFilter(r'\*(.+?)\*', 'emphasis')  
        self.addFilter(r'(http://[\.a-zA-Z/]+)', 'url')  
        self.addFilter(r'([\.a-zA-Z]+@[\.a-zA-Z]+[a-zA-Z]+)', 'mail')  
handler=HTMLRenderer()  
parser=BasicTextParser(handler)  
parser.parse(sys.stdin)

主要想说的就是这个

def parse(self,file):
        self.handler.start('document')
        for block in blocks(file):
            for filter in self.filters:
                block=filter(block,self.handler)#重新将块绑定到结果
            for rule in self.rules:
                if rule.condition(block):
                    last=rule.action(block,self.handler)
                    if last:
                        break

这段是调用处理程序start（'doucument'）开始，调用end('document')结尾，在这个中间迭代文本文件中所有的块，对每个块使用过滤器（也就是filters）和规则（rules），而且使用过滤器只是用block和处理程序来调用filter函数（filter Found at: __module_not_in_the_pythonpath_）并且重新将块绑定到结果

本文出自 “LuoZhengWu” 博客，请务必保留此出处http://brucemars.blog.51cto.com/5288106/960816

0 0