mako源码解读(1)——python代码格式化

来源：互联网发布：淘宝拍图摄影工作室编辑：程序博客网时间：2024/04/28 11:40

mako是一个高性能的Python模板库，他采用把模板编译成Python代码来执行的方式进行渲染，mako的git仓库是从空仓库开始，让我们分阶段一步步来看看mako是怎么做到现在这种成熟度的。

第一次commit的mako做了两件事情

分析变量
格式化输出代码

这个是目录，很少，想想现在功能强大的mako就是从这么几个文件慢慢做大的，而且过程你完全可以看到，万分鸡冻啊，感谢开源，感谢github，感谢党！感谢天朝！

test目录下有单元测试的案例

class AstParseTest(unittest.TestCase):    def setUp(self):        pass    def tearDown(self):        pass    def test_locate_identifiers(self):        """test the location of identifiers in a python code string"""        code = """a = 10b = 5c = x * 5 + a + b + q(g,h,i) = (1,2,3)[u,k,j] = [4,5,6]foo.hoho.lala.bar = 7 + gah.blah + u + blahfor lar in (1,2,3):    gh = 5    x = 12print "hello world, ", a, bprint "Another expr", c"""        parsed = ast.PythonCode(code)        assert parsed.declared_identifiers == util.Set(['a','b','c', 'g', 'h', 'i', 'u', 'k', 'j', 'gh', 'lar'])        assert parsed.undeclared_identifiers == util.Set(['x', 'q', 'foo', 'gah', 'blah'])            parsed = ast.PythonCode("x + 5 * (y-z)")        assert parsed.undeclared_identifiers == util.Set(['x', 'y', 'z'])        assert parsed.declared_identifiers == util.Set()

这段测试需要的代码是完成分析已声明和未声明的变量，感觉没思路啊，语法分析吗（想到最近正在学的编译原理，跪下默默的烧柱香保佑期末不挂）

我们来看看mako是怎么实现这个功能的

class PythonCode(object):    """represents information about a string containing Python code"""    def __init__(self, code):        self.code = code        self.declared_identifiers = util.Set()        self.undeclared_identifiers = util.Set()                expr = parse(code, "exec")        class FindIdentifiers(object):            def visitAssName(s, node, *args, **kwargs):                if node.name not in self.undeclared_identifiers:                    self.declared_identifiers.add(node.name)            def visitName(s, node, *args, **kwargs):                if node.name not in __builtins__ and node.name not in self.declared_identifiers:                    self.undeclared_identifiers.add(node.name)        f = FindIdentifiers()        visitor.walk(expr, f)

他竟然利用了Python标准库里的模块，直接分析得到代码当中的声明和未声明的变量，真实机(tou)智(懒)，吾等来准备随便学学编译原理那套的呢(┬＿┬)

其中的visitor和parse都是标准库里的，在此我们也可以看到什么样的变量可以成为已声明——被赋值的变量，未声明——必须在__builtins__模块和先前声明的变量中都没有。

来看下一个测试案例

def test_generate_normal(self):    stream = StringIO()    printer = PythonPrinter(stream)    printer.print_python_line("import lala")    printer.print_python_line("for x in foo:")    printer.print_python_line("print x")    printer.print_python_line(None)    printer.print_python_line("print y")    assert stream.getvalue() == \"""import lalafor x in foo:print xprint y"""

一行一行输入，他帮我们自动进行代码格式化。

我们找到PythonPrinter这个类

import re, stringclass PythonPrinter(object):    ......        def print_python_line(self, line, is_comment=False):        """print a line of python, indenting it according to the current indent level.                this also adjusts the indentation counter according to the content of the line."""        if not self.in_indent_lines:            self._flush_adjusted_lines()            self.in_indent_lines = True        decreased_indent = False            if (line is None or             re.match(r"^\s*#",line) or            re.match(r"^\s*$", line)            ):            hastext = False        else:            hastext = True        if (not decreased_indent and             not is_comment and             (not hastext or self._is_unindentor(line))            ):                        if self.indent > 0:                 self.indent -=1                 if len(self.indent_detail) == 0:                      raise "Too many whitespace closures"                self.indent_detail.pop()                if line is None:            return                        self.stream.write(self._indent_line(line) + "\n")         if re.search(r":[ \t]*(?:#.*)?$", line):                     match = re.match(r"^\s*(if|try|elif|while|for)", line)            if match:                indentor = match.group(1)                self.indent +=1                self.indent_detail.append(indentor)            else:                indentor = None                m2 = re.match(r"^\s*(def|class|else|elif|except|finally)", line)                if m2:                    self.indent += 1                    self.indent_detail.append(indentor)    ......

in_indent_lines的值在__init__中赋值为False，让我们先看看self._flush_adjusted_lines()这个方法是什么作用。

def _flush_adjusted_lines(self):    stripspace = None    self._reset_multi_line_flags()        for entry in self.line_buffer:        if self._in_multi_line(entry):            self.stream.write(entry + "\n")        else:            entry = string.expandtabs(entry)            if stripspace is None and re.search(r"^[ \t]*[^# \t]", entry):                stripspace = re.match(r"^([ \t]*)", entry).group(1)            self.stream.write(self._indent_line(entry, stripspace) + "\n")            self.line_buffer = []    self._reset_multi_line_flags()

这个方法把self.line_buffer中的文本全部格式化输出了，主要是控制换行和缩进。

那这个line_buffer又是什么，让我们来全文检索一下

def print_adjusted_line(self, line):self.in_indent_lines = Falsefor l in re.split(r'\r?\n', line):    self.line_buffer.append(l)

这个唯一一个他被添加元素的地方，主要是负责添加整块代码这个方法

我们先来看换行，self._in_multi_line(entry)这个方法是判断是否需要换行，让我们想想什么情况Python代码可以换行

value = \False

还有

"""我可以跨行"""'''我也可以跨行'''

不是跨行取款的跨行，，，是Python代码的跨行

so~~，我们来看看mako是如何判断换行的

def _in_multi_line(self, line):     current_state = (self.backslashed or self.triplequoted)                         if re.search(r"\\$", line):        self.backslashed = True    else:        self.backslashed = False            triples = len(re.findall(r"\"\"\"|\'\'\'", line))    if triples == 1 or triples % 2 != 0:        self.triplequoted = not self.triplequoted            return current_state

注意他把上次的换行记录给返回了，因为这个函数判断的是当前行是否是出于多行当中，并利用正则判断当前行的下一行是否是多行

假若是多行的话，就不用管缩进啦，所以要做个是否多行代码的判别

来看假如不是多行，也就是要严格控制缩进的那个代码分支

entry = string.expandtabs(entry)if stripspace is None and re.search(r"^[ \t]*[^# \t]", entry):    stripspace = re.match(r"^([ \t]*)", entry).group(1)self.stream.write(self._indent_line(entry, stripspace) + "\n")

self._indent_line又是什么玩意

def _indent_line(self, line, stripspace = ''):    return re.sub(r"^%s" % stripspace, self.indentstring * self.indent, line)

假若stripspce是空的情况，也就是 for entry in self.line_buffer: 这个循环还没加，也就是连第一行都没有输出的时候，

stripspace = re.match(r"^([ \t]*)", entry).group(1)

之后

stripspace现在就是我们代码块第一行前面的空格数

提取他开头的空格，咱们的_indent_line会把他替换成当前需要的缩进，也就是self.indent*4的空格数。

下面行的代码会按第一行的缩进程度相应调整

联想下，我们在mako使用过程中，不同的<% %>的Python代码不用刻意的控制去对齐，非常的方便，得益于此处吧

然后整个_flush_adjusted_line方法的作用就是调整整一块代码缩进到他应该在的位置

之前方法的功能是清旧账~，不能任务越积越多嘛，我们来看下面的代码

if (line is None or     re.match(r"^\s*#",line) or    re.match(r"^\s*$", line)    ):    hastext = Falseelse:    hastext = True# see if this line should decrease the indentation levelif (not decreased_indent and     not is_comment and     (not hastext or self._is_unindentor(line))    ):        if self.indent > 0:         self.indent -=1        # if the indent_detail stack is empty, the user        # probably put extra closures - the resulting        # module wont compile.          if len(self.indent_detail) == 0:              raise "Too many whitespace closures"        self.indent_detail.pop()if line is None:    return

如果当前行是空的话，那么就是结束当前的代码块，联想在交互解释器中判断函数结束，确实是这样。

还有另外一种情况，某些关键词，比如else，except之类的，也需要和上层保持统一缩进，这是我们_is_unindentor方法要判断的

def _is_unindentor(self, line):    """return true if the given line is an 'unindentor', relative to the last 'indent' event received."""    if len(self.indent_detail) == 0:         return False    indentor = self.indent_detail[-1]     if indentor is None:         return False          match = re.match(r"^\s*(else|elif|except|finally)", line)    if not match:         return False      return True

以上两种情况都是需要对当前缩进减一，

然后按应该的缩进输出~

输出完了，下面还有一段代码

if re.search(r":[ \t]*(?:#.*)?$", line):    match = re.match(r"^\s*(if|try|elif|while|for)", line)    if match:        indentor = match.group(1)        self.indent +=1        self.indent_detail.append(indentor)    else:        indentor = None          m2 = re.match(r"^\s*(def|class|else|elif|except|finally)", line)        if m2:            self.indent += 1            self.indent_detail.append(indentor)

我们还得判断下当前行是否是某些关键词，下一行需要缩进的关键词，

这样一套下来后，咱们的self.stream就是输出的整整齐齐的Python代码了

0 0