python实现简易模板引擎

本文章翻译自A-template-engine.

该项目的作者为Ned Batchelder，他是一位经验丰富的软件工程师，是coverage.py(一个代码覆盖度检测工具)以及很多开源库的作者，同时也是一位享誉盛名的教育家。在这个项目中，他很短短几百行，实现了一个功能齐全的模板引擎。

什么是模板引擎

模板引擎（特指用于web开发的模板引擎），是为了使用户界面与业务数据分离而产生的，用于网站的模板引擎会根据传入的参数生成标准HTML文档。

通俗点讲，模板引擎就是动态生成HTML文件的东西，以博客网站为例，每个用户看到的界面内容可能都是不一样的，但大体框架是相同的，如果这些包含内容的HTML文件全是以静态形式储存在服务器的，那么当网站用户足够多的时候，服务器将会不堪重负。而模板引擎就是一个能根据用户动态生成HTML文件的东西。

初步探索

假设我们想要生成以下HTML内容：

<p>Welcome, Charlie!</p>
<p>Products:</p>
<ul>
    <li>Apple: $1.00</li>
    <li>Fig: $1.50</li>
    <li>Pomegranate: $3.25</li>
</ul>

在这里，用户的名字（这里是 Charlie）和商品以及商品价格都是动态生成的。那么我们如何使用python生成这些内容呢？我们可以把要生成的html当成一个字符串，而其中动态生成的部分使用参数传入：

PAGE_HTML = """
<p>Welcome, {name}!</p>
<p>Products:</p>
<ul>
{products}
</ul>
"""

# The HTML for each product displayed.
PRODUCT_HTML = "<li>{prodname}: {price}</li>\n"

def make_page(username, products):
    product_html = ""
    for prodname, price in products:
        product_html += PRODUCT_HTML.format(
            prodname=prodname, price=format_price(price))
    html = PAGE_HTML.format(name=username, products=product_html)
    return html

这是可行的，但这样的方法很容易造成混乱，尤其是当网页很大结构复杂时，维护困难。

使用模板引擎

为了避免出现上述问题，我们可以使用模板引擎动态生成页面。也就是说，网站的一个页面对应一个html模板，其中动态部分的数据会使用特殊的语法表示，我们使用python处理这些语法。以上面的例子，一个模板引擎应该如下：

<p>Welcome, !</p>
<p>Products:</p>
<ul>

</ul>

比起我们实现的第一个版本，这个版本大部分都是html代码，大括号表示动态生成数据的部分。

支持的语法

编写模板之前，我们要编写的模板支持以下语法：

变量：
```
Welcome, {{user_name}}!
```
两个大括号之间的值表示变量，模板生成html时应该将其作为参数传入。
通过点号访问字典值，对象属性以及对象方法：
```
The price is: {{product.price}}, with a {{product.discount}}% discount.
```
比如上述price和discount都是product的属性
使用函数修改值：
```
Short name: {{story.subject|slugify|lower}}
```
其中管道连接的字符串表示为调用函数修改变量值。

支持if语句和for语句：

{% if user.is_logged_in %}
    <p>Welcome, {{ user.name }}!</p>
{% endif %}

if语句可用来判断用户是否登录。

<p>Products:</p>
<ul>
{% for product in product_list %}
    <li>{{ product.name }}: {{ product.price|format_price }}</li>
{% endfor %}
</ul>

for语句用于循环遍历

支持注释：
```
{# This is the best template ever! #}
```

实现方法

如何实现模板引擎呢？一个常用的方法是读入模板，然后将模板生成python代码，执行这些代码时，就会生成我们想要的结果。以上面的模板为例：

<p>Welcome, {{user_name}}!</p>
<p>Products:</p>
<ul>
{% for product in product_list %}
    <li>{{ product.name }}:
        {{ product.price|format_price }}</li>
{% endfor %}
</ul>

我们的引擎将会生成类似以下的代码：

def render_function(context, do_dots):
    c_user_name = context['user_name']
    c_product_list = context['product_list']
    c_format_price = context['format_price']

    result = []
    append_result = result.append
    extend_result = result.extend
    to_str = str

    extend_result([
        '<p>Welcome, ',
        to_str(c_user_name),
        '!</p>\n<p>Products:</p>\n<ul>\n'
    ])
    for c_product in c_product_list:
        extend_result([
            '\n    <li>',
            to_str(do_dots(c_product, 'name')),
            ':\n        ',
            to_str(c_format_price(do_dots(c_product, 'price'))),
            '</li>\n'
        ])
    append_result('\n</ul>\n')
    return ''.join(result)

每一个模板都会转化成一个render_function函数，这个函数context参数是一个字典，

函数的开头三行是读入参数的过程，每个context数据值被读到以’c_‘前缀开头的局部变量中，使用这样的前缀是为了避免与后面的变量发生变量名重复。

result列表是一个字符串列表，是我们最后要返回的数据，也就是生成html文件所需要的内容。使用类似下面风格的代码：

append_result = result.append
extend_result = result.extend
to_str = str

是为了将函数存储在局部变量里，减少函数调用所需要的时间，当需要大量调用该函数时，可以有效的提高代码运行的效率。

后面我们通过不断调用append_result和extend_result函数生成result，最后返回它的字符串形式。

开始动手

模板类

我们的模板引擎的核心是模板类，我们的目标是可以使用一个字符串以及参数组成的字典来初始化它，同时调用它的render方法就能得到渲染结果：

# Make a Templite object.
templite = Templite('''
    <h1>Hello !</h1>
    
    ''',
    {'upper': str.upper},
)

# Later, use it to render some data.
text = templite.render({
    'name': "Ned",
    'topics': ['Python', 'Geometry', 'Juggling'],
})

下面是模板类的具体实现过程，为了使代码模块化，我们再定义另一个类：CodeBuilder。

CodeBuilder类的实现

该类主要是为了方便我们创建生成python代码，它可以添加多行代码，处理缩进，最后还可以返回编译后的python代码：

class CodeBuilder(object):
    """Build source code conveniently."""

    def __init__(self, indent=0):
        self.code = []
        self.indent_level = indent

add_line函数添加一个新行代码，同时注意缩进的处理：

    def add_line(self, line):
        """Add a line of source to the code.

        Indentation and newline will be added for you, don't provide them.

        """
        self.code.extend([" " * self.indent_level, line, "\n"])

indent和dedent函数增大和减少缩进：

    INDENT_STEP = 4      

    def indent(self):
        """Increase the current indent for following lines."""
        self.indent_level += self.INDENT_STEP

    def dedent(self):
        """Decrease the current indent for following lines."""
        self.indent_level -= self.INDENT_STEP

add_section函数返回另一个CodeBuilder对象，这样当我们想在特定的位置留出空间时，调用该方法即可。

    def add_section(self):
        """Add a section, a sub-CodeBuilder."""
        section = CodeBuilder(self.indent_level)
        self.code.append(section)
        return section

__str__连接self.code产生单个字符串，注意到self.code可能包括其他CodeBuilder对象，遇到这种情况时，它将会递归调用该方法。

get_globals方法通过执行生成代码返回一个字典，包含变量名和变量的值。

    def get_globals(self):
        """Execute the code, and return a dict of globals it defines."""
        # A check that the caller really finished all the blocks they started.
        assert self.indent_level == 0
        # Get the Python source as a single string.
        python_source = str(self)
        # Execute the source, defining globals, and return them.
        global_namespace = {}
        exec(python_source, global_namespace)
        return global_namespace

Templite类的具体实现

将一个模板成功编译成python代码的大部分过程会在这个类中实行。首先是初始化函数：

    def __init__(self, text, *contexts):
        """Construct a Templite with the given `text`.

        `contexts` are dictionaries of values to use for future renderings.
        These are good for filters and global values.

        """
        self.context = {}
        for context in contexts:
            self.context.update(context)

我们创建两个集合用于存放变量：

        self.all_vars = set()
        self.loop_vars = set()

然后就是调用CodeBuilder生成代码的部分了：

        code = CodeBuilder()

        code.add_line("def render_function(context, do_dots):")
        code.indent()
        # 留出空间，因为我们还不知道要有多少变量
        vars_code = code.add_section()
        code.add_line("result = []")
        code.add_line("append_result = result.append")
        code.add_line("extend_result = result.extend")
        code.add_line("to_str = str")

以上是比较简单的部分，无论模板如何，这些代码总会被生成。接下来我们定义一个内部函数帮助我们缓存输出的字符串:

        buffered = []
        def flush_output():
            """Force `buffered` to the code builder."""
            if len(buffered) == 1:
                code.add_line("append_result(%s)" % buffered[0])
            elif len(buffered) > 1:
                code.add_line("extend_result([%s])" % ", ".join(buffered))
            del buffered[:]

buffered列表存放还没有被写到我们要生成的函数的源代码里面，在模板引擎处理模板的时候，我们将会添加字符串到buffered，在遇到控制流语句时，flush_output函数将会把字符串添加到函数源代码。比如，当我们执行（假设buffered为空）：

buffered.append("'hello'")

这时候执行flush_output函数，下面这个语句

append_result('hello')

将会被添加到源代码。

回到我们的Templite类，当我们处理控制结构时，我们应该判断他们的语法是否符合我们的模板，ops_stack是一个用来处理这些情况的堆栈。

        ops_stack = []

当我们遇到控制语句，比如``时，就将if出栈，如果没有if，就可以返回错误。利用ops_stack我们可以很快的检查错误。

接下来就是真正的解析语句了，我们将用到以下正则表达式：

        tokens = re.split(r"(?s)({{.*?}}|{%.*?%}|{#.*?#})", text)

关于正则表达式的用法，这里就不赘述了，有需要可以自己查看文档。

对于以下字符串：

<p>Topics for {{name}}: {% for t in topics %}{{t}}, {% endfor %}</p>

若将其作为text传入，将会被分成：

[
    '<p>Topics for ',               # literal
    '{{name}}',                     # expression
    ': ',                           # literal
    '{% for t in topics %}',        # tag
    '',                             # literal (empty)
    '{{t}}',                        # expression
    ', ',                           # literal
    '{% endfor %}',                 # tag
    '</p>'                          # literal
]

文本被分割成列表后，我们就可以对它进行遍历：

        for token in tokens:

处理注释的情况，直接忽略：

            if token.startswith('{#'):
                # Comment: ignore it and move on.
                continue

处理{{...}}之类的语句：

            elif token.startswith('{{'):
                # An expression to evaluate.
                expr = self._expr_code(token[2:-2].strip())
                buffered.append("to_str(%s)" % expr)

上面用到的_expr_code方法会将一个模板表达式转换成python表达式，我们在后面定义它。

处理控制流语句{%....%}：

            elif token.startswith('{%'):
                # Action tag: split into words and parse further.
                flush_output()
                words = token[2:-2].strip().split()

针对if语句处理：

                if words[0] == 'if':
                    # An if statement: evaluate the expression to determine if.
                    if len(words) != 2:
                        self._syntax_error("Don't understand if", token)
                    ops_stack.append('if')
                    code.add_line("if %s:" % self._expr_code(words[1]))
                    code.indent()

针对for语句处理：

                elif words[0] == 'for':
                    # A loop: iterate over expression result.
                    if len(words) != 4 or words[2] != 'in':
                        self._syntax_error("Don't understand for", token)
                    ops_stack.append('for')
                    self._variable(words[1], self.loop_vars)
                    code.add_line(
                        "for c_%s in %s:" % (
                            words[1],
                            self._expr_code(words[3])
                        )
                    )
                    code.indent()

我们首先检查了语法的正确性，然后调用_variable方法检查变量语法，然后添加到我们提供的集合。

处理结束语句比如{% endif %}或者{% endfor %}：

                elif words[0].startswith('end'):
                    # Endsomething.  Pop the ops stack.
                    if len(words) != 1:
                        self._syntax_error("Don't understand end", token)
                    end_what = words[0][3:]
                    if not ops_stack:
                        self._syntax_error("Too many ends", token)
                    start_what = ops_stack.pop()
                    if start_what != end_what:
                        self._syntax_error("Mismatched end tag", end_what)
                    code.dedent()

错误处理，当遇到非if，for以及end语句时：

                else:
                    self._syntax_error("Don't understand tag", words[0])

最后一种情况，处理正常字符串（非模板语句）：

            else:
                # Literal content.  If it isn't empty, output it.
                if token:
                    buffered.append(repr(token))

遍历结束后，所有模板都已经被处理，但别忘了，我们还有一个ops_stack，此时它应该为空：

        if ops_stack:
            self._syntax_error("Unmatched action tag", ops_stack[-1])

        flush_output()

现在，我们可以回过头来处理刚刚空下的vars_code区域了，模板所有的变量被存放在all_vars中，循环使用的变量存放在loop_vars中：

        for var_name in self.all_vars - self.loop_vars:
            vars_code.add_line("c_%s = context[%r]" % (var_name, var_name))

至此，解析工作算是完全完成了，最后补上函数最后一行：

        code.add_line("return ''.join(result)")
        code.dedent()

定义渲染函数，它由CodeBuilder对象的get_globals方法获得：

       self._render_function = code.get_globals()['render_function']

接下来定义_expr_code方法，它将模板中的变量，表达式转化成python的表达式，比如：

{{user_name}}

或者

{{user.name.localized|upper|escape}}

将会被处理成python的表达方式。

    def _expr_code(self, expr):
        """Generate a Python expression for `expr`."""
        if "|" in expr:
            pipes = expr.split("|")
            code = self._expr_code(pipes[0])
            for func in pipes[1:]:
                self._variable(func, self.all_vars)
                code = "c_%s(%s)" % (func, code)
        elif "." in expr:
            dots = expr.split(".")
            code = self._expr_code(dots[0])
            args = ", ".join(repr(d) for d in dots[1:])
            code = "do_dots(%s, %s)" % (code, args)
            
        else:
            self._variable(expr, self.all_vars)
            code = "c_%s" % expr
        return code

自此，模板引擎的核心部分已基本完成，下面是一些辅助函数。

一些辅助函数

    def _syntax_error(self, msg, thing):
        """Raise a syntax error using `msg`, and showing `thing`."""
        raise TempliteSyntaxError("%s: %r" % (msg, thing))

    def _variable(self, name, vars_set):
        """Track that `name` is used as a variable.

        Adds the name to `vars_set`, a set of variable names.

        Raises an syntax error if `name` is not a valid name.

        """
        if not re.match(r"[_a-zA-Z][_a-zA-Z0-9]*$", name):
            self._syntax_error("Not a valid name", name)
        vars_set.add(name)    
        

渲染函数

    def render(self, context=None):
        """Render this template by applying it to `context`.

        `context` is a dictionary of values to use in this rendering.

        """
        # Make the complete context we'll use.
        render_context = dict(self.context)
        if context:
            render_context.update(context)
        return self._render_function(render_context, self._do_dots)

_do_dots函数：

    def _do_dots(self, value, *dots):
        """Evaluate dotted expressions at runtime."""
        for dot in dots:
            try:
                value = getattr(value, dot)
            except AttributeError:
                value = value[dot]
            if callable(value):
                value = value()
        return value

全部代码请访问：项目源码

更多细节可以关注：A-template-engine.