Halide学习笔记----Halide tutorial源码阅读13
来源:互联网 发布:notepad++ mac 编辑:程序博客网 时间:2024/06/04 22:06
Halide入门13
// Halide tutorial lesson 13: Tuples// Halide入门第13课:元组// This lesson describes how to write Funcs that evaluate to multiple// values.// 本科介绍如何编写多值函数// On linux, you can compile and run it like so:// g++ lesson_13*.cpp -g -I ../include -L ../bin -lHalide -lpthread -ldl -o lesson_13 -std=c++11// LD_LIBRARY_PATH=../bin ./lesson_13#include "Halide.h"#include <stdio.h>#include <algorithm>using namespace Halide;int main(int argc, char **argv) { // So far Funcs (such as the one below) have evaluated to a single // scalar value for each point in their domain. // 之前介绍的函数为单值函数,即在每一个像素点计算一个值 Func single_valued; Var x, y; single_valued(x, y) = x + y; // One way to write a Func that returns a collection of values is // to add an additional dimension that indexes that // collection. This is how we typically deal with color. For // example, the Func below represents a collection of three values // for every x, y coordinate indexed by c. // 一种多值函数的描述方法是增加一个维度,通过这个额外的维度来索引多值函数。 // 像通常的颜色通道就是通过颜色这个维度来索引像素点的r(x,y)/g(x,y)/b(x,y)多值函数的。 Func color_image; Var c; color_image(x, y, c) = select(c == 0, 245, // Red value c == 1, 42, // Green value 132); // Blue value // This method is often convenient because it makes it easy to // operate on this Func in a way that treats each item in the // collection equally: // 这种方法通常很简单,平等对待多值函数的每一个函数即可。 Func brighter; brighter(x, y, c) = color_image(x, y, c) + 10; // However this method is also inconvenient for three reasons. // // 1) Funcs are defined over an infinite domain, so users of this // Func can for example access color_image(x, y, -17), which is // not a meaningful value and is probably indicative of a bug. // // 2) It requires a select, which can impact performance if not // bounded and unrolled: // brighter.bound(c, 0, 3).unroll(c); // // 3) With this method, all values in the collection must have the // same type. While the above two issues are merely inconvenient, // this one is a hard limitation that makes it impossible to // express certain things in this way. // 上述哪种方法有如下三个不便之处 // 1)函数定义在一个无线区域上,因此调用函数会出现一些无意义的调用,这样会导致一些可能存在的很隐晦的bug // 2)如果没有边界条件限制和代码平铺,会很影响程序的性能。 // 3)使用这种方法的多值函数必须是同种的数据类型。对于不同中数据类型的多值函数,本方法不是很方便。 // It is also possible to represent a collection of values as a // collection of Funcs: // 同样可以类似函数数组那样来表达多值函数。 Func func_array[3]; func_array[0](x, y) = x + y; func_array[1](x, y) = sin(x); func_array[2](x, y) = cos(y); // This method avoids the three problems above, but introduces a // new annoyance. Because these are separate Funcs, it is // difficult to schedule them so that they are all computed // together inside a single loop over x, y. // 这种方法可以有效避免上面提到的三个问题,但是引入了新的问题,因为这三个函数是相互独立的,很难同时 // 调度这样的多值函数。 // A third alternative is to define a Func as evaluating to a // Tuple instead of an Expr. A Tuple is a fixed-size collection of // Exprs. Each Expr in a Tuple may have a different type. The // following function evaluates to an integer value (x+y), and a // floating point value (sin(x*y)). // 第三种可行的方法是将函数定义成元组,而不是表达式。元组的每一个表达式可以有不同的数据类型。 // 如下的元组有整型值x+y和浮点型值sin(x*y) Func multi_valued; multi_valued(x, y) = Tuple(x + y, sin(x * y)); // Realizing a tuple-valued Func returns a collection of // Buffers. We call this a Realization. It's equivalent to a // std::vector of Buffer objects: // 实现(这里的realize指的是halide的jit编译和执行)元组型函数返回一个多个buffer。 // 相当于一个buffer向量对象,需要注意的是,每个buffer可以有不同的数据类型。 { Realization r = multi_valued.realize(80, 60); assert(r.size() == 2); Buffer<int> im0 = r[0]; Buffer<float> im1 = r[1]; assert(im0(30, 40) == 30 + 40); assert(im1(30, 40) == sinf(30 * 40)); } // All Tuple elements are evaluated together over the same domain // in the same loop nest, but stored in distinct allocations. The // equivalent C++ code to the above is: { int multi_valued_0[80*60]; float multi_valued_1[80*60]; for (int y = 0; y < 80; y++) { for (int x = 0; x < 60; x++) { multi_valued_0[x + 60*y] = x + y; multi_valued_1[x + 60*y] = sinf(x*y); } } } // When compiling ahead-of-time, a Tuple-valued Func evaluates // into multiple distinct output buffer_t structs. These appear in // order at the end of the function signature: // int multi_valued(...input buffers and params..., // buffer_t *output_1, buffer_t *output_2); // 当采用提前编译的方法编译时,元组型多值函数按照定义时的顺序,返回多值函数的buffer // You can construct a Tuple by passing multiple Exprs to the // Tuple constructor as we did above. Perhaps more elegantly, you // can also take advantage of C++11 initializer lists and just // enclose your Exprs in braces: // 可以通过元组构造函数的方法将多值表达式传给多值函数,也可以采用c++11语法的初始化列表的方式用大括号 // 初始化。 Func multi_valued_2; multi_valued_2(x, y) = {x + y, sin(x*y)}; // Calls to a multi-valued Func cannot be treated as Exprs. The // following is a syntax error: // Func consumer; // 调用多值函数不再能将它们当作一个表达式处理了 // consumer(x, y) = multi_valued_2(x, y) + 10; // Instead you must index a Tuple with square brackets to retrieve // the individual Exprs: // 取而代之的是可以采用元组的方法,用类似数组下标的方式分别索引处理。 Expr integer_part = multi_valued_2(x, y)[0]; Expr floating_part = multi_valued_2(x, y)[1]; Func consumer; consumer(x, y) = {integer_part + 10, floating_part + 10.0f}; // Tuple reductions. { // Tuples are particularly useful in reductions, as they allow // the reduction to maintain complex state as it walks along // its domain. The simplest example is an argmax. // 元组在约减操作中非常有用,因为它们允许在约减操作时维护非常复杂的状态。 // First we create a Buffer to take the argmax over. Func input_func; input_func(x) = sin(x); Buffer<float> input = input_func.realize(100); // Then we define a 2-valued Tuple which tracks the index of // the maximum value and the value itself. // 定义一个二值函数,返回最大值的坐标和最大值。 Func arg_max; // Pure definition. arg_max() = {0, input(0)}; // Update definition. RDom r(1, 99); Expr old_index = arg_max()[0]; Expr old_max = arg_max()[1]; Expr new_index = select(old_max < input(r), r, old_index); Expr new_max = max(input(r), old_max); arg_max() = {new_index, new_max}; // The equivalent C++ is: int arg_max_0 = 0; float arg_max_1 = input(0); for (int r = 1; r < 100; r++) { int old_index = arg_max_0; float old_max = arg_max_1; int new_index = old_max < input(r) ? r : old_index; float new_max = std::max(input(r), old_max); // In a tuple update definition, all loads and computation // are done before any stores, so that all Tuple elements // are updated atomically with respect to recursive calls // to the same Func. arg_max_0 = new_index; arg_max_1 = new_max; } // Let's verify that the Halide and C++ found the same maximum // value and index. { Realization r = arg_max.realize(); Buffer<int> r0 = r[0]; Buffer<float> r1 = r[1]; assert(arg_max_0 == r0(0)); assert(arg_max_1 == r1(0)); } // Halide provides argmax and argmin as built-in reductions // similar to sum, product, maximum, and minimum. They return // a Tuple consisting of the point in the reduction domain // corresponding to that value, and the value itself. In the // case of ties they return the first value found. We'll use // one of these in the following section. // Halide提供了argmax/argmin的内置函数,返回最大值在的坐标和相应的最大值。 } // Tuples for user-defined types. { // Tuples can also be a convenient way to represent compound // objects such as complex numbers. Defining an object that // can be converted to and from a Tuple is one way to extend // Halide's type system with user-defined types. // 元组可以方便的表达其他的复合对象,这里以复数为例,讲述如何拓展Halide的数据类型。 // 主要是元组的使用和Halide内置运算符重载等内容。这里的struc和公有成员变量和成员函数的class一致 struct Complex { Expr real, imag; // Construct from a Tuple Complex(Tuple t) : real(t[0]), imag(t[1]) {} // Construct from a pair of Exprs Complex(Expr r, Expr i) : real(r), imag(i) {} // Construct from a call to a Func by treating it as a Tuple Complex(FuncRef t) : Complex(Tuple(t)) {} // Convert to a Tuple operator Tuple() const { return {real, imag}; } // Complex addition Complex operator+(const Complex &other) const { return {real + other.real, imag + other.imag}; } // Complex multiplication Complex operator*(const Complex &other) const { return {real * other.real - imag * other.imag, real * other.imag + imag * other.real}; } // Complex magnitude, squared for efficiency Expr magnitude_squared() const { return real * real + imag * imag; } // Other complex operators would go here. The above are // sufficient for this example. }; // Let's use the Complex struct to compute a Mandelbrot set. Func mandelbrot; // The initial complex value corresponding to an x, y coordinate // in our Func. Complex initial(x/15.0f - 2.5f, y/6.0f - 2.0f); // Pure definition. Var t; mandelbrot(x, y, t) = Complex(0.0f, 0.0f); // We'll use an update definition to take 12 steps. // 通过RDom来限制循环次数 RDom r(1, 12); Complex current = mandelbrot(x, y, r-1); // The following line uses the complex multiplication and // addition we defined above. mandelbrot(x, y, r) = current*current + initial; // We'll use another tuple reduction to compute the iteration // number where the value first escapes a circle of radius 4. // This can be expressed as an argmin of a boolean - we want // the index of the first time the given boolean expression is // false (we consider false to be less than true). The argmax // would return the index of the first time the expression is // true. // 用一个元组来计算迭代过程中,复数点一次超出了半径为4的圆。可以用argmin来实现。 Expr escape_condition = Complex(mandelbrot(x, y, r)).magnitude_squared() < 16.0f; Tuple first_escape = argmin(escape_condition); // We only want the index, not the value, but argmin returns // both, so we'll index the argmin Tuple expression using // square brackets to get the Expr representing the index. // 由于我们需要的知识坐标,因此只需引用元组的第一个元素即可 Func escape; escape(x, y) = first_escape[0]; // Realize the pipeline and print the result as ascii art. Buffer<int> result = escape.realize(61, 25); const char *code = " .:-~*={}&%#@"; for (int y = 0; y < result.height(); y++) { for (int x = 0; x < result.width(); x++) { printf("%c", code[result(x, y)]); } printf("\n"); } } printf("Success!\n"); return 0;}
编译和执行:
$ g++ lesson_13*.cpp -g -I ../include -L ../bin -lHalide -lpthread -ldl -o lesson_13 -std=c++11$ ./lesson_13
结果:
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::-------------------::::::::::::::::::::::::::::::::::::-------------------------------::::::::::::::::::::::::::---------------------------------------:::::::::::::::::::----------------------~~~~~~~~~--------------::::::::::::::------------------~~~~~~~~~*****~~~~~~-----------::::::::::----------------~~~~~~~~~****={#%@=**~~~~~-----------:::::::-------------~~~~~~~~~****==={% %{=***~~~~----------::::------------~~~~~~~******==& @% @#}} =*~~~~----------::--------~~~~~~~***======{{& #{=*~~~~---------:-------~~~~~~***==} @% %%&% =**~~~----------------~~~***==={}&# }=**~~~~---------------~* &{=**~~~~---------------~~~***==={}&# }=**~~~~----------------~~~~~~***==} @% %%&% =**~~~----------:--------~~~~~~~***======{{& #{=*~~~~---------::------------~~~~~~~******==& @% @#}} =*~~~~----------::::-------------~~~~~~~~~****==={% %{=***~~~~----------:::::::----------------~~~~~~~~~****={#%@=**~~~~~-----------::::::::::------------------~~~~~~~~~*****~~~~~~-----------::::::::::::::----------------------~~~~~~~~~--------------:::::::::::::::::::---------------------------------------::::::::::::::::::::::::::-------------------------------::::::::::::::::::::::::::::::::::::-------------------::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::Success!
本节要点提炼:
1. 用元组构建多值函数
multi_value(x, y) = Tuple(x+y, x*y);
或者采用c++11风格的
multi_value(x, y) = {x+y, x*y};
元组多值函数的引用
multi_value(x, y)[0];// 多值函数的第一个分量
multi_value(x, y)[1];// 多值函数的第二分量
argmin/argmax内置约减函数,返回的是一个多值函数,第一个为对应的坐标,第二个为对应的最大值或者最小值
拓展Halide数据类型,运算符需要重载。
可以用RDom变量来达到指定次数的循环。
阅读全文