编写TensorFlow文档

来源：互联网发布：淘宝账号权重查询编辑：程序博客网时间：2024/06/16 06:32

我们欢迎来自社区的Tensorflow文档的贡献。本文档介绍了如何为该文档做出贡献。特别地，本文档解释了以下内容：

文件所在位置。
如何进行一致的编辑。
在提交文档之前如何构建和测试您的文档更改。

您可以在tensorflow.org上查看Tensorflow文档，您可以在Github上查看和编辑原始文件。

版本说明

tensorflow.org，在root显示最新的稳定二进制文件。如果您正在使用pip安装TensorFlow，这是您应该阅读的文档。

然而，大多数开发人员将向Github主分支提供文档，该分支偶尔会在tensorflow.org/versions/master上发布。

如果希望文档更改显示在根目录下，您还需要将该更改提交给当前稳定的二进制分支（和/或 cherrypick）。

参考文献与非参考文献

以下参考文档由代码中的注释自动生成：

C ++ API参考文档
Java API参考文档
Python API参考文档

要修改参考文档，请编辑相应的代码注释。

非参考文档（例如，TensorFlow安装指南）由人类创作。该文档位于tensorflow/docs_src 目录中。每个子目录docs_src包含一组相关的Tensorflow文档。例如，TensorFlow安装指南全部在 docs_src/install目录中。

C ++文档是通过doxygen生成的XML文件生成的; 但是，这些工具目前在开放源代码中不可用。

Markdown

可编辑的TensorFlow文档是用Markdown编写的。除了少数例外，TensorFlow使用标准的Markdown规则。

本节介绍标准的Markdown规则与可编辑TensorFlow文档使用的Markdown规则之间的主要区别。

Markdown中的数学

编辑Markdown文件时，您可以在TensorFlow中使用MathJax，但请注意以下事项：

MathJax在tensorflow.org上正确呈现
MathJax在github上无法正常呈现。

当写MathJax，你可以用$$和\$和\$包围你的数学。 $$守卫会导致换行符，所以在文本中使用\$ \$。

Markdown中的链接

链接分为几类：

链接到同一文件的不同部分
链接到tensorflow.org之外的URL
从Markdown文件（或代码注释）到tensorflow.org中的另一个文件的链接

对于前两个链接类别，您可以使用标准的Markdown链接，但将链接完全放在一行上，而不是跨线分割。例如：

[text](link) # Good link
[text]\n(link) # Bad link
[text](\nlink) # Bad link

对于最终链接类别（在tensorflow.org中的另一个文件的链接），请使用特殊的链接参数化机制。这种机制使作者能够移动和重新组织文件，而不会中断链接。

参数化方案如下。使用：

@{tf.symbol}链接到Python符号的参考页面。请注意，类成员没有自己的页面，但语法仍然有效，因为@{tf.MyClass.method}链接到tf.MyClass页面的正确部分。
@{tensorflow::symbol} 链接到C ++符号的参考页面。
@{$doc_page}链接到另一个（不是API参考）文档页面。链接到
- red/green/blue/index.md使用@{$blue}或 @{$green/blue}，
- foo/bar/baz.md使用@{$baz}或 @{$bar/baz}。
较短的一个是首选的，所以我们可以在不打破这些引用的情况下移动页面。主要的例外是应该引用Python API指南来避免歧义。@{$python/}
@{$doc_page#anchor-tag$link-text} 链接到该文档中的锚点并使用不同的链接文本（默认情况下，链接文本是目标页面的标题）。
要覆盖链接文本，请忽略#anchor-tag。

要链接到源代码，请使用以：开头的链接 https://www.github.com/tensorflow/tensorflow/blob/r1.1/，后跟文件名从github根开始。例如，您正在阅读的文件的链接应写为https://www.github.com/tensorflow/tensorflow/blob/r1.1/tensorflow/docs_src/community/documentation.md。

此URL命名方案确保tensorflow.org可以将链接转发到与您正在查看的文档版本相对应的代码的分支。不要在源代码URL中包含url参数。

生成文档和预览链接

在构建文档之前，您必须首先通过执行以下操作来设置环境：

1.如果您的计算机上未安装pip，请通过发出以下命令现在安装：
$ sudo easy_install pip
2.使用pip通过发出以下命令来安装codegen，mock和pandas（注意：如果您正在使用virtualenv来管理依赖项，则可能不需要对这些安装使用sudo）：
$ sudo pip install codegen mock pandas
3.如果您的机器上没有安装bazel，请立即安装。如果您在Linux上，请通过发出以下命令来安装bazel：
$ sudo apt-get install bazel # Linux

如果您在Mac OS上，请在此页面上找到bazel安装说明。

4.将目录更改为tensorflowTensorFlow源代码的顶级目录。
5.运行configure脚本并适当地回答系统的提示。
$ ./configure

然后，更改到tensorflow包含docs_src（cd tensorflow）的目录。运行以下命令来编译TensorFlow并在/tmp/tfdocsdir中生成文档：

bazel run tools/docs:generate -- \
          --src_dir=`pwd`/tensorflow/docs_src/ \
          --output_dir=/tmp/tfdocs/

注意：您必须设置src_dir和output_dir绝对文件路径。

生成Python API文档

操作，类和实用程序函数在Python模块中定义，例如 image_ops.py。Python模块包含一个模块docstring。例如：

"""Image processing and decoding ops."""

文档生成器将该模块的docstring放在为模块生成的Markdown文件的开始处，在这种情况下为tf.image。

它曾经是一个要求，列出模块文件中的@@每个成员的开头，放在每个成员之前。该@@member_name 语法已被弃用，不再产生任何文档。但是，根据模块的封装方式，仍然可能需要将模块内容的元素标记为公开。被调出的op，函数或类不必在同一个文件中定义。本文档的下几个部分将讨论密封以及如何向公共文档添加元素。

新的文档系统自动记录公共符号，除了以下内容：

名称以下划线开头的私人符号。
最初在object或原型中定义的符号Message。
一些类成员，如__base__，__class__，这是动态创建的，但一般不具有有用的文档。

只需要在生成脚本中手动添加顶级模块（目前只有tf和tfdbg）。

密封模块

因为文档生成器会移动所有可见的符号，并且下降到任何它找到的东西，它会记录任何意外的符号。如果一个模块仅公开了旨在成为公共API一部分的符号，我们称之为密封。由于Python的宽松导入和可见性约定，天真写的Python代码将无意中暴露出许多实现细节的模块。不正确密封的模块可能暴露其他未密封的模块，这通常会导致文档生成器失败。这种失败是预期的行为。它确保我们的API定义明确，并允许我们更改实施细节（包括哪些模块导入到哪里），而不用担心意外中断用户。

如果模块意外导入，则通常会中断文档生成器（generate_test）。这是您密封模块所需的明确标志。但是，即使文档生成器成功，文档中也会显示不需要的符号。检查生成的文档，以确保所有记录的符号是预期的。如果有符号不应该在那里，您有以下选项来处理它们：

私人符号和进口
该remove_undocumented过滤器
遍历黑名单。

我们将在下面详细讨论这些选项。

私人符号和进口

符合API密封期望的最简单方法是使非公开符号为私有（通过前缀下划线_）。文档生成器尊重私有符号。这也适用于模块。如果唯一的问题是文档中显示的少量导入的模块（或破坏生成器），您可以在导入时简单地重命名它们，例如：

import sys as _sys

因为Python认为所有文件都是模块，所以这也适用于文件。如果您有一个包含以下两个文件/模块的目录：

module/__init__.py
module/private_impl.py

然后，module导入后，可以访问 module.private_impl。重命名private_impl.py以_private_impl.py解决问题。如果重命名模块尴尬，请继续阅读。

使用`remove_undocumented`过滤器

封装模块的另一种方法是将您的实现从API中分离出来。为此，请考虑使用remove_undocumented，其中包含允许的符号列表，并从模块中删除其他所有内容。例如，以下代码段演示了如何remove_undocumented将__init__.py文件放入模块中：

init .py：

# Use * imports only if __all__ defined in some_file
from tensorflow.some_module.some_file import *
 
# Otherwise import symbols directly
from tensorflow.some_module.some_other_file import some_symbol
 
from tensorflow.platform.all_util import remove_undocumented
 
_allowed_symbols = [‘some_symbol’, ‘some_other_symbol’]
 
remove_undocumented(__name__, allowed_exception_list=_allowed_symbols)

该@@member_name语法已过时，但它仍然在文档中的某些地方存在为指标，remove_undocumented这些符号是公开的。所有@@的都将最终被删除。但是，如果您看到它们，请不要随机删除它们，因为我们的某些系统仍在使用它们。

穿行黑名单

如果其他所有失败，您可以在遍历黑名单中添加条目 generate_lib.py. 几乎所有列表中的所有条目都是滥用其目的; 如果可以的话避免加入

遍历黑名单将合格的模块名称（不带前缀）映射tf.到不被下载到的本地名称。例如，以下条目将从遍历中排除some_module。

{ ...
  ‘contrib.my_module’: [‘some_module’]
  ...
} 

这意味着文档生成器将显示它的some_module存在，但它不会枚举其内容。

这个黑名单最初是为了确保用于平台抽象的系统模块（模拟，标志，...）可以被记录下来，而无需记录其内部空间。它超出此目的的用途是对于contrib而言可以接受的捷径，而不是核心张量流。

操作文档样式指南

模块的长期描述性模块级文档应该在API指南中docs_src/api_guides/python。

对于课堂和操作，理想情况下，您应按照演示顺序提供以下信息：

一个简短的句子，描述了op的作用。
当您将参数传递给操作时会发生什么的简短描述。
显示操作如何（伪代码最好）的示例。
要求，注意事项，重要说明（如有）。
op构造函数的输入，输出和Attrs或其他参数的描述。

这些中的每一个在下面更详细地描述。

以Markdown格式写下您的文字。这里有一个基本的语法参考。您可以使用MathJax的方程式（见上文有关限制）。

写关于代码

在文字中使用这些东西时，请反驳：

参数名称（例如，input，x，tensor）
回到张量名称（例如output，idx，out）
数据类型（例如，int32，float，uint8）
文本中引用的其他op名称（例如list_diff()，shuffle()）
类名（例如，Tensor当你实际上意味着一个Tensor对象时;如果你刚刚解释一个操作对张量，一个图形或一个操作的操作，那么不要大写或使用反引号）
文件名（例如image_ops.py，或 /path-to-your-data/xml/example-name）
数学表达式或条件（例如-1-input.dims() <= dim <= input.dims()）

在示例代码和伪代码示例中放置三个反引号。而==> 当您想要显示操作返回时，使用而不是单个等号。例如：

```
# 'input' is a tensor of shape [2, 3, 5]
(tf.expand_dims(input, 0)) ==> [1, 2, 3, 5]
``` 

如果您提供了Python代码示例，请添加python样式标签，以确保正确的语法突出显示：

```python
# some Python code
```

有关Markdown代码示例的反引号的两个注释：

如果需要，您可以使用backticks作为除Python之外的漂亮打印语言。语言的完整列表，请点击这里。
Markdown还允许您缩进四个空格来指定代码示例。但是，请勿缩进四个空格并同时使用反引号。使用一个或另一个。

张量尺寸

当你在谈论一般张量时，不要把这个词放大。当你在谈论提供给op作为参数或由op返回的特定对象时，你应该使用Tensor这个词，并在其周围添加反引号，因为你在谈论一个Tensor对象。

不要使用这个词Tensors来描述多个Tensor对象，除非你真的在谈论一个Tensors对象。更好地说“ Tensor 物品清单”。

使用术语“维度”来表示张量的大小。如果您需要具体关于大小，请使用以下约定：

参考标量为“0-D张量”
参考矢量作为“1-D张量”
参考矩阵为“2-D张量”
参考具有3维或更多维度的张量作为3-D张量或nD张量。只有在有意义的时候才使用“rank”这个词，而是尝试使用“dimension”。不要使用单词“order”来描述张量的大小。

使用“形状”一词来详细说明张量的尺寸，并用反引号显示方括号中的形状。例如：

If `input` is a 3-D tensor with shape `[3, 4, 3]`, this operation
returns a 3-D tensor with shape `[6, 8, 6]`.

在C ++中定义的操作

所有在C ++中定义的操作（并且可以从其他语言访问）必须用声明来REGISTER_OP记录。处理C ++文件中的docstring会自动为输入类型，输出类型和Attr类型以及默认值添加一些信息。

例如：

```c++
REGISTER_OP("PngDecode")
  .Input("contents: string")
  .Attr("channels: int = 0")
  .Output("image: uint8")
  .Doc(R"doc(
Decodes the contents of a PNG file into a uint8 tensor.
 
contents: PNG file contents.
channels: Number of color channels, or 0 to autodetect based on the input.
  Must be 0 for autodetect, 1 for grayscale, 3 for RGB, or 4 for RGBA.
  If the input has a different number of channels, it will be transformed
  accordingly.
image:= A 3-D uint8 tensor of shape `[height, width, channels]`.
  If `channels` is 0, the last dimension is determined
  from the png contents.
)doc");
``` 

结果在这片Markdown：

### tf.image.png_decode(contents, channels=None, name=None) {#png_decode}
 
Decodes the contents of a PNG file into a uint8 tensor.
 
#### Args:
 
*  <b>contents</b>: A string Tensor. PNG file contents.
*  <b>channels</b>: An optional int. Defaults to 0.
   Number of color channels, or 0 to autodetect based on the input.
   Must be 0 for autodetect, 1 for grayscale, 3 for RGB, or 4 for RGBA.  If the
   input has a different number of channels, it will be transformed accordingly.
*  <b>name</b>: A name for the operation (optional).
 
#### Returns:
A 3-D uint8 tensor of shape `[height, width, channels]`.  If `channels` is
0, the last dimension is determined from the png contents. 

自动添加大部分参数说明。特别是，doc生成器自动添加所有输入，attrs和输出的名称和类型。在上面的例子中，<b>contents</b>: A string Tensor.被自动添加。您应该写下您的附加文字，以便在该描述之后自然流动。

对于输入和输出，您可以使用等号对其他文本前缀，以防止自动添加的名称和类型。在上面的例子中，命名image开始的描述是=为了防止A uint8 Tensor.在我们的文本之前添加A 3-D uint8 Tensor...。您不能以这种方式阻止添加attrs的名称，类型和默认值，因此请仔细阅读文本。

在Python中定义的操作

如果您的op在python/ops/*.py文件中定义，则需要为所有参数和输出（返回）张量提供文本。doc生成器不会自动生成Python中定义的op的任何文本，所以你写的是你所得到的。

您应该符合通常的Python docstring约定，但您应该在文档字符串中使用Markdown。

这是一个简单的例子：

def foo(x, y, name="bar"):
  """Computes foo.
 
  Given two 1-D tensors `x` and `y`, this operation computes the foo.
 
  Example:
 
  # x is [1, 1]
  # y is [2, 2]
  tf.foo(x, y) ==> [3, 3]
>     Args:
>       x: A `Tensor` of type `int32`.
>       y: A `Tensor` of type `int32`.
>       name: A name for the operation (optional).
>    
>     Returns:
>       A `Tensor` of type `int32` that is the foo of `x` and `y`.
>    
>     Raises:
>       ValueError: If `x` or `y` are not of type `int32`.
>     """

Docstring部分的描述

本节详细介绍了docstrings中的每个元素。

描述操作的简短句子

例子：

Concatenates tensors.
 
Flips an image horizontally from left to right.
 
Computes the Levenshtein distance between two sequences.
 
Saves a list of tensors to a file.
 
Extracts a slice from a tensor.

当您将参数传递给操作时会发生什么的简短描述

例子：

Given a tensor input of numerical type, this operation returns a tensor of
the same type and size with values reversed along dimension `seq_dim`. A
vector `seq_lengths` determines which elements are reversed for each index
within dimension 0 (usually the batch dimension).
 
This operation returns a tensor of type `dtype` and dimensions `shape`, with
all elements set to zero.

举例说明

良好的代码示例很简单，易于理解，通常包含一个简短的代码段，以澄清示例的示例。当操作者操纵Tensor的形状时，通常还可以包括前后的示例。

该squeeze()运算有一个很好的伪代码示例：

# 't' is a tensor of shape [1, 2, 1, 3, 1, 1]
shape(squeeze(t)) ==> [2, 3] 

该tile()运算提供了描述性文本一个很好的例子：

For example, tiling `[a, b, c, d]` by `[2]` produces `[a b c d a b c d]`.

在Python中显示代码示例通常是有帮助的。不要将它们放在C ++ Ops文件中，并避免将它们放在Python Ops文档中。如果可能，我们建议将代码示例放在 API指南中。否则，将它们添加到调用Ops构造函数的模块或类docstring中。

以下是模块docstring中的示例api_guides/python/math_ops.md：

## Segmentation
 
TensorFlow provides several operations that you can use to perform common
math computations on tensor segments.
...
In particular, a segmentation of a matrix tensor is a mapping of rows to
segments.
 
For example:
 
```python
c = tf.constant([[1,2,3,4], [-1,-2,-3,-4], [5,6,7,8]])
tf.segment_sum(c, tf.constant([0, 0, 1]))
  ==>  [[0 0 0 0]
        [5 6 7 8]]
```

要求，注意事项，重要说明

例子：

This operation requires that: `-1-input.dims() <= dim <= input.dims()`
 
Note: This tensor will produce an error if evaluated. Its value must
be fed using the `feed_dict` optional argument to `Session.run()`,
`Tensor.eval()`, or `Operation.run()`. 

参数和输出（返回）张量的描述。

要点简要说明。您不必解释操作在参数部分中的工作原理。

如果Op对输入或输出张量的尺寸有很强的限制，请注意。请记住，对于C ++操作，张量的类型自动添加为“A ..type .. Tensor”或“A类型在{...列表类型...}”中。在这种情况下，如果Op对尺寸有约束，则可以添加诸如“必须为4-D”的文本，或者用=（为了防止添加张量类型）开始描述，并写出“A 4-D float张量”。

例如，这里有两种记录C ++ op的图像参数的方法（注意“=”符号）：

image: Must be 4-D. The image to resize.
 
image:= A 4-D `float` tensor. The image to resize. 

在文档中，这些将被渲染为

image: A `float` Tensor. Must be 4-D. The image to resize.
 
image: A 4-D `float` Tensor. The image to resize.

可选参数说明（“attrs”）

文档生成器始终描述每个attr的类型及其默认值（如果有）。由于C ++和Python生成的文档的描述是非常不同的，因此您不能用等号来覆盖它。

短语任何其他attr描述，以便在类型和默认值之后流畅。首先显示类型和默认值，然后再附加说明。因此，完整的句子是最好的。

以下是一个例子image_ops.cc：

REGISTER_OP("DecodePng")
    .Input("contents: string")
    .Attr("channels: int = 0")
    .Attr("dtype: {uint8, uint16} = DT_UINT8")
    .Output("image: dtype")
    .SetShapeFn(DecodeImageShapeFn)
    .Doc(R"doc(
Decode a PNG-encoded image to a uint8 or uint16 tensor.
 
The attr `channels` indicates the desired number of color channels for the
decoded image.
 
Accepted values are:
 
*   0: Use the number of channels in the PNG-encoded image.
*   1: output a grayscale image.
*   3: output an RGB image.
*   4: output an RGBA image.
 
If needed, the PNG-encoded image is transformed to match the requested
number of color channels.
 
contents: 0-D.  The PNG-encoded image.
channels: Number of color channels for the decoded image.
image: 3-D with shape `[height, width, channels]`.
)doc");

这将在以下生成以下Args部分 api_docs/python/tf/image/decode_png.md：

#### Args:
 
* <b>`contents`</b>: A `Tensor` of type `string`. 0-D.  The PNG-encoded
  image.
* <b>`channels`</b>: An optional `int`. Defaults to `0`. Number of color
  channels for the decoded image.
* <b>`dtype`</b>: An optional `tf.DType` from: `tf.uint8,
  tf.uint16`. Defaults to `tf.uint 8`.
* <b>`name`</b>: A name for the operation (optional).

阅读全文

0 0