tensorflow 滑动平均模型 ExponentialMovingAverage
来源:互联网 发布:maxwell软件渲染 编辑:程序博客网 时间:2024/06/05 23:57
____tz_zs学习笔记
滑动平均模型对于采用GradientDescent或Momentum训练的神经网络的表现都有一定程度上的提升。
原理:在训练神经网络时,不断保持和更新每个参数的滑动平均值,在验证和测试时,参数的值使用其滑动平均值,能有效提高神经网络的准确率。
tf.train.ExponentialMovingAverage
tensorflow官网地址:https://www.tensorflow.org/versions/r0.12/api_docs/python/train/moving_averages
tensorflow中提供了tf.train.ExponentialMovingAverage来实现滑动平均模型,他使用指数衰减来计算变量的移动平均值。
tf.train.ExponentialMovingAverage.__init__(self, decay, num_updates=None, zero_debias=False, name="ExponentialMovingAverage"):
decay是衰减率
num_updates是ExponentialMovingAverage提供用来动态设置decay的参数,当初始化时提供了参数,即不为none时,每次的衰减率是:
min { decay , ( 1 + num_updates ) / ( 10 + num_updates ) }
apply()方法添加了训练变量的影子副本,并保持了其影子副本中训练变量的移动平均值操作。在每次训练之后调用此操作,更新移动平均值。
average()和average_name()方法可以获取影子变量及其名称。
在创建ExponentialMovingAverage对象时,需指定衰减率(decay),用于控制模型的更新速度。影子变量的初始值与训练变量的初始值相同。当运行变量更新时,每个影子变量都会更新为:
shadow_variable = decay * shadow_variable + (1 - decay) * variable
decay设置为接近1的值比较合理,通常为:0.999,0.9999等
滑动平均的原理理解
# -*- coding: utf-8 -*-"""@author: tz_zs滑动平均模型"""import tensorflow as tf# 定义一个变量,用于滑动平均计算、v1 = tf.Variable(0, dtype=tf.float32)# 定义一个变量step,表示迭代的轮数,用于动态控制衰减率step = tf.Variable(0, trainable=False)# 定义滑动平均的对象ema = tf.train.ExponentialMovingAverage(0.99, step)# 定义执行保持滑动平均的操作, 参数为一个列表格式maintain_average_op = ema.apply([v1])with tf.Session() as sess: # 初始化所有变量 init_op = tf.global_variables_initializer() sess.run(init_op) # 通过ema.average(v1)获取滑动平均之后变量的取值, # print(sess.run(v1)) # 0.0 # print(sess.run([ema.average_name(v1), ema.average(v1)])) # [None, 0.0] print(sess.run([v1, ema.average(v1)])) # [0.0, 0.0] # 更新变量v1的值为5 sess.run(tf.assign(v1, 5)) # 更新v1的滑动平均值,衰减率 min { decay , ( 1 + num_updates ) / ( 10 + num_updates ) }=0.1 # 所以v1的滑动平均会被更新为 0.1*0 + 0.9*5 = 4.5 sess.run(maintain_average_op) # print(sess.run(v1)) # 5.0 # print(sess.run([ema.average_name(v1), ema.average(v1)])) # [None, 4.5] print(sess.run([v1, ema.average(v1)])) # [5.0, 4.5] # 更新step的值为10000。模拟迭代轮数 sess.run(tf.assign(step, 10000)) # 跟新v1的值为10 sess.run(tf.assign(v1, 10)) # 更新v1的滑动平均值。衰减率为 min { decay , ( 1 + num_updates ) / ( 10 + num_updates ) }得到 0.99 # 所以v1的滑动平均值会被更新为 0.99*4.5 + 0.01*10 = 4.555 sess.run(maintain_average_op) print(sess.run([v1, ema.average(v1)])) # [10.0, 4.5549998] # 再次更新滑动平均值,将得到 0.99*4.555 + 0.01*10 =4.60945 sess.run(maintain_average_op) print(sess.run([v1, ema.average(v1)])) # [10.0, 4.6094499]
# -*- coding: utf-8 -*-"""@author: tz_zs"""import tensorflow as tfv1 = tf.Variable(10, dtype=tf.float32, name="v")for variables in tf.global_variables(): # all_variables弃用了 print(variables) # <tf.Variable 'v:0' shape=() dtype=float32_ref>ema = tf.train.ExponentialMovingAverage(0.99)print(ema) # <tensorflow.python.training.moving_averages.ExponentialMovingAverage object at 0x00000218AE5720F0>maintain_averages_op = ema.apply(tf.global_variables())for variables in tf.global_variables(): print(variables) # <tf.Variable 'v:0' shape=() dtype=float32_ref> # <tf.Variable 'v/ExponentialMovingAverage:0' shape=() dtype=float32_ref>with tf.Session() as sess: tf.global_variables_initializer().run() sess.run(tf.assign(v1, 1)) sess.run(maintain_averages_op) print(sess.run([v1, ema.average(v1)])) # [1.0, 9.9099998]
滑动平均值的存储和加载(持久化)
# -*- coding: utf-8 -*-"""@author: tz_zs滑动平均值的存储和加载(持久化)"""import tensorflow as tfv1 = tf.Variable(10, dtype=tf.float32, name="v1")for variables in tf.global_variables(): # all_variables弃用了 print(variables) # <tf.Variable 'v1:0' shape=() dtype=float32_ref>ema = tf.train.ExponentialMovingAverage(0.99)print(ema) # <tensorflow.python.training.moving_averages.ExponentialMovingAverage object at 0x00000218AE5720F0>maintain_averages_op = ema.apply(tf.global_variables())for variables in tf.global_variables(): print(variables) # <tf.Variable 'v1:0' shape=() dtype=float32_ref> # <tf.Variable 'v1/ExponentialMovingAverage:0' shape=() dtype=float32_ref>saver = tf.train.Saver()print(saver) # <tensorflow.python.training.saver.Saver object at 0x0000026B7E591940>with tf.Session() as sess: tf.global_variables_initializer().run() sess.run(tf.assign(v1, 1)) sess.run(maintain_averages_op) print(sess.run([v1, ema.average(v1)])) # [1.0, 9.9099998] print(saver.save(sess, "/path/to/model.ckpt")) # 持久化存储____会返回路径 /path/to/model.ckpt#################################################################################################print("#####" * 10)print("加载")#################################################################################################var2 = tf.Variable(0, dtype=tf.float32, name="v2") # <tf.Variable 'v2:0' shape=() dtype=float32_ref>print(var2)saver2 = tf.train.Saver({"v1/ExponentialMovingAverage": var2})with tf.Session() as sess2: saver2.restore(sess2, "/path/to/model.ckpt") print(sess2.run(var2)) # 9.91 所以,成功加载了v1的滑动平均值
也可以使用tensorflow提供的variable_to_restore函数完成加载
var3 = tf.Variable(0, dtype=tf.float32, name="v1")print(var3) # <tf.Variable 'v1:0' shape=() dtype=float32_ref>ema = tf.train.ExponentialMovingAverage(0.99)print(ema.variables_to_restore()) # {'v1/ExponentialMovingAverage': <tf.Variable 'v1:0' shape=() dtype=float32_ref>}saver = tf.train.Saver(ema.variables_to_restore())with tf.Session() as sess: saver.restore(sess, "/path/to/model.ckpt") print(sess.run(var3)) # 9.91
附录1:tensorflow1.2版本moving_averages.py源代码
# Copyright 2015 The TensorFlow Authors. All Rights Reserved.## Licensed under the Apache License, Version 2.0 (the "License");# you may not use this file except in compliance with the License.# You may obtain a copy of the License at## http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.# =============================================================================="""Maintain moving averages of parameters."""from __future__ import absolute_importfrom __future__ import divisionfrom __future__ import print_functionfrom tensorflow.python.framework import dtypesfrom tensorflow.python.framework import opsfrom tensorflow.python.ops import control_flow_opsfrom tensorflow.python.ops import init_opsfrom tensorflow.python.ops import math_opsfrom tensorflow.python.ops import state_opsfrom tensorflow.python.ops import variable_scopefrom tensorflow.python.ops import variablesfrom tensorflow.python.training import slot_creator# TODO(touts): switch to variables.Variable.def assign_moving_average(variable, value, decay, zero_debias=True, name=None): """Compute the moving average of a variable. The moving average of 'variable' updated with 'value' is: variable * decay + value * (1 - decay) The returned Operation sets 'variable' to the newly computed moving average. The new value of 'variable' can be set with the 'AssignSub' op as: variable -= (1 - decay) * (variable - value) Since variables that are initialized to a `0` value will be `0` biased, `zero_debias` optionally enables scaling by the mathematically correct debiasing factor of 1 - decay ** num_updates See `ADAM: A Method for Stochastic Optimization` Section 3 for more details (https://arxiv.org/abs/1412.6980). Args: variable: A Variable. value: A tensor with the same shape as 'variable'. decay: A float Tensor or float value. The moving average decay. zero_debias: A python bool. If true, assume the variable is 0-intialized and unbias it, as in https://arxiv.org/abs/1412.6980. See docstring in `_zero_debias` for more details. name: Optional name of the returned operation. Returns: A reference to the input 'variable' tensor with the newly computed moving average. """ with ops.name_scope(name, "AssignMovingAvg", [variable, value, decay]) as scope: with ops.colocate_with(variable): decay = ops.convert_to_tensor(1.0 - decay, name="decay") if decay.dtype != variable.dtype.base_dtype: decay = math_ops.cast(decay, variable.dtype.base_dtype) if zero_debias: update_delta = _zero_debias(variable, value, decay) else: update_delta = (variable - value) * decay return state_ops.assign_sub(variable, update_delta, name=scope)def weighted_moving_average(value, decay, weight, truediv=True, collections=None, name=None): """Compute the weighted moving average of `value`. Conceptually, the weighted moving average is: `moving_average(value * weight) / moving_average(weight)`, where a moving average updates by the rule `new_value = decay * old_value + (1 - decay) * update` Internally, this Op keeps moving average variables of both `value * weight` and `weight`. Args: value: A numeric `Tensor`. decay: A float `Tensor` or float value. The moving average decay. weight: `Tensor` that keeps the current value of a weight. Shape should be able to multiply `value`. truediv: Boolean, if `True`, dividing by `moving_average(weight)` is floating point division. If `False`, use division implied by dtypes. collections: List of graph collections keys to add the internal variables `value * weight` and `weight` to. Defaults to `[GraphKeys.GLOBAL_VARIABLES]`. name: Optional name of the returned operation. Defaults to "WeightedMovingAvg". Returns: An Operation that updates and returns the weighted moving average. """ # Unlike assign_moving_average, the weighted moving average doesn't modify # user-visible variables. It is the ratio of two internal variables, which are # moving averages of the updates. Thus, the signature of this function is # quite different than assign_moving_average. if collections is None: collections = [ops.GraphKeys.GLOBAL_VARIABLES] with variable_scope.variable_scope(name, "WeightedMovingAvg", [value, weight, decay]) as scope: value_x_weight_var = variable_scope.get_variable( "value_x_weight", shape=value.get_shape(), dtype=value.dtype, initializer=init_ops.zeros_initializer(), trainable=False, collections=collections) weight_var = variable_scope.get_variable( "weight", shape=weight.get_shape(), dtype=weight.dtype, initializer=init_ops.zeros_initializer(), trainable=False, collections=collections) numerator = assign_moving_average( value_x_weight_var, value * weight, decay, zero_debias=False) denominator = assign_moving_average( weight_var, weight, decay, zero_debias=False) if truediv: return math_ops.truediv(numerator, denominator, name=scope.name) else: return math_ops.div(numerator, denominator, name=scope.name)def _zero_debias(unbiased_var, value, decay): """Compute the delta required for a debiased Variable. All exponential moving averages initialized with Tensors are initialized to 0, and therefore are biased to 0. Variables initialized to 0 and used as EMAs are similarly biased. This function creates the debias updated amount according to a scale factor, as in https://arxiv.org/abs/1412.6980. To demonstrate the bias the results from 0-initialization, take an EMA that was initialized to `0` with decay `b`. After `t` timesteps of seeing the constant `c`, the variable have the following value: ``` EMA = 0*b^(t) + c*(1 - b)*b^(t-1) + c*(1 - b)*b^(t-2) + ... = c*(1 - b^t) ``` To have the true value `c`, we would divide by the scale factor `1 - b^t`. In order to perform debiasing, we use two shadow variables. One keeps track of the biased estimate, and the other keeps track of the number of updates that have occurred. Args: unbiased_var: A Variable representing the current value of the unbiased EMA. value: A Tensor representing the most recent value. decay: A Tensor representing `1-decay` for the EMA. Returns: The amount that the unbiased variable should be updated. Computing this tensor will also update the shadow variables appropriately. """ with variable_scope.variable_scope( unbiased_var.op.name, values=[unbiased_var, value, decay]) as scope: with ops.colocate_with(unbiased_var): with ops.control_dependencies(None): biased_initializer = init_ops.zeros_initializer( dtype=unbiased_var.dtype)(unbiased_var.get_shape()) local_step_initializer = init_ops.zeros_initializer() biased_var = variable_scope.get_variable( "biased", initializer=biased_initializer, trainable=False) local_step = variable_scope.get_variable( "local_step", shape=[], dtype=unbiased_var.dtype, initializer=local_step_initializer, trainable=False) # Get an update ops for both shadow variables. update_biased = state_ops.assign_sub(biased_var, (biased_var - value) * decay, name=scope.name) update_local_step = local_step.assign_add(1) # Compute the value of the delta to update the unbiased EMA. Make sure to # use the new values of the biased variable and the local step. with ops.control_dependencies([update_biased, update_local_step]): # This function gets `1 - decay`, so use `1.0 - decay` in the exponent. unbiased_ema_delta = (unbiased_var - biased_var.read_value() / (1 - math_ops.pow( 1.0 - decay, local_step.read_value()))) return unbiased_ema_deltaclass ExponentialMovingAverage(object): """Maintains moving averages of variables by employing an exponential decay. When training a model, it is often beneficial to maintain moving averages of the trained parameters. Evaluations that use averaged parameters sometimes produce significantly better results than the final trained values. The `apply()` method adds shadow copies of trained variables and add ops that maintain a moving average of the trained variables in their shadow copies. It is used when building the training model. The ops that maintain moving averages are typically run after each training step. The `average()` and `average_name()` methods give access to the shadow variables and their names. They are useful when building an evaluation model, or when restoring a model from a checkpoint file. They help use the moving averages in place of the last trained values for evaluations. The moving averages are computed using exponential decay. You specify the decay value when creating the `ExponentialMovingAverage` object. The shadow variables are initialized with the same initial values as the trained variables. When you run the ops to maintain the moving averages, each shadow variable is updated with the formula: `shadow_variable -= (1 - decay) * (shadow_variable - variable)` This is mathematically equivalent to the classic formula below, but the use of an `assign_sub` op (the `"-="` in the formula) allows concurrent lockless updates to the variables: `shadow_variable = decay * shadow_variable + (1 - decay) * variable` Reasonable values for `decay` are close to 1.0, typically in the multiple-nines range: 0.999, 0.9999, etc. Example usage when creating a training model: ```python # Create variables. var0 = tf.Variable(...) var1 = tf.Variable(...) # ... use the variables to build a training model... ... # Create an op that applies the optimizer. This is what we usually # would use as a training op. opt_op = opt.minimize(my_loss, [var0, var1]) # Create an ExponentialMovingAverage object ema = tf.train.ExponentialMovingAverage(decay=0.9999) # Create the shadow variables, and add ops to maintain moving averages # of var0 and var1. maintain_averages_op = ema.apply([var0, var1]) # Create an op that will update the moving averages after each training # step. This is what we will use in place of the usual training op. with tf.control_dependencies([opt_op]): training_op = tf.group(maintain_averages_op) ...train the model by running training_op... ``` There are two ways to use the moving averages for evaluations: * Build a model that uses the shadow variables instead of the variables. For this, use the `average()` method which returns the shadow variable for a given variable. * Build a model normally but load the checkpoint files to evaluate by using the shadow variable names. For this use the `average_name()` method. See the @{tf.train.Saver} for more information on restoring saved variables. Example of restoring the shadow variable values: ```python # Create a Saver that loads variables from their saved shadow values. shadow_var0_name = ema.average_name(var0) shadow_var1_name = ema.average_name(var1) saver = tf.train.Saver({shadow_var0_name: var0, shadow_var1_name: var1}) saver.restore(...checkpoint filename...) # var0 and var1 now hold the moving average values ``` """ def __init__(self, decay, num_updates=None, zero_debias=False, name="ExponentialMovingAverage"): """Creates a new ExponentialMovingAverage object. The `apply()` method has to be called to create shadow variables and add ops to maintain moving averages. The optional `num_updates` parameter allows one to tweak the decay rate dynamically. It is typical to pass the count of training steps, usually kept in a variable that is incremented at each step, in which case the decay rate is lower at the start of training. This makes moving averages move faster. If passed, the actual decay rate used is: `min(decay, (1 + num_updates) / (10 + num_updates))` Args: decay: Float. The decay to use. num_updates: Optional count of number of updates applied to variables. zero_debias: If `True`, zero debias moving-averages that are initialized with tensors. name: String. Optional prefix name to use for the name of ops added in `apply()`. """ self._decay = decay self._num_updates = num_updates self._zero_debias = zero_debias self._name = name self._averages = {} def apply(self, var_list=None): """Maintains moving averages of variables. `var_list` must be a list of `Variable` or `Tensor` objects. This method creates shadow variables for all elements of `var_list`. Shadow variables for `Variable` objects are initialized to the variable's initial value. They will be added to the `GraphKeys.MOVING_AVERAGE_VARIABLES` collection. For `Tensor` objects, the shadow variables are initialized to 0 and zero debiased (see docstring in `assign_moving_average` for more details). shadow variables are created with `trainable=False` and added to the `GraphKeys.ALL_VARIABLES` collection. They will be returned by calls to `tf.global_variables()`. Returns an op that updates all shadow variables as described above. Note that `apply()` can be called multiple times with different lists of variables. Args: var_list: A list of Variable or Tensor objects. The variables and Tensors must be of types float16, float32, or float64. Returns: An Operation that updates the moving averages. Raises: TypeError: If the arguments are not all float16, float32, or float64. ValueError: If the moving average of one of the variables is already being computed. """ # TODO(touts): op_scope if var_list is None: var_list = variables.trainable_variables() zero_debias_true = set() # set of vars to set `zero_debias=True` for var in var_list: if var.dtype.base_dtype not in [dtypes.float16, dtypes.float32, dtypes.float64]: raise TypeError("The variables must be half, float, or double: %s" % var.name) if var in self._averages: raise ValueError("Moving average already computed for: %s" % var.name) # For variables: to lower communication bandwidth across devices we keep # the moving averages on the same device as the variables. For other # tensors, we rely on the existing device allocation mechanism. with ops.control_dependencies(None): if isinstance(var, variables.Variable): avg = slot_creator.create_slot(var, var.initialized_value(), self._name, colocate_with_primary=True) # NOTE(mrry): We only add `tf.Variable` objects to the # `MOVING_AVERAGE_VARIABLES` collection. ops.add_to_collection(ops.GraphKeys.MOVING_AVERAGE_VARIABLES, var) else: avg = slot_creator.create_zeros_slot( var, self._name, colocate_with_primary=(var.op.type in ["Variable", "VariableV2"])) if self._zero_debias: zero_debias_true.add(avg) self._averages[var] = avg with ops.name_scope(self._name) as scope: decay = ops.convert_to_tensor(self._decay, name="decay") if self._num_updates is not None: num_updates = math_ops.cast(self._num_updates, dtypes.float32, name="num_updates") decay = math_ops.minimum(decay, (1.0 + num_updates) / (10.0 + num_updates)) updates = [] for var in var_list: zero_debias = self._averages[var] in zero_debias_true updates.append(assign_moving_average( self._averages[var], var, decay, zero_debias=zero_debias)) return control_flow_ops.group(*updates, name=scope) def average(self, var): """Returns the `Variable` holding the average of `var`. Args: var: A `Variable` object. Returns: A `Variable` object or `None` if the moving average of `var` is not maintained. """ return self._averages.get(var, None) def average_name(self, var): """Returns the name of the `Variable` holding the average for `var`. The typical scenario for `ExponentialMovingAverage` is to compute moving averages of variables during training, and restore the variables from the computed moving averages during evaluations. To restore variables, you have to know the name of the shadow variables. That name and the original variable can then be passed to a `Saver()` object to restore the variable from the moving average value with: `saver = tf.train.Saver({ema.average_name(var): var})` `average_name()` can be called whether or not `apply()` has been called. Args: var: A `Variable` object. Returns: A string: The name of the variable that will be used or was used by the `ExponentialMovingAverage class` to hold the moving average of `var`. """ if var in self._averages: return self._averages[var].op.name return ops.get_default_graph().unique_name( var.op.name + "/" + self._name, mark_as_used=False) def variables_to_restore(self, moving_avg_variables=None): """Returns a map of names to `Variables` to restore. If a variable has a moving average, use the moving average variable name as the restore name; otherwise, use the variable name. For example, ```python variables_to_restore = ema.variables_to_restore() saver = tf.train.Saver(variables_to_restore) ``` Below is an example of such mapping: ``` conv/batchnorm/gamma/ExponentialMovingAverage: conv/batchnorm/gamma, conv_4/conv2d_params/ExponentialMovingAverage: conv_4/conv2d_params, global_step: global_step ``` Args: moving_avg_variables: a list of variables that require to use of the moving variable name to be restored. If None, it will default to variables.moving_average_variables() + variables.trainable_variables() Returns: A map from restore_names to variables. The restore_name can be the moving_average version of the variable name if it exist, or the original variable name. """ name_map = {} if moving_avg_variables is None: # Include trainable variables and variables which have been explicitly # added to the moving_average_variables collection. moving_avg_variables = variables.trainable_variables() moving_avg_variables += variables.moving_average_variables() # Remove duplicates moving_avg_variables = set(moving_avg_variables) # Collect all the variables with moving average, for v in moving_avg_variables: name_map[self.average_name(v)] = v # Make sure we restore variables without moving average as well. for v in list(set(variables.global_variables()) - moving_avg_variables): if v.op.name not in name_map: name_map[v.op.name] = v return name_map
附录2:移动平均法相关知识(转)
来源地址:http://wiki.mbalib.com/wiki/%E7%A7%BB%E5%8A%A8%E5%B9%B3%E5%9D%87%E6%B3%95
移动平均法又称滑动平均法、滑动平均模型法(Moving average,MA)
什么是移动平均法?
移动平均法的种类
一、简单移动平均法
二、加权移动平均法
移动平均法的优缺点
移动平均法案例分析
案例一:移动平均法在公交运行时间预测中的应用
案例二:简单移动平均法在房地产中的运用[2]
案例三:加权移动平均法在计算销售额中的运用
- tensorflow 滑动平均模型 ExponentialMovingAverage
- Tensorflow滑动平均模型tf.train.ExponentialMovingAverage解析
- tensorflow 下的滑动平均模型 —— tf.train.ExponentialMovingAverage
- Tensorflow中提供tf.train.ExponentialMovingAverage函数实现(滑动平均模型)
- TensorFlow滑动平均模型
- tensorflow 滑动平均模型
- tensorflow--滑动平均模型
- 指数滑动平均(ExponentialMovingAverage)EMA
- TensorFlow中滑动平均模型介绍
- TensorFlow优化之滑动平均模型
- tensorflow ExponentialMovingAverage
- Tensorflow深度学习之九:滑动平均模型
- 9、Tensorflow: 移动平均法又称滑动平均法、滑动平均模型法(Moving average,MA)
- 滑动平均模型
- TensorFlow学习笔记-ExponentialMovingAverage
- tensorflow07 《TensorFlow实战Google深度学习框架》笔记-04-05滑动平均模型
- 78、tensorflow滑动平均模型,用来更新迭代的衰减系数
- tensorflow学习笔记(三十三):ExponentialMovingAverage
- Linux常用命令
- 正则表达式
- Redis常用命令
- POJ 3233
- Webview 详解
- tensorflow 滑动平均模型 ExponentialMovingAverage
- android上调试H5小工具
- 内部类
- CentOS7 使用 yum 安装 mysql 遇到的一个问题
- spring AspectJ基于注解报错问题以及基于xml后置通知和抛出异常通知报错问题
- JUC锁-10之 CyclicBarrier原理和示例
- JUC锁-11之 Semaphore信号量的原理和示例
- CentOS7配置AMP安装配置(Apache + MySQL + PHP)
- CentOS7 VNC SERVER的安装配置