理解卷积Understanding Convolutions
来源:互联网 发布:数控g72端面编程 编辑:程序博客网 时间:2024/06/02 06:57
In a previous post, we built up an understanding of convolutional neural networks, without referring to any significant mathematics. To go further, however, we need to understand convolutions.
If we just wanted to understand convolutional neural networks, it might suffice to roughly understand convolutions. But the aim of this series is to bring us to the frontier of convolutional neural networks and explore new options. To do that, we’re going to need to understand convolutions very deeply.
Thankfully, with a few examples, convolution becomes quite a straightforward idea.
Lessons from a Dropped Ball
Imagine we drop a ball from some height onto the ground, where it only has one dimension of motion. How likely is it that a ball will go a distance
Let’s break this down. After the first drop, it will land
Now after this first drop, we pick the ball up and drop it from another height above the point where it first landed. The probability of the ball rolling
If we fix the result of the first drop so we know the ball went distance
Let’s think about this with a specific discrete example. We want the total distance
However, this isn’t the only way we could get to a total distance of 3. The ball could roll 1 units the first time, and 2 the second. Or 0 units the first time and all 3 the second. It could go any
The probabilities are
In order to find the total likelihood of the ball reaching a total distance of
We already know that the probability for each case of
Turns out, we’re doing a convolution! In particular, the convolution of
If we substitute
This is the standard definition2 of convolution.
To make this a bit more concrete, we can think about this in terms of positions the ball might land. After the first drop, it will land at an intermediate position
To get the convolution, we consider all intermediate positions.
Visualizing Convolutions
There’s a very nice trick that helps one think about convolutions more easily.
First, an observation. Suppose the probability that a ball lands a certain distance
If we know the ball lands at a position
So the probability that the previous position was
Now, consider the probability each intermediate position contributes to the ball finally landing at
Summing over the
The advantage of this approach is that it allows us to visualize the evaluation of a convolution at a value
For example, we can see that it peaks when the distributions align.
And shrinks as the intersection between the distributions gets smaller.
By using this trick in an animation, it really becomes possible to visually understand convolutions.
Below, we’re able to visualize the convolution of two box functions:
Armed with this perspective, a lot of things become more intuitive.
Let’s consider a non-probabilistic example. Convolutions are sometimes used in audio manipulation. For example, one might use a function with two spikes in it, but zero everywhere else, to create an echo. As our double-spiked function slides, one spike hits a point in time first, adding that signal to the output sound, and later, another spike follows, adding a second, delayed copy.
Higher Dimensional Convolutions
Convolutions are an extremely general idea. We can also use them in a higher number of dimensions.
Let’s consider our example of a falling ball again. Now, as it falls, it’s position shifts not only in one dimension, but in two.
Convolution is the same as before:
Except, now
Or in the standard definition:
Just like one-dimensional convolutions, we can think of a two-dimensional convolution as sliding one function on top of another, multiplying and adding.
One common application of this is image processing. We can think of images as two-dimensional functions. Many important image transformations are convolutions where you convolve the image function with a very small, local function called a “kernel.”
The kernel slides to every position of the image and computes a new pixel as a weighted sum of the pixels it floats over.
For example, by averaging a 3x3 box of pixels, we can blur an image. To do this, our kernel takes the value
We can also detect edges by taking the values
The gimp documentation has many other examples.
Convolutional Neural Networks
So, how does convolution relate to convolutional neural networks?
Consider a 1-dimensional convolutional layer with inputs
As we observed, we can describe the outputs in terms of the inputs:
Generally,
Recall that a typical neuron in a neural network is described by:
Where
It’s this wiring of neurons, describing all the weights and which ones are identical, that convolution will handle for us.
Typically, we describe all the neurons in a layers at once, rather than individually. The trick is to have a weight matrix,
For example, we get:
Each row of the matrix describes the weights connecting a neuron to its inputs.
Returning to the convolutional layer, though, because there are multiple copies of the same neuron, many weights appear in multiple positions.
Which corresponds to the equations:
So while, normally, a weight matrix connects every input to every neuron with different weights:
The matrix for a convolutional layer like the one above looks quite different. The same weights appear in a bunch of positions. And because neurons don’t connect to many possible inputs, there’s lots of zeros.
Multiplying by the above matrix is the same thing as convolving with
What about two-dimensional convolutional layers?
The wiring of a two dimensional convolutional layer corresponds to a two-dimensional convolution.
Consider our example of using a convolution to detect edges in an image, above, by sliding a kernel around and applying it to every patch. Just like this, a convolutional layer will apply a neuron to every patch of the image.
Conclusion
We introduced a lot of mathematical machinery in this blog post, but it may not be obvious what we gained. Convolution is obviously a useful tool in probability theory and computer graphics, but what do we gain from phrasing convolutional neural networks in terms of convolutions?
The first advantage is that we have some very powerful language for describing the wiring of networks. The examples we’ve dealt with so far haven’t been complicated enough for this benefit to become clear, but convolutions will allow us to get rid of huge amounts of unpleasant book-keeping for us.
Secondly, convolutions come with significant implementational advantages. Many libraries provide highly efficient convolution routines. Further, while convolution naively appears to be an
In fact, the use of highly-efficient parallel convolution implementations on GPUs has been essential to recent progress in computer vision.
Next Posts in this Series
This post is part of a series on convolutional neural networks and their generalizations. The first two posts will be review for those familiar with deep learning, while later ones should be of interest to everyone. To get updates, subscribe to my RSS feed!
Please comment below or on the side. Pull requests can be made on github.
Acknowledgments
I’m extremely grateful to Eliana Lorch, for extensive discussion of convolutions and help writing this post.
I’m also grateful to Michael Nielsen and Dario Amodei for their comments and support.
We want the probability of the ball rolling
a a units the first time and also rollingb b units the second time. The distributionsP(A)=f(a) P(A)=f(a) andP(b)=g(b) P(b)=g(b) are independent, with both distributions centered at 0. SoP(a,b)=P(a)∗P(b)=f(a)⋅g(b) P(a,b)=P(a)∗P(b)=f(a)⋅g(b).↩The non-standard definition, which I haven’t previously seen, seems to have a lot of benefits. In future posts, we will find this definition very helpful because it lends itself to generalization to new algebraic structures. But it also has the advantage that it makes a lot of algebraic properties of convolutions really obvious.
For example, convolution is a commutative operation. That is,
f∗g=g∗f f∗g=g∗f. Why?∑a+b=cf(a)⋅g(b) = ∑b+a=cg(b)⋅f(a) ∑a+b=cf(a)⋅g(b) = ∑b+a=cg(b)⋅f(a)Convolution is also associative. That is,
(f∗g)∗h=f∗(g∗h) (f∗g)∗h=f∗(g∗h). Why?↩∑(a+b)+c=d(f(a)⋅g(b))⋅h(c) = ∑a+(b+c)=df(a)⋅(g(b)⋅h(c)) ∑(a+b)+c=d(f(a)⋅g(b))⋅h(c) = ∑a+(b+c)=df(a)⋅(g(b)⋅h(c))There’s also the bias, which is the “threshold” for whether the neuron fires, but it’s much simpler and I don’t want to clutter this section talking about it.↩
- 理解卷积Understanding Convolutions
- Understanding Convolutions
- Understanding Convolutions
- Understanding Convolutions
- 图像卷积:Image Convolutions
- 卷积Groups & Group Convolutions
- Dilated Convolutions——扩张卷积
- Understanding Convolutional Neural Networks for NLP(理解NLP中的卷积神经网络) 阅读笔记
- Understanding Convolutional Neural Networks for NLP(理解自然语言处理中的卷积神经网络)
- Going deeper with convolutions:卷积的更深一些
- 膨胀卷积--Multi-scale context aggregation by dilated convolutions
- 卷积理解
- 卷积理解
- 理解 卷积
- 理解卷积
- 卷积理解
- 理解卷积
- 理解卷积
- continue继续循环
- static proxy
- Laravel学习笔记(二)Laravel 应用程序的体系结构
- java作业 流水线
- Laravel学习笔记(三)数据库 数据库迁移
- 理解卷积Understanding Convolutions
- Android JSON数据解析
- Xcode的Architectures、Valid Architectures和Build Active Architecture Only属性
- Laravel学习笔记(四)数据库 数据库迁移案例
- Laravel学习笔记(五)数据库 数据库迁移案例2——创建数据结构,数据表,修改数据结构
- 码农小汪-synchronized
- Laravel学习笔记(六)数据库 数据库填充
- u-boot网络启动分析(一) 网络初窥
- Android网络编程(三)Volley用法全解析