神经网络的机器学习（Neural Networks for Machine Learning）(6)

来源：互联网发布：知乎live过期收听购买编辑：程序博客网时间：2024/06/05 20:56

Types of neural network architectures

In this video I’m going to describe various kinds of architectures for neural networks. What I mean by an architecture, is the way in which the neurons are connected together. By far the commonest type of architecture in practical applications is a feet forward neural network where the information comes into the imput units and flows in one direction through hidden layers until each reaches the output units. A much more interesting kind architecture is a recurrent neural network in which information can flow round in cycles. These networks can remember information for a long time. They can exhibit all sorts of interesting oscillations but they are much more difficult to train in part because they are so much more complicated in what they can do. Recently, however, people have made a lot of progress in training recurrent neural networks, and they can now do some fairly impressive things. The last kind of architecture that I’ll describe is a symmetrically-connected network, one in which the weights are the same in both directions between two units.

The commonest type of neural network in practical applications is a feed-forward neural network. This has some input units. And in the first layer at the bottom, some output units in the last layer at the top, and one or more layers of hidden units. If there’s more than one layer of hidden units, we call them deep neural networks. These networks compute a series of transformations between their input and their output. So at each layer, you get a new representation of the input in which things that were similar in the previous layer may have become less similar, or things that were dissimilar in the previous layer may have become more similar. So in speech recognition, for example, we’d like the same thing said by different speakers to become more similar, and different things said by the same speaker to be less similar as we go up through the layers of the network. In order to achieve this, we need the activities of the neurons in each layer to be a non-linear function of the activities in the layer below.

Recurrent neural networks are much more powerful than feed forward neural networks. They have directed cycles in the direct, in their connection graph. What this means is that if you start at a node or a neuron and you follow the arrows, you can sometimes get back to the neuron you started at. They can have very complicated dynamics, and this can make them very difficult to train. There’s a lot of interest at present at finding efficient ways of training our current net, because they are so powerful if we can train them. They’re also more biologically realistic. Recurrent neural networks with multiple hidden layers are really just a special case of a general recurrent neural network that has some of its hidden to hidden connections missing.

Recurring networks are a very natural way to model sequential data. So what we do is we have connections between hidden units. And the hidden units act like a network that’s very deep in time. So at each time step the states of the hidden units determines the states of the hidden units of the next time step. One way in which they differ from feed-forward nets is that we use the same weights at every time step. So if you look at those red arrows where the hidden units are determining the next state of the hidden units, the weight matrix depicted by each red arrow is the same at each time step. They also get inputs at every time stamp and often give outputs at every time stamp, and those’ll use the same weight matrices too. Recurrent networks have the ability to remember information in the hidden state for a long time. Unfortunately, it’s quite hard to train them to use that ability. However, recent algorithms have been able to do that.

So just to show you what recurrent neural nets can now do, I’m gonna show you a net designed by Ilya Sutskever. It’s a special kind of recurrent neural net, slightly different from the kind in the diagram on the previous slide, and it’s used to predict the next character in a sequence. So Ilya trained it on lots and lots of strings from English Wikipedia. It’s seeing English characters and trying to predict the next English character. He actually used different characters to allow for punctuation, and digits, and capital letters and so on. After you trained it, one way of seeing how well it can do is to see whether it assigns high probability to the next character that actually occurs. Another way of seeing what it can do is to get it to generate text. So what you do is you give it a string of characters and get it to predict probabilities for the next character. Then you pick the next character from that probability distribution. It’s no use picking the most likely character. If you do that after a while it starts saying the United States of the United States of the United States of the United States. That tells you something about Wikipedia. But if you pick from the probability distribution, so if it says there’s a one in chance it was a Z, you pick a Z one time in , then you see much more about what it’s learned. The next slide shows an example of the text that it generates, and it’s interesting to notice how much is learned just by reading Wikipedia, and trying to predict the next character.

So remember this text was generated one character at a time. Notice that it makes reasonable sensible sentences and they composed always entirely of real English words. Occasionally, it makes a non-word but they typically sensible ones. And notice that within a sentence, it has some thematic sentence. So the phrase, Several perishing intelligence agents is in the Mediterranean region, has problems but it’s almost good English. Notice also the thing it says at the end, such that it is the blurring of appearing on any well-paid type of box printer. There’s a certain sort of thematic thing there about appearance and printing, and the syntax is pretty good. And remember, that’s one character at a time.

Quite different for a current nets, symmetrically connected networks. In these the connections between units have the same weight in both directions. John Hopfield and others realized that symmetric networks are much easier to analyze than recurrent networks. This is mainly because they’re more restricted in what they can do, and that’s because they obey an energy function. So they come, for example, model cycles. You can’t get back to where you started in one of these symmetric networks.

0 0