Introduction
Effective neural network architectures are specialized to the type of inputs and type of outputs that they deal with.
A convolutional neural network is a specialized architecture to deal with image-based inputs.
Images are rather complex. There are two dimensions, the pixel height and the pixel width of the image, plus for each spatial dimension, there are also often three-color dimensions.
It is not unusual then for image data to be given the vector format [256, 256, 3]. Where 256 and 256 are the height and width, and 3 is for the 3 color channels. Thats 196,608 numbers to represent one image!
Before the age of neural networks.
convolutional layers are very similar to an earlier machine learning trick that was used before the era of neural networks. Back then people would hand design filters or kernels which would take in a small window of input and perform a calculation. This window would then sweep across images to produce feature maps. These feature maps would then be an easier abstract representation of the image to perform calculations on.
How convolutional layers work
With images being so high dimensional using fully connected neural networks to process them was ineffective and inefficient.
So we combined neurons with the kernels/filters trick. Convolutional layers were born. Each neuron in a convolutional layer takes in a small window of values as input and produces output. This then happens over and over again as a window sweeps across the image.
Like all neurons, neurons in a convolutional layer have learned weights that change over training time. These weights are analogous to the old filters/kernels, but they are discovered through the learning process… No more hand designed filters.
Pooling layers
Pooling layers often follow convolutional layers to reduce the dimensionality of the data. They are a simple computation with no learned elements. Common types of pooling include max and average pooling, where the maximum or average value of a window of data in a feature map is taken. This can make it easier for the network to compute and also adding a level of translation invariance.
Convolutional neural network architectures
Convolutional neural network architectures often link many layers of convolutional neural layers together (ResNet has 34 layers). Each of these layers slowly condenses the visual dimensions while increasing the number of neurons recognizing features. At the beginning of the network neurons may fire when they recognize a particular type of line or corner. Further into the network you may get some neurons recognizing different shapes and patterns. Further again, you may find some detecting eyes and hands, etc. Finally, you’ll get neurons at the end of the network detecting different classes that it has been trained to detect. Cats, dogs, people, etc. This is to say, the further into the network the more abstract recognitions an individual neuron may make.
Pooling layers are sprinkled into convolutional network architectures, especially at the beginning to help deal with the high level of random variation often found within images.
Click here to play around with a conv network
Hot comments
about anything