Convolutional Neural Network - CNN - UFPR

Transcription

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesConvolutional Neural Network - CNNEduardo Todt,Bruno Alexandre KrinskiVRI Group - Vision Robotic and ImagesFederal University of ParanáNovember 30, 20191 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesDefinitionConvolutional Neural Networks (CNNs) are Artificial Intelligencealgorithms based on multi-layer neural networks that learns relevantfeatures from images, being capable of performing several tasks likeobject classification, detection, and segmentation.2 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesHistoryAn initial idea of convolution was proposed by Kunihiko Fukushima in1980 and initially was called Neocognitron.3 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesHistoryThe modern concept of Convolutional Neural Networks comes fromthe work of Yann LeCun published in 1998. LeCun proposed a CNNcalled LeNet for hand-write recognition.Figure: Gradient-Based Learning Applied to Document Recognition4 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesHistoryIn 2012, Alex Krizhevsky won the ImageNet Large Scale VisualRecognition Challenge with a CNN model called AlexNet. Krizhevskyused GPUs to train the AlexNet, which enabled faster training of CNNsmodels and started a wave of interest and new works based on CNNs.Figure: ImageNet Classification with Deep Convolutional Neural Networks5 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesWhy CNN is better?The advantage of CNNs over others classification algorithms (SVM,K-NN, Random-Forest, and others) is that the CNNs learns the bestfeatures to represent the objects in the images and has a highgeneralization capacity, being able to precisely classify new exampleswith just a few examples in the training set.6 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesPopular applicationsCar and Plate recognition7 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesPopular applicationsBiometry8 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesPopular applicationsAutonomous car9 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesPopular applicationsPedestrian Detection10 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesPopular applicationsIndoor environments recognition11 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesCNN LayersA CNN is topically composed by four types of layers:ConvolutionalPoolingReluFully Connected12 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesConvolutional LayerA convolutional layer is composed by a set of filters, also calledkernels, that slides over the input data.Each kernel has a width, a height and width height weightsutilized to extract features from the input data.In the training step, the weights in the kernel starts with randomvalues, and will be learning based on the training set.13 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesConvolutional LayerExample of convolution with 1 channel: 7xFbhg.gifExample of convolution with 3 channels: X-EeSrA.gif14 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesConvolutional LayerEach filter in the convolutional layer represents a feature.15 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesConvolutional LayerWhen the filter slides over the image and finds a match.16 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesConvolutional LayerThe convolution operation generates a large number, activating thefilter to that characteristic.17 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesConvolutional LayerWhen the filter slides over the image and finds no match, the filterdoes not activate.The CNN uses this process to learn the best filters to describe theobjects.18 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesConvolutional Layer19 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesReLuThe ReLU (Rectified Linear Units) layers, is an activation layerlinked after a convolutional layer to generate non-linearity in thenetwork.The ReLu helps the network to learn harder decision functionsand reduce the overfitting.The ReLu applies the function y max (x , 0)20 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesReLu21 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesPooling22 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesPoolingThe pooling layer, or down-sampling layer, is applied to reducethe dimensionality of the feature maps in a way to save the mostrelevant information from the feature maps.In the pooling layer, a filter slides over the input data and appliesthe pooling operation (max, min, avg).The max pooling is the most used in the ol animation.gif23 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesFully Connected LayerA CNN is topically divided into two parts: the convolutional and thedense steps. The former learns the best features to extract from theimages and the latter learns how to classify the features in differentcategories.24 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesFully Connected LayerThe Fully Connected layer is a MultiLayer Perceptron (MLP),composed by three types of layers: input, hidden, and output layers.25 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesFully Connected LayerThe input layer receives the features generated by the CNN .The hidden layer is a sequence of neurons with weights that willbe learned in the training step. A MLP is composed by one ormore hidden layers.The output layer is also a sequence of neurons. However, it has adifferent activation function. Usually, the softmax function is usedto generate the probabilities of each category in the problemscope.26 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesFully Connected LayerEach neuron is composed by:A input vector x0 , x1 , ., xn , that represent the featuresA weight vector w0 , w1 , ., wn , that will be learned in the trainingstepThe biasAn activation functionThe output27 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesFully Connected Layer28 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesFully Connected LayerThe Perceptron performs the following operation:output ActivationFunction(x0 w0 x1 w1 . xn wn bias) (1)The most common activation functions used in the literature are:ReLu, Sigmoid, Softmax, Tanh, Hardlim29 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesSigmoid30 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesSoftmax31 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesTahn32 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesHardlim33 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesTrainingIn order to train a CNN model, a training dataset composed by aset of images and labels (classes, bounding boxes, masks) isused.The algorithm used to train a CNN is called back-propagation,that uses the output value of the last layer to measure an errorvalue. This error value is used to update the weights of eachneuron in that layer.The new weights are used to measure an error value and updatethe weights of the previous.The algorithm repeats the process until it reaches the first layer.34 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesTrainingA 3D CNN visualization:https://scs.ryerson.ca/ aharley/vis/conv/35 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesTraining36 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesClassificationPopular CNNs for classificationtasksVGG-16ResNetsInception37 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesDetectionPopular CNNs for detection tasksFaster R-CNNYOLO38 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesSegmentationPopular CNNs for segmentationtasksFCNU-NetMask R-CNN39 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesVGG-16Figure: Very Deep Convolutional Networks for Large-Scale ImageRecognition40 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesVGG-16The VGG-16 was molded with stacks of convolutional layers withsmall kernels of size 3 3 instead of one convolutional layer withlarge kernels of size 7 7 and 11 11 in LeNet and AlexNet.A stack of convolutional layers with small filters sizes generates adecision function more discriminatory by increasing the numberof non-linear rectification layers.VGG16 layers: Tg6fIw.png41 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesResNetFigure: ResNet42 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesResNet VariationsFigure: Deep Residual Learning for Image Recognition43 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesResNetAiming to build deeper neural networks, the ResNet was proposedwith skip connections.Figure: Difference between a convolution block without skip and with skipconnection.44 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesResNetResNet: https://adeshpande3.github.io/assets/ResNet.gif45 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesInception - GoogleLeNet46 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesInception - GoogleLeNetUse a sequence of parallel convolutions of different sizes to learnobjects of different size and position in the figure.47 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesInception - io/assets/GoogleNet.gif48 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesFaster R-CNNThe Faster R-CNN has the Region Proposal Network (RPN) to selectthe regions in the image with the highest probability of being objects.49 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesFaster R-CNNIn the RPN, the Faster R-CNN generate a set of anchor boxes. Theanchor boxes with the highest probability of being objects are used togenerate the bounding boxes of each object in the image.50 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesFaster R-CNNRoiPooling is applied to convert all proposed regions to a same sizefeature map. c/RoI pooling animated.gif51 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesYou only look once (YOLO)52 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesYou only look once (YOLO)In the YOLO, the image is divided into a S S grid, with each cell inthe grid predicting N bounding boxes. The bounding boxes with a verylow probability of being an object are discarded and non-maxsuppression are utilized to remove multiple bounding box to a sameobject.53 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesFully Convolutional Network (FCN)In the FCN, the fully connected layers are replaced by convolutionalones to generate a pixel level classification mask.54 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesFully Convolutional Network (FCN)55 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesFully Convolutional Network (FCN)Three versions of the FCN was proposed.56 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesU-NetSimilar with the FCN. However, each convolutional block has adeconvolutional block.57 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesMask R-CNNThe Mask R-CNN is an evolution of the Faster R-CNN with asegmentation modulo. So, the Mask R-CNN performs the three tasks:classification, detection and segmentation.58 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesTensorflowDeveloped by GoogleSupport for many programming languages: Python, C ,JavaScript, Java, Go, C# and JuliaRuns on mobile platformsWorks with static computation graphs (first defines a graph, thanperforms the training).59 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesPyTorchDeveloped by FacebookSupport for: Python and C Runs on mobile platformsWorks with dynamically updated graphs (the graph can be alteredin the training process).60 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesKerasIt is a high level APIWorks on top of: Tensorflow, Theano, and CNTKVery easy of use61 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesModel ZooThere are high number of online repositories called Model Zoo’s with awide range of frameworks and implementations in many differentframeworks. https://modelzoo.co/62 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesPapersDeep Residual Learning for Image Recognition:https://arxiv.org/abs/1512.03385Fully Convolutional Networks for Semantic ion/303505510 FullyConvolutional Networks for Semantic SegmentationFaster R-CNN: Towards Real-Time Object Detection with Region ProposalNetworks: https://arxiv.org/abs/1506.01497Going Deeper with Convolutions: sed Learning Applied to Document 26791ImageNet Classification with Deep Convolutional Neural works.pdf63 / 68

IntroductionCNN LayersCNN ModelsPopular FrameworksPapersReferencesPapersYou Only Look Once: Unified, Real-Time Object Detection:https://arxiv.org/abs/1506.02640Mask R-CNN: https://arxiv.org/abs/1703.06870U-Net: Convolutional Networks for Biomedical Image Segmentation:https://arxiv.org/abs/1505.04597A Framework for Object Detection, Tracking and Classification in Urban TrafficScenarios Using ion/224602865 A Framework for Object DetectionClassification in Urban Traffic Scenarios Using StereovisionNeocognitron: A Self-organizing Neural Network Model for a Mechanism ofPattern Recognition Unaffected by Shift in /BF0034425164 / 68

IntroductionCNN LayersCNN ModelsPopular -level accident-mapping: Distance based pattern matching usingartificial neural network65 / 68

IntroductionCNN LayersCNN ModelsPopular ship-detection-segmentationa1108b5a08366 / 68

IntroductionCNN LayersCNN ModelsPopular 833c686ae6c67 / 68

IntroductionCNN LayersCNN ModelsPopular 918ehttps://www.medcalc.org/manual/tanh al Neural Networks/Activation Functions68 / 68

network. The ReLu helps the network to learn harder decision functions . Support for many programming languages: Python, C , JavaScript, Java, Go, C# and Julia Runs on mobile platforms Works with static computation graphs (first defines a graph, than. Introduction CNN Layers CNN Models P