Learn Computer Vision Using OpenCV - Archive

Transcription

Learn ComputerVision UsingOpenCVWith Deep Learning CNNs and RNNs—Sunila GollapudiForeword by V Laxmikanth

Learn ComputerVision Using OpenCVWith Deep LearningCNNs and RNNsSunila GollapudiForeword by V Laxmikanth

Learn Computer Vision Using OpenCV: With Deep Learning CNNs and RNNsSunila GollapudiHyderabad, Telangana, IndiaISBN-13 (pbk): 2-4261-2ISBN-13 (electronic): 978-1-4842-4261-2Copyright 2019 by Sunila GollapudiThis work is subject to copyright. All rights are reserved by the Publisher, whether the whole orpart of the material is concerned, specifically the rights of translation, reprinting, reuse ofillustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way,and transmission or information storage and retrieval, electronic adaptation, computer software,or by similar or dissimilar methodology now known or hereafter developed.Trademarked names, logos, and images may appear in this book. Rather than use a trademarksymbol with every occurrence of a trademarked name, logo, or image we use the names, logos,and images only in an editorial fashion and to the benefit of the trademark owner, with nointention of infringement of the trademark.The use in this publication of trade names, trademarks, service marks, and similar terms, even ifthey are not identified as such, is not to be taken as an expression of opinion as to whether or notthey are subject to proprietary rights.While the advice and information in this book are believed to be true and accurate at the date ofpublication, neither the authors nor the editors nor the publisher can accept any legalresponsibility for any errors or omissions that may be made. The publisher makes no warranty,express or implied, with respect to the material contained herein.Managing Director, Apress Media LLC: Welmoed SpahrAcquisitions Editor: Celestin Suresh JohnDevelopment Editor: Matthew MoodieCoordinating Editor: Shrikant VishwakarmaCover designed by eStudioCalamarCover image designed by Freepik (www.freepik.com)Distributed to the book trade worldwide by Springer Science Business Media New York,233 Spring Street, 6th Floor, New York, NY 10013. Phone 1-800-SPRINGER, fax (201) 348-4505,e-mail orders-ny@springer-sbm.com, or visit www.springeronline.com. Apress Media, LLC is aCalifornia LLC and the sole member (owner) is Springer Science Business Media Finance Inc(SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation.For information on translations, please e-mail rights@apress.com, or visit www.apress.com/rights-permissions.Apress titles may be purchased in bulk for academic, corporate, or promotional use. eBookversions and licenses are also available for most titles. For more information, reference our Printand eBook Bulk Sales web page at www.apress.com/bulk-sales.Any source code or other supplementary material referenced by the author in this book is availableto readers on GitHub via the book’s product page, located at www.apress.com/978-1-4842-4260-5.For more detailed information, please visit www.apress.com/source-code.Printed on acid-free paper

To my angel, my BFF, my raison d’être—my daughter,Sai Srividya Nikita—for being proud of me always!

Table of ContentsAbout the Author ixAbout the Technical Reviewer xiAcknowledgments xiiiForeword xvIntroduction xviiChapter 1: Artificial Intelligence and Computer Vision 1Introduction to Artificial Intelligence 3Natural Language Processing 7Robotics 10Machine Learning 11Expert Systems 13Speech and Voice Recognition 13Intelligent Process Automation 14Introduction to Computer Vision 14Scope 15Challenges of Computer Vision 19Real-World Applications of Computer Vision 21Images and Their Features 24Core Building Blocks (Input – Process – Output) 26Conclusion 28v

Table of ContentsChapter 2: OpenCV with Python 31About OpenCV 32Setting Up OpenCV with Python 32Windows Installation 32macOS Installation 36Using Modules 38Working with Images and Videos 40Using NumPy 40Videos 46Conclusion 49Chapter 3: Deep Learning for Computer Vision 51Deep Learning: An Overview 52Deep Learning Applications in Computer Vision 53Classification 53Detection and Localization 54(Semantic) Segmentation 55Similarity Learning 55Image Captioning 55Generative Models 56Video Analysis 57Neural Networks at Their Core 57Artificial Neural Networks 58Artificial Neurons or Perceptrons 58Training Neural Networks 62vi

Table of ContentsConvolutional Neural Networks 63Convolution Layer 64Pooling Layer 65Fully Connected Layer 65Recurrent Neural Networks 66Backpropagation Through Time 68Conclusion 69Chapter 4: Image Manipulation and Segmentation 71Image Manipulations 72Accessing and Manipulating Pixels 73Drawing Geometric Shapes or Writing Text on a Color Image 75Filtering Images 79Transforming Images 82Image Segmentation 90Line Detection 92Circle Detection 93Conclusion 96Chapter 5: Object Detection and Recognition 97Basics of Object Detection 97Object Detection vs. Object Recognition 98Template Matching 99Challenges with Template Matching 102Understanding Image “Features” 102Feature Matching 105Image Corners As Features 105Harris Corner Algorithm 106Feature Tracking and Matching Flow 108vii

Table of ContentsScale Variant Feature Transform 109Speeded-Up Robust Features 112Features from Accelerated Segment Test 113Binary Robust Independent Elementary Features 114Oriented FAST and Rotated BRIEF 116Conclusion 117Chapter 6: Motion Analysis and Object Tracking 119Introduction to Object Tracking 120Challenges of Object Tracking 121Object Detection Techniques for Tracking 121Frame Differentiation 122Background Subtraction 123Optical Flow 125Object Classification 131Shaped-Based Classification 132Motion-Based Classification 132Color-Based Classification 132Texture-Based Classification 133Object Tracking Methods 133Point Tracking Method 134Kernel-Based Tracking Methods 135Silhouette-Based Tracking 144Conclusion 145Index 147viii

About the AuthorSunila Gollapudi is an executive vicepresident at Broadridge Financial SolutionsIndia (Pvt) Ltd. Sunila is a passionateand pragmatic technology leader with morethan 17 years of experience in architecting,designing, and developing client-centric,enterprise-scale, and data-driven solutions.She oversees every stage of the technologyimplementation and is a thought leader andtechnology visionary with a proven ability tobuild the technology road map. Primarily focused on the banking andfinancial services domain over the past ten years, she is a data connoisseurand an architect, adept at designing an overall data strategy to maximizethe value of data through analytics. She is also an author and a mentorwith an entrepreneur mind-set who believes in continuous learning as akey to organizational growth.Her specialties include building overall intelligentautomation strategies by synthesizing the business and domaindrivers and emerging technology trends in Big Data engineering andanalytics; leading cloud migration and DevOps strategies for CI/CD;and steering application (legacy) modernization, reuse, and technologystandardization initiatives.ix

About the Technical ReviewerLentin Joseph is an author and roboticsentrepreneur from India. He runs a roboticssoftware company called Qbotics Labs in India.He has more than eight years of experiencein the robotics domain, primarily in ROS,OpenCV, and PCL.He has authored several books on ROS,namely, Learning Robotics Using Python,Mastering ROS for Robotics Programming,ROS Robotics Projects, ROS Programming, andRobot Operating System for Absolute Beginners.He is also the technical reviewer of six robotics books.He completed his master’s in robotics and automation in India andalso conducted research work at the Robotics Institute, Carnegie MellonUniversity, in the United States.xi

AcknowledgmentsMy sincere thanks to Broadridge for providing an opportunity to championthe adoption of artificial intelligence in the financial services domain.Special thanks to my mentor and boss, V. Laxmikanth, the managingdirector at Broadridge India, for all the support and trust and for taking thetime to pen the foreword to this book. I always value and look up to yourhumility and leadership.A big thank-you to Apress, the publishing team, and the reviewersfor an opportunity to work with you and for being efficient, patient, andprofessional.My heartfelt gratitude to Mrs. Radhika Laxmikanth for her unflinchingsupport and to my brothers, Ravi and Sashi, and my close friends for givingthe best encouragement and being the best critics.Finally, kudos to all the technology enthusiasts who constantlyexperiment and inspire me to be a student for life!xiii

ForewordBuilding machines that can see and interpret things around us is aninteresting, but notoriously complex problem to solve. The human visualsystem is infallible for tasks such as recognizing a face or a given object.Computer vision has now become a very important sub-field ofartificial intelligence. Application areas of computer vision have expandedfrom reading and interpreting human scripts (handwriting recognition)or analyzing images and videos to using these capabilities in securitysurveillance and intelligent automation (among other digital usages).In this book, Sunila Gollapudi articulates the broader vision ofartificial intelligence and how computer vision is now a key enabler. Shehas included a step-by-step hands-on guide to building computer visionapplications from scratch using OpenCV and Python. Readers can accessthe complete code for each of these implementations, which utilize realworld examples and open data sets.Overall, what is more challenging is how computer vision applicationscan be integrated as an offering to enhance existing products orapplications, and how they can be scaled and deployed as a service. Thisbook has a special focus on operationalizing AI applications and cloudplatforms for computer vision.—V LaxmikanthManaging DirectorBroadridge Indiawww.broadridge.comxv

IntroductionWhat artificial intelligence is today is a result of our continuous pursuitto make machines do all that humans can do, be it hearing, seeing,perceiving, thinking, or emoting. The evolution of artificial intelligencehas reached an interesting juncture where machines not only are doingintensive work that is beyond a human’s physical capabilities (such asmining harmful chemicals, large manufacturing plants, etc.) but also arebeing companions or assistants to humans by helping with day-to-daychores and by being available on small devices like smartphones (forexample, Siri, Alexa, and Google Assistant). The key measure for successnow is how personalized these machines can be and how well they canoperate in collaboration with humans (human-aware AI). While this isreaping bigger benefits by enhancing quality of life and improving theadoption of technology in many businesses, it is also opening up avenuesfor misuse, probing the need for governing bodies to define stricterboundaries and controls around adopting artificial intelligence.Computer vision is one such area of artificial intelligence that hassignificantly gained adoption in recent times given the advent of theInternet of Things. Computer vision is all about enabling machines toperceive and interpret what is seen.This book focuses on the field of computer vision in particularand provides step-by-step guidance on how to build computer visionapplications to address real-world use cases using OpenCV with Python.This book briefly introduces the overall landscape of artificial intelligenceand its purpose and subfields, which includes computer vision. That isfollowed by a detailed introduction to computer vision and its subfieldssuch as OCR, ICR, and OMR that enable computers to view, recognize, andxvii

Introductionprocess images and videos in the way human do and provide the necessaryinterpretations.This book starts with setting up OpenCV with Python from scratch andthen covers implementing specialized image processing, implementingobject/feature detection and motion tracking functions, using advancedlibraries, and productionizing large-scale deployments using OpenCV.The high-level objectives of the book are as follows: Understand what computer vision is and its overallapplication in AI and intelligent automation systems Learn all the deep learning techniques required andused for building computer vision applications Learn how to build complex computer visionapplications using the latest techniques in OpenCVusing programming skills such as basic Python andNumPy See practical applications and implementations suchas face detection and recognition (face swapping andfilters!), handwriting recognition, object detection,tracking, and motion analysisThis book has seven chapters, described here:Chapter 1, “Artificial Intelligence and ComputerVision,” focuses on introducing you to the landscapeof artificial intelligence and the role of computervision in AI applications. This chapter explainswhat images are, describes their characteristics, andintroduces some computer vision concepts such asmanipulation, tracking, detection, and recognition.It also describes some use cases and domains thatneed this technology.xviii

IntroductionChapter 2, “OpenCV with Python,” introduces anopen library called OpenCV that provides thetools and necessary frameworks to implementcomputer vision applications. A brief introductionto Python and the image libraries of Python likeNumPy is provided. You will be able to set up anOpenCV/Python environment from scratch and getready to implement some real-world use cases forthe upcoming chapters. Additionally, the chaptertalks about some aspects around computer visionas a service and discusses the extended librariesof OpenCV like OpenCV.JS for web and mobileapplications and how OpenCV can be deployed onthe cloud. A few competing frameworks and toolslike the Google Vision API from Google, Textract andRekognition from Amazon AWS, and the MicrosoftComputer Vision API are introduced.Chapter 3, “Deep Learning for ComputerVision,” describes how building computervision applications requires creating complexdeep learning models with two components: aconvolution neural network (CNN) that transformsan input image into a set of features, and a recurringneural network (RNN) that turns those features intoa rich, descriptive language. This chapter covershow these cutting-edge deep learning architectureswork, especially in the context of computer vision.xix

IntroductionChapter 4, “Image Manipulation and Segmentation,”covers image manipulations and segmentation- related functions that are core to image processingin computer vision. For each of the use cases,the syntax and implementations of the built-infunctions in OpenCV in Python are covered, andsample implementations are provided. Techniquessuch as edge detection, rotations, resizing, shapedetection, and so on, are covered in depth.Chapter 5, “Object Detection and Recognition,”provides a deep dive into object detection andthen moves on to object recognition followed byface-feature recognition, landmark identification,and finally handwriting recognition. The necessaryOpenCV libraries are explained, and sampleimplementations are provided.Chapter 6, “Motion Analysis and Tracking,” coversmotion analysis and tracking of objects in videos.Information about different types of objects inmotion is given, with details on how to removebackground and foreground information and how todo real-time tracking. The topics in this chapter arean extension to the object detection and recognitiontechniques in Chapter 5.xx

CHAPTER 1Artificial Intelligenceand Computer VisionThe field of artificial intelligence, and its application in day-to day life,has seen remarkable evolution in the past three to five years. Artificialintelligence (AI) is an enabler that potentially facilitates machines doingeverything that humans can do. This includes perceiving, reasoning,rationalizing, and problem-solving while working within a context orinteracting with the environment with more efficiency and accuracy.Here, the word context means the domain or the business where theproblem is dealt with, for example online shopping, social media,insurance, manufacturing, and others. Interacting with the environmentcould mean that computers or machines work along with the humans ortake input from external stimuli and adjust their behaviors accordingly.Computer vision, which enables computers and machines to see andunderstand the world around them, specifically has become a gamechanger for how and where machines can be used and AI can be adopted.This chapter covers the larger AI dream that is all about touching boththe personal and professional lives of humans and how computer visionamong other areas is a key enabler. Also, you’ll learn about a few real- world applications, challenges, and technology tools such as OpenCV thathelp in complex implementations. Sunila Gollapudi 2019S. Gollapudi, Learn Computer Vision Using OpenCV,https://doi.org/10.1007/978-1-4842-4261-2 11

Chapter 1Artificial Intelligence and Computer VisionThe following topics are covered in detail in this chapter: Artificial intelligence and its landscape, which includesa basic definition and the usage context of robotics,intelligent automation, natural language processing,expert systems, speech recognition, computer vision,and machine learning Computer vision, including its challenges andapplications in today’s world Computer vision architecture and tools, including whatimages are and how to understand and manipulate keyattributes of images A sneak-peak into the core building blocks of computervision and aspects such as image manipulation andsegmentation, object detection, motion analysis andtracking, and others A brief introduction to optical character recognition,intelligent character recognition, and optimal markrecognitionNote A good understanding of programming and prior knowledgeof Python will be helpful to understand the working examples inthis book; however, primers will be given for all the hands-on codeexercises.2

Chapter 1Artificial Intelligence and Computer Vision Introduction to Artificial IntelligenceThe definition of artificial intelligence has evolved since its first referencein 1956 at a Dartmouth conference, from emulating how the human brainworks to solving focused, complex problems to doing all that a human cando such as seeing, hearing, communicating, acting, learning, perceiving,thinking, deciding, demonstrating emotion and compassion, interactingwith environment, and more. The 2012 AI breakthroughs with vision,language recognition, and self-driving vehicles changed the way that AIis looked at today. This section gives a simple and informal definition ofartificial intelligence.Essentially, AI is the field of computer science that involves enablingcomputers to behave like humans or perform tasks that usuallyrequire human intelligence.The purpose of AI systems is evolving. In this section, we will coverdifferent types of AI systems categorized based on their core purpose.You will also observe how these different types of AI systems signify a steptoward building smarter systems.Figure 1-1 lists different types of AI.3

Chapter 1Artificial Intelligence and Computer VisionFigure 1-1. Types of AI4 Reactive AI was the first kind of AI that was talkedabout. These types of machines do not have memoryand do not use information from past experiences.In these machines, the current context is directlyperceived as it is and acted upon. This makes themachine behave the same way every time it encountersa situation. The benefit of this is a reliable andconsistent outcome. An example is Deep Blue (a chess- playing computer developed by IBM that won againstKasparov in the game of chess). Limited memory AI machines look into the past anduse it as a preprogrammed representation of the worldand then apply it to the current data set. For example,in self-driving cars, decisions on when a car should

Chapter 1Artificial Intelligence and Computer Visionchange lanes is based on data such as lane markings,speed limits or road directions, current speed of the car,and relative neighboring car speeds. Theory of mind AI machines are intelligent machinesthat use advanced technologies that have more todo with understanding human emotions. The theoryof mind is a psychological term that refers to the factthat living beings have emotions and thoughts thatdetermine their behavior. Self-aware AI machines are an extension of theory ofmind AI. They can configure representations, whichmeans we will have machines that are conscious andaware given a context. This is also called human-awareAI or human interaction AI. There are no prototypesbuilt of these machines.Typeof AIMemoryUses PastExperienceInteraction Dynamic and ExampleswithIncrementalEnvironment LearningNoNoNoDeep BlueLimitedYes (withYes (a limitedNomemory AI littleset that becomeinformation) preprogrammedstandards)NoSelfdrivingcarsTheory ofmind AIYesYesNoYesEfforts inprogressSelfaware AIYesYesYesYesEfforts inprogressReactive AI No5

Chapter 1Artificial Intelligence and Computer VisionAnother way of categorizing of AI systems is based on the degree ofcomplexity of the problem at hand.Artificial narrow intelligence (ANI) is about solving a problem againsta given request with a narrow range of abilities. A feature like Siri insmartphones can be considered an example in this case. This is also calledweak AI.Artificial general intelligence (AGI) is referred to as strong AI and refersto a machine that is as capable as humans. The Pillo robot is an example ofa robot that can diagnose an illness and administer pills as well.Artificial super intelligence (ASI) is about machines that can performtasks beyond what humans are capable of. The Alpha 2 robot was a firstattempt toward this; it is a robot that can manage a smart home andoperate things at home. It potentially could be a member of the family.Most of the existing AI today is ANI. AGI and ASI are still being developed.Figure 1-2 represents the core functions and features of an AI system atthe center and related subfields that support implementing these functions.Figure 1-2. AI functions6

Chapter 1Artificial Intelligence and Computer VisionThe applications or subfields of AI are as follows: Natural language processing Robotics Machine learning and deep learning Expert systems Speech or voice recognition Intelligent automation Computer visionEach of these subfields is interrelated, and any real-worldimplementation usually includes one or mo

Learn all the deep learning techniques required and used for building computer vision applications Learn how to build complex computer vision applications using the latest techniques in OpenCV using programming skills such as basic Python and NumPy See practical applications and implementations such