Statistical Pattern Recognition

Transcription

38mmThird EditionAndrew R. Webb and Keith D. CopseyMathematics and Data Analysis Consultancy, Malvern, UKStatistical pattern recognition relates to the use of statistical techniques for analysing datameasurements in order to extract information and make justified decisions. It is a very active areaof study and research, which has seen many advances in recent years. Applications such as datamining, web searching, multimedia data retrieval, face recognition, and cursive handwritingrecognition, all require robust and efficient pattern recognition techniques.Statistical Pattern Recognition, Third Edition:condition monitoring.Provides descriptions and guidance for implementing techniques, which will beinvaluable to software engineers and developers seeking to develop real applications.Describes mathematically the range of statistical pattern recognition techniques.Presents a variety of exercises including more extensive computer projects. The in-depth technical descriptions make this book suitable for senior undergraduate and graduatestudents in statistics, computer science and engineering. Statistical Pattern Recognition is also anexcellent reference source for technical professionals. Chapters have been arranged to facilitateimplementation of the techniques by software engineers and developers in non-statisticalengineering fields.www.wiley.com/go/statistical pattern recognitionRECOGNITION Provides a self-contained introduction to statistical pattern recognition. Includes new material presenting the analysis of complex networks. Introduces readers to methods for Bayesian density estimation. Presents descriptions of new applications in biometrics, security, finance andPATTERNThis third edition provides an introduction to statistical pattern theory and techniques, withmaterial drawn from a wide range of fields, including the areas of engineering, statistics, computerscience and the social sciences. The book has been updated to cover new methods and applications,and includes a wide range of techniques such as Bayesian methods, neural networks, supportvector machines, feature selection and feature reduction techniques. Technical descriptions andmotivations are provided, and the techniques are illustrated using real COGNITIONRED BOX RULES ARE FOR PROOF STAGE ONLY. DELETE BEFORE FINAL PRINTING.STATISTICALPATTERNRECOGNITIONThird EditionAndrew R. WebbKeith D. CopseyThirdEditionCover design: Gary Thompsonwww.it-ebooks.info

P1: OTA/XYZJWST102-fmP2: ABCJWST102-WebbSeptember 8, 20118:52Printer Name: Yet to Comewww.it-ebooks.info

P1: OTA/XYZJWST102-fmP2: ABCJWST102-WebbSeptember 8, 20118:52Printer Name: Yet to ComeStatistical Pattern Recognitionwww.it-ebooks.info

P1: OTA/XYZJWST102-fmP2: ABCJWST102-WebbSeptember 8, 20118:52Printer Name: Yet to Comewww.it-ebooks.info

P1: OTA/XYZJWST102-fmP2: ABCJWST102-WebbSeptember 8, 20118:52Printer Name: Yet to ComeStatistical Pattern RecognitionThird EditionAndrew R. Webb Keith D. CopseyMathematics and Data Analysis Consultancy, Malvern, UKA John Wiley & Sons, Ltd., Publicationwww.it-ebooks.info

P1: OTA/XYZJWST102-fmP2: ABCJWST102-WebbSeptember 8, 20118:52Printer Name: Yet to ComeThis edition first published 2011 2011 John Wiley & Sons, LtdRegistered officeJohn Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United KingdomFor details of our global editorial offices, for customer services and for information about how to apply for permission toreuse the copyright material in this book please see our website at www.wiley.com.The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright,Designs and Patents Act 1988.All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any formor by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright,Designs and Patents Act 1988, without the prior permission of the publisher.Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available inelectronic books.Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and productnames used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners.The publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provideaccurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that thepublisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, theservices of a competent professional should be sought.Library of Congress Cataloging-in-Publication DataWebb, A. R. (Andrew R.)Statistical pattern recognition / Andrew R. Webb, Keith D. Copsey. – 3rd ed.p. cm.Includes bibliographical references and index.ISBN 978-0-470-68227-2 (hardback) – ISBN 978-0-470-68228-9 (paper)1. Pattern perception–Statistical methods. I. Copsey, Keith D. II. Title.Q327.W43 2011006.4–dc232011024957A catalogue record for this book is available from the British Library.HB ISBN: 978-0-470-68227-2PB ISBN: 978-0-470-68228-9ePDF ISBN: 978-1-119-95296-1oBook ISBN: 978-1-119-95295-4ePub ISBN: 978-1-119-96140-6Mobi ISBN: 978-1-119-96141-3Typeset in 10/12pt Times by Aptara Inc., New Delhi, Indiawww.it-ebooks.info

P1: OTA/XYZJWST102-fmP2: ABCJWST102-WebbSeptember 8, 20118:52Printer Name: Yet to ComeTo Rosemary,Samuel, Miriam, Jacob and Ethanwww.it-ebooks.info

P1: OTA/XYZJWST102-fmP2: ABCJWST102-WebbSeptember 8, 20118:52Printer Name: Yet to Comewww.it-ebooks.info

P1: OTA/XYZJWST102-fmP2: ABCJWST102-WebbSeptember 8, 20118:52Printer Name: Yet to ComeContentsPrefacexixNotationxxiii1 Introduction to Statistical Pattern Recognition1.1 Statistical Pattern Recognition1.1.1 Introduction1.1.2 The Basic Model1.2 Stages in a Pattern Recognition Problem1.3 Issues1.4 Approaches to Statistical Pattern Recognition1.5 Elementary Decision Theory1.5.1 Bayes’ Decision Rule for Minimum Error1.5.2 Bayes’ Decision Rule for Minimum Error – Reject Option1.5.3 Bayes’ Decision Rule for Minimum Risk1.5.4 Bayes’ Decision Rule for Minimum Risk – Reject Option1.5.5 Neyman–Pearson Decision Rule1.5.6 Minimax Criterion1.5.7 Discussion1.6 Discriminant Functions1.6.1 Introduction1.6.2 Linear Discriminant Functions1.6.3 Piecewise Linear Discriminant Functions1.6.4 Generalised Linear Discriminant Function1.6.5 Summary1.7 Multiple Regression1.8 Outline of Book1.9 Notes and 26272929312 Density Estimation – Parametric2.1 Introduction3333www.it-ebooks.info

P1: OTA/XYZJWST102-fmP2: ABCJWST102-WebbviiiSeptember 8, 20118:52Printer Name: Yet to ComeCONTENTS2.2Estimating the Parameters of the Distributions2.2.1 Estimative Approach2.2.2 Predictive Approach2.3 The Gaussian Classifier2.3.1 Specification2.3.2 Derivation of the Gaussian Classifier Plug-In Estimates2.3.3 Example Application Study2.4 Dealing with Singularities in the Gaussian Classifier2.4.1 Introduction2.4.2 Naı̈ve Bayes2.4.3 Projection onto a Subspace2.4.4 Linear Discriminant Function2.4.5 Regularised Discriminant Analysis2.4.6 Example Application Study2.4.7 Further Developments2.4.8 Summary2.5 Finite Mixture Models2.5.1 Introduction2.5.2 Mixture Models for Discrimination2.5.3 Parameter Estimation for Normal Mixture Models2.5.4 Normal Mixture Model Covariance Matrix Constraints2.5.5 How Many Components?2.5.6 Maximum Likelihood Estimation via EM2.5.7 Example Application Study2.5.8 Further Developments2.5.9 Summary2.6 Application Studies2.7 Summary and Discussion2.8 Recommendations2.9 Notes and ReferencesExercises3 Density Estimation – Bayesian3.1 Introduction3.1.1 Basics3.1.2 Recursive Calculation3.1.3 Proportionality3.2 Analytic Solutions3.2.1 Conjugate Priors3.2.2 Estimating the Mean of a Normal Distribution withKnown Variance3.2.3 Estimating the Mean and the Covariance Matrix of a MultivariateNormal Distribution3.2.4 Unknown Prior Class Probabilities3.2.5 Summary3.3 Bayesian Sampling Schemes3.3.1 737373757985878787

P1: OTA/XYZJWST102-fmP2: ABCJWST102-WebbSeptember 8, 20118:52Printer Name: Yet to ComeCONTENTS3.3.2 Summarisation3.3.3 Sampling Version of the Bayesian Classifier3.3.4 Rejection Sampling3.3.5 Ratio of Uniforms3.3.6 Importance Sampling3.4 Markov Chain Monte Carlo Methods3.4.1 Introduction3.4.2 The Gibbs Sampler3.4.3 Metropolis–Hastings Algorithm3.4.4 Data Augmentation3.4.5 Reversible Jump Markov Chain Monte Carlo3.4.6 Slice Sampling3.4.7 MCMC Example – Estimation of Noisy Sinusoids3.4.8 Summary3.4.9 Notes and References3.5 Bayesian Approaches to Discrimination3.5.1 Labelled Training Data3.5.2 Unlabelled Training Data3.6 Sequential Monte Carlo Samplers3.6.1 Introduction3.6.2 Basic Methodology3.6.3 Summary3.7 Variational Bayes3.7.1 Introduction3.7.2 Description3.7.3 Factorised Variational Approximation3.7.4 Simple Example3.7.5 Use of the Procedure for Model Selection3.7.6 Further Developments and Applications3.7.7 Summary3.8 Approximate Bayesian Computation3.8.1 Introduction3.8.2 ABC Rejection Sampling3.8.3 ABC MCMC Sampling3.8.4 ABC Population Monte Carlo Sampling3.8.5 Model Selection3.8.6 Summary3.9 Example Application Study3.10 Application Studies3.11 Summary and Discussion3.12 Recommendations3.13 Notes and ReferencesExercises4 Density Estimation – Nonparametric4.1 Introduction4.1.1 Basic Properties of Density 8150150150

P1: OTA/XYZJWST102-fmP2: ABCJWST102-WebbxSeptember 8, 20118:52Printer Name: Yet to ComeCONTENTS4.2k-Nearest-Neighbour Method4.2.1 k-Nearest-Neighbour Classifier4.2.2 Derivation4.2.3 Choice of Distance Metric4.2.4 Properties of the Nearest-Neighbour Rule4.2.5 Linear Approximating and Eliminating Search Algorithm4.2.6 Branch and Bound Search Algorithms: kd-Trees4.2.7 Branch and Bound Search Algorithms: Ball-Trees4.2.8 Editing Techniques4.2.9 Example Application Study4.2.10 Further Developments4.2.11 Summary4.3 Histogram Method4.3.1 Data Adaptive Histograms4.3.2 Independence Assumption (Naı̈ve Bayes)4.3.3 Lancaster Models4.3.4 Maximum Weight Dependence Trees4.3.5 Bayesian Networks4.3.6 Example Application Study – Naı̈ve Bayes Text Classification4.3.7 Summary4.4 Kernel Methods4.4.1 Biasedness4.4.2 Multivariate Extension4.4.3 Choice of Smoothing Parameter4.4.4 Choice of Kernel4.4.5 Example Application Study4.4.6 Further Developments4.4.7 Summary4.5 Expansion by Basis Functions4.6 Copulas4.6.1 Introduction4.6.2 Mathematical Basis4.6.3 Copula Functions4.6.4 Estimating Copula Probability Density Functions4.6.5 Simple Example4.6.6 Summary4.7 Application Studies4.7.1 Comparative Studies4.8 Summary and Discussion4.9 Recommendations4.10 Notes and ReferencesExercises5 Linear Discriminant Analysis5.1 Introduction5.2 Two-Class Algorithms5.2.1 General ebooks.info

P1: OTA/XYZJWST102-fmP2: ABCJWST102-WebbSeptember 8, 20118:52Printer Name: Yet to ComeCONTENTSxi5.2.2 Perceptron Criterion5.2.3 Fisher’s Criterion5.2.4 Least Mean-Squared-Error Procedures5.2.5 Further Developments5.2.6 Summary5.3 Multiclass Algorithms5.3.1 General Ideas5.3.2 Error-Correction Procedure5.3.3 Fisher’s Criterion – Linear Discriminant Analysis5.3.4 Least Mean-Squared-Error Procedures5.3.5 Regularisation5.3.6 Example Application Study5.3.7 Further Developments5.3.8 Summary5.4 Support Vector Machines5.4.1 Introduction5.4.2 Linearly Separable Two-Class Data5.4.3 Linearly Nonseparable Two-Class Data5.4.4 Multiclass SVMs5.4.5 SVMs for Regression5.4.6 Implementation5.4.7 Example Application Study5.4.8 Summary5.5 Logistic Discrimination5.5.1 Two-Class Case5.5.2 Maximum Likelihood Estimation5.5.3 Multiclass Logistic Discrimination5.5.4 Example Application Study5.5.5 Further Developments5.5.6 Summary5.6 Application Studies5.7 Summary and Discussion5.8 Recommendations5.9 Notes and 2672672682682682692702706 Nonlinear Discriminant Analysis – Kernel and Projection Methods6.1 Introduction6.2 Radial Basis Functions6.2.1 Introduction6.2.2 Specifying the Model6.2.3 Specifying the Functional Form6.2.4 The Positions of the Centres6.2.5 Smoothing Parameters6.2.6 Calculation of the Weights6.2.7 Model Order Selection6.2.8 Simple info

P1: OTA/XYZJWST102-fmP2: ABCJWST102-WebbxiiSeptember 8, 20118:52Printer Name: Yet to ComeCONTENTS6.2.9 Motivation6.2.10 RBF Properties6.2.11 Example Application Study6.2.12 Further Developments6.2.13 Summary6.3 Nonlinear Support Vector Machines6.3.1 Introduction6.3.2 Binary Classification6.3.3 Types of Kernel6.3.4 Model Selection6.3.5 Multiclass SVMs6.3.6 Probability Estimates6.3.7 Nonlinear Regression6.3.8 Example Application Study6.3.9 Further Developments6.3.10 Summary6.4 The Multilayer Perceptron6.4.1 Introduction6.4.2 Specifying the MLP Structure6.4.3 Determining the MLP Weights6.4.4 Modelling Capacity of the MLP6.4.5 Logistic Classification6.4.6 Example Application Study6.4.7 Bayesian MLP Networks6.4.8 Projection Pursuit6.4.9 Summary6.5 Application Studies6.6 Summary and Discussion6.7 Recommendations6.8 Notes and ReferencesExercises7 Rule and Decision Tree Induction7.1 Introduction7.2 Decision Trees7.2.1 Introduction7.2.2 Decision Tree Construction7.2.3 Selection of the Splitting Rule7.2.4 Terminating the Splitting Procedure7.2.5 Assigning Class Labels to Terminal Nodes7.2.6 Decision Tree Pruning – Worked Example7.2.7 Decision Tree Construction Methods7.2.8 Other Issues7.2.9 Example Application Study7.2.10 Further Developments7.2.11 9340341342

P1: OTA/XYZJWST102-fmP2: ABCJWST102-WebbSeptember 8, 20118:52Printer Name: Yet to ComeCONTENTS7.3Rule Induction7.3.1 Introduction7.3.2 Generating Rules from a Decision Tree7.3.3 Rule Induction Using a Sequential Covering Algorithm7.3.4 Example Application Study7.3.5 Further Developments7.3.6 Summary7.4 Multivariate Adaptive Regression Splines7.4.1 Introduction7.4.2 Recursive Partitioning Model7.4.3 Example Application Study7.4.4 Further Developments7.4.5 Summary7.5 Application Studies7.6 Summary and Discussion7.7 Recommendations7.8 Notes and ReferencesExercises8 Ensemble Methods8.1 Introduction8.2 Characterising a Classifier Combination Scheme8.2.1 Feature Space8.2.2 Level8.2.3 Degree of Training8.2.4 Form of Component Classifiers8.2.5 Structure8.2.6 Optimisation8.3 Data Fusion8.3.1 Architectures8.3.2 Bayesian Approaches8.3.3 Neyman–Pearson Formulation8.3.4 Trainable Rules8.3.5 Fixed Rules8.4 Classifier Combination Methods8.4.1 Product Rule8.4.2 Sum Rule8.4.3 Min, Max and Median Combiners8.4.4 Majority Vote8.4.5 Borda Count8.4.6 Combiners Trained on Class Predictions8.4.7 Stacked Generalisation8.4.8 Mixture of Experts8.4.9 Bagging8.4.10 Boosting8.4.11 Random Forests8.4.12 Model 0382382385387389390

P1: OTA/XYZJWST102-fmP2: ABCJWST102-WebbxivSeptember 8, 20118:52Printer Name: Yet to ComeCONTENTS8.4.13 Summary of Methods8.4.14 Example Application Study8.4.15 Further Developments8.5 Application Studies8.6 Summary and Discussion8.7 Recommendations8.8 Notes and ReferencesExercises9 Performance Assessment9.1 Introduction9.2 Performance Assessment9.2.1 Performance Measures9.2.2 Discriminability9.2.3 Reliability9.2.4 ROC Curves for Performance Assessment9.2.5 Population and Sensor Drift9.2.6 Example Application Study9.2.7 Further Developments9.2.8 Summary9.3 Comparing Classifier Performance9.3.1 Which Technique is Best?9.3.2 Statistical Tests9.3.3 Comparing Rules When Misclassification Costs are Uncertain9.3.4 Example Application Study9.3.5 Further Developments9.3.6 Summary9.4 Application Studies9.5 Summary and Discussion9.6 Recommendations9.7 Notes and ReferencesExercises10Feature Selection and Extraction10.1 Introduction10.2 Feature Selection10.2.1 Introduction10.2.2 Characterisation of Feature Selection Approaches10.2.3 Evaluation Measures10.2.4 Search Algorithms for Feature Subset Selection10.2.5 Complete Search – Branch and Bound10.2.6 Sequential Search10.2.7 Random Search10.2.8 Markov Blanket10.2.9 Stability of Feature Selection10.2.10 Example Application Study10.2.11 Further Developments10.2.12 9460462462463

P1: OTA/XYZJWST102-fmP2: ABCJWST102-WebbSeptember 8, 20118:52Printer Name: Yet to ComeCONTENTS11xv10.3Linear Feature Extraction10.3.1 Principal Components Analysis10.3.2 Karhunen–Loève Transformation10.3.3 Example Application Study10.3.4 Further Developments10.3.5 Summary10.4 Multidimensional Scaling10.4.1 Classical Scaling10.4.2 Metric MDS10.4.3 Ordinal Scaling10.4.4 Algorithms10.4.5 MDS for Feature Extraction10.4.6 Example Application Study10.4.7 Further Developments10.4.8 Summary10.5 Application Studies10.6 Summary and Discussion10.7 Recommendations10.8 Notes and 90491492493493493495495496497Clustering11.1 Introduction11.2 Hierarchical Methods11.2.1 Single-Link Method11.2.2 Complete-Link Method11.2.3 Sum-of-Squares Method11.2.4 General Agglomerative Algorithm11.2.5 Properties of a Hierarchical Classification11.2.6 Example Application Study11.2.7 Summary11.3 Quick Partitions11.4 Mixture Models11.4.1 Model Description11.4.2 Example Application Study11.5 Sum-of-Squares Methods11.5.1 Clustering Criteria11.5.2 Clustering Algorithms11.5.3 Vector Quantisation11.5.4 Example Application Study11.5.5 Further Developments11.5.6 Summary11.6 Spectral Clustering11.6.1 Elementary Graph Theory11.6.2 Similarity Matrices11.6.3 Application to Clustering11.6.4 Spectral Clustering Algorithm11.6.5 Forms of Graph oks.info

P1: OTA/XYZJWST102-fmP2: ABCJWST102-Webbxvi12September 8, 20118:52Printer Name: Yet to ComeCONTENTS11.6.6 Example Application Study11.6.7 Further Developments11.6.8 Summary11.7Cluster Validity11.7.1 Introduction11.7.2 Statistical Tests11.7.3 Absence of Class Structure11.7.4 Validity of Individual Clusters11.7.5 Hierarchical Clustering11.7.6 Validation of Individual Clusterings11.7.7 Partitions11.7.8 Relative Criteria11.7.9 Choosing the Number of Clusters11.8Application Studies11.9Summary and Discussion11.10 Recommendations11.11 Notes and 43543545546549551552553Complex Networks12.1Introduction12.1.1 Characteristics12.1.2 Properties12.1.3 Questions to Address12.1.4 Descriptive Features12.1.5 Outline12.2Mathematics of Networks12.2.1 Graph Matrices12.2.2 Connectivity12.2.3 Distance Measures12.2.4 Weighted Networks12.2.5 Centrality Measures12.2.6 Random Graphs12.3Community Detection12.3.1 Clustering Methods12.3.2 Girvan–Newman Algorithm12.3.3 Modularity Approaches12.3.4 Local Modularity12.3.5 Clique Percolation12.3.6 Example Application Study12.3.7 Further Developments12.3.8 Summary12.4Link Prediction12.4.1 Approaches to Link Prediction12.4.2 Example Application Study12.4.3 Further Developments12.5Application ooks.info

P1: OTA/XYZJWST102-fmP2: ABCJWST102-WebbSeptember 8, 20118:52Printer Name: Yet to ComeCONTENTS13xvii12.6 Summary and Discussion12.7 Recommendations12.8 Notes and ReferencesExercises579580580580Additional Topics13.1 Model Selection13.1.1 Separate Training and Test Sets13.1.2 Cross-Validation13.1.3 The Bayesian Viewpoint13.1.4 Akaike’s Information Criterion13.1.5 Minimum Description Length13.2 Missing Data13.3 Outlier Detection and Robust Procedures13.4 Mixed Continuous and Discrete Variables13.5 Structural Risk Minimisation and the Vapnik–Chervonenkis Dimension13.5.1 Bounds on the Expected Risk13.5.2 The VC ferences591Index637www.it-ebooks.info

P1: OTA/XYZJWST102-fmP2: ABCJWST102-WebbSeptember 8, 20118:52Printer Name: Yet to Comewww.it-ebooks.info

P1: OTA/XYZP2: ABCJWST102-PrefaceJWST102-WebbAugust 23, 201119:56Printer Name: Yet to ComePrefaceThis book provides an introduction to statistical pattern recognition theory and techniques.Most of the material presented in this book is concerned with discrimination and classificationand has been drawn from a wide range of literature including that of engineering, statistics,computer science and the social sciences. The aim of the book is to provide descriptions ofmany of the most useful of today’s pattern processing techniques including many of the recentadvances in nonparametric approaches to discrimination and Bayesian computational methodsdeveloped in the statistics literature and elsewhere. Discussions provided on the motivationsand theory behind these techniques will enable the practitioner to gain maximum benefitfrom their implementations within many of the popular software packages. The techniquesare illustrated with examples of real-world applications studies. Pointers are also providedto the diverse literature base where further details on applications, comparative studies andtheoretical developments may be obtained.The book grew out of our research on the development of statistical pattern recognitionmethodology and its application to practical sensor data analysis problems. The book is aimedat advanced undergraduate and graduate courses. Some of the material has been presented aspart of a graduate course on pattern recognition and at pattern recognition summer schools.It is also designed for practitioners in the field of pattern recognition as well as researchersin the area. A prerequisite is a knowledge of basic probability theory and linear algebra,together with basic knowledge of mathematical methods (for example, Lagrange multipliersare used to solve problems with equality and inequality constraints in some derivations). Somebasic material (which was provided as appendices in the second edition) is available on thebook’s website.ScopeThe book presents most of the popular methods of statistical pattern recognition. However, many of the important developments in pattern recognition are not confined to thestatistics literature and have occurred where the area overlaps with research in machinelearning. Therefore, where we have felt that straying beyond the traditional boundaries ofstatistical pattern recognition would be beneficial, we have done so. An example is thewww.it-ebooks.info

P1: OTA/XYZP2: ABCJWST102-PrefaceJWST102-WebbxxAugust 23, 201119:56Printer Name: Yet to ComePREFACEinclusion of some rule induction methods as a complementary approach to rule discovery bydecision tree induction.Most of the methodology is generic – it is not specific to a particular type of data orapplication. Thus, we exclude preprocessing methods and filtering methods commonly usedin signal and image processing.ApproachThe approach in each chapter has been to introduce some of the basic concepts and algorithmsand to conclude each section on a technique or a class of techniques with a practical applicationof the approach from the literature. The main aim has been to introduce the basic conceptof an approach. Sometimes this has required some detailed mathematical description andclearly we have had to draw a line on how much depth we discuss a particular topic. Mostof the topics have whole books devoted to them and so we have had to be selective in ourchoice of material. Therefore, the chapters conclude with a section on the key references.The exercises at the ends of the chapters vary from ‘open book’ questions to more lengthycomputer projects.New to the third editionMany sections have been rewritten and new material added. The new features of this editioninclude the following:r A new chapter on Bayesian approaches to density estimation (Chapter 3) includingexpanded material on Bayesian sampling schemes and Markov chain Monte Carlomethods, and new sections on Sequential Monte Carlo samplers and VariationalBayes approaches.r New sections on nonparametric methods of density estimation.r Rule induction.r New chapter on ensemble methods of classification.r Revision of feature selection material with new section on stability.r Spectral clustering.r New chapter on complex networks, with relevance to the high-growth field of socialand computer network analysis.Book outlineChapter 1 provides an introduction to statistical pattern recognition, defining some terminology, introducing supervised and unsupervised classification. Two related approaches tosupervised classification are presented: one based on the use of probability density functionswww.it-ebooks.info

P1: OTA/XYZP2: ABCJWST102-PrefaceJWST102-WebbAugust 23, 201119:56Printer Name: Yet to ComePREFACExxiand a second based on the construction of discriminant functions. The chapter concludes withan outline of the pattern recognition cycle, putting the remaining chapters of the book intocontext. Chapters 2, 3 and 4 pursue the density function approach to discrimination. Chapter 2 addresses parametric approaches to density estimation, which are developed furtherin Chapter 3 on Bayesian methods. Chapter 4 develops classifiers based on nonparametric schemes, including the popular k nearest neighbour method, with associated efficientsearch algorithms.Chapters 5–7 develop discriminant function approaches to supervised classification.Chapter 5 focuses on linear discriminant functions; much of the methodology of this chapter(including optimisation, regularisation, support vector machines) is used in some of the nonlinear methods described in Chapter 6 which explores kernel-based methods, in particular,the radial basis function network and the support vector machine, and projection-based methods (the multilayer perceptron). These are commonly referred to as neural network methods.Chapter 7 considers approaches to discrimination that enable the classification function to becast in the form of an interpretable rule, important for some applications.Chapter 8 considers ensemble methods – combining classifiers for improved robustness.Chapter 9 considers methods of measuring the performance of a classifier.The techniques of Chapters 10 and 11 may be described as methods of exploratory dataanalysis or preprocessing (and as such would usually be carried out prior to the supervisedclassification techniques of Chapters 5–7, although they could, on occasion, be post-processorsof supervised techniques). Chapter 10 addresses feature selection and feature extraction – theprocedures for obtaining a reduced set of variables characterising the original data. Suchprocedures are often an integral part of classifier design and it is somewhat artificial topartition the pattern recognition problem into separate processes of feature extraction andclassification. However, feature extraction may provide insights into the data structure andthe type of classifier to employ; thus, it is of interest in its own right. Chapter 11 considersunsupervised classification or clustering – the process of grouping individuals in a populationto discover the presence of structure; its engineering application is to vector quantisation forimage and speech coding. Chapter 12 on complex networks introduces methods for analysingdata that may be represented using the mathematical concept of a graph. This has greatrelevance to social and computer networks.Finally, Chapter 13 addresses some important diverse topics including model selection.Book websiteThe website www.wiley.com/go/statistical pattern recognition contains supplementary material on topics including measures of dissimilarity, estimation, linear algebra, data analysisand basic probability.AcknowledgementsIn preparing the third edition of this book we have been helped by many people. We areespecially grateful to Dr Gavin Cawley, University of East Anglia, for help and advice. Weare grateful to friends and colleagues (past and present, from RSRE, DERA and QinetiQ)www.it-ebooks.info

P1: OTA/XYZP2: ABCJWST102-PrefaceJWST102-WebbxxiiAugust 23, 201119:56Printer Name: Yet to ComePREFACEwho have provided encouragement and made comments on various parts of the manuscript.In particular, we would like to thank Anna Skeoch for providing figures for Chapter 12; andRichard Davies and colleagues at John Wiley for help in the final production of the manuscript.Andrew Webb is especially thankful to Rosemary for her love, support and patience.Andrew R. WebbKeith D. Copseywww.it-ebooks.info

P1: OTA/XYZP2: ABCJWST102-NotationJWST102-WebbAugust 26, 201117:52Printer Name: Yet to ComeNotationSome of the more commonly used notation is given below. We have used some notationalconveniences. For example, we have tended to use the same symbol for a variable as well as ameasurement on that variable. The meaning should be obvious from context. Also, we denotethe density function of x as p(x) and y as p(y), even though the functions differ. A vectoris denoted by a lower case quantity in bold face, and a matrix by upper case. Since patternrecognition is very much a multidisci

Statistical pattern recognition relates to the use of statistical techniques for analysing data measurements in order to extract information and make justified decisions. It is a very active area of study and research, which has seen