Medical Applications Of Pattern Recognition

Transcription

MedicalApplications of PatternRecognitionbyNeşe YalabıkHIBIT'10, Antalya,April 2010

Outline Part 1:Introduction:Definitions and Terminology Part 2:Historical Background Part 3: PR Techniques used in Medicine andApplication ExamplesHIBIT'10, Antalya, April 20102/58

Part 1:Introduction:Definitions and TerminologyHIBIT'10, Antalya, April 20103/58

Definitions and Terminology Medical Informatics : Is an interdisciplinary scientific fieldof research that deals with the use of Information andCommunication Technologies and Systems for clinicalhealth care, for more accurate and faster service topeople.Pattern Recognition(PR): Automated analysis ofcollected attributes of objects, events,etc. to classify theminto categories.Medical Pattern Recognition: All PR Techniques indecision support and treatment of illnessesHIBIT'10, Antalya, April 20104/58

Example Applications of PatternRecognition Reading hand-written text to classify it into letters andwords Analyzing fingerprints to find the owner Recognizing the faces of people to name them Finding buildings in a satellite image Naming a gun from its bullet mark(Ballistics) Identifying different objects on a conveyor belt Analyzing test results in decision support for anyillnessHIBIT'10, Antalya, April 20105/58

Pattern Recognition andClassification: An IntroductionWe human beings do pattern recognition everyday.We “recognize” and classify many things, even if it is corrupted by noise,distorted and variable. Classification is the result of recognition: categorization, generalization A problem is a PR problem only if it involves ‘statistical variation’How do we do it? Automatic pattern recognition has 50 years of history Many different approaches tried Limited success in many problems Successful only with restricted environments and limited categories.HIBIT'10, Antalya, April 20106/58

Variation in PR Problems We see here that all 9's are different from each otherand 9's and 4's can easily be mixedHIBIT'10, Antalya, April 20107/58

Unlimited RecognitionTurns out that unlimited recognition is still a dream,such as: Continuous speech recognition Cursive script Unlimited medical diagnosis Unlimited fingerprint recognitionToday applications aim at limiting these to simpler problems.A more detailed definition of P.R.: The process of machineperception for an automatic labeling of an object or an event intoone of the predefined categories.HIBIT'10, Antalya, April 20108/58

ClassifiersunknowndataLetter AAhmetF.PUnknownFingerprintLetter BAli F.PLetter CHIBIT'10, Antalya, April 2010MehmetF. P9/58

Objective in PRMinimize the average error (at least as good as a human being)Minimize the risk: wrong decision could be more risky in some cases such asmedical diagnosisWhy automize? Obvious reason: save from time and effort(Ex: consensus forms: enter 100 million records into electronic medium).How do machines solve it: Many different approaches in history Template matching Use statistics, decision theory “statistical pattern recognition” Use “ neural networks” self learning systems Tree Classifiers Support Vector machines MulticlassifiersHIBIT'10, Antalya, April 201010/58

Learning and FeaturesWhichever approach is used, there’s a classification processData: LearningLearning ClassificationResult “Learning samples” Large data sets to be used in training, orestimating parameters, etc. “Result” a decision on the category sample belongs. “Test Samples” used in testing the classifier performance. L.S and T.S may have an overlap. “Data” a raw data pre-processing feature set. “Feature” a discriminating, easily measurable characteristics ofour data.In all approaches, samples from different categories should givedistant numerical values for features.HIBIT'10, Antalya, April 201011/58

Ex. For letter A, a feature2-d arrayAprocessing[ M 0 , M 1 ,., M k ]M: moments invariants (center of growing obtained from the A featurevector! A model of the underlying system that generated it.Letter ALetter BThere is always an error probability in decision!How many features should we use?Not small, but not too large either.(curse of dimensionality)HIBIT'10, Antalya, April 201012/58

Classificationfeature 1LetterAreteLBfeature 2How do we separate A ’s from B ‘s? From a decision boundary Classify the sample to the side it fallsMany classification methods exist Parametric: Bayes Decision Theory, Parameterize as belonging toa probabilistic variable. Non-parametric: discriminant functions, nearest neighbor rule useonly learning samples Tree classifiersHIBIT'10, Antalya, April 201013/58

Given the learning data set, supervised learning, learn parameters ofP.R.clusteringIf we do not have enough data, we incorporate “domain knowledge” forexample, we already know that letter A is written by hand in form of 2 or3 strokes.orSo maybe recognizing strokes rather than the complete letters first is abetter idea. Also consider the text.HIBIT'10, Antalya, April 201014/58

Statistical Approach to P.RX [ X 1 , X 2 ,., X d ]Dimension of the feature space:Set of different states of nature:dcCategories: {ω1 , ω2 ,., ωc }findforRiRiRi R j ϕR3g1R1g3uRi R dgi ( X ) g j ( X )R2HIBIT'10, Antalya, April 2010g215/58

A Pattern Classifierg1 ( X )Xg2 ( X )gc ( X )So our aim now will be to define these functionsto minimize or optimize a criterion.HIBIT'10, Antalya, April 2010Maxαkg1 , g 2 ,., g c16/58

Pattern Recognition in MedicalDecision Support 50 years ago, we tried to make systems that will'diagnose' an illness without a physicanToday, we make systems that we call ‘decisionsupport’ that only gives opinion to physicianInterpreting all kinds of collected medical data, whichis hugeHIBIT'10, Antalya, April 201017/58

Pattern Recognition in MedicalDecision Support Examples: Interpreting 1-d data such as in ECG, EEG Interpreting 2-d data: detecting cells, tumors or any otherabnormalities in any x-ray, MR, tomography etc. Sequence processing in genetic data Processing of any collected numerical data such as blood test results Processing any collected non-numeric data such as patient history,doctor interpretations and reportsUsing more than one of these together to use in decisions andtreatment of an illnessHIBIT'10, Antalya, April 201018/58

Part 2:Historical BackgroundHIBIT'10, Antalya, April 201019/58

Historical Background Earlier in 60's and 70’s of the 20th century wherecomputers were thought to be able to solve any problems,it was thought that it was easy Enter the symptoms, diagnose the illness Unfortunately it did not work! As in all PR problems, you had to limit yourselves to veryrestricted problemsHIBIT'10, Antalya, April 201020/58

Chromosome Analysis Karyotyping: orderingand enumerating thechromosomesDetect the abnormalitiesin chromosome spreadsto detect geneticdeseases, cancer etc.still an unsolvedproblem.HIBIT'10, Antalya, April 201021/58

ECG Analysis ECG and EEG analysis: First automated ECGinterpreters available in '70's, improved laterToday, many accurate machines availablePQRST curve: abnormalities detected by measuringvarious featuresHIBIT'10, Antalya, April 201022/58

Medical Diagnosis Decision Support In 80's and 90's, 'expert systems' were popular Most successful diagnostic application: Mycin was designed to diagnose infectious blood diseases andrecommend antibiotics in Stanford University Used ‘Expert Systems’ approach: 500 rules(if-then statements) a correct diagnosis rate of about 65%(better than most physicians), Legal issues : Who is responsible for the wrong diagnosis? Certainty factors in rules Never used in practice due to legal and ethical issues Also technical issues that are solved todayHIBIT'10, Antalya, April 201023/58

Example of a Decision Rule inMYCINRULE-507IF:1. The infection which requires therapy ismeningitis2. Organisms were not seen on the stain of theculture3. The type of the infection is bacterial4. The patient does not have a head injury defect5. The age of the patient is between 15 and 55 yearsThen:The organisms that might be causing the infectionare diplococcus-pneumoniae and neisseriameningitidisHIBIT'10, Antalya, April 201024/58

Medical Diagnosis Decision Support 90's and 2000's: Mycin-like system led to clinical 'decisionsupport systems' or 'diagnostic Clinical Decision SupportSystems' AI approach to PR Knowledge base, Inference Engine Non-knowledge based CDSS: Neural Networks, BayesianNetworks, Genetic Algorithms, Tree Classifiers, multiclassifiersetc. Shown to improve physician's performance in generalHIBIT'10, Antalya, April 201025/58

Part 3: PR Techniques used in Medicine andApplication ExamplesHIBIT'10, Antalya, April 201026/58

PR Techniques used in ClinicalMedicineLast 20 years many new approaches to PR, manysuccessfully applied to medicine. Neural Networks Bayesian Belief Networks Support Vector Machines Tree Classifiers Multiclassifiers. A combination of aboveHIBIT'10, Antalya, April 201027/58

Neural Networks Old approach. Perceptron in '50's by RosenblattRevived with new learning algorithms in 80's (BackPropagation)Used in many scientific problemsHIBIT'10, Antalya, April 201028/58

Biological vs. ArtificialBiological Neural NetworksA Neuron:A nerve cell as a part of nervous system and the brainHIBIT'10, Antalya, April 201029/58

Biological vs. Artificial 10 billion neurons and a huge number of connections in human brain.thinking, reasoning, learning and recognition are performed by theinformation storage and transfer between neuronsEach neuron “fires” sufficient amount of electric impulse is received fromother neurons.The information is transferred through successive firings of many neuronsthrough the network of neurons.Artificial Neural Networks: An artificial NN, or ANN or (a connectionist model, a neuromorphic system)is meant to beA simple, computational model of the biological NN.A simulation of above model in solving problems in pattern recognition,optimization etc.HIBIT'10, Antalya, April 201030/58

Y1Y2a neuronwwX1wwX2An Artificial Neural NetY1, Y2 – outputsX1, X2 – inputsw – neuron weightsHIBIT'10, Antalya, April 201031/58

Any application that involves Classification Optimization Clustering Scheduling Feature Extractionmay use ANN!WHY ANN? Easy to implement Self learning ability When parallel architectures are used, very fast. Performance at least as good as other approaches, in principle theyprovide nonlinear discriminants, so solve any P.R. problem.HIBIT'10, Antalya, April 201032/58

Multilayer Perceptrony1.ymHidden layer 2Hidden layer 1x1.xnFigure: Fully Connected Multilayer PerceptronHIBIT'10, Antalya, April 201033/58

Multilayer Perceptron It was shown that a MLP with 2 hidden layers can solve anydecision boundaries.Back-propagation learning algorithm: iteratively update theweights to obtain required input-output pairs.Inputs: Features, Outputs: one output/class.Successfully used in many bio-medical decision makingproblemsHIBIT'10, Antalya, April 201034/58

Tree Classifiers Consider the feature vector X (x1, x2, x3.xn)A tree classifier considers features one by oneinstead of as a whole and measures them one by one,following the leaves of a tree. The features are usuallybinary valued .An optimum tree can be constructed using learningsamples. Leaves of the tree correspond to the classes. Example will be seen in the following .HIBIT'10, Antalya, April 201035/58

Decision Tree ExampleThe decision 'to play tennis' treeAccording to weather ndyyesnormalyesfalseyesHIBIT'10,2010Decision treeforAntalya,theAprilweatherdata.trueno36/58

Example study‘OAGAIT’: A Decision Support System for Grading KneeOsteoarthritis using Gait Data'N. Köktaş, N. Yalabık, G. Yavuzer,P. Dunn, V. AtalayA Tübitak Project , 2006-2008 and a Ph.D. ThesisMETU Computer Engineering Dept. and AnkaraUniversity Gait LaboratoriesHIBIT'10, Antalya, April 201037/58

Gait Analysis What is gait analysis? process of collecting and analyzing quantitative information aboutwalking patterns of peopleWhere is it used? human identification clinical applicationsWhy is it important? for diagnosis, developing treatment plans and tracking theprogression of diseasesHIBIT'10, Antalya, April 201038/58

Osteoarthritis (OA) OA is a disorder that affects joint cartilage and surroundingtissueShows itself by pain, stiffness and loss of function of kneeKellgren-Lawrence method is used for radiologicalassessment Grade 0: NormalGrade 1: Doubtful narrowing of joint space and possible outgrowth of the boneGrade 2: Definite outgrowth of the bone and possible narrowing of joint spaceGrade 3: Moderate multiple outgrowths, definite narrowing of joints space,some hardening and possible deformity of bone contour;Grade 4: Large outgrowths, marked narrowing of joint space, severe hardeningand definite deformity of bone contour.HIBIT'10, Antalya, April 201039/58

XR image showing OA of theknee jointHIBIT'10, Antalya, April 201040/58

Gait Classification The aim is to support the physicians’ decision makingMost popular PR algorithms for gait classification are NNs,SVMs, FFT, PCA etc.Gait Laboratories in hospitals in Turkey are becoming verypopularThere are 5 gait laboratories only in AnkaraThe increasing amounts of collected data need to beanalyzed intelligentlyMD.s are seeking help of computer scientists fordeveloping toolsHIBIT'10, Antalya, April 201041/58

Properties of Gait Data Three sets of data is gathered in gait laboratory History and symptoms of the patients– Time-distance parameters of the gait– A {age, BMI, pain, stiffness, history, period, sex}B {Cadence, Walking Speed, Stride Time, Step Time,Single Support, Double Support, Stride Length, StepLength}Temporal changes of the joint angles (kinetic andkinematic gait variables)–C {PTilt, PObliq, PRot APRot}HIBIT'10, Antalya, April 201042/58

Implementation and results80% success rate with 100 testsamplesHIBIT'10, Antalya, April 201043/58

Bayesian Networks(BN) A Bayesian Belief Network: a knowledge-based graphicalrepresentation that shows a set of variables and theirprobabilistic relationships between diseases and symptoms.They are based on conditional probabilities, the probability of anevent given the occurrence of another event, such as theinterpretation of diagnostic tests. In the context of CDSS, theBayesian network can be used to compute the probabilitiesof the presence of the possible diseases given theirsymptoms.Some of the advantages of Bayesian Network include theknowledge and conclusions of experts in the form ofprobabilitiesas an assistance in decision making.HIBIT'10, Antalya, April 201044/58

A Simple Bayes Net Below net shows the probabilities between the case ofgrass being wet and sprinkler and rain conditions.Using the net, we can find the probability of rain if thegrass is wet.HIBIT'10, Antalya, April 201045/58

Example Study'Bayesian Networks in Medicine: a Model-basedApproach to Medical Decision Making'Peter Lucas,K-P. Adlassnig (ed.), Proceedings ofthe EUNITE workshop on Intelligent Systems inpatient Care, Vienna, Oct. 2001, pp. 73-97)HIBIT'10, Antalya, April 201046/58

Bayesian Networks in Medicine ' The BN formalism offers a natural way to representthe uncertainties involved in medicine when dealingwith diagnosis, treatment selection, planning, andprediction of prognosis ''A BN model that was developed to assist cliniciansin the diagnosis and selection of antibiotic treatmentfor patients with pneumonia'Domain expert knowledge is used in developing BNResults show a close match between expert opinionand BNHIBIT'10, Antalya, April 201047/58

A BN for pnomoniaHIBIT'10, Antalya, April 201048/58

Support Vector Machines(SVM) Support Vector Machines are extensions of LinearDiscriminant FunctionsLinear Discriminant Functions have linear decisionboundaries and found using learning samples onlyLinear separability: All learning samples are correctlyclassified by a linear decision boundaryNot possible for many casesAn SVM: An optimum linear discriminant functionwhere linear separability is provided by a featurespace extension to a higher dimensionHIBIT'10, Antalya, April 201049/58

Linear SeparabilityyXOR ProblemNot linearly separablexLinearly seperablenot seperableSolution 1Solution 2Many or no solutions possibleHIBIT'10, Antalya, April 201050/58

Here we see that by carrying the samples to a higher dimension results withseparability which was not the case in lower dimension.HIBIT'10, Antalya, April 201051/58

SVM carries the feature space to a higher dimensionby processing it with a nonlinear function called'Kernel Function'Then, finds an optimum boundary by making it equallyspaced from samples from different classes usingsamples called 'Support Vectors'HIBIT'10, Antalya, April 201052/58

SVM in Medical Decision Making A newer tool than others in medical decision makingas well as other applicationsConcluded to outperform other approaches in manystudies as compared to NN, BN and othersEven though it can be used for any problem,especially found to be successful in breast cancerstudiesHIBIT'10, Antalya, April 201053/58

Example Study'A Support Vector Machine Approach for Detection ofMicrocalcifications'Issam El-Naqa et alIEEE TRANSACTIONS ON MEDICAL IMAGING, VOL.21, NO. 12, DECEMBER 2002 Finds microcalcifications, that are pre-cancerouscycsts in breasts, from digital mammographs usingSVM and compares it with other approachesHIBIT'10, Antalya, April 201054/58

Microcalcifications inmammogramHIBIT'10, Antalya, April 201055/58

Performance Comparison using aFROC curve Higher the curve is, better the performanceHIBIT'10, Antalya, April 201056/58

Conclusions We discussed many methods to automatically labelillnesses, medical images and plotsRecent methods are usually used as a part of aDecision Support SystemEthical and legal issues prevent the development offully automatic systemsToday, Pattern Recognition methods are acceptedas useful tools in the service of M.D.'s asconsultants in clinical decision making.HIBIT'10, Antalya, April 201057/58

References MIN720 Pattern Classification in BiomedicalApplications' Course Lecture Notes, METUInformatics Institute, METU , 2010 'Pattern Classification' Duda, Hart, Stork, Wiley 2001 Wikipedia Free Encyclopedia - www.wikipedia.com Other references in their respective pagesHIBIT'10, Antalya, April 201058/58

HIBIT'10, Antalya, April 2010 6/58 Pattern Recognition and Classification: An Introduction We human beings do pattern recognition everyday. We "recognize" and classify many things, even if it is corrupted by noise, distorted and variable. Classification is the result of recognition: categorization, generalization A problem is a PR problem only if it involves 'statistical variation'