Transcription
A Cross-Entropy Method thatOptimizes Partially DecomposableProblems:A New Way to Interpret NMR SpectraSiamak Ravanbakhsh,Barnabas Poczos,Russell GreinerComputing Science Department, University of Alberta1
Metabolomics & NMR SpectroscopyMetabolomics study of chemical fingerprints that cellular processes leavebehindComputing Science Department, University of AlbertaAAAI’102
Metabolomics & NMR SpectroscopyMetabolomics study of chemical fingerprints that cellular processes leavebehindMetabolites: end products ofgene expressionComputing Science Department, University of AlbertaAAAI’102
Metabolomics & NMR SpectroscopyMetabolomics study of chemical fingerprints that cellular processes leavebehindMetabolites: end products ofgene expression{Gas chromatographyHigh performance liquid chromatographyCapillary electrophoresisMass Spectrometrysource: wikipediaToolbox:Nuclear Magnetic Resonance (NMR) SpectroscopyComputing Science Department, University of AlbertaAAAI’102
Nuclear Magnetic Resonance (NMR) SpectroscopyComputing Science Department, University of AlbertaAAAI’103
Nuclear Magnetic Resonance (NMR) SpectroscopyNMR SpectrumComputing Science Department, University of AlbertaAAAI’103
Nuclear Magnetic Resonance (NMR) Spectroscopyheight ( a )width (w)Spectrum is made of manyLorentzian peaksoffset ( )NMR SpectrumComputing Science Department, University of AlbertaAAAI’103
3-Hydroxyisobutyric acidComputing Science Department, University of AlbertaAAAI’104
3-Hydroxyisobutyric acidComputing Science Department, University of AlbertaAAAI’104
clusterShift3-Hydroxyisobutyric acidComputing Science Department, University of AlbertaAAAI’104
ConcentrationclusterShift3-Hydroxyisobutyric acidComputing Science Department, University of AlbertaAAAI’104
The Goal is:Given The library of metabolite SignaturesFind Corresponding ConcentrationComputing Science Department, University of AlbertaAAAI’105
The Goal is:Given The library of metabolite SignaturesFind Corresponding ConcentrationOptimization variables: Metabolite Concentrations Chemical ShiftsComputing Science Department, University of AlbertaAAAI’105
The Goal is:Given The library of metabolite SignaturesFind Corresponding ConcentrationOptimization variables: Metabolite Concentrations Chemical ShiftsIt’s a difficult Problem Nonlinear in shift variables Involves hundreds of variables Loss is very expensive to evaluate (100K points) Non-Convex even around local optima Incomplete library results in over-fittingComputing Science Department, University of AlbertaAAAI’105
Exploiting the structureComputing Science Department, University of AlbertaAAAI’106
Exploiting the structureSub-problemsare easier to solvebut theyshare variablesComputing Science Department, University of AlbertaAAAI’106
Exploiting the structureSub-problemsare easier to solvebut theyshare variablesComputing Science Department, University of AlbertaAAAI’106
Coupling Matrixrepresents the structureComputing Science Department, University of AlbertaAAAI’107
Coupling Matrixrepresents the structureSub-problemsVariablesComputing Science Department, University of AlbertaAAAI’107
Coupling Matrixrepresents the structureVariablesSub-problemsSudoku PuzzleComputing Science Department, University of AlbertaAAAI’107
Coupling Matrixrepresents the structureVariablesSATisfiability ProblemComputing Science Department, University of AlbertaAAAI’107
Cross Entropy (CE)Method for OptimizationComputing Science Department, University of AlbertaAAAI’108
Cross Entropy (CE)Method for OptimizationStart from a priorLoss functionPriorComputing Science Department, University of AlbertaAAAI’108
Cross Entropy (CE)Method for OptimizationStart from a priorLoss functionRepeat until convergenceTake samples from current dist.Calculate the loss for samplesPriorComputing Science Department, University of AlbertaAAAI’108
Cross Entropy (CE)Method for OptimizationStart from a priorLoss functionRepeat until convergenceTake samples from current dist.Calculate the loss for samplesSelect Elite samplesPriorComputing Science Department, University of AlbertaAAAI’108
Cross Entropy (CE)Method for OptimizationStart from a priorRepeat until convergenceTake samples from current dist.Calculate the loss for samplesSelect Elite samplesFind maximum likelihood dist. for ElitesComputing Science Department, University of AlbertaAAAI’108
Cross Entropy (CE)Method for OptimizationStart from a priorRepeat until convergenceTake samples from current dist.Calculate the loss for samplesSelect Elite samplesFind maximum likelihood dist. for ElitesA subroutine to be used againComputing Science Department, University of AlbertaAAAI’108
Variation of CEVariables ( i )Sub-problems (k)that Exploits Decomposability (CEED)coupling matrixComputing Science Department, University of AlbertaAAAI’109
Variation of CEVariables ( i )Start from a priorRepeat,Sub-problems (k)that Exploits Decomposability (CEED)coupling matrixUntil convergenceComputing Science Department, University of AlbertaAAAI’109
Variation of CEVariables ( i )Start from a priorRepeat,For each sub-problem kSub-problems (k)that Exploits Decomposability (CEED)Draw samples from marginal of current dist.Calculate the loss for samplescoupling matrixSelect Elite samplesFind maximum likelihood dist. for ElitesUntil convergenceComputing Science Department, University of AlbertaAAAI’109
Variation of CEVariables ( i )Start from a priorRepeat,For each sub-problem kSub-problems (k)that Exploits Decomposability (CEED)Draw samples from marginal of current dist.Calculate the loss for samplescoupling matrixSelect Elite samplesFind maximum likelihood dist. for ElitesFor each variable iCombinedist’s from related sub-problemsUntil convergenceComputing Science Department, University of AlbertaAAAI’109
How toCombineML DistributionsComputing Science Department, University of AlbertaAAAI’1010
How toML DistributionsWe use linear combination by Fisher InformationCombineComputing Science Department, University of AlbertaAAAI’1010
How toML DistributionsWe use linear combination by Fisher InformationCombineSecond distributionCombined distributionGaussian DistributionsFirst distributionComputing Science Department, University of AlbertaAAAI’1010
How toML DistributionsWe use linear combination by Fisher InformationCombineGaussian DistributionsComputing Science Department, University of AlbertaAAAI’1010
How toML DistributionsWe use linear combination by Fisher InformationCombineBeta DistributionsComputing Science Department, University of AlbertaAAAI’1010
How toML DistributionsWe use linear combination by Fisher InformationCombineBeta DistributionsComputing Science Department, University of AlbertaAAAI’1010
How toML DistributionsWe use linear combination by Fisher InformationCombineBernoulli DistributionsComputing Science Department, University of AlbertaAAAI’1010
How toML DistributionsWe use linear combination by Fisher InformationCombineBernoulli DistributionsA Tweak for NMR problemComputing Science Department, University of AlbertaAAAI’1010
Experimental Results800 MHz SpectraComputing Science Department, University of AlbertaAAAI’1011
Experimental Results500 MHz SpectraComputing Science Department, University of AlbertaAAAI’1011
Experimental ResultsSimulated SpectraComputing Science Department, University of AlbertaAAAI’1011
Experimental ResultsComparison with ChenomX Inc. automated fitting softwareQuantification TaskMetabolite Detection TaskThreshold: .02 mMolAvg. Relative ErrorUsThemComputing Science Department, University of AlbertaAAAI’1011
Experimental ResultsComputing Science Department, University of AlbertaAAAI’1011
MaxSATAnalytically update of dist’s is possible.variableConvergence of distributions to the correct assignmentiterationComputing Science Department, University of AlbertaAAAI’1012
MaxSAT#clauses 4250#vars1000# satisfied clausesConvergence of CE and CEEDiterationComputing Science Department, University of AlbertaAAAI’1012
Sudokusolution in redUsing categorical distributionOur Alg. is 5 times faster than CEComputing Science Department, University of AlbertaAAAI’1013
ConclusionOur method successfully exploits partial decomposability intargeted profiling of NMR spectraas well as some combinatorial problems (i.e. SAT & Sudoku)Maximum likelihood estimates could be linearly combined based ontheir certainty using their Fisher Information.Computing Science Department, University of AlbertaAAAI’1014
Thank you!Computing Science Dept. University of AlbertaAlberta Ingenuity Centre for Machine Learning15
that Exploits Decomposability (CEED) coupling matrix Draw samples from marginal of current dist. For each sub-problem k Calculate the loss for samples Select Elite samples Find maximum likelihood dist. for Elites 9. Computing Science Department, University of Alberta AAAIÕ10 Start from a prior