Topological Data Analysis In Python - GitHub Pages

Transcription

Topological Data Analysis in Pythonorganized by:Michael Bleher, Maximilian Schmahl and Daniel SpitzHeidelberg University26th - 28th of October 2020

ContentsHomology of SpacesHomology of DataRipserApplications

ContentsHomology of SpacesHomology of DataRipserApplicationsMaximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 23/ 36

HomologyAlgebraic Topology: Distinguishing topological spaces via algebraic invariantsMaximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 24/ 36

HomologyAlgebraic Topology: Distinguishing topological spaces via algebraic invariantsHomology: For each dimension d0 assigns to a topological space a vector spaceX 7! Hd (X )and to a continuous map a linear map(f : X ! Y ) 7! (Hd (f ) : Hd (X ) ! Hd (Y ))Maximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 24/ 36

HomologyAlgebraic Topology: Distinguishing topological spaces via algebraic invariantsHomology: For each dimension d0 assigns to a topological space a vector spaceX 7! Hd (X )and to a continuous map a linear map(f : X ! Y ) 7! (Hd (f ) : Hd (X ) ! Hd (Y ))Definitiond (X ) dim Hd (X ) is called the d-th Betti number of X .Maximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 24/ 36

What homology can distinguish I000Mel XMaximilian SchmahlwYBelatßal 4Python Course on Topological Methods in Data Analysis - Day 25/ 36

What homology can distinguish IIß0Maximilian Schmahlß1ß2Python Course on Topological Methods in Data Analysis - Day 26/ 36

What homology can distinguish III0ß1Maximilian SchmahlßDf0ßPython Course on Topological Methods in Data Analysis - Day 217/ 36

What homology cannot distinguish I00SOOMaximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 28/ 36

What homology cannot distinguish IIgoMaximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 29/ 36

Homology in a nutshelld (X )is the number of d-dimensional holes in X .Maximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 210/ 36

Geometric Simplicial Complexes IDefinitionA geometric n-simplex is the convex hull of n 1 affinely independent points in Rm .WMaximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 211/ 36

Geometric Simplicial Complexes IDefinitionA geometric n-simplex is the convex hull of n 1 affinely independent points in Rm .If conv X is a simplex and Y X , then conv Y is called a face of conv X .IIIMaximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 211/ 36

Geometric Simplicial Complexes IDefinitionA geometric n-simplex is the convex hull of n 1 affinely independent points in Rm .If conv X is a simplex and Y X , then conv Y is called a face of conv X .A geometric simplicial complex is a finite set K of simplices such thatI if 2 K and is a face of K , then 2 K , andI if , 2 K and \ 6 ;, then \ is a face of and of .AMaximilian SchmahlI ÄPython Course on Topological Methods in Data Analysis - Day 211/ 36

Boundary matrixDefinitionLet K { 1 , . . . ,D (di,j ) asdi,j Maximilian Schmahlm}(10be a simplicial complex. We define its boundary matrixif i is a boundary ofotherwise.jwith dimi dimPython Course on Topological Methods in Data Analysis - Day 2j1,12/ 36

Boundary matrix1 t 1DefinitionLet K { 1 , . . . ,D (di,j ) asdi,j m}(10Obe a simplicial complex. We define its boundary matrixif i is a boundary ofotherwise.jwith dimi dimj1,ExamplecMaximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 212/ 36

Matrix reduction IDefinitionLet v (v1 , . . . , vm )T be a column vector. We definepivot v max{i 2 {1, . . . , m} vi 6 0}.Maximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 213/ 36

Matrix reduction IDefinitionLet v (v1 , . . . , vm )T be a column vector. We definepivot v max{i 2 {1, . . . , m} vi 6 0}.If M is a matrix, we write mj for its j-th column and cols M for the set of all itsnon-zero columns.Maximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 213/ 36

Matrix reduction IDefinitionLet v (v1 , . . . , vm )T be a column vector. We definepivot v max{i 2 {1, . . . , m} vi 6 0}.If M is a matrix, we write mj for its j-th column and cols M for the set of all itsnon-zero columns.We write pivots M {pivot mj mj 2 cols M}.Maximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 213/ 36

Matrix reduction IDefinitionLet v (v1 , . . . , vm )T be a column vector. We definepivot v max{i 2 {1, . . . , m} vi 6 0}.If M is a matrix, we write mj for its j-th column and cols M for the set of all itsnon-zero columns.We write pivots M {pivot mj mj 2 cols M}.We say that M is reduced if pivot mj mk implies mj mk for all mj , mk 2 cols M.mpiu.tnMaximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 213/ 36

Matrix reduction IDefinitionLet v (v1 , . . . , vm )T be a column vector. We definepivot v max{i 2 {1, . . . , m} vi 6 0}.If M is a matrix, we write mj for its j-th column and cols M for the set of all itsnon-zero columns.We write pivots M {pivot mj mj 2 cols M}.We say that M is reduced if pivot mj mk implies mj mk for all mj , mk 2 cols M.ExampleReduced:Not reduced:Maximilian SchmahlIcPython Course on Topological Methods in Data Analysis - Day 213/ 36

Matrix reduction IIDefinitionLet M be a matrix. We say that (R, V ) is a reduction of M if R is reduced, V isupper-triangular and invertible and we have R MV .Maximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 214/ 36

Matrix reduction IIDefinitionLet M be a matrix. We say that (R, V ) is a reduction of M if R is reduced, V isupper-triangular and invertible and we have R MV .Input: MOutput: (R, V ) reduction of MRMVIwhile 9i j with pivot Ri pivot Rj doRjRj RiVjVj Viend whilereturn (R, V )Maximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 214/ 36

Matrix reduction IIIIMaximilian SchmahlILalPython Course on Topological Methods in Data Analysis - Day 215/ 36

Computing homology ITheoremLet K { 1 , . . . , m } be a simplicial complex with boundary matrix D. Let (R, V ) bea reduction of D. Thend (K )Maximilian Schmahl #{i Ri 0, i 2/ pivots R, dimi d}.Python Course on Topological Methods in Data Analysis - Day 216/ 36

Computing homology ITheoremLet K { 1 , . . . , m } be a simplicial complex with boundary matrix D. Let (R, V ) bea reduction of D. Thend (K )ExampleO #{i Ri 0, i 2/ pivots R, dimEfdirMaximilian SchmahlO2 3ldrini d}.RtPivots RPython Course on Topological Methods in Data Analysis - Day 2L2,316/ 36

Computing homology IITopological spaceTriangulationSimplicial complexMatrix reductionBetti numbersMaximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 217/ 36

ContentsHomology of SpacesHomology of DataRipserApplicationsMaximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 218/ 36

Manifold recovery IMaximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 219/ 36

Manifold recovery IMaximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 219/ 36

Manifold recovery I0Maximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 219/ 36

Manifold recovery IMaximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 219/ 36

Manifold recovery IMaximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 219/ 36

Manifold recovery IITheorem (Niyogi, Smale, Weinberger)Let M Rk be a manifoldS and P M. There exists c(M) 0 such that for everyc(M) 0 with M x2P B (x) we have![H (M) B (x) H x2PMaximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 220/ 36

Manifold recovery IITheorem (Niyogi, Smale, Weinberger)Let M Rk be a manifoldS and P M. There exists c(M) 0 such that for everyc(M) 0 with M x2P B (x) we have![H (M) B (x) H x2PHow do we choose ?Maximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 220/ 36

Manifold recovery IITheorem (Niyogi, Smale, Weinberger)Let M Rk be a manifoldS and P M. There exists c(M) 0 such that for everyc(M) 0 with M x2P B (x) we have![H (M) B (x) H x2PHow do we choose ?We don’t!Maximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 220/ 36

Complexes from balls IMaximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 221/ 36

Complexes from balls IMaximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 221/ 36

Complexes from balls IMaximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 221/ 36

Complexes from balls IMaximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 221/ 36

Complexes from Balls IIDefinitionLet P Rk be a finite set. For t 0 define the Čech complex\Čecht (P) {Q P Bt (x) 6 ;}.x2QNote: Čecht (P) Čechu (P) whenever t u.Maximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 222/ 36

Complexes from Balls IIDefinitionLet P Rk be a finite set. For t 0 define the Čech complex\Čecht (P) {Q P Bt (x) 6 ;}.x2QNote: Čecht (P) Čechu (P) whenever t u.Theorem (Nerve Theorem)TLet P Rk be a finite set and t 0. If x2Q Bt (x) can be deformed to a point for allQ P, then![H Bt (x) H (Čecht (P))x2PMaximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 222/ 36

Rips complexesDefinitionLet P Rk be a finite set. For t0 we define the Rips complexRipst (P) {Q P sup d(x, y ) t}.x,y 2QNote: Ripst (P) Ripsu (P) whenever t u.Maximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 2Iaaf23/ 36

Rips complexesDefinitionLet P Rk be a finite set. For t0 we define the Rips complexRipst (P) {Q P sup d(x, y ) t}.x,y 2QNote: Ripst (P) Ripsu (P) whenever t u.TheoremThere exists 0 such that for all finite sets P Rk and t 0 we haveRipst Čech t (P)andČecht Rips t (P)Maximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 223/ 36

Homology for DataMaximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 224/ 36

Homology for Data0.1Maximilian Schmahl0.20.4Python Course on Topological Methods in Data Analysis - Day 20.8δ24/ 36

Homology for Data0.1Maximilian Schmahl0.20.4Python Course on Topological Methods in Data Analysis - Day 20.8δ24/ 36

Homology for Data0.1Maximilian Schmahl0.20.4Python Course on Topological Methods in Data Analysis - Day 20.8δ24/ 36

Homology for Data0.1Maximilian Schmahl0.20.4Python Course on Topological Methods in Data Analysis - Day 20.8δ24/ 36

Homology for Data0.1Maximilian Schmahl0.20.4Python Course on Topological Methods in Data Analysis - Day 20.8δ24/ 36

Homology for Data0.1Maximilian Schmahl0.20.4Python Course on Topological Methods in Data Analysis - Day 20.8δ24/ 36

BarcodesDefinitionA family of finite-dimensional vector spaces Vi , i 1, . . . , n with linear mapsVi ! Vi 1 for each i is called a persistence module.Maximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 225/ 36

BarcodesDefinitionKEKH kEKEEHlkKuH KuA family of finite-dimensional vector spaces Vi , i 1, . . . , n with linear mapsVi ! Vi 1 for each i is called a persistence module.TheoremLet4GabrielV1V24ten.Vn1UsEtVnbe a persistence module. Then there exists a unique family of intervals (Ik )k2K withIk {1, . . . , n} such thatdim Vi #{k 2 K i 2 Ik }andrank(Vi ! Vj ) #{k 2 K i, j 2 Ik }Maximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 225/ 36

Persistent homologyDefinitionLet K be a simplicial complex. A filtration of K is a family of simplicial complexesK1 , . . . , Kn such that Kn K and Ki Ki 1 for all n.Maximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 226/ 36

Persistent homologyDefinitionLet K be a simplicial complex. A filtration of K is a family of simplicial complexesK1 , . . . , Kn such that Kn K and Ki Ki 1 for all n.We callH (K1 )H (K2 ).H (Kn1)H (Kn )the persistent homology of the filtration.Maximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 226/ 36

Computing persistent homologyTheorem (Barannikov, Carlsson & Zomorodian, Edelsbrunner et al.)Let K { 1 , . . . ,subcomplex.Maximilian Schmahln}be a simplicial complex such that Ki {1, . . . ,Python Course on Topological Methods in Data Analysis - Day 2i}is a27/ 36

Computing persistent homologyTheorem (Barannikov, Carlsson & Zomorodian, Edelsbrunner et al.)Let K { 1 , . . . , n } be a simplicial complex such that Ki { 1 , . . . , i } is asubcomplex.Let D be the corresponding boundary matrix and (R, V ) a reduction of D.Maximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 227/ 36

Computing persistent homologyTheorem (Barannikov, Carlsson & Zomorodian, Edelsbrunner et al.)Let K { 1 , . . . , n } be a simplicial complex such that Ki { 1 , . . . , i } is asubcomplex.Let D be the corresponding boundary matrix and (R, V ) a reduction of D.Then{[i, 1) Ri 0, i 2/ pivots R} [ {[i, j) i pivot Rj }is the barcode ofH (K1 )Maximilian SchmahlH (K2 ).H (Kn1)Python Course on Topological Methods in Data Analysis - Day 2H (Kn )27/ 36

Persistent homology pipelineDataRips/Čech/Delaunay/.Filtration of ComplexesHomologyPersistence ModuleInterval DecompositionBarcodeMaximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 228/ 36

Stability IDefinitionLet P, Q Rk . We define their Hausdor distance as(dH (P, Q) maxMaximilian Schmahlsup inf d(p, q), sup inf d(p, q)p2P q2Qq2Q p2PPython Course on Topological Methods in Data Analysis - Day 2)29/ 36

Stability IDefinitionLet P, Q Rk . We define their Hausdor distance as(dH (P, Q) maxExampleMaximilian Schmahlsup inf d(p, q), sup inf d(p, q)p2P q2Qq2Q p2P)00Python Course on Topological Methods in Data Analysis - Day 229/ 36

Stability IIDefinitionLet B (I ) 2A and C (J ) 2B be barcodes.A -matching between them consists of subsets A0 A, B 0 B and a bijectionf : A0 ! B 0 such that:Maximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 230/ 36

Stability IIDefinitionLet B (I ) 2A and C (J ) 2B be barcodes.A -matching between them consists of subsets A0 A, B 0 B and a bijectionf : A0 ! B 0 such that:I If 2/ A0 , 2/ B 0 , then length(I ), length(J ) .I If f (I ) J , then the endpoints of I and J are within of eachother.Maximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 230/ 36

Stability IIDefinitionLet B (I ) 2A and C (J ) 2B be barcodes.A -matching between them consists of subsets A0 A, B 0 B and a bijectionf : A0 ! B 0 such that:I If 2/ A0 , 2/ B 0 , then length(I ), length(J ) .I If f (I ) J , then the endpoints of I and J are within of eachother.We define the bottleneck distance asdb (B, C ) inf{ 0 there exists a -matching between B and C }.Maximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 230/ 36

Stability IIDefinitionLet B (I ) 2A and C (J ) 2B be barcodes.A -matching between them consists of subsets A0 A, B 0 B and a bijectionf : A0 ! B 0 such that:I If 2/ A0 , 2/ B 0 , then length(I ), length(J ) 2 .I If f (I ) J , then the endpoints of I and J are within of eachother.We define the bottleneck distance asdb (B, C ) inf{ 0 there exists a -matching between B and C }.ExampleFMaximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 230/ 36

Stability IIITheorem (Cohen-Steiner et al.)Let P, Q Rk be finite subsets and let B(P) and B(Q) be the barcodes of thepersistent homology of their Rips filtrations. Thendb (B(P), B(Q)) dH (P, Q).Maximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 231/ 36

Persistence diagramsDefinitionIf B (I ) 2A is a barcode, we define its persistence diagram asdgm(B) ((inf I , sup I )) 2A R2 .Maximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 232/ 36

Persistence diagramsDefinitionIf B (I ) 2A is a barcode, we define its persistence diagram asdgm(B) ((inf I , sup I )) 2A R2 .Maximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 232/ 36

ContentsHomology of SpacesHomology of DataRipserApplicationsMaximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 233/ 36

RipserC library by Ulrich Bauer to compute barcodes of Rips filtrationsMaximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 234/ 36

RipserC library by Ulrich Bauer to compute barcodes of Rips filtrationsAlso implemented in scikit-tda:from ripser import ripserfrom persim import plot diagrams# Compute the persistence diagram of a Rips filtration# data is numpy array of points in euclidean space or a distance matrixdgm ripser(data, maxdim 1, thresh inf, distance matrix False)# Plot the persistence diagramplot diagrams(dgm, show False)Maximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 234/ 36

ContentsHomology of SpacesHomology of DataRipserApplicationsMaximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 235/ 36

What can we do with persistent homology?Maximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 236/ 36

What can we do with persistent homology?I Infer something about the shape of a data set (e.g. Cosmic MicrowaveBackground, Edelsbrunner et al.)Maximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 236/ 36

What can we do with persistent homology?I Infer something about the shape of a data set (e.g. Cosmic MicrowaveBackground, Edelsbrunner et al.)I Infer something about the complexity of a data set (e.g. lung disease detection,Brodzki et al.)Maximilian SchmahlPython Course on Topological Methods in Data Analysis - Day 236/ 36

What can we do with persistent homology?I Infer something about the shape of a data set (e.g. Cosmic MicrowaveBackground, Edelsbrunner et al.)I Infer something about the complexity of a data set (e.g. lung disease detection,Brodzki et al.)I Use it as an additional layer in machine learning methodsÄMaximilian SchmahlerPython Course on Topological Methods in Data Analysis - Day 236/ 36

Homology in a nutshell d(X) is the number of d-dimensional holes in X. Maximilian Schmahl Python Course on Topological Methods in Data Analysis - Day 2 10/ 36. . Maximilian Schmahl Python Course on Topological Methods in Data Analysis - Day 2 13/ 36 m piu.tn. Matrix reduction I Definition Let v (v 1,.,v m) T be a column vector. We define