Biomed. Data Science: Unsupervised Datamining B: Community .

Transcription

Biomed. Data Science:Unsupervised Datamining B:Community DetectionMark Gerstein, Yale Universitygersteinlab.org/courses/452(last edit in spring ’21, pack #9b, final)

Unsupervised MiningGraph Analysis &Community Detection Approaches

Graph Methods & Community Detection Turn data into a graph Alternate local and global clustering while optimizing for modularity Can discover the number of clusters given a resolution Cell type detection Fast: O(nlogn) E.g. Louvain, Leiden Community detectionImage reference: https://sites.google.com/site/findcommunities/

Network Propagation in Biomedicine (Label propagation & Diffusion distance) Starting 2008Limitations in nearest neighbor (B)and shortest distance measures(B-D)Leverages local and globalnetwork topologyMathematically rigorousEarly methods: function predictionand gene-disease associationCurrent methods: gene ranking,subnetwork detection, gene-drugand TF-target associations,patient sample stratification, etc.Köhler, Bauer, Horn and Robinson (2008)Reyna, Leiserson, and Raphael (2018)Hofree, Shen, Carter, Gross, and Ideker (2013)4

Dimensionality Reduction& Spectral MethodsOutline & Papers Based on affinity matrixPCA/SVDExtensions: biplot, RCA, CCA .Related papers– O Alter et al. (2000). "Singular value decomposition for genomewide expression data processing and modeling." PNAS 97: 10101– Langfelder P, Horvath S (2007) Eigengene networks for studyingthe relationships between co-expression modules. BMC SystemsBiology 2007, 1:54– Z Zhang et al. (2007) "Statistical analysis of the genomicdistribution and correlation of regulatory elements in the ENCODEregions." Genome Res 17: 787– TA Gianoulis et al. (2009) "Quantifying environmental adaptation ofmetabolic pathways in metagenomics." PNAS 106: 1374.

Unsupervised MiningCommunity DetectionApplication to Hi-C

Network modularitynumber of edgesexpected number ofedges between i and jLectures.GersteinLab.orgwhether or noti, j are in thesame module7-adjacency matrixdegree of node iNewman Phy. Rev. E 2013

Network modularitynumber of edgesexpected number ofedges between i and jLectures.GersteinLab.orgwhether or noti, j are in thesame module8-adjacency matrixdegree of node i

Network modularitywhether or noti, j are in thesame modulenumber of edgesexpected number ofedges between i and jLectures.GersteinLab.orgadjacency matrixdegree of node i9-Optimizationproblemfor sim.annealing

image credit: Iyer et al. BMC Biophysics 201110 -image credit: Iyer et al. BMC Biophysics 2011,cartoonist John ChaseLectures.GersteinLab.org3D organization of genome

Lectures.GersteinLab.orgScience 2009, 5950: 289-2931111 -23 -Data:Rao et al. Aiden,Cell 2014Lectures.GersteinLab.orgHi-C contact map

[Yan et al., PLOS Comp. Bio. (in revision, ‘17); bioRxiv 097345]12 -Lectures.GersteinLab.orgIdentifying TADs in multiple resolutions

[Yan et al., PLOS Comp. Bio. (in revision, ‘17); bioRxiv 097345]13 -Lectures.GersteinLab.orgIdentifying TADs in multiple resolutions

Data Science: Unsupervised Datamining B: Community Detection. Unsupervised Mining Graph Analysis & Community Detection Approaches. Graph Methods & Community Detection Turn data into a graph Alternate local and global clustering while optimizing for modularity Can discover the number of clusters given a resolution Cell type detection Fast: O(nlogn) E.g. Louvain, Leiden Community detection Image .