Methods For Network Visualization And Gene Enrichment Analysis

Transcription

Methods for networkvisualization and geneenrichment analysisJuly 17, 2013Jeremy MillerScientist Ijeremym@alleninstitute.org

Outline Visualizing networks using R Visualizing networks using outside programs– Using VisANT to graph modules Gene enrichment analyses using R Gene enrichment analyses using outside programs

Visualizing networks using R First, run WGCNA and assign modules– This process involves creating a dendrogram– A dendrogram shows the topology of a network but doesn’t directly showgene expression relationships or module correlations. But what do these modules represent?Which modules are distinct? Do some have similar patterns?

Visualizing module relationships Calculate the module eigengenesME moduleEigengenes(DATA,colors MODULES) eigengenes– Many of the visualization and enrichment strategies require this value.– Think of this as a representative value for each module. Module eigengenes can be visualized in dendrograms just like genes:distance 1-(1 cor(ME 1A,use "p"))/2cluster hclust(as.dist(distance),method "a")plot(cluster, [parameters])Modules in the same branch contain geneswith relatively similar expression patterns But note that the genes within a modulehave higher co-expression than genesbetween similar modules.

Visualizing module relationships A multidimensional scaling plot can show similar information aboutmodule relationships in two dimensions– This plots the first two principal components of the distance matrixMDS cmdscale(as.dist(distance),2)plot(MDS, col MODULES, [parameters])Modules that group together on thisplot tend to contain genes with similarexpression patterns.For example, in a WGCNA study ofAlzheimer’s disease, we found fourmain groups of modules, most ofwhich could be distinguished from oneanother based on gene expression,enrichment analyses, etc.

Visualizing module relationships The module eigengenes can be directly plotted using graphs– In this case, each bar is a sample.– If you first order these samples in a biologically-meaningful way, you canlearn a lot about a module just by plotting the eigengene!barplot(ME moduleX, [parameters])# (Also verboseBarplot in the WGCNA library)unsorted, no formattingsorted, formatted, and labeled ME from HBAWith some minor adjustments, we can find modulesrelated to brain subregions just by looking!

Other network visualizations using RThere are several other types of standard network visualizations which Iwill not discuss in detail here.Heat maps: there aremany standard waysof making these plots.Scatter plots: these areparticularly useful forbetween-study analyses.Box plots: these areuseful for displayingdifferential expression.

Adapting visualizations to best fit your data Data visualization is critical so it is important to make sure yourvisualizations are appropriate for the analysis at hand.– Compact summarizations of complex data sets can be helpful!– So can properly ordering samples!GCLSGZ

Visualizing networks using outside programsThere are many useful programs outside R for visualizing networks: Programs with both enrichment and visualization components– ChiliBot, Ingenuity, STRING, etc.– (These will be discussed with the enrichment analysis section.) VisANT– This is available for both PC and Mac, but I find that the PC versionworks much better (particularly for reading in the data).–(Note that I have not tested this on a Mac for 2 years, so it may be better now.) Cytoscape– This has been discussed in detail already and is another option forplotting modules.– See “exportNetworkToCytoscape” function in the WGCNA library.

Visualizing networks using outside programs - VisANTSummary of VisANT steps: Download and install VisANT(http://visant.bu.edu/) and Java Create a file with your interactions inthe appropriate format Read the interaction data into VisANT Format your interaction map in VisANTas desiredA step by step tutorial for how to useVisANT (with screen shots) is available,either on the course website, or ssionNetwork/WORKSHOP/

Visualizing networks using outside programs - VisANTThere are three ways of making theinput file for VisANT:1. On your own. It must be in theproper format.2. Using “exportNetworkToVisANT”in the WGCNA library3. Using “visantPrepOverall” (whichis included in the meta-analysisdiscussed yesterday).From there you just copy and paste theinteractions you want to show directlyinto VisANT.Genes involved in the interactionThe number 0Character vectorrepresenting the edge(M1003 orange).(Note that node color is setwithin VisANT itself.)A numeric value for sorting interactions.In this case, topological overlap is used(not strictly necessary, although anumber between 0 and 1 must be here).

Visualizing networks using outside programs - VisANTThe best way to learn how to use VisANT is just to try it!Some helpful hints:(4) Choose one of the “relaxing”options to make the nodes group in ahub-and-spoke manner. After this, youwill have to move nodes manually (5) This will allow you to displayonly certain connections. Use thisoption LAST, if you use it at all.(6) Finish up by saving your fileAND by saving your image (SVG filewill give the highest-quality image).(1) FIRST, copy data here and click “Add”(3) After highlighting the nodes ofinterest, changes the color and sizeby clicking “Nodes” “Properties”(2) Turn off “fine arts”

Gene enrichment analyses using RThere are two basic methods for gene enrichment analysis in R:1. Enrichment for published or user-defined lists–––userListEnrichment in the WGCNA libraryThis function performs hypergeometric tests for all of your modulesagainst any user-defined lists.It also includes pre-loaded lists from brain, blood, and stem cell data sets. Cell type markers from many publicationsGenes from modules found in several WGCNA analysesKnown and predicted lists of disease genesLists of genes enriched in particular brain areasImmune-related gene lists2. Gene Ontology Enrichment––––GOenrichmentAnalysis in the WGCNA libraryenrichGO in the clusterProfiler libraryMany more iews.html# GOIn my experience the results from DAVID/EASE are betterAn R function combining all of the above will be available soon!

Gene enrichment analyses using outside programsThere are many programs available for annotating modules!I will be discussing a small subset of these programs: EASE:http://david.abcc.ncifcrf.gov/ease/ease1.htm ToppFun:http://toppgene.cchmc.org/ ChiliBot:http://www.chilibot.net/ WebGestalt: http://bioinfo.vanderbilt.edu/webgestalt/ Ingenuity:http://www.ingenuity.com/ GSEA:http://www.broadinstitute.org/gsea/index.jsp UGET:http://genome.ucla.edu/projects/UGET STRING:http://string.embl.de/ Galaxy:https://main.g2.bx.psu.edu/

EASE – A GO (etc.) enrichment analysis toolEASE (Enrichment Analysis Systematic Explorer) is a standaloneversion of DAVID that can be used to find enrichment of GO, KEGG,etc. in a list, given both the test list and the reference list.I find it useful tosave the outputto an excel file.This box allows you to choose which databasesto search for enrichments (GO, KEGG, etc.)Typical output from EASE:

ToppFun - A GO (etc.) enrichment analysis toolToppFun is a userfriendly website thatprovides gene listannotations basedon enrichments ofGO, and severalother features.There are tools forcandidate geneprioritization hereas well.

ChiliBot – A literature search toolChilibot will take a list of up to 50 genes, search the literature for cooccurrences of these terms, then output an interactive plot of theliterature connections between these terms. For example:If you click on theconnection betweentwo genes it will showyou text from articleswhere both terms arepresented.Word of caution:since this is aliterature search, youshould check thereferences carefully!

WebGestalt – A toolkit of enrichment analysesWebGestalt can perform several enrichment analyses from a relativelystraightforward web-based interface. The output is not as user-friendlyas EASE, but the results can sometimes be more informative.

Ingenuity – A hand-curated list of interactionsIngenuity is a comprehensive program for both annotation andvisualization. It requires training and an expensive subscription to use.An example outputplot looks like this:

GSEA – A powerful method for gene enrichmentGSEA takes as input a sorted list of all genes with respect to aparameter (i.e., correlation with age, module membership, etc.),and asks whether an a priori defined set of genes is significantlyenriched at one end of this distribution.GSEA is very powerfulsince it uses all of thedata, not just the bestsubset for enrichments,but the software has ahigh learning curve andis very particular aboutdata formats.

UGET – A tool for finding other co-expressed genesUGET isn’t an enrichment analysis method itself, but it can help find othergenes correlated with your genes of interest across thousands of microarraysamples in the Celsius database (i.e., “guilt by association”). This could beuseful either before or after enrichment analysis, depending on your goal.For example, it can find other geneshighly correlated with ribosomalproteins, and likely involved intranslational machinery:

STRING – A tool for finding networks of 1 geneSTRING takes a single gene as input and returns a list and a plotof predicted functional partners based on several lines of evidence.STRING isn’t an enrichment analysis method itself, but is still veryuseful, particularly for following up on hub genes.

Galaxy – A tool for just about everythingGalaxy can be used for enrichment analysis, andjust about any other bioinformatics purpose. Ithas a rather steep learning curve, but there areseveral tutorials to get you started.

Summary Once you have your network, it is useful to visualize it. Once you have your modules, it is useful to visualizeand annotate them to get a better understand of whatthese gene lists represent. There are many different ways of visualizing andannotating modules, both within R and by usingadditional programs. Many of these methods will work with any gene list,regardless of origin (not just modules).

AcknowledgementsSteve HorvathDan GeschwindMike HawrylyczPeter LangfelderMike OldhamWe wish to thank the Allen Institute founders, Paul G. Allen and Jody Allen, for their vision,encouragement, and support.Any questions?

EASE - A GO (etc.) enrichment analysis tool EASE (Enrichment Analysis Systematic Explorer) is a standalone version of DAVID that can be used to find enrichment of GO, KEGG, etc. in a list, given both the test list and the reference list. Typical output from EASE: I find it useful to save the output to an excel file.