GenomeStudio Data Analysis Software

Transcription

Data Sheet: Systems and SoftwareGenomeStudio Data Analysis SoftwareIllumina has created a comprehensive suite of data analysis tools to support a wide range ofgenetic analysis assays. This single software package provides data visualization and resultsanalysis for all Illumina assay platforms.GenomeStudio Software Highlights Broad Set of ToolsAnalyze data generated from all Illumina platforms,sequencing and array.Figure 1: GenomeStudio Software for IntegratedAnalysis of Data from All Illumina PlatformsHiSeq 2500 SystemiScan SystemMiSeq System Integrated AnalysisCombine data from more than one assay type in thesame project. Sophisticated PlatformUse high-performance algorithms and statistics calculationsfor a wide range of supported applications.SequenceDataBeadArrayDataSequenceData Open ArchitectureLeverage plug-ins and application programming interfaces(APIs) to facilitate data export for secondary or tertiary analysis.CSMIntegrated FrameworkGenomeStudio software consists of several assay-specific modules,integrated into a single platform. The common framework providesa set of intuitive graphical user interface (GUI) and data visualizationfeatures for the control and display of results generated by individualmodules. Thorough understanding of the massive amounts of datagenerated by Illumina assays may require multi-modal examinationand integration of information gleaned from a 10,000-foot view allthe way down to a fine-grained single-feature view. GenomeStudioframework displays results at all scales to enable researchers toeffectively examine high-resolution genome-wide data.Project Creation WizardGenomeStudio software makes it easy to create new projects andidentify input data locations with an intuitive wizard interface. Oncea project is created, users can easily visualize and display all dataassociated with each experiment type.Users with the optional Infinium LIMS, sample tracking, and roboticautomation control can take advantage of GenomeStudio softwareintegration for increased efficiency in overall project management forMethylation ProfilingRSRSProteIntera in-NucleicctionAnalyAcidsisDSGan ened m ExiR presNA sPr ionofilingginypotenGringsteCluloidlypPoIllumina leads the industry with a broad spectrum of innovativeand powerful genetic assays. The Illumina GenomeStudio DataAnalysis Software supports researchers’ equally diverse needs fordata analysis. Scientists using any of Illumina’s platforms—GenomeAnalyzer , MiSeq , or HiSeq 2500 systems for next-generationsequencing; HiScan or iScan System for BeadArray technology—canuse the highly visual and intuitive GenomeStudio software for primaryanalysis of data generated with Illumina assays (Figure 1).GDi enesc Eov xpery ressionIntroductioningncueeqSA isDN alysAnGXGTPCConsists of several modules (bottom) that support the analysis of datagenerated using any Illumina platforms (top). Data from separate modulescan be combined into a single project (grey arrows).all Infinium genotyping assays. These systems, custom designed forIllumina workflows, allow labs to maximize their throughput.Data VisualizationGenome-wide orientation and broad trends can be quickly seen whendata are displayed in the Illumina Genome Viewer (IGV). Chromosomeor region-level trends of sequence or array data are visualized in theIllumina Chromosome Browser (ICB). The ICB can be used to identifystructural aberrants, gene expression levels, protein binding sites, ormethylation of CpG islands in promoter regions. For higher resolutionanalysis, particularly with sequencing data, researchers can zoom in to

Data Sheet: Systems and SoftwareTable 1: GenomeStudio Display OptionsGlobal VisualizationIllumina Genome Viewer (IGV)Illumina Chromosome Browser (ICB)Illumina Sequencing Viewer (ISV)GraphsDendrograms and Clustering AnalysisApplication-Specific Analysis ModulesThe modular nature of GenomeStudio software enables powerful assayspecific analysis and allows individual applications to be updated oradded as necessary. GenomeStudio modules cover the spectrum ofIllumina applications, including microarray-based genotyping, geneexpression, methylation, and immunoprotein assay analysis, as well asDNA sequencing, chromatin immunoprecipitation sequencing (ChIPSeq), and mRNA sequencing.Heat MapsArray-Based ApplicationsScatter PlotsGenotyping (GT) ModuleHistogramsGenotyping data generated using the GoldenGate or Infinium assayson the iScan System are analyzed in the GenomeStudio Genotyping(GT) Module. This module uses algorithms to perform primary dataanalyses, such as raw data normalization, clustering, and genotypecalling. Data quality is rapidly confirmed with internal controls andother QC functions. Individual SNPs can be viewed as GenoPlotsand edited if necessary (Figure 2). Genotype summary statistics andresults are automatically reported and exportable for use in third-partydownstream analysis software.Line GraphsBox PlotsFrequency PlotsPie ChartsTablesSamples TableSequence or Lane TablesSNP TableAlleles TableProbe TableGene, Exon, or Junction TablesPlus other assay results tablessee single base calls in the Illumina Sequence Viewer (ISV) to preciselyidentify individual SNPs, CpG loci, splice junctions, or cSNPs.To find trends across samples, markers, or different assays, theGenomeStudio framework provides a wide range of graphicalplotting and display tools (Table 1). Researchers can choose todisplay data as line graphs, histograms, scatterplots, pie charts,dendrograms, box plots, frequency plots, or heat maps. These toolsare used to easily compare samples from different experimentalconditions in order to identify differential expression levels, proteinexpression, or methylation levels.Structural variation is identified using the same markers as genotypingand intensity-only probes with algorithms to calculate loss ofheterozygosity (LOH) and abnormal copy numbers (CNVs). Identifiedstructural variants can be bookmarked (with auto-bookmarkingfeatures) and viewed in the context of the entire chromosomes withthe ICB or IGV. In addition, GenomeStudio software provides dataplots displaying CNV values, log R ratios, B-allele frequencies, andbookmarks for one or more samples within the IGV.Polyploid Clustering (PC) ModuleGenomeStudio now has the ability to analyze data obtained fromgenotyping of polyploid organisms. The Polyploid Clustering (PC)Module implements two well-known classic density-clusteringalgorithms, OPTICS and DBSCAN, to call as many clusters as desired.Figure 2: Genotyping Module GenoPlotWhen trends or interesting regions are identified with graphicalanalysis tools, looking at individual data points becomes essential.GenomeStudio software supports this single-site level of analysis ofindividual SNP genotypes, splice junctions, gene or exon expressionlevels, CpG loci methylation status, or protein binding site occupationlevels with table displays. Table views are customizable for sortingand to show or hide various data categories. Table data can also beexported in formats compatible with other downstream analysis tools.Controls DashboardIllumina array-based assays, including Infinium, GoldenGate Genotyping or Methylation Profiling, Direct Hyb, or DASL assayscontain internal sample-dependent and sample-independent controlsso researchers have confidence that they are producing the highestquality data. The performance of all controls can be easily monitoredwith the GenomeStudio software integrated controls dashboard.The graphical display of results in GenomeStudio GT module is a GenoPlotwith data points color coded for the call (red AA, purple AB, blue BB).Genotypes are called for each sample (dots) by their signal intensity (Norm R,y-axis) and Allele Frequency (Norm Theta , x-axis) relative to canonical clusterpositions (dark shading) for a given SNP marker.

Data Sheet: Systems and SoftwareFigure 3: Polyploid Clustering Moduleare analyzed using the Gene Expression (GX) Module. The resultsgenerated using this module provide meaningful conclusions fromthe continuous expression data on gene-level statistical analysistools. Differential expression analysis can be visualized as line plots,histograms, dendrograms, box plots, heat maps, scatter plots,frequency plots, pie charts, samples tables, and gene clusteringdiagrams (Figure 4). Simplified data management tools includehierarchical organization of samples, groups, group sets, and allassociated project analysis.Methylation (M) ModuleA locus analyzed within the Genotyping Module (a), which assumesdiploidy, is compared to the same locus with samples assigned to clustermembership within the Polyploid Clustering Module (b). Poly call rate forthis locus is higher in (b) than in (a) due to the higher number of sampleswith cluster assignment. The Polyploid Clustering Module does not havean a priori assumption of the shape of clusters, allowing for the detectionof differences in allele dosing as well as hybridization efficiency. For thisreason, the Polyploid Clustering Module does not call genotypes, providingresearchers with the flexibility to determine genotype assignment based onthe known biology of the organism.DNA methylation data from scanned microarray images collected fromthe iScan System are analyzed with the Methylation (M) Module.This module calculates methylation levels (beta values) and analyzesdifferences between experimental groups. CpG island methylationstatus is visualized across the genome with the IGV and ICB. Resultsfrom single-site resolution data are visualized as line plots, bar graphs,scatter plots, frequency plots, pie charts, histograms, dendrograms,box plots, or heat maps. Methylation data can also be combined withgene expression profiling experiments within the same GenomeStudioproject to study any correlation between levels of methylated sites(beta values) and differential gene expression levels (p-values).Sequencing ApplicationsThe module (Figure 3) intentionally does not call polyploid genotypes.Instead, it allows the user to factor in experimental design andsample biology, and combine the population-level cluster analysis tointelligently call genotypes in a workflow outside of GenomeStudio.The flexibility built into the module allows for clustering of one, several,or all SNPs simultaneously. Once parameters are selected, they can besaved for automated clustering of new sample sets.Gene Expression (GX) ModuleData from Direct Hyb, DASL, and Whole-Genome DASL geneexpression profiling assays generated using the iScan SystemDNA Sequencing (DS) ModuleDNA sequencing data generated using the Genome Analyzer orHiSeq instruments and software tools can be analyzed to discoverand confirm SNPs and chromosomal breakpoint regions in the DNASequencing (DS) Module. Visualization tools display consensusreads in the reassembled genome and indicate SNPs with coloredletters (Figure 5). Newly discovered SNPs can be exported to use incustomized iSelect genotyping array designs.Figure 5: SNPs Identified From Aligned ReadsDisplayed in DNA Sequencing ModuleFigure 4: Gene Expression Module Heat MapUsing the heat map function in the GX Module allows easy visualization andanalysis of large amounts of data. This heat map dendrogram clusters rows(Target ID) and columns (Differential Scores).Aligned sequencing reads (yellow and purple blocks) are stacked on areference genome in the ICB. SNPs are identified with red charactersand in the called SNPs data track. Two SNPs are highlighted with a rulerindicating the position of the called SNPs in the aligned reads relative tothe reference genome.

Data Sheet: Systems and SoftwareFigure 6: RNA Sequencing Module Sequencingand Alleles Tabledisplayed as tables or plots within GenomeStudio software (Figure6). Genome views display consensus reads in the transcriptome byaligning reads to known abundant sequences and splice junctions.Coding SNPs and splice variants are identified and confirmed visuallywith single-base resolution in the ICB.Integrated Analysis From Multiple ApplicationsThe research community is taking advantage of the multiple types ofassays and platforms offered by Illumina to perform a varietyof genetic variation analysis studies. GenomeStudio software supportsthese powerful integrative studies with the ability to combine datasets from different assay types in a single project requiring minimaldata handling and preparation by researchers. For example, data frommethylation and gene expression assays can be analyzed togetherin a single GenomeStudio GX module project table where combinedstatistics are shown and integrated plots generated.Sophisticated Analysis AlgorithmsGenomeStudio software is part of the overall molecular biologyinformatics platform supporting all Illumina assays. Primary dataanalysis involves several algorithms that are either integrated withGenomeStudio software or in the upstream Pipeline software. Primarydata analysis functions for genotype calling, CNV identification,sequence read alignment, exon and splice junction counting, and SNPcalling tools are provided by algorithms such as GenCall, GenTrain,cnvPartition, ELAND, and CASAVA.Sequence information for more than 130 million reads from a flow cell isaccessible and viewable in real-time in the Sequences Table of the RNASequencing Module (left). All called cSNPs can be individually explored indepth with the Alleles Table (right).ChIP Sequencing (CS) ModuleData from whole-genome chromatin immunoprecipitation sequencingexperiments performed using the Genome Analyzer or HiSeqinstruments and software can be parsed to the GenomeStudio ChIPSequencing (CS) Module to create global binding site maps of DNAassociated proteins. Differential binding levels between experimentalgroups can be identified by comparing sequences, regions, and peaksin table or chromosome views.GenCallBy comparing the two-color signal intensities produced by a BeadChipmarker to canonical genotype clusters, genotypes can be called. Themillions of calls resulting from Infinium BeadChip assays are madequickly and reproducibly for display in GenomeStudio software withthe integrated GenCall algorithm. Cluster position identification, whennecessary, is performed by the GenTrain algorithm1.RNA Sequencing (RS) ModulecnvPartitionData generated from mRNA sequencing experiments using theGenome Analyzer or HiSeq instruments and software tools aredisplayed in the RNA Sequencing (RS) Module as expression levelsand variants discovered. By aggregating data from the software, theRS Module is able to count the abundance of reads falling withinspecific exons, genes, and splice junctions. Data are then graphicallycnvPartition uses Illumina BeadChip genotyping array data (signalintensities and genotype calls) to identify regions of unexpectedcopy number and calculate the copy numbers of those regionswith confidence scores. The copy number values are then used tocreate CNV regions and bookmarks in GenomeStudio software forvisualization of aberrant chromosomal regions across the genome.Table 2: Minimum GenomeStudio System RecommendationsParameterSequencing data analysisMicroarray data analysisMicroarray and sequencing data analysisCPU Speed2.0 GHz or greater2.0 GHz or greater2.0 GHz or greaterProcessor64-bit64-bit64-bitMemory8 GB or more8 GB or more8 GB or moreHard Drive250 GB or larger250 GB or larger250 GB or largerOperating SystemWindows XP, Vista, or 7Windows Vista or 7Windows XP, Vista, or 7

Data Sheet: Systems and SoftwareELAND and CASAVASummarySoftware outputs processed sequence data that GenomeStudiomodules display graphically. Single or paired-end sequence readalignments to a reference sequence are performed by ELAND.GenomeStudio software provides a diverse and integrated platformfor data analysis of Illumina assays. Researchers doing sequencingor array experiments use the same powerful software package. Thegraphical display of results generated from primary data analysis withassay-specific modules supports high-level and in-depth views ofwhole-genome variation. Integrated analysis is directly supported bycombining data from different modules into a single project.The CASAVA software package performs post-sequencing analysis(including SNP allele calls and counts of exons, genes, and splicejunctions from RNA samples) of data from reads aligned to thereference genome.Open ArchitectureGenomeStudio software offers a flexible and open architecture foreasy integration with third-party applications and tools. Availableapplication programming interfaces (API) ensure that GenomeStudiosoftware serves as a robust core of any analysis workflow.GenomeStudio software offers an API for each module thatpermits users to create report plug-ins (dlls) for parsing datafrom GenomeStudio software to downstream analysis tools. Theillumina Connect third-party partnership program encouragesinformatics software vendors and the open source communityto leverage this open architecture. This program has led toseveral custom report plug-ins created and supported by variousillumina Connect partners2.Ordering InformationAccess to appropriate GenomeStudio modules is included withinstrument purchase. Licenses for additional users and applicationsmay be purchased separately. Learn more about this felxibleinformatics solution and third-party software tools at mina.com/documents/products/technotes/technote gentrain2.pdf2.www.illumina.com/illuminaconnect

Data Sheet: Systems and SoftwareOrdering InformationProductGenomeStudio DNA Sequencing ModuleGenomeStudio ChIP-Seq ModuleGenomeStudio RNA Sequencing ModuleGenomeStudio Genotyping ModuleGenomeStudio Gene Expression ModuleGenomeStudio Methylation ModuleGenomeStudio Sequencing Bundle (Includes DS, RS, CS Modules)GenomeStudio Microarray Bundle (Includes GT, GX, M Modules)GenomeStudio Software Integrated System Bundle (All Modules)Seat LicenseCatalog No.Single SeatSW-600-1001Five SeatSW-600-5001Single SeatSW-500-1001Five SeatSW-500-5001Single SeatSW-700-1001Five SeatSW-700-5001Single SeatSW-100-1001Five SeatSW-100-5001Single SeatSW-200-1001Five SeatSW-200-5001Single SeatSW-300-1001Five SeatSW-300-5001Single SeatSW-820-1001Five SeatSW-820-5001EnterpriseSW-820-2001Single SeatSW-810-1001Five SeatSW-810-5001EnterpriseSW-810-2001Five SeatSW-800-5001EnterpriseSW-800-2001Illumina 1.800.809.4566 toll-free (U.S.) 1.858.202.4566 tel techsupport@illumina.com www.illumina.comFor research use only 2008–2013 Illumina, Inc. All rights reserved.Illumina, IlluminaDx, BaseSpace, BeadArray, BeadXpress, cBot, CSPro, DASL, DesignStudio, Eco, GAIIx, Genetic Energy,Genome Analyzer, GenomeStudio, GoldenGate, HiScan, HiSeq, Infinium, iSelect, MiSeq, Nextera, NuPCR, SeqMonitor, Solexa,TruSeq, TruSight, VeraCode, the pumpkin orange color, and the Genetic Energy streaming bases design are trademarks or registeredtrademarks of Illumina, Inc. All other brands and names contained herein are the property of their respective owners.Pub. No. 970-2008-038 Current as of 21 October 2013

analysis, particularly with sequencing data, researchers can zoom in to GenomeStudio Data Analysis Software Illumina has created a comprehensive suite of data analysis tools to support a wide range of genetic analysis assays. This single software package provides data visualization and results analysis for all Illumina assay platforms.