Transcription
visualizing quantitative informationmartin krzywinski
outlinebest practices of graphical data designdata-to-ink ratiocartjunkcircosthe visual display of quantitative informationedward r tufte, 2001, 2nd ed
graphical displays essentialsshow the datainduce viewer to think about substance rather than methodologyencourage eye to compare different pieces of dataavoid distorting what the data representspresent many numbers in a small spacemake large data sets coherentreveal data at several levels of detail – broad overview and fine structurethe visual display of quantitative informationedward r tufte, 2001, 2nd ed
graphics reveal data and patternseach of these sets are described by the same linear modelanscombe’s quarteteach of the values below is thesame for each setnumber of pointsaverage xaverage yregression linestandard error of slopesum of squaresqresidual sum of squarescorrelation coefficientr2the visual display of quantitative informationedward r tufte, 2001, 2nd ed
graphics organize complex informationsome data sets are naturally better represented visuallyeach of these data maps portrays 21,000 numbersalthough very dense,dense the images draw attention to hot spotsdeath rate from various cancersfemalesmalesthe visual display of quantitative informationedward r tufte, 2001, 2nd ed
graphics organize dense informationlocations and boundaries,communes inof 330,000France240,000 numbersthe visual display of quantitative informationedward r tufte, 2001, 2nd ed
graphics organize dense information1,024 x 2,222 sky divisions10 grey tonespixel grey value denotesnumber of galaxies incorresponding sky regiondensity of datacommensurate with aphotograph, butquantitativethe visual display of quantitative informationedward r tufte, 2001, 2nd ed
graphics simplify complex informationTGVthe visual display of quantitative informationedward r tufte, 2001, 2nd ed
when the image is the datathe visual medium is idealf depictingford i ti multivariatelti i tdataarguably univariate andbivariate data should betabularized, withinreasonthis example shows a plotfor a case where datacannot be easilyparametrizedthe visual display of quantitative informationedward r tufte, 2001, 2nd ed
parametrization of multivariate datathe 2D plane can depicthi h dihigh-dimensioni datad tchernoff faces are dataencodings designed foreasy identification ofoutliersparameters are mappedd tohead shape, eye distance,nose and lip sizesmoothly varying datacorresponds to smoothlyvarying chernoffpopulationlthe visual display of quantitative informationedward r tufte, 2001, 2nd ed
data-to-inkdatato ink ratioproportion of graphic’s ink devoted to the non-redundant display of datai finformationti1.0 – proportion of a graphic that can be erased without loss of datainformationdata-to-ink ratio should always be maximized, within reasonthe visual display of quantitative informationedward r tufte, 2001, 2nd ed
data-to-inkdatato ink ratiohighthe visual display of quantitative informationedward r tufte, 2001, 2nd edshockingly low
data-to-inkdatato ink ratiooriginaldeleted componentsmodified to increasedata-to-ink ratiothe visual display of quantitative informationedward r tufte, 2001, 2nd ed
shrink your graphicsdense data can be depicted within asmallll area withoutith t lossloff clarityl itas long as data-to-ink ratio is highgood graphics areinformativedensemultivariatestrive to give your viewerthe greatest number of ideasin the shortest timewithith the least inkin the smallest spacethe visual display of quantitative informationedward r tufte, 2001, 2nd ed
cartjunkexcessive use of grids and patterns cause perceived vibrationsavoid hatched patterns to limit moireavoid excessive use of decorative formsthe visual display of quantitative informationedward r tufte, 2001, 2nd ed
the shimmering statisticnatural eye tremorandd densedfillpatterns produce ashimmering effectthis is annoying andtiringthe visual display of quantitative informationedward r tufte, 2001, 2nd ed
circosthere are manyy genomegbrowsers andvisualizers already available – do we reallyneed another one?communicating data visually critical forlarge data setsyp of data that obfuscatethere certain typescommon diagram formatsstandard 2D plots (2 perpendicular axes) are inadequate
scalar mappingsscalar valued mappings are common and easily handlediinputt genomici positioniti iis a scalarl iinputtwhen the output is real-valued (GC content, conservation, etc) use a histogram, line plot, scatter plotgenome position on x-axisfunction value on y-axisf :g y
genome-to-genomegenometo genome mappingsoutput scalar is often a genome position (G2G)range may beb theth same genome, or a differentdifft genomeG2G is also common, but less easily handledf : g g′genomepositiongenomeposition
drawing G2G mappings
drawing G2G mappingsGenome Res. 2003 Jan;13(1):37-45
drawing G2G mappingsGenome Res. 2003 Jan;13(1):37-45
drawing G2G mappingsGenome Res. 2005 May;15(5):629-40
drawing G2G mappingssc7sc15IIIchr04sIchr09ch
drawing G2G mappingsGenome Res. 2003 Jan;13(1):37-45
drawing G2G omics
drawing G2G /chr7paper/chr7data/030113/segmental/index.php
drawing G2G mappings
dealing with G2G mappingsreduce information content in figuresplot/colourmapl t/ ltargettt chromosome,hnott positionitif : g g ′ c′
dealing with G2G mappingsGenome Res. 2004 Apr;14(4):685-92
reduce samplingGenome Res. 2005 Jan;15(1):98-110
rearrange axes
partition data
recompose axis layout – circos
circoswritten in PerlApache-styleconfiguration fileplain text data inputpPNG output
G2G in circosdisplay characteristicsoff mostt elementslt arecustomizabledata-drivendatadrivenformatting rulessupport for datalayers
2D data in circos
2D data in circosboxscatterline
2D data in circostilestileshistogramheatmapschr2
non-linearnonlinear scalingglobal scaling – scaleoff eachh idideogramcan be adjustede g chr 1 drawn at 8xe.g.local scaling – anyregion can be locallyexpandedd d orcontractede.g.eg 100-150 Mb onchr1 expanded 5x
non-linearnonlinear scaling
circos in comparative genomicsmouse chr3mouse chr1human chr1
circos in comparative genomicschlamydia D fingerprint mapvschlamydia D sequence
circos in comparative genomicschlamydia L fingerprint mapvschlamydia D sequence
blast of regions of chr14 vs chr22alignments drawn asribbonsibbsinglealignment
circos is flexible
scircos art
visualizing quantitative informationvisualizing quantitative information martin krzywinski. outoutlineline best practices of graphical data design data-to-ink ratio cartjunk . reveal data at several levels of detail - broad overview and fine structure the visual display of quantitative information edward r tufte, 2001, 2nd ed.