Next Generation Sequencing In Genetic Diagnostics Alan . - HSTalks

Transcription

Next Generation Sequencingin Genetic DiagnosticsAlan Pittman, PhDNext Generation Sequencingin Genetic DiagnosticsAlan Pittman, PhDInstitute of NeurologyFaculty of Brain SciencesUniversity College London1Outline of lecture Sanger sequencing Next generation sequencing technologies Target enrichment Analysis of NGS data Applications of NGS technology in genetics diagnostics2Sanger sequencing3The screen versions of these slides have full details of copyright and acknowledgements1

Next Generation Sequencingin Genetic DiagnosticsAlan Pittman, PhDSanger sequencing (2) Invented by Fred Sanger in 1977 Cycle sequencing One reaction one sequence Sequence 800 base pairsper reaction Accurate (99.999%) but very slowhttp://en.wikipedia.org/wiki/DNA sequencing4Human genome projectHuman genome project (1990-2003) 3 billion base pairs long All done by Sanger sequencing Unravelled human genome sequenceto drive genetics researchWe can now achieve this amount of sequencingin as little time as one day5Next generation sequencing technologies6The screen versions of these slides have full details of copyright and acknowledgements2

Next Generation Sequencingin Genetic DiagnosticsAlan Pittman, PhDThe next generation of DNA sequencing (NGS) Move from Sanger DNA sequencingto a more high-throughput approach Sequence DNA molecules in parallelrather than sequence one molecule at a time Driven by new technological advances7The next generation of DNA sequencing (NGS)(2) Technological advances leading to a decreasein the cost of DNA sequencing Since the end of 2007, the cost has droppedat a rate faster than that of Moor’s law8http://www.genome.gov/sequencingcosts/The next generation of DNA sequencing (NGS)(3) Development of new NGS methods began 10 years agowith 454 Pyrosequencing Solexa/Illumina was developed in 2005 DNA sequencingthroughput jumped10 ordersof magnitudeA decades perspective on DNA sequencing technology;Elaine R. Mardis. Nature 470, 198–203 (10 February 2011)9The screen versions of these slides have full details of copyright and acknowledgements3

Next Generation Sequencingin Genetic DiagnosticsAlan Pittman, PhDIllumina (Solexa) sequencing Illumina is the most common sequencer ( 80%)so will use this technology for the examplesin the remainder of the talk Understand the sequencing-by-synthesis (SBS) principle10Illumina (Solexa) sequencing (2)Step 1: DNA library construction In the lab prepare the DNA sample for sequencingSheared Sample DNA fragmentsEnd - RepairA-tailingAdapter ligationDNA library11www.Illumina.comIllumina (Solexa) sequencing (3)Step 2: cluster generation Hybridise DNA library to flowcell Perform bridge PCR to generate clusters Now ready to preform sequencingIllumina SBS technologyReversible terminator chemistry foundationDNAIllumina www.Illumina.comThe screen versions of these slides have full details of copyright and acknowledgements4

Next Generation Sequencingin Genetic DiagnosticsAlan Pittman, PhDIllumina (Solexa)sequencing (4)Step 3: sequencing-by-synthesis Sequence nucleotides1 cycle at a time 4 bases 4 different dyes Modified nucleotidesmean only one base can beincorporated at a time Camera images the colouredbases on the surfaceof the flowcell every cycleIllumina/Solexa – reversible terminatorsIncorporateall fournucleotides,each labelwith adifferent dyeWash, fourcolourimagingCleave dyeand terminatinggroups, wash Each image is convertedto a nucleotide base call13Mitchel and Metzker, 2010Repeat cyclesTop:Bottom:CATCGTCCCCCCIllumina sequencers For genome scale sequencingHIllumina HiSeq– Generates approximately 1Tb of dataper machine runIllumina HiSeq– 1,600,000,000 short DNA reads For smaller scale sequencingHIllumina MiSeq– Generates approximately 5Gb per runIllumina MiSeqwww.Illumina.com14Target enrichment15The screen versions of these slides have full details of copyright and acknowledgements5

Next Generation Sequencingin Genetic DiagnosticsAlan Pittman, PhDTarget enrichment (2) Why target enrichment? The haploid human genome is 3 billion base pairs long There are 20,000 genes in the human genome Often we are only interested in the gene protein coding exonsor ‘exome’ The coding portion of the genome represents 1-2%of the genome More efficient to only sequence the bits we are interested in,rather than the entire genome! Costs 1,000-5,000 for a genome, but only 500- 1,000for an exome16Target enrichment (3)Genomic sample(set of chromosomes) Capture target regionsof interest with baitsNGS Kit Potential to capture severalMb of genomic regions(typically 30-60 Mb)Genomic sample(Prepped)SureSelect HYBbufferSureSelectbiotinylated RNAlibrary “baits”Hybridization Commonly used methods:Streptavidin coatedmagnetic beads‒ Illumina TruSeq‒ Agilent SureSelect‒ NimbleGen EZ17Bead captureWash beadsand digest RNAUnbound fractiondiscardedAmplifywww.agilent.comTarget enrichment (4)Prepare DNA Samples in the labIllumina TruSeq Custom Amplicon Multiplex PCRUse DesignStudio to create custom oligo capture probesflanking each region of interestCustomProbe 1Region ofinterestCustomProbe 2CAT (custom amplicon tube) Up to 384 simultaneous PCR reactionsin one tubeCAT probes hybridize to flanking regionsof interest in unfragmented gDNACustomProbe 1 For sequencing 1-30 genes at a time Useful for small, targeted applicationssuch as small gene sequencing projectsand genetic diagnosticsCustomProbe 2Extension/Ligation between Custom Probesacross regions of interestPCR adds indices and sequencing primersP7Index 1Index 2P5Uniquely tagged amplicon library readyfor cluster generation and sequencingP7 Index 1Index 2 P5www.Illumina.com18The screen versions of these slides have full details of copyright and acknowledgements6

Next Generation Sequencingin Genetic DiagnosticsAlan Pittman, PhDAnalysis of NGS data19NGS data analysis How is NGS data analysed? NGS data analysis represents the real challenge There are 4 main steps:1. Primary analysis (process the machine raw data)2. Secondary analysis (align the short read sequencesto the genome)3. Tertiary analysis (call variants in the sequence data)4. Quaternary analysis (interpret the genetic data)20Process raw sequence data (primary analysis) Standard fastq file Four lines per read:1. Sequence ID2. Nucleotide sequence3. Strand4. Per base quality scoreMachine DNA base callsGenerate sequence files21http://www.Illumina.comThe screen versions of these slides have full details of copyright and acknowledgements7

Next Generation Sequencingin Genetic DiagnosticsAlan Pittman, PhDAlignment to the genome(secondary analysis) Align short sequence reads(the fastq files)to the reference genomeShort-read sequences(FastQ format) Specialist alignment programs;BWA, NovoalignGenerate overlappingsequence contig Generates a .bamalignment fileAlign to owse/22Variant detection (tertiary analysis) Look for differences in the aligned datacompared to that of the reference genome Specialist programs: SAMTOOLS, GATKDNA Varianthttp://www.goldenhelix.com/GenomeBrowse/ End result of variant detection is a table of variantsfound in the sequenced DNA sample Standard variant call (.vcf) format23http://www.1000genomes.org/Making sense of the variant data(quaternary analysis) We need to interpret the variant data Typically there are 3 million variants detectedin any given human genome We detect 23,000 variants in an ‘exome’ What does all this genetic variation mean?VariantGENEfrequencyinformation informationPublicallyavailable databasesAnnotated variantVariant datadata for interpretation(.vcf file)Annovar annotation software24The screen versions of these slides have full details of copyright and acknowledgements8

Next Generation Sequencingin Genetic DiagnosticsAlan Pittman, PhDApplications of NGS technology25Neurogenetics Many adult onset neurodegenerative conditions,such as Alzheimer's disease, are currently untreatable Many neurodegenerative disease have a genetic cause Symptoms of some can sometimes be alleviated Research - new gene discovery/refinement givesnew insights into the molecular pathogenesis Diagnostics - diagnosing those at risk may allowfor early therapeutic intervention Need to identify carriers/individuals at risk26Neurogenetics (2) Research - new gene discovery/refinement gives new insightsinto the molecular pathogenesis Families with an unknown genetic basis Whole exome sequencing Identify shared variants between distantly related relativesHIdentify the causal mutation27Johnson et al. Nature Neuroscience. 2014The screen versions of these slides have full details of copyright and acknowledgements9

Next Generation Sequencingin Genetic DiagnosticsAlan Pittman, PhDNeurogenetic diagnosticsCan we use new NGS approaches to genetic diagnosticsof neurological disease? Diagnostics - diagnosing those at risk may allowfor early therapeutic intervention Need to identify carriers/individuals at risk I will now present two examples of the use of NGS technologyin a genetic diagnostic setting28Applications of NGS technologyin genetic diagnostics I Design and implementation of NGS diagnosticpanel for dementia using the Illumina MiSeq29Dementia 1 in 6 people will go on to get some form of dementia by the age of 80 By 2021 there will be 1 million people in the UK living with dementia Worldwide, there are currently 36 million people living with dementia The estimated cost of dementia worldwide is 380 billion‒ Alzheimer’s disease 62% - most common‒ Vascular dementia 17%‒ Mixed dementia 10%‒ Dementia with Lewy bodies 4%‒ Frontotemporal dementia 2%‒ Parkinson’s dementia 2%‒ Other 3%30http://www.zmescience.comThe screen versions of these slides have full details of copyright and acknowledgements10

Next Generation Sequencingin Genetic DiagnosticsAlan Pittman, PhDGenetic causes of dementia Alzheimer's disease - 1-5% of cases‒ PSEN1, PSEN2, APP Frontotemporal Dementia - 10% of cases‒ MAPT, GRN, VCP, TARDBP and CHMP2b, c9orf72 CJD - 15% of cases‒ PRNP Familial British dementia - 100% of cases‒ ITM2B Other:‒ Rare genetic diseases that present clinically as dementia31Design of a NGS dementia diagnostic panel Illumina TruSeq custom amplicon assay MiSeq platform 15 genes Exons only 50Kb in size( 0.001% of genome) 99% was designable 425 base pair fragments 2x250 reads v2.0 chemistry Careful designof PCR EN2TARDBPTREM2TYROBPVCPUse DesignStudio to create custom oligo captureprobes flanking each region of interestCustomProbe 1 Region ofinterestCustomProbe 2CAT (custom amplicon tube)CAT probes hybridize to flanking regionsof interest in unfragmented gDNACustomProbe 1CustomProbe 2Extension/Ligation between CustomProbes across regions of interestP7Index 1PCR adds indicesand sequencing primersIndex 2 P5Uniquely tagged amplicon library readyfor cluster generation and sequencingP7 Index 132www.Illumina.comIndex 2 P5Design of a NGS dementia diagnostic panel (2) Sequence gene exons only and intronsof known splice site mutations‒ e.g. PSEN1 geneTargetedamplicons33http://genome.ucsc.edu/The screen versions of these slides have full details of copyright and acknowledgements11

Next Generation Sequencingin Genetic DiagnosticsAlan Pittman, PhDDesign of a NGS dementia diagnostic panel (3) VCP - Some partial areas in exons NOT designablehttp://genome.ucsc.edu/ Just have to keep it in mind and get on with itH Traditional Sanger sequencing to fill in the gapsH Not ideal reallyas we want to do it all with next generation sequencing34Dementia diagnostic panel: blind study Determine feasibility of technology in clinical setting Validation of the technology 85 positive control dementia samplesand 10 unaffected controls Call mutation (or not) in each sample MUST prove technology worksWhich DNA sample carrieswhich disease causing mutation?35Dementia diagnostic panel:are the genes covered? 95% covered at least onceH 90% between 20-1500x coverage If less than 20x coverage, amplicon judged to be a failH Last exon of PRNPH Not covered in the e screen versions of these slides have full details of copyright and acknowledgements12

Next Generation Sequencingin Genetic DiagnosticsAlan Pittman, PhDDementia diagnostic panel: blind study resultsAre the blind mutations identified? For point mutations:‒ Sensitivity 98%‒ Specificity 100% APP duplication missed Partial deletion of GRN missedPSEN1 Delta a diagnostic panel: summary A targeted sequencing panel on the Illumina MiSeqhas been developed for dementia Sequences all currently known genesthat can cause dementia 98% effectiveH Not 100%. Research tool to study the genetics of dementia Can diagnose the genetic basis of dementia All findings still need to be Sanger confirmed38Applications of NGS technologyin genetic diagnostics IIDesign and implementation of NGS diagnostic panelfor Parkinson's disease using the Illumina MiSeq39The screen versions of these slides have full details of copyright and acknowledgements13

Next Generation Sequencingin Genetic DiagnosticsAlan Pittman, PhDParkinson’s disease Progressive and degenerative movement disorder of CNS Characterized by tremor, slowness of movements and rigidity 1% people will go on to get Parkinson's disease 10 Million people worldwide Cost of 25 billion in the US aloneWilliam Richard Gowers, A Manual of Diseasesof the Nervous System (1886)40Parkinson’s disease genetics 15% of people with Parkinson’s diseasehave a positive family history Parkinson’s diseasepreviously considereda non-inherited condition Both dominantand recessive 10 years research 10 genes discovered Genes in otheroverlapping syndromesHSun Ju Chung, MD. Genetics in Parkinson’s Disease.J Korean Med Assoc. 2011 Jan; 54(1): 70-7841Design of a Parkinson’s diseaseNGS diagnostic panel Illumina TruSeq custom amplicon MiSeq 15 genes Exons 98% designable 425 bp amplicon size 2x250 reads v2.0 chemistry 262 amplicons; 241 targets Avoid SNP where 07PLA2G6CCH1THSPRATP13A2GBA‘Typical causal genes’‘Atypical causal genes’Risk Factor (5-fold)42The screen versions of these slides have full details of copyright and acknowledgements14

Next Generation Sequencingin Genetic DiagnosticsAlan Pittman, PhDValidating Parkinson’s diseasediagnostic panel: blind study Positive control samples 49 Variants in genes 62‒ 6 exonic rearrangements‒ 56 point mutations No exonic gene rearrangement could be detected –still requires additional screening technique Sensitivity for point mutations 89% Discovered 113 rare variants, of these 105 had coverageof 20 reads The remainder need to be Sanger sequencing confirmed43Parkinson's disease diagnostic panel:blind study resultsReasons for failing to detect mutations: PINK1 exon 1 failed amplification PARK2 deletion – reads carrying this 40bp deletion NOT aligned– consequence, mutation in trans was called homozygousrather than wse/DeletionParkinson’s disease NGS diagnostic panel:summary We have developed a targeted sequencing panelon the Illumina MiSeq Screens all currently known genes that can causeParkinson's disease 89% effective – needs a few design tweaks New analysis software to better pick uplarge insertions/deletions Research tool to study the genetics of Parkinson's disease Can diagnose the genetic basis of Parkinson's disease45The screen versions of these slides have full details of copyright and acknowledgements15

Next Generation Sequencingin Genetic DiagnosticsAlan Pittman, PhDThank you!4647The screen versions of these slides have full details of copyright and acknowledgements16

Illumina TruSeq Custom Amplicon Multiplex PCR Up to 384 simultaneous PCR reactions in one tube For sequencing 1-30 genes at a time Useful for small, targeted applications such as small gene sequencing projects and genetic diagnostics Target enrichment (4) Prepare DNA Samples in the lab www.Illumina.com P7 Index 1 Index 2 P5 Uniquely tagged amplicon library ready for cluster .