What Are Genomics And Computational Genomics?

Transcription

What are Genomics andComputational Genomics?Ben LangmeadDepartment of Computer SciencePlease sign guestbook (www.langmead-lab.org/teaching-materials)to tell me briefly how you are using the slides. For original Keynotefiles, email me (ben.langmead@gmail.com).

GenomicsWhat do you know about genomes and genomics?Where did you hear about them?

1993

1997

SequencingWhen I started graduate school in 2007, sequencingtechnology was entering a new era.

Adoption of 2ndgeneration sequencingHuman GenomeProject ends

DNA sequencing instruments from Illumina: www.illumina.comGA II1.6 billion nt/day(2008)GA IIx5 billion nt/day(2009)nt nucleotide A, C, G or THiSeq 2000HiSeq 250075 billion nt/day 120 billion nt/day(2011)(2012)HiSeq 3000/4000200-400 billion nt/day(2015)NovaSeq 5000/60001-3 trillion nt/day(2017)

SequencingSequencing is now a common tool for life scientistsThe story echoes that of computing; once computersbecame fast & cheap, they were adopted everywhere

Genome“The complete set of genes or genetic material present ina cell or organism.”Oxford dictionaries“Blueprint” or “recipe” of lifeSelf-copying store of read-onlyinformation about how to developand maintain an organismTAGCCCGACTTG

GenomicsOxford dictionaries“The branch of molecular biology concerned with thestructure, function, evolution, and mapping of genomes.”where are the genes and other interesting bits?how do sequences change over evolutionary time?what does all the DNA do?what are the physical shapes of the genome and its products?Collins English Dictionary“The branch of molecular genetics concerned with the study ofgenomes, specifically the identification and sequencing of theirconstituent genes and the application of this knowledge inmedicine, pharmacy, agriculture, etc.”

Genomics: contrast with biology & genetics** This slide hasgross generalizationsBiology &GeneticsTargeted studies of oneor a few genesTargeted,low-throughputexperimentsClever experimentaldesign, ard partStudies considering allgenes in a genomeGlobal,high-throughputexperimentsTons of data,uncertainty,computation

Genomics: shaped by technologySanger DNAsequencingDNA Microarrays2nd-generation DNAsequencing1977-1990sSince mid-1990sSince 20073rd-generation &single-moleculeDNA sequencingSince 2010These provide very high-resolution snapshots of the world ofnucleic acids (not just DNA)

Genomics: tool for basic science“The branch of molecular biology concerned with the structure,Oxford dictionariesfunction, evolution, and mapping of genomes.”Structure / mappingWhat is the DNA sequence of the genome?Where are the genes?What is the genome’s three dimensional shape in the cell?FunctionWhat does all the DNA in the genome do?What genes interact with what other genes?How does the cell know what DNA is on/off?EvolutionHow did history shape our ethnicities and populations?What big events shaped our current genetics?Which portions of the genome are conserved by evolution?

Genomics: tool for medicine“The branch of molecular genetics concerned with the study of genomes,specifically the identification and sequencing of their constituent genes andthe application of this knowledge in medicine, pharmacy, agriculture, etc.”Collins English DictionaryHow is genotype related to health phenotypes?What’s the difference between DNA in a tumor vs DNA in healthy tissue?Can genomic data help predict what drugs might be appropriate for: a particular cancer patient?a particular genetic disorder?Can genomic data reveal weaknesses in the defenses of pathogens?Can genomic data help us predict what flu strains will prevail next year?

Computational GenomicsAddresses crucial problems at the intersection ofgenomics and computer scienceThe intersection:Key biological models are straight out of computer science:circuits and networks for molecular interactions, trees forevolution and pedigrees, strings for DNA, RNA and proteinsThanks to sequencers and microarrays, research bottlenecksincreasingly hinge on computational issues: speed, scalability,energy, costWith large, noisy, biased high-throughput datasets comes a criticalneed for machine learning and statistical reasoning

Computational Genomics: computationHow to efficiently analyze the huge quantities of fragmentaryevidence that come from DNA sequencersHow to model biological phenomena and make predictionsHow to combine data from disparate datasets to reach newconclusions in the presence of error and systematic biasHow to store huge quantities of data economically and securelywhile also allowing it to be queriedHow to visualize large, complicated datasetsDraws on: Algorithms, data structures, pattern matching, indexing,compression, information retrieval, distributed and parallelcomputing, cloud computing, machine learning, .

Computational Genomics: success GRAM blastn&PAGE TYPE BlastSearch&LINK LOC blasthomeThe BLAST sequence alignment program is a hugely successfultool, a fixture of biological analysis and cited over 50,000 times

Computational Genomics: success storiesThe Human Genome Project depended crucially on contributionsby computer scientists, especially new methods for assemblingDNA fragments into chromosomes.

Computational Genomics: success storiesThe idea of using high-throughput DNA sequencing in medical settingsis only possible because of novel, extremely efficient softwaredeveloped in the years after second-generation sequencers arrived.

LinksPast winners of the (Computational Biology) Overton cs and sequencing in the popular press:www.cs.jhu.edu/ langmea/poppress.shtmlThe DNA Data Deluge (behind 9

Collins English Dictionary. Computational Genomics Addresses crucial problems at the intersection of genomics and computer science Key biological models are straight out of computer science: circ