What Is Bioinformatics? - San Jose State University

Transcription

June 2016American University of ArmeniaIntroduction to BioinformaticsWhat is Bioinformatics?What is Bioinformatics?§ The Human GenomeTWOIntroduction to BioinformaticsSami KhuriDept of Computer ScienceSan José State UniversityJune 2016Project (HGP)§ Mapping§ Model Organisms§ Types of Databases§ Applications ofBioinformatics§ Genome Research 2016 Sami Khuri 2016 Sami KhuriFrom the PrefacePreface and Note to theReader We believe that to perform a properanalysis it is not sufficient tounderstand how to use a program andthe kind of results (and errors!) it canproduce. It is of also necessary to have someunderstanding of the technique used bythe program and the science on whichit is based. All research workers in the areas of biomolecularscience and biomedicine are now expected to becompetent in several areas of sequence analysisand often, additionally, in protein structureanalysis and other more advanced bioinformaticstechniques. The book is designed to be accessible both tostudents who wish to obtain a working knowledgeof the bioinformatics applications, as well as tostudents who want to know how the applicationswork and maybe write their own. 2016 Sami KhuriThe Human GenomeProject The HGP is a multinational effort, begun by the USA in1988, whose aim is to produce a complete physical mapof all human chromosomes, as well as the entire humanDNA sequence. The ultimate goal of genome research is to find all thegenes in the DNA sequence and to develop tools forusing this information in the study of human biologyand medicine. The primary goal of the project is to make a series ofdescriptive diagrams (called maps) of each humanchromosome at increasingly finer resolutions. 2016 Sami Khuri 2016 Sami Khuri 2016 Sami KhuriBioinformatics andthe Internet The recent enormous increase in biological datahas made it necessary to use computerinformation technology to collect, organize,maintain, access, and analyze the data. Computer speed, memory, exchange ofinformation over the Internet has greatlyfacilitated bioinformatics. The bioinformatics tools available over theInternet are accessible, generally well developed,fairly comprehensive, and relatively easy to use. 2016 Sami Khuri2.1

June 2016American University of ArmeniaIntroduction to BioinformaticsOther SpeciesAs part of the HGP, genomes of other organisms, such asbacteria, yeast, flies and mice are also being studied.centromeretelomereBakerʼs yeastC eleganstelomerep53 genepax6 geneDiabetesDNA repairCell divisionCytogenetic mapof chromosome 19@2002-10 Sami KhuriChimps are infected with SIVVery rarely progress to AIDS@2002-10 Sami Khuri 2016 Sami KhuriOther SequencedGenomes 2016 Sami KhuriModel Organisms A model organism is an organism that isextensively studied to understand particularbiological phenomena. Why have model organisms? The hope is thatdiscoveries made in model organisms will provideinsight into the workings of other organisms. Why is this possible? This works becauseevolution reuses fundamental biological principlesand conserves metabolic, regulatory, anddevelopmental pathways.@2002-10 Sami Khuri 2016 Sami KhuriStudying Human Diseases 2016 Sami KhuriGoals of the HGP To identify all the approximately20,000-25,000 genes in human DNA, To determine the sequences of the 3.2 billionchemical base pairs that make up humanDNA, To store this information in databases, To improve tools for data analysis, To address the ethical, legal, and social issues(ELSI) that may arise from the project.Copyright 2006 Pearson Prentice Hall, Inc. 2016 Sami Khuri 2016 Sami Khuri 2016 Sami Khuri2.2

June 2016American University of ArmeniaIntroduction to BioinformaticsHGP Finished BeforeDeadline In 1991, the USA Congress was toldthat the HGP could be done by 2005 for 3 billion. It ended in 2003 for 2.7 billion,because of efficient computationalmethods.What is Bioinformatics?Set of Tools The use of computers to collect,analyze, and interpret biologicalinformation at the molecular level. A set of software toolsfor molecular sequenceanalysis 2016 Sami KhuriWhat is Bioinformatics?A Discipline The field of science, in which biology,computer science, and informationtechnology merge into a single discipline.Definition of NCBI (National Center for Biotechnology Information) The ultimate goal of bioinformatics is toenable the discovery of new biological insightsand to create a global perspective from whichunifying principles in biology can be discerned. 2016 Sami KhuriWhy Study Bioinformatics (I) Bioinformatics is intrinsicallyinteresting. Bioinformatics offers the prospectof finding better drug targets earlierin the drug development process.– By looking for genes in model organisms that aresimilar to a given human gene, researchers canlearn about protein the human gene encodes andsearch for drugs to block it. 2016 Sami Khuri 2016 Sami KhuriHow can BioinformaticsHelp?use docking algorithmsto design molecule thatcould bind the model structureRational drug designStructure-based drug design@2002-10 Sami Khuri 2016 Sami Khuri 2016 Sami KhuriScientific American July 2000 2016 Sami Khuri2.3

June 2016American University of ArmeniaIntroduction to BioinformaticsWhy Study Bioinformatics (II) Molecular biology is the newfrontier of 21st century science.Science: Top 25 Questions (I)* What Is the Universe Made Of?* What is the Biological Basis of Consciousness? Why Do Humans Have So Few Genes? To What Extent Are Genetic Variation andPersonal Health Linked?– DNA, RNA, genes, stem cells,etc. are everywhere in thenews.* Can the Laws of Physics Be Unified?* How Much Can Human Life Span Be Extended? Science Magazine celebratedits 125th anniversary by issuing twenty fivebig questions facing science over the h What Controls Organ Regeneration? How Can a Skin Cell Become a Nerve Cell? How Does a Single Somatic Cell Become a WholePlant?* How Does Earth's Interior Work?* Are We Alone in the Universe?* How and Where Did Life on Earth Arise? 2016 Sami Khuri 2016 Sami KhuriScience: Top 25 Questions (II) What Determines Species Diversity? What Genetic Changes Made Us Uniquely Human?* How Are Memories Stored and Retrieved? How Did Cooperative Behavior Evolve? How Will Big Pictures Emerge from a Sea ofBiological Data?* How Far Can We Push Chemical Self-Assembly?* What Are the Limits of Conventional Computing? Can We Selectively Shut Off Immune Responses? Do Deeper Principles Underlie Quantum Uncertainty and Nonlocality? Is an Effective HIV Vaccine Feasible?* How Hot Will the Greenhouse World Be?* What Can Replace Cheap Oil -- and When?@2002-10 Sami Khuri 2016 Sami Khuri 2016 Sami KhuriRed Blood Cells@2002-10 Sami Khuri@2002-10 Sami Khuri 2016 Sami Khuri 2016 Sami Khuri 2016 Sami Khuri2.4

June 2016American University of ArmeniaIntroduction to BioinformaticsWhat do Bioinformaticiansdo? They analyze and interpret dataDevelop and implement algorithmsDesign user interfaceDesign databaseAutomate genome analysisThey assist molecular biologists in dataanalysis and experimental design.Databases for Storageand Analysis- Databases store data that need to be analyzed- By comparing sequences, we discover:- How organisms are related to one another- How proteins function- How populations vary- How diseases occur- The improvement of sequencing methods generated a lot ofdata that need to be:- stored- organized- curated- annotated- managed- networked- accessed- assessed 2016 Sami KhuriTypes of Databases 2016 Sami KhuriThree Major DatabasesIn 2006 there were858 databasesclassified into 14major categories GenBank from the NCBI(National Center ofBiotechnology Information),National Library of Medicinehttp://www.ncbi.nlm.nih.gov EBI (European BioinformaticsInstitute) from the EuropeanMolecular Biology Libraryhttp://www.ebi.ac.uk DDBJ (DNA DataBank of Japan)http://www.ddbj.nig.ac.jp 2016 Sami KhuriGenBank TaxonomicSamplingHomo sapiensMus musculusDrosophila melanogasterCaenorhabditis elegansArabidopsis thalianaOryza sativaRattus norvegicusDanio rerioSaccharomyces cerevisiaeGenBankGenBank is the NIH genetic sequencedatabase of all publicly available DNAand derived protein sequences, withannotations describing the biologicalinformation these records contain.62.1%7.7%6.1%3.3%2.9%1.3%0.8%0.6%0.6% 2016 Sami Khuri 2016 Sami Khuri 2016 Sami Khuri 2016 Sami Khuri2.5

June 2016American University of ArmeniaIntroduction to BioinformaticsApplications ofGenome ResearchWhat does NCBI do?NCBI: established in 1988 as a national resourcefor molecular biology information.– it creates public databases,– it conducts research in computational biology,– it develops software tools for analyzing genome data,and– it disseminates biomedical information,all for the better understanding of molecularprocesses affecting human health and disease.Current and potential applications of GenomeResearch include:––––Molecular MedicineMicrobial GenomicsRisk AssessmentBioarcheology, Anthropology, Evolution andHuman Migration– DNA Identification– Agriculture, Livestock Breeding andBioprocessing 2016 Sami KhuriMolecular Medicine 2016 Sami KhuriMicrobal Genomics Improve the diagnosis of disease Detect genetic predispositions to disease Create drugs based on molecularinformation Use gene therapy and control systems asdrugs Design custom drugs on individual geneticprofiles. Swift detection and treatment in clinics ofdisease-causing microbes: pathogens Development of new energy sources: biofuels Monitoring of the environment to detectchemical warfare Protection of citizens from biological andchemical warfare Efficient and safe clean up of toxic waste. 2016 Sami KhuriLouis XVIIDNA Identification I Identify potential suspects whose DNA maymatch evidence left at crime scenes Exonerate persons wrongly accused ofcrimes Establish paternity and other familyrelationships Match organ donors with recipients intransplant programs 2016 Sami Khuri 2016 Sami Khuri 2016 Sami KhuriLouis XVII: son of Louis XV1 and Marie-Antoinette whodied from tuberculosis in 1795 at the age of 12 2016 Sami Khuri2.6

June 2016American University of ArmeniaIntroduction to BioinformaticsDNA and Human TraffickingFrom Haiti to Bolivia 2016 Sami Khuri 2016 Sami KhuriDanish Astronomer:Tycho Brahe (1546 – 1601)DNA in Murder, Suicide Casesand HistoryHe catalogued more than 1,000 new stars andhis stellar and planetary observations helpedlay the foundations of early modernastronomy. He was long thought to have diedof a bladder infection, which legend suggestswas contracted 11 days previously - when hehad been too polite to leave the royal banquettable to go to the toilet. Others have suggestedhe was poisoned. The finger of suspicion hadfallen on his assistant, Johannes Kepler, wholater became a renowned astronomer himself.In November 2012, Brahe’s body wasexhumed and scientists concluded that he wasprobably not poisoned. What do these people have in common?–––––Tycho BraheSalvador AllendeAlbert DeSalvoMaria RidulfLuigi Tenco They all had their bodies exhumed for DNAtesting. 2016 Sami KhuriA PossibleEvolution TreeFor HumansQuagga: Zebra or Horse? 2016 Sami KhuriH. sapiensHomohabilisHomo ergasterH. heidelbergensisAustralopithecusafarensisH. erectusA. robustusH. neanderthalensisArdipithecusramidusA. africanusA. boiseiDied in Amsterdam zoo in 1883.5 2016 Sami Khuri 2016 Sami Khuri432Millions of Years Ago10 2016 Sami Khuri2.7

June 2016American University of ArmeniaIntroduction to BioinformaticsDNA Identification II Identify endangered and protected speciesas an aid to wildlife officials and also toprosecute poachers Detect bacteria and other organisms thatmay pollute air, water, soil, and food Determine pedigree for seed or livestockbreeds Authenticate consumables such as wineand caviarAgriculture, LivestockBreeding and Bioprocessing Grow disease-resistant, insect-resistant,and drought-resistant crops Breed healthier, more productive,disease-resistant farm animals Grow more nutritious produce Develop biopesticides Incorporate edible vaccines into foodproducts 2016 Sami KhuriWhat have we learnedfrom the HGP? 2016 Sami KhuriWhat have we learnedfrom the HGP?The small number of genesA smallportion of thegenomecodes forproteins,tRNAsand rRNAs@2002-10 Sami Khuri 2016 Sami Khuri 2016 Sami KhuriThe Alpha-TropomyosinGeneAlternative Splicing@2002-10 Sami KhuriGenomic Medicine by Guttmacher et al., NEJM, 2002 2016 Sami Khuri 2016 Sami Khuri 2016 Sami Khuri2.8

June 2016American University of ArmeniaIntroduction to BioinformaticsAnatomy of an IntronBuilding upon theFoundations of HGP As we build upon the foundation laid by theHuman Genome Project, our ability to exploreuncharted frontiers will hinge upon meldingbiological know-how with expertise in computerscience, physics, math, clinical research,bioethics, and many other disciplines. A firm understanding of the powerful potentialof genomics, proteomics, and bioinformaticswill be essential to success in this amazing newDiscovering Genomics, Campbell, 2007 – Preface by Francis Collinsworld. 2016 Sami KhuriGenomics is a Way ofSeeing Life Genome: the complete (haploid) DNA content of anorganism. Genomics: the field of genome studies. Genomics– is not just a collection of methods– has become an enhanced way of seeing life. Genomics includes the study of interaction ofmolecules inside the cell:DNAProteinLipidsCarbohydrates 2016 Sami KhuriPathway to apProjectGenomicMedicineSequencing ofthe humanDNAInterpretingthe humangenomesequenceImplicatinggeneticvariants withhuman diseasePersonalizedmedicineCure fordiseases Genomics requires us to analyze, hypothesize, think,and formulate models. 2016 Sami KhuriPersonalized MedicinePersonalized medicine is the use of diagnosticand screening methods to better manage theindividual patient’s disease or predispositiontoward a disease.Personalized medicine will enable riskassessment, diagnosis, prevention, and therapyspecifically tailored to the uniquecharacteristics of the individual, thus enhancingthe quality of life and public health.Personalized Medicine is Genotype-SpecificTreatment. 2016 Sami Khuri 2016 Sami Khuri 2016 Sami KhuriOrigins of AfricanAmericans@2002-10 Sami KhuriSource: Esteban González Burchard 2016 Sami Khuri2.9

June 2016American University of ArmeniaIntroduction to BioinformaticsAncestry InformativeMarkerSNPs and AIMs An Ancestry-Informative Marker (AIM) isa set of polymorphisms for a locus whichexhibits substantially different frequenciesbetween populations from differentgeographical regions. By using a number of AIMs one can estimatethe geographical origins of the ancestors of anindividual and ascertain what proportion ofancestry is derived from each geographicalregion.en.wikipedia.org/wiki/Source: Esteban González Burchard 2016 Sami Khuri 2016 Sami KhuriSelf-Identified Race:Genetic AncestryOrigins of Latinos andAfrican AmericansAfricanAmericans@2002-10 Sami KhuriSource: Esteban González Burchard 2016 Sami Khuri 2016 Sami KhuriThe Superior DoctorPreventive Medicine Superior doctors prevent the diseaseMediocre doctors treat the disease before evidentInferior doctors treat the full blown disease-Huang Dee: Nai - Ching(2600 B.C. 1st Chinese Medical Text) 2016 Sami Khuri 2016 Sami KhuriPrevent disease from occurringIdentify the cause of the diseaseTreat the cause of the disease rather than the symptomsGenomics identifies the cause of disease“All medicine may become pediatrics” Paul WiseEffects of environment, accidents, aging, penetrance Health care costs can be greatly reduced if– invests in preventive medicine– one targets the cause of disease rather than symptoms 2016 Sami Khuri2.10

June 2016American University of ArmeniaIntroduction to BioinformaticsWellderly: Healthy Aging 2016 Sami KhuriAnatomy Lessonof Dr. Nicolaes Tulp@2002-09 Sami Khuri 2016 Sami KhuriIf Rembrandt wasAround Today1632 oil painting by Rembrandt Harmenszoon van Rijn 2016 Sami KhuriSource: Carlos Cordon-Cardo, Columbia University 2016 Sami KhuriConcluding Remarks Biology is becoming an information science Progression: in vivo to in vitro to in silico Are natural languages adequate in predictingquantitative behavior of biological systems?The Future– Need to produce biological knowledge andoperations in ways that natural languages do notallowConvert all this progress into real riches for science, society, and patients 2016 Sami Khuri 2016 Sami Khuri “Biology easily has 500 years of excitingproblems to work on”. Donald Knuth Today’s biologists need to think quantitativelyand from a multidisciplinary perspective. 2016 Sami Khuri2.11

The bioinformatics tools available over the Internet are accessible, generally well developed, fairly comprehensive, and relatively easy to use. 2016 Sami Khuri 2.2 American University of Armenia June 2016 Introduction to Bioinformatics 2016 Sami K