Introduction To Bioinformatics - CGIAR

Transcription

Introduction to BioinformaticsIMBB 2017RAB, Kigali - RwandaMay 02 – 13, 2017Joyce Nzioki

Plan for the WeekIntroduction to BioinformaticsRaw sanger sequence dataQualityControlDe novoassemblyIntroduction to CLC BioResolving conflictsBLAST and Biological databasesDNA BarcodingNucleotide sequence AnalysisMSA and PhylogeneticsSequence depositing

What is Bioinformatics Bioinformatics is an interdisciplinary sciencethat develops and improves on methods ofstoring, retrieving, organizing and analyzingbiological data. This computational techniques are to solvebiological problems and discover the wealth ofbiological information hidden in biologicaldata.

BioinformaticsThe design, construction and use of software toolsto generate, store, annotate and analyse data andinformation relating to Molecular Biology.

BioinformaticsThe design, construction and use of software toolsto generate, store, annotate and analyse data andinformation relating to Molecular Biology.Here we consider the use of bioinformatics toolsrather than their design and construction.Here we consider the access, storage and analysisof data and information items rather than thegeneration and annotation.

ure xpressionHypothesis

Major types of Bioinformatics DataLiterature and ontologiesGene expressionGenomesProtein sequenceDNA & RNA sequenceProtein structureDNA & RNA structureChemical entitiesProtein families,motifs and domainsProtein interactionsPathwaysSystems

Bioinformatics Research areasInclude but not limited to Organization, classification, dissemination and analysis ofbiological and biomedical dataBiological sequence analysis and phylogenetics.Genome organization and evolutionRegulation of gene expression and epigineticsBiological pathways and network in healthy and disease statesProtein structure prediction from sequenceModelling and prediction of the biophysical properties ofbiomolecules for binding prediction and drug design Design of biomolecular structure and functionWith applications to Biology, Medicine, Agriculture and Industry

Where did bioinformatics come from?Bioinformatics arose as molecular biology begun to be transformedby the emergence of molecular sequence and structural data Recap: The key dogmas of molecular biology DNA sequence determined protein sequenceProtein sequence determines protein structureProtein structure determines protein functionRegulatory mechanisms (e.g. gene expression) determinesthe amount of a particular function in space and timeBioinformatics is now essential for the archiving, organizationand analysis of data related to these processes

Bioinformatics involves the application of computeralgorithms, computer models and computer databaseswith the broad goal of understanding the action of genes,transcripts, proteins and large collections in this entitiesThe integration of information learned about this threebiological processes gives insight Into the biology of organisms

How does it look like on a computer

A cDNA sequence (reading frame) gi 14456711 ref NM 000558.3 Homo sapiens hemoglobin, alpha 1 (HBA1), TGGTCTTTGAATAAAGTCTGAGTGGGCGGCA protein sequence gi 4504347 ref NP 000549.1 alpha 1 globin [Homo RVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR

How do we actually do Bioinformatics?Prepackage tools and databasesvMany online and open sourcevSome are commercialTool developmentvMostly on UNIX environmentvKnowledge of programming requires(Python, Perl, R, C, Java)vMay require specialized or high performance computingresources

History of Bioinformatics

History of Bioinformatics

SequencingDNA sequencing is a process of determining theorder of nucleotides within a DNA molecule.

History of DNA sequencing 1976: Maxam – Gilbert sequencing 1977: Sanger sequencing (dideoxy chaintermination) 1986: Flourescently labelled ddNTPs 1987: Applied Biosystems (ABI 370) 1988: Capillary gell electrophoresis 1999: Applied Biosystems ABI 3700 DNA Analyzer 2005 : Next generation sequencing

Next Generation SequencingIllumina MiniSeqIllumina MiSeqIllumina NextSeqIon PGMPacBio RS IIPacBio SequelIon ProtonONT MinIONCTLGH Introduction to Bioinformatics, 13-17 Feb 2017, NairobiIllumina HiSeqIllumina NovaSeqIon S5ONT PromethIONIntro to NGS Sequencing TechnologiesONT SmidgIONBert Overduin14

Applications of Bioinformatics Microbial genomeapplications Molecular medicine Personalized medicine Preventive medicine Gene therapy Drug development Antibiotic resistance Evolutionary studies Biotechnology Climate change studies Crop improvementForensic analysisInsect resistanceImprove nutritional qualityDevelopment of droughtresistant varieties Veterinary science Bioengineering Agriculture biotechnology.

Limitations of Bioinformatics Bioinformatics is a science of inference hence: Quality of bioinformatics predictions depends on thequality of data and sophistication of algorithms. Sequence data may have errors which subsequentlyleads to errors in downstream analysis. Many exhaustive algorithms cannot be used due tocomputational limitations. Trade-off between specificity and sensitivity

Why bioinformatics then In most cases biologics /wet lab is needed to validate bioinformatic predictions Bioinformatics can:–Reduce data to a small set of testable predictions–Assign a degree of confidence to each prediction The biologist will often have to choose the appropriate degree of confidence,depending on:–Cost of validating predictions.–Benefit expected from the right predictions. Data mining - the process by which testable hypothesis are generated regardingthe function or structure of a gene or protein of interest by identifying homologsin better characterized organisms. Bioinformatics as in sillico biology:–Allows for exploration of domains that cannot be addressed manually e.gstudy of past evolutionary events / patterns.

The EndAcknowledging Bert Overduin University or EdinburghandEBI online courses for some slides

Thank youIMBB 2017RAB, Kigali - RwandaMay 02 – 13, 2017Joyce Nziokij.n.njuguna@cgiar.org

Limitations of Bioinformatics Bioinformatics is a science of inference hence: Quality of bioinformatics predictions depends on the quality of data and sophistication of algorithms. Sequence data may have errors which subsequently leads to errors in downstream an