Bioinformatics And Its Applications - Главная

Transcription

Bioinformatics and itsapplicationsAlla L Lapidus, Ph.D.SPbAU, SPbSU,St. Petersburg

Term BioinformaticsTerm Bioinformatics was invented by PaulienHogeweg (Полина Хогевег) and Ben Hesperin 1970 as "the study of informatic processesin biotic systems".Paulien Hogeweg is a Dutch theoretical biologistand complex systems researcher studyingbiological systems as dynamic informationprocessing systems at many interconnectedlevels.

Definitions of what is Bioinformatics:Bioinformatics is the use of IT in biotechnology for the data storage, datawarehousing and analyzing the DNA sequences. In Bioinfomatics knowledge ofmany branches are required like biology, mathematics, computer science, laws ofphysics & chemistry, and of course sound knowledge of IT to analyze biotechBioinformaticsis antointerdisciplinarythatandimprovesdata. Bioinformaticsis not limitedthe computingfielddata,butdevelopsin realityit canbe uponmethodsfor storing,retrieving,and analyzingbiological data. Aused to solve manybiologicalproblemsand findorganizingout how livingthings works.major activity in bioinformatics is to develop software tools to generate usefulbiological knowledge.The mathematical, statistical and computing methods thataim to solve biological problems using DNA and amino acidsequences and related information.Bioinformatics develops algorithms and biological softwareof computer to analyze and record the data related tobiology for example the data of genes, proteins, drugingredients and metabolic pathways.My additions:1. Bioinformatics is a SCIENCE2. Not only to develop algorithms, store, retrieve, organize andanalyze biological data but to CURATE data3

Bioinformatics is being used infollowing fields:Microbial genomeapplicationsMolecular medicinePersonalised medicinePreventative medicineGene therapyDrug developmentAntibiotic resistanceEvolutionary studiesWaste cleanupBiotechnologyClimate change StudiesAlternative energy sourcesCrop improvementForensic analysisBio-weapon creationInsect resistanceImprove nutritional qualityDevelopment of Drought resistantvarietiesVetinary Science4

Sequencing nQCfastqResultsInterpretationDataapplicationsLIMS - Lab Information Management Software

Microbial genome applicationsGenome assemblyRe-sequencingComparative analysisEvolutionary studiesAntibiotic resistanceWaste cleanupBiotechnology6

Genome Assembly Genome assembly is a very complex computational problemdue to enormous amount of data to put together and someother reasons reasons. Ideally an assembly program should produce one contig forevery chromosome of the genome being sequenced. Butbecause of the complex nature of the genomes, the idealconditions just never possible, thus leading to gaps in thegenome.7

De Novo assembly - puzzle without thepicture

Assembly Challenges Presence of repeats. Repeats are identical sequences that occur inthe genome in different locations and are often seen in varyinglengths and in the multiple copies. There are several types ofrepeats: tandem repeats or interspersed repeats. The read'soriginating from different copies of the repeat appear identical to theassembler, causing errors in the assembly. Contaminants in samples (eg. from Bacteria or Human). PCR artefacts (eg. Chimeras and Mutations) Sequencing errors, such as “Homopolymer” errors – when eg. 2 runof same base. MID’s (multiplex indexes), primers/adapters still in the raw reads. polyploid genomes9

Assembly algorithmsOverlap-Layout-Consensus - Find overlaps between all readsreadsConsensusProblems caused by new sequencing technologies: Hard to find overlaps between short reads Impossible to scale up10

De Bruijn graphACGTCGTAk 2AC CGGTTATCABySSALLPATHS-LGEULERIDBAVelvetModified from Andrey Prjibelski

Single-cell dataset E. coli isolate dataset E.coli single-cell dataset IDBA-UDSPAdesVelvet

SPAdes pipelineSPAdesInput dataError correctionAssemblyPostprocessingContigs13

14

Gene Prediction and GenomeAnnotationBased on similarity toknown genes – blastX(NCBI)Gene finding programs Glimmer – for mostprocaryotic genomes GenMark – for bothprocaryotic genomesand eucaryotic genomes15

Re-sequencingProjects aimed at characterizing the geneticvariations of species or populationsResequencing of bacterial and archaeal isolatesetc is possible if reference genomes areavailableThis approach can help to better understandbacterial community structure, gene function inbacteria under selective pressure or inmutagenized strains.

Climate change StudiesIncreasing levels of carbon dioxide emission arethought to contribute to global climatechange.One way to decrease atmospheric carbondioxide is to study the genomes of microbesthat use carbone dioxidet as their sole carbonsource

HumanmicrobiomeMetaHIT - EuropeHuman Microbiome Project –USThe human microbiome includes viruses, fungi and bacteria, their genes andtheir environmental interactions, and is known to influence humanphysiology.There’s very broad variation in these bacteria in different people and thatseverely limits our ability to create a “normal” microflora profile forcomparison among healthy people and those with any kind of health issues.Children with autism harbor significantlyfewer types of gut bacteria than thosewho are not affected by the disorder,researchers have found.Prevotella species were most dramaticallyreduced among samples from autisticchildren—especially P. copri. (helps thebreakdown of protein and carbohydratefoods)

Bioinformatics combining biologywith computer science- it can explore the causes of diseases at the molecularlevel- explain the phenomena of the diseases on thegene/pathway level- make use of computer techniques (data mining,machine learning etc), to analyze and interpret datafaster- to enhance the accuracy of the resultsReduce the cost and time of drug discovery

To improve drug discovery we need to discover(read "develop") efficient bioinformaticsalgorithms and approaches fortarget identificationtarget validationlead identificationlead optimization

Advantages of detecting mutations withnext-generation sequencing High throughput Test many genes at once Systematic, unbiased mutation detection All mutation types Single nucleotide variants (SNV), copy number variation(CNV)-insertions, deletions and translocations Digital readout of mutation frequency Easier to detect and quantify mutations in aheterogeneous sample Cost effective precision medicine “Right drug at right dose to the right patient at theright time”

Homozygous SNPs and indel23

Poor alignment

Missed SNP?25

Bioinformatics and Health InformaticsIf bioinformatics is the study of the flow ofinformation in biological sciences, HealthInformatics is the study of the information inpatient care26

Medicine: Informatics pipeline workflowSampleSequenceOrderPatientPhysicianTier 2:Genome AnnotationMedical KnowledgebaseTier 1:Base CallingAlignmentVariant CallingEHRTier 3: Clinical Report

Huge need in bioinformatics toolsSimple pipelines/protocols and easy to read reportsCancerSample sequencingData AnalysisPatients treatment

molecularnicianiatictisttisstastabiologistTeam work to set up cancer cispemolecular oncologists(pathway analysis)Data analystsigoholtapdatabassis automationes spe csUnialistseis ctiotrgfrielo eleocsndlyOn entIT infrastructure (storage,inttmerfaupdatesetc)eactres29

http://www.youtube.com

Ion Torrent: Torrent Suite Software31

HOMEZ 22814666S2S8s4s5s3s7

Each baby to be sequenced at birth:personal reference“GATTACA”, 1997

Funny De Bruijn oDon'tModified from Andrey Prjibelski

THANK YOU!

Bioinformatics is an interdisciplinary field that develops and improves upon methods for storing, retrieving, organizing and analyzing biological data. A major activity in bioinformatics is to develop software tools to generate useful biological knowledge. The mathematical, statistical and computing methods thatFile Size: 1MB