Transcription
Bioinformatics and itsapplicationsAlla L Lapidus, Ph.D.SPbAU, SPbSU,St. Petersburg
Term BioinformaticsTerm Bioinformatics was invented by PaulienHogeweg (Полина Хогевег) and Ben Hesperin 1970 as "the study of informatic processesin biotic systems".Paulien Hogeweg is a Dutch theoretical biologistand complex systems researcher studyingbiological systems as dynamic informationprocessing systems at many interconnectedlevels.
Definitions of what is Bioinformatics:Bioinformatics is the use of IT in biotechnology for the data storage, datawarehousing and analyzing the DNA sequences. In Bioinfomatics knowledge ofmany branches are required like biology, mathematics, computer science, laws ofphysics & chemistry, and of course sound knowledge of IT to analyze biotechBioinformaticsis antointerdisciplinarythatandimprovesdata. Bioinformaticsis not limitedthe computingfielddata,butdevelopsin realityit canbe uponmethodsfor storing,retrieving,and analyzingbiological data. Aused to solve manybiologicalproblemsand findorganizingout how livingthings works.major activity in bioinformatics is to develop software tools to generate usefulbiological knowledge.The mathematical, statistical and computing methods thataim to solve biological problems using DNA and amino acidsequences and related information.Bioinformatics develops algorithms and biological softwareof computer to analyze and record the data related tobiology for example the data of genes, proteins, drugingredients and metabolic pathways.My additions:1. Bioinformatics is a SCIENCE2. Not only to develop algorithms, store, retrieve, organize andanalyze biological data but to CURATE data3
Bioinformatics is being used infollowing fields:Microbial genomeapplicationsMolecular medicinePersonalised medicinePreventative medicineGene therapyDrug developmentAntibiotic resistanceEvolutionary studiesWaste cleanupBiotechnologyClimate change StudiesAlternative energy sourcesCrop improvementForensic analysisBio-weapon creationInsect resistanceImprove nutritional qualityDevelopment of Drought resistantvarietiesVetinary Science4
Sequencing nQCfastqResultsInterpretationDataapplicationsLIMS - Lab Information Management Software
Microbial genome applicationsGenome assemblyRe-sequencingComparative analysisEvolutionary studiesAntibiotic resistanceWaste cleanupBiotechnology6
Genome Assembly Genome assembly is a very complex computational problemdue to enormous amount of data to put together and someother reasons reasons. Ideally an assembly program should produce one contig forevery chromosome of the genome being sequenced. Butbecause of the complex nature of the genomes, the idealconditions just never possible, thus leading to gaps in thegenome.7
De Novo assembly - puzzle without thepicture
Assembly Challenges Presence of repeats. Repeats are identical sequences that occur inthe genome in different locations and are often seen in varyinglengths and in the multiple copies. There are several types ofrepeats: tandem repeats or interspersed repeats. The read'soriginating from different copies of the repeat appear identical to theassembler, causing errors in the assembly. Contaminants in samples (eg. from Bacteria or Human). PCR artefacts (eg. Chimeras and Mutations) Sequencing errors, such as “Homopolymer” errors – when eg. 2 runof same base. MID’s (multiplex indexes), primers/adapters still in the raw reads. polyploid genomes9
Assembly algorithmsOverlap-Layout-Consensus - Find overlaps between all readsreadsConsensusProblems caused by new sequencing technologies: Hard to find overlaps between short reads Impossible to scale up10
De Bruijn graphACGTCGTAk 2AC CGGTTATCABySSALLPATHS-LGEULERIDBAVelvetModified from Andrey Prjibelski
Single-cell dataset E. coli isolate dataset E.coli single-cell dataset IDBA-UDSPAdesVelvet
SPAdes pipelineSPAdesInput dataError correctionAssemblyPostprocessingContigs13
14
Gene Prediction and GenomeAnnotationBased on similarity toknown genes – blastX(NCBI)Gene finding programs Glimmer – for mostprocaryotic genomes GenMark – for bothprocaryotic genomesand eucaryotic genomes15
Re-sequencingProjects aimed at characterizing the geneticvariations of species or populationsResequencing of bacterial and archaeal isolatesetc is possible if reference genomes areavailableThis approach can help to better understandbacterial community structure, gene function inbacteria under selective pressure or inmutagenized strains.
Climate change StudiesIncreasing levels of carbon dioxide emission arethought to contribute to global climatechange.One way to decrease atmospheric carbondioxide is to study the genomes of microbesthat use carbone dioxidet as their sole carbonsource
HumanmicrobiomeMetaHIT - EuropeHuman Microbiome Project –USThe human microbiome includes viruses, fungi and bacteria, their genes andtheir environmental interactions, and is known to influence humanphysiology.There’s very broad variation in these bacteria in different people and thatseverely limits our ability to create a “normal” microflora profile forcomparison among healthy people and those with any kind of health issues.Children with autism harbor significantlyfewer types of gut bacteria than thosewho are not affected by the disorder,researchers have found.Prevotella species were most dramaticallyreduced among samples from autisticchildren—especially P. copri. (helps thebreakdown of protein and carbohydratefoods)
Bioinformatics combining biologywith computer science- it can explore the causes of diseases at the molecularlevel- explain the phenomena of the diseases on thegene/pathway level- make use of computer techniques (data mining,machine learning etc), to analyze and interpret datafaster- to enhance the accuracy of the resultsReduce the cost and time of drug discovery
To improve drug discovery we need to discover(read "develop") efficient bioinformaticsalgorithms and approaches fortarget identificationtarget validationlead identificationlead optimization
Advantages of detecting mutations withnext-generation sequencing High throughput Test many genes at once Systematic, unbiased mutation detection All mutation types Single nucleotide variants (SNV), copy number variation(CNV)-insertions, deletions and translocations Digital readout of mutation frequency Easier to detect and quantify mutations in aheterogeneous sample Cost effective precision medicine “Right drug at right dose to the right patient at theright time”
Homozygous SNPs and indel23
Poor alignment
Missed SNP?25
Bioinformatics and Health InformaticsIf bioinformatics is the study of the flow ofinformation in biological sciences, HealthInformatics is the study of the information inpatient care26
Medicine: Informatics pipeline workflowSampleSequenceOrderPatientPhysicianTier 2:Genome AnnotationMedical KnowledgebaseTier 1:Base CallingAlignmentVariant CallingEHRTier 3: Clinical Report
Huge need in bioinformatics toolsSimple pipelines/protocols and easy to read reportsCancerSample sequencingData AnalysisPatients treatment
molecularnicianiatictisttisstastabiologistTeam work to set up cancer cispemolecular oncologists(pathway analysis)Data analystsigoholtapdatabassis automationes spe csUnialistseis ctiotrgfrielo eleocsndlyOn entIT infrastructure (storage,inttmerfaupdatesetc)eactres29
http://www.youtube.com
Ion Torrent: Torrent Suite Software31
HOMEZ 22814666S2S8s4s5s3s7
Each baby to be sequenced at birth:personal reference“GATTACA”, 1997
Funny De Bruijn oDon'tModified from Andrey Prjibelski
THANK YOU!
Bioinformatics is an interdisciplinary field that develops and improves upon methods for storing, retrieving, organizing and analyzing biological data. A major activity in bioinformatics is to develop software tools to generate useful biological knowledge. The mathematical, statistical and computing methods thatFile Size: 1MB