Molecular Biology Fundamentals

Transcription

MolecularBiologyFundamentalsRobert J. RobbinsJohns Hopkins Universityrrobbins@gdb.orgFile: N drive:\jhu\class\1995\mol-bio.ppt 1994, 1995 Robert RobbinsMolecular Biology: 1

Origins of Molecular BiologyPhenotypeClassical e phenotype of an organism denotes its external appearance (size, color,intelligence, etc.). Classical genetics showed that genes control thetransmission of phenotype from one generation to the next. Biochemistryshowed that within one generation, proteins had a determining effect onphenotype. For many years, however, the relationship between genes andproteins was a mystery. Then, it was found that genes contain digitallyencoded instructions that direct the synthesis of proteins. The crucial insightof molecular biology is that hereditary information is passed betweengenerations in a form thatis truly, not metaphorically, digital.Understanding how that digital code directs the creation of life is the goal ofmolecular biology.PhenotypeClassical ecularBiologyFile: N drive:\jhu\class\1995\mol-bio.ppt 1994, 1995 Robert RobbinsMolecular Biology: 2

Classical GeneticsPhenotypeClassical ccCF2CcCcccRegular numerical patterns of inheritance showed that the passage oftraits from one generation to the next could be explained with theassumption that hypothetical particles, or genes, were carried in pairsin adults, but transmitted individually to progeny.File: N drive:\jhu\class\1995\mol-bio.ppt 1994, 1995 Robert RobbinsMolecular Biology: 3

Classical GeneticsDuring the first half of this century, classical investigation of thegene established that theoretical objects called genes were thefundamental units of heredity. According to the classical model ofthe gene:Genes behave in inheritance as independent particles.Genes are carried in a linear arrangement in the chromosome,where they occupy stable positions.Genes recombine as discrete units.Genes can mutate to stable new forms.Basically, genes seemed to be particulate objects, arranged on thechromosome like “beads on a string.”The genes are arranged in a manner similar to beads strung on aloose string.Sturtevant, A.H., and Beadle, G.W., 1939. An Introduction toGenetics. W. B. Saunders Company, Philadelphia, p. 94.File: N drive:\jhu\class\1995\mol-bio.ppt 1994, 1995 Robert RobbinsMolecular Biology: 4

Classical 11112113114115PQR116117118119120121122123The beads can be conceptually separated from the string, which has“addresses” that are independent of the 3Mapping involves placing the beads in the correct order andassigning a correct address to each bead. The address assigned to abead is its locus.File: N drive:\jhu\class\1995\mol-bio.ppt 1994, 1995 Robert RobbinsMolecular Biology: 5

Classical 123Recognizing that the beads have width, mapping could be extendedto assigning a pair of numbers to each bead so that a locus is definedas a region, not a 4STUPQR115116117118VWX119120YZ121122123In this model, genes are independent, mutually exclusive, nonoverlapping entities, each with its own absolute address.File: N drive:\jhu\class\1995\mol-bio.ppt 1994, 1995 Robert RobbinsMolecular Biology: 6

Classical GeneticsABCJKLYZ106111.9121.8In principle, maps of a few genes might be represented by showingthe gene names in order, with their relative positions indicated.Drosophila melanogasterOBCPR0.0 1.030.7 33.7B yellow bodyC white eyeO eosin eyeP vermilion eyeR rudimentary wingM57.6M miniature wingAnd, in fact, the first genetic map ever published was of just thattype. Sturtevant, A.H., 1913, The linear arrangement of six sexlinked factors in Drosophila as shown by their mode of association,Journal of Experimental Zoology, 14:43-59.File: N drive:\jhu\class\1995\mol-bio.ppt 1994, 1995 Robert RobbinsMolecular Biology: 7

insThe aim of modern biology is to interpret the properties of theorganism by the structure of its constituent molecules.Jacob, F. 1973. The Logic of Life. New York: Pantheon Books.Understanding the molecular basis of life had its beginnings withthe advent of biochemistry. Early in the nineteenth century, it wasdiscovered that preparations of fibrous material could be obtainedfrom cell extracts of plants and animals. Mulder concluded in 1838that this material was:without doubt the most important of the known componentsof living matter, and it would appear that without life wouldnot be possible. This substance has been named protein.Later, many wondered whether chemical processes in living systemsobeyed the same laws as did chemistry elsewhere. Complex carbonbased compounds were readily synthesized in cells, but seemedimpossible to construct in the laboratory.By the beginning of the twentieth century, chemists had been able tosynthesize a few organic compounds, and, more importantly, todemonstrate that complex organic reactions could be accomplishedin non-living cellular extracts. These reactions were found to becatalyzed by a class of proteins called enzymes.Early biochemistry, then, was characterized by (1) efforts tounderstand the structure and chemistry of proteins themselves, and(2) efforts to discover, catalog, and understand enzymaticallycatalayzed biochemical reactions.File: N drive:\jhu\class\1995\mol-bio.ppt 1994, 1995 Robert RobbinsMolecular Biology: 8

Genetic FallaciesBefore molecular biology began, biochemists believed that DNA wascomposed of a monotonous rotation of four basic components, thenucleotides adenine, cytosine, guanine, and thymine. Since a repeatingpolymer consisting of four subunits could not encode information, it waswidely held that DNA provided only a structural role in chromosomes andthat genetic information was stored in protein.If the genes are conceived as chemicalsubstances, only one class of compounds needbe given to which they can be reckoned asbelonging, and that is the proteins in the widersense, on account of the inexhaustiblepossibilities for variation which they offer. .Such being the case, the most likely role forthe nucleic acids seems to be that of thestructure-determining supporting substance.T. Caspersson. 1936. Über den chemischen Aufbau derStrukturen des Zellkernes. Acta Med. Skand., 73, Suppl.8, 1-151.At any given time in a particular science, there will be beliefs that are heldso strongly that they are considered beyond challenge, yet they will proveto be wildly wrong. This poses a great challenge for the design ofscientific databases, which must reflect current beliefs in the field, yet berobust in the face of changes in fundamental concepts or practices.File: N drive:\jhu\class\1995\mol-bio.ppt 1994, 1995 Robert RobbinsMolecular Biology: 9

Molecular BiologyPhenotypeClassical ecularBiologyKey Discoveries:1928 Heritable changes can be transmitted frombacterium to bacterium through a chemicalextract (the transforming factor) takenfrom other bacteria.1944 The transforming factor appears to be DNA.1950 The tetranucleotide hypothesis of DNAstructure is overthrown.1953 The structure of DNA is established to be adouble helix.DNA is constructed as a double-stranded molecule, with absolutely noconstraints upon the liner order of subcomponents along each strand, butwith the pairing between strands totally constrained according tocomplementarity rules: A always pairs with T and C always pairs with G.File: N drive:\jhu\class\1995\mol-bio.ppt 1994, 1995 Robert RobbinsMolecular Biology: 10

The Fundamental DogmatRNADNAmRNAProteinrRNAInformation coded in DNA (deoxyribonucleic acid) directs the synthesisof different RNA (ribonucleic acid) molecules. RNA molecules fall intoseveral different categories:rRNA: ribosomal RNA that is required for building ribosomes, whichare structures necessary for protein synthesis.tRNA: transfer RNA that serves to transfer individual amino acidmolecules from the general cytoplasm to their appropriatelocation in a growing polypeptide during protein synthesis.mRNA: messenger RNA that carries the specific instructions for buildinga specific protein.Both rRNA and tRNA are generic groups of molecules in that all types ofrRNA and all types of tRNA are involved in the synthesis of every type ofprotein. However, mRNA is specific in that a different type of mRNA isrequired for every different type of protein.tRNADNAmRNAProteinrRNAThe whole system is recursive, in that certain proteins are required for thesynthesis of RNAs, as well as for the synthesis of DNA itself.File: N drive:\jhu\class\1995\mol-bio.ppt 1994, 1995 Robert RobbinsMolecular Biology: 11

TAC CGC GGA TAG CCTDNA:TranscriptionmRNA:AUG GCG CCU AUC GGATranslationmet ala pro ile glyPolypeptide:DNA directs protein synthesis through a multi-step process. First, DNAis copied to mRNA through the process of transcription. The rulesgoverning transcription are the same as the rules govering the interstrandconstraint in DNA. Then translation produces a polypeptide with anamino-acid sequence that is completely specified by the sequence ofnucleotides in the RNA. A simple code, the same for all living things onthis planet, governs the synthesis of protein from mRNA instructions.P1T1DNA:gene 1TranscriptionPrimary Transcript:Post-transcriptional lational modificationModified Polypeptide:Self-assembly to final proteinSome post-transcriptional processing of the immediate RNA transcript isnecessary to produce a finished RNA, and post-translational processing ofpolypeptides can be needed to produce a final protein.File: N drive:\jhu\class\1995\mol-bio.ppt 1994, 1995 Robert RobbinsMolecular Biology: 12

mRNA to Amino Acid aaspaspglugluglyglyglyglyUCAG5 3 This dictionary gives the sixty four different mRNA codons and the amino acids(or stop signals) for which they code. The 5' nucleotides are given along the lefthand border, the middle nucleotides are given across the top, and the 3'nucleotides are given along the right hand border. The decoded meaning of aparticular codon is given by the entry in the table.For example, the meaning of the codon 5'AUG3' is determined as follows:1. Examine the entries along the left hand side of the table to locate the horizontalblock corresponding to the sixteen codons that have A in the 5' position.2. Examine the entries along the top of the table to locate the vertical blockcorresponding to the sixteen codons that have U in the middle position.3. Find the intersection of these two blocks. This intersection represents the fourcodons that have A in the 5' position and U in the middle position.4. Examine the entries along the right hand side of the table to find the entry forthe one codon that has A in the 5' position, U in the middle position, and G inthe 3' position. The “met” indicates that the decoded meaning of the codon5'AUG3' is methionine. That is, the codon 5'AUG3' codes for the amino acidmethionine.File: N drive:\jhu\class\1995\mol-bio.ppt 1994, 1995 Robert RobbinsMolecular Biology: 13

What is a Gene?neoClassical Sequence Definitions (AB)Gene (cistron) the fundamental unit of genetic function.Gene (muton) the fundamental unit of genetic mutation.Gene (recon) the fundamental unit of genetic recombination.Gene (codon) the fundamental unit of genetic coding.Summary DefinitionsClassical Definition: fundamental unit of heredity,mutation, and recombination (beads on a string).Physiological Definition: fundamental unit of function (onegene, one enzyme).Cistronic Definition: fundamental unit of expression (cistrans test).Sequence Definition: the smallest segment of the genestring consistently associated with the occurrence of aspecific genetic effect.Current Definition: ?File: N drive:\jhu\class\1995\mol-bio.ppt 1994, 1995 Robert RobbinsMolecular Biology: 14

What is a Gene?Current Textbook DefinitionsThe unexpected features of eukaryotic genes have stimulateddiscussion about how a gene, a single unit of hereditaryinformation, should be defined. Several different possibledefinitions are plausible, but no single one is entirelysatisfactory or appropriate for every gene.Singer, M., and Berg, P. 1991. Genes & Genomes. University ScienceBooks, Mill Valley, California.Gene (cistron) is the segment of DNA involved in producing apolypeptide chain; it includes regions preceding andfollowing the coding region (leader and trailer) as well asintervening sequences (introns) between individual codingsegments (exons).Allele is one of several alternative forms of a gene occupying agiven locus on a chromosome.Locus is the position on a chromosome at which the gene for aparticular trait resides; locus may be occupied by any oneof the alleles for the gene.Lewin, Benjamin. 1990. Genes IV. Oxford University Press, New York.File: N drive:\jhu\class\1995\mol-bio.ppt 1994, 1995 Robert RobbinsMolecular Biology: 15

What is a Gene?Current Textbook DefinitionsDNA molecules (chromosomes) should thus be functionallyregarded as linear collections of discrete transcriptional units,each designed for the synthesis of a specific RNA molecule.Whether such “transcriptional units” should now be redefinedas genes, or whether the term gene should be restricted to thesmaller segments that directly code for individual mature rRNAor tRNA molecules or for individual peptide chains is now anopen question.Watson, J. D., Hopkins, N. H., Roberts, J. W., Steitz, J. A., and Weiner, A.M. 1992. Molecular Biology of the Gene. Benjamin/Cummins PublishingCompany: Menlo Park, California. p. 233.For the purposes of this book, we have adopted a moleculardefinition. A eukaryotic gene is a combination of DNAsegments that together constitute an expressible unit, expressionleading to the formation of one or more specific functional geneproducts that may be either RNA molecules or polypeptides.Singer, M., and Berg, P. 1991. Genes & Genomes. University ScienceBooks, Mill Valley, California.File: N drive:\jhu\class\1995\mol-bio.ppt 1994, 1995 Robert RobbinsMolecular Biology: 16

The Simplistic View of a Gene as SequenceTPcoding regionA gene is a transcribed region of DNA, flanked by upstream startregulatory sequences and downstream stop regulatory sequences.100.44T104.01Pcoding region100101102103104kilobasesThe location of a gene can be designated by specifying the basepair location of its beginning and end.File: N drive:\jhu\class\1995\mol-bio.ppt 1994, 1995 Robert RobbinsMolecular Biology: 17

The Simplistic View of a Gene as SequenceT2T1P2P1coding region -- gene2coding region -- gene1DNA may be transcribed in either direction. Therefore, fullyspecifying a gene’s position requires noting its orientation as wellas its start and stop positions.T2T1P2P1coding region -- gene2coding region -- ,9359,373,9409,373,9459,373,95A naive view holds that a genome can be represented as acontinuous linear string of nucleotides, with landmarks identifiedby the chromosome number followed by the offset number of thenucleotide at the beginning and end of the region of interest. Thissimplistic approach ignores the fact that chromosomes may varyin length by tens of millions of nucleotides.File: N drive:\jhu\class\1995\mol-bio.ppt 1994, 1995 Robert RobbinsMolecular Biology: 18

The Human Genome ProjectMaleAt conception, a normal human receives 23 chromosomes from eachparent -- 22 autosomes and one sex chromosome. The mother alwayscontributes 22 autosomes and one X chromosome. If the father alsocontributes an X chromosome, the child will be female. If the fathercontributes a Y chromosome, the child will be male.FemaleFile: N drive:\jhu\class\1995\mol-bio.ppt 1994, 1995 Robert RobbinsMolecular Biology: 19

The human genome is believed to consist of 50,000 to 100,000 genesencoded in 3.3 billion base pairs of DNA, which are packaged into 23chromosomes. The goal of the Human Genome Project (HGP) islearning the specific order of those 3.3 billion base pairs and ofidentifying and locating all of the genes encoded by that DNA.Databases must be developed to hold, manage, and distribute all ofthose findingsThe HGP can be logically divided into two components: (1) obtainingthe sequence, and (2) understanding the sequence, and neither ofthem involves a simple 3.3 gigabyte database with straightforwardcomputational requirements.The Challenge: Consider the DNA sequence of a human genomeas equivalent to 3.3 gigabytes of files on the mass-storage device ofsome computer system of unknown design. Obtaining the sequence isequivalent to obtaining an image of the contents of that mass-storagedevice. Understanding the sequence is equivalent to reverseengineering that unknown computer system (both the hardware and the3.3 gigabytes of software) all the way back to a full set of design andmaintenance specifications.File: N drive:\jhu\class\1995\mol-bio.ppt 1994, 1995 Robert RobbinsMolecular Biology: 20

Getting the SequenceObtaining one full human sequence will be a technical challenge. If the DNAsequence from a single human sperm cell were typed on a continuous ribbonin ten-pitch type, that ribbon could be stretched from San Francisco toChicago to Washington to Houston to Los Angeles, and back to SanFrancisco, with about 60 miles of ribbon left over.The amount of human sequence currently sequenced is equal to less than onethird of that left-over 60-mile fragment. We have a long way to go, andgetting there will be expensive. Computers will play a crucial role in theentire process, from robotics to control experimental equipment to complexanalytical methods for assembling sequence fragments.yearper basepercentcostbudgetyear1995 0.5016,000,00010,774,41110,774,4110.33%1996 0.4025,000,00021,043,77131,818,1820.96%1997 0.3035,000,00039,281,70671,099,8882.15%1998 0.2050,000,00084,175,084155,274,9724.71%1999 0.1575,000,000168,350,168323,625,1409.81%2000 0.10100,000,000336,700,337660,325,47720.01%2001 0.05100,000,000673,400,6731,333,726,15040.42%2002 0.05100,000,000673,400,6732,007,126,82460.82%2003 0.05100,000,000673,400,6732,680,527,49781.23%2004 : N drive:\jhu\class\1995\mol-bio.ppt 1994, 1995 Robert Robbinscumulative completedMolecular Biology: 21

Defective Genes Cause Disease1p36.2-p34RHRh Blood Type17q22-q24GH1pituitary dwarfism11p15.5HBBSickle-cell Anemia17q12-q24BRCA1Breast Cancer(early onset)Xq28F8ChemophiliaMany human diseases are known to associated with specific defects inparticular genes. These defects are equivalent to coding errors in files ona mass storage system.A defective copy of the gene for beta-hemoglobin (HBB) can lead tosickle-cell anemia.File: N drive:\jhu\class\1995\mol-bio.ppt 1994, 1995 Robert RobbinsMolecular Biology: 22

Beta cacagaaaaggaacattacttatgcatctctcThe genomic sequence for the beta-hemoglobin gene is given above. Theletters in bold are the introns that are spliced together after initialtranscription. The upper case letters are the actual coding region thatspecify the amino-acid sequence for beta-hemoglobin. The coding regionis excerpted and given GTACTGTTTTTAAGAACGATTTTGTGFile: N G 1994, 1995 Robert CCTGGACGTGAAAMolecular Biology: 23

Beta ggcttgattttgtcctcatgctgggttggcca atctactccc aggagcaggg agggcaggagcagggcagag ccatctattg cttacatttg cttctgacactcaaacagac accATGGTGC ACCTGACTCC TGAGGAGAAGGGGCAAGGTG AACGTGGATG AAGTTGGTGG TGAGGCCCTGacaagacagg tttaaggaga ccaatagaaa ctgggcatgtggtttctgat aggcactgac tctctctgcc st oneCAGAGGTTCTnucleotideATGCTGTTAT GGGCAACCCT AAGGTGAAGG CTCATGGCAAout of 3,000,000,000 is enoughGTGATGGCCT GGCTCACCTG GACAACCTCA AGGGCACCTTto producea lethalGATCCTGAGAgene, ttttctatggttaagttcatgtcaas one incorrect bit can crashgggtacagtt tagaatggga aacagacgaa tgattgcatcan operatingsystem.gttttagtttcttttatttgctgttcataa caattgtttttctttttttt tcttctccgc aatttttact attatacttaacaaaaggaa atatctctga gatacattaa gtaacttaaacctagtacat tactatttgg aatatatgtg tgcttatttgtttattttct tttattttta attgatacat aatcattatataatgtttta atatgtgtac acatattgac caaatcagggtaaaaaatgc tttcttcttt taatatactt ttttgtttattaatctcttt ctttcagggc aataatgata caatgtatcaaagaataaca gtgataattt ctgggttaag gcaatagcaatctgcatata aattgtaact gatgtaagag gtttcatattagctaccatt ctgcttttat tttatggttg ggataaggcttaggcccttt tgctaatcat gttcatacct cttatcttccGTGCTGGTCT GTGTGCTGGC CCATCACTTT GGCAAAGAATGCCTATCAGA AAGTGGTGGC TGGTGTGGCT AATGCCCTGGcgctttcttg ctgtccaatt tctattaaag gttcctttgtactgggggat attatgaagg gccttgagca tctggattcttttcattgca atgatgtatt taaattattt ctgaatatttaggtcagtgc atttaaaaca taaagaaatg atgagctgtttatatcttaa actccatgaa agaaggtgag gctgcaaccagcccctgatg cctatgcctt attcatccct cagaaaaggagcaggttaaa gttttgctat gctgtatttt acattacttaaatgtctttt cactacccat ttgcttatcc tgcatctctcA change in this nucleic acidfrom an A to T causes glutamicacid to be replaced with valine.This produces the TGGACGTGCACGGTACTGTTTTTAAGAACGATTTTGTGFile: N G 1994, 1995 Robert CCTGGACGTGAAAMolecular Biology: 24

Genomic FallaciesMolecular Genetics:The ultimate . map [will be] thecomplete DNA sequence of the humangenome.Committee on Mapping and Sequencing the Human Genome,1988, Mapping and Sequencing the Human Genome. NationalAcademy Press, Washington, D.C., p. 6.The Ultimate Feature Table:As the Genome Project progresses,mapping and sequencing will converge.With the full human sequence available,it will be possible unambiguously todefine every gene by the base-pairaddress of its functional subunits.File: N drive:\jhu\class\1995\mol-bio.ppt 1994, 1995 Robert RobbinsMolecular Biology: 25

Genome Project as DatabaseWhen the Human Genome Project is finished, manyof the innovative laboratory methods involved in itssuccessful conclusion will begin to fade frommemory. What will remain, as the project's enduringcontribution, is a vast amount of computerizedknowledge. Seen in this light, the Human GenomeProject is nothing but the effort to create the mostimportant database ever attempted—the databasecontaining instructions for creating life.File: N drive:\jhu\class\1995\mol-bio.ppt 1994, 1995 Robert RobbinsMolecular Biology: 26

of molecular biology is that hereditary information is passed between generations in a form that is truly, not metaphorically, digital. Understanding how that digital code directs the creation of life is the goal of molecular biology. Origins of Molecular Biology