Biopython Tutorial And Cookbook

Transcription

Biopython Tutorial and CookbookJeff Chang, Brad Chapman, Iddo Friedberg, Thomas Hamelryck,Michiel de Hoon, Peter Cock, Tiago Antao, Eric Talevich, Bartek WilczyńskiLast Update – June 2, 2021 (Biopython 1.79)

Contents1 Introduction1.1 What is Biopython? . . . . . . . . . . . .1.2 What can I find in the Biopython package1.3 Installing Biopython . . . . . . . . . . . .1.4 Frequently Asked Questions (FAQ) . . . .99910102 Quick Start – What can you do with Biopython?2.1 General overview of what Biopython provides . . . . . . .2.2 Working with sequences . . . . . . . . . . . . . . . . . . .2.3 A usage example . . . . . . . . . . . . . . . . . . . . . . .2.4 Parsing sequence file formats . . . . . . . . . . . . . . . .2.4.1 Simple FASTA parsing example . . . . . . . . . . .2.4.2 Simple GenBank parsing example . . . . . . . . .2.4.3 I love parsing – please don’t stop talking about it!2.5 Connecting with biological databases . . . . . . . . . . . .2.6 What to do next . . . . . . . . . . . . . . . . . . . . . . .141414151516161717183 Sequence objects3.1 Sequences act like strings . . . . . . . . . . . . .3.2 Slicing a sequence . . . . . . . . . . . . . . . . .3.3 Turning Seq objects into strings . . . . . . . . . .3.4 Concatenating or adding sequences . . . . . . . .3.5 Changing case . . . . . . . . . . . . . . . . . . . .3.6 Nucleotide sequences and (reverse) complements3.7 Transcription . . . . . . . . . . . . . . . . . . . .3.8 Translation . . . . . . . . . . . . . . . . . . . . .3.9 Translation Tables . . . . . . . . . . . . . . . . .3.10 Comparing Seq objects . . . . . . . . . . . . . . .3.11 Sequences with unknown sequence contents . . .3.12 MutableSeq objects . . . . . . . . . . . . . . . . .3.13 UnknownSeq objects . . . . . . . . . . . . . . . .3.14 Working with strings directly . . . . . . . . . . .1919202121222223242627282829304 Sequence annotation objects4.1 The SeqRecord object . . . . . . . . . . . . .4.2 Creating a SeqRecord . . . . . . . . . . . . .4.2.1 SeqRecord objects from scratch . . . .4.2.2 SeqRecord objects from FASTA files .4.2.3 SeqRecord objects from GenBank files4.3 Feature, location and position objects . . . .31313232333435.1.

.3536393940404144455 Sequence Input/Output5.1 Parsing or Reading Sequences . . . . . . . . . . . . . . . . . . . . .5.1.1 Reading Sequence Files . . . . . . . . . . . . . . . . . . . .5.1.2 Iterating over the records in a sequence file . . . . . . . . .5.1.3 Getting a list of the records in a sequence file . . . . . . . .5.1.4 Extracting data . . . . . . . . . . . . . . . . . . . . . . . . .5.1.5 Modifying data . . . . . . . . . . . . . . . . . . . . . . . . .5.2 Parsing sequences from compressed files . . . . . . . . . . . . . . .5.3 Parsing sequences from the net . . . . . . . . . . . . . . . . . . . .5.3.1 Parsing GenBank records from the net . . . . . . . . . . . .5.3.2 Parsing SwissProt sequences from the net . . . . . . . . . .5.4 Sequence files as Dictionaries . . . . . . . . . . . . . . . . . . . . .5.4.1 Sequence files as Dictionaries – In memory . . . . . . . . .5.4.2 Sequence files as Dictionaries – Indexed files . . . . . . . . .5.4.3 Sequence files as Dictionaries – Database indexed files . . .5.4.4 Indexing compressed files . . . . . . . . . . . . . . . . . . .5.4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.5 Writing Sequence Files . . . . . . . . . . . . . . . . . . . . . . . . .5.5.1 Round trips . . . . . . . . . . . . . . . . . . . . . . . . . . .5.5.2 Converting between sequence file formats . . . . . . . . . .5.5.3 Converting a file of sequences to their reverse complements5.5.4 Getting your SeqRecord objects as formatted strings . . . .5.6 Low level FASTA and FASTQ parsers . . . . . . . . . . . . . . . .47474748495052525354555556596061626364656666676 Multiple Sequence Alignment objects6.1 Parsing or Reading Sequence Alignments . . . . . . . . . .6.1.1 Single Alignments . . . . . . . . . . . . . . . . . . .6.1.2 Multiple Alignments . . . . . . . . . . . . . . . . . .6.1.3 Ambiguous Alignments . . . . . . . . . . . . . . . .6.2 Writing Alignments . . . . . . . . . . . . . . . . . . . . . . .6.2.1 Converting between sequence alignment file formats6.2.2 Getting your alignment objects as formatted strings6.3 Manipulating Alignments . . . . . . . . . . . . . . . . . . .6.3.1 Slicing alignments . . . . . . . . . . . . . . . . . . .6.3.2 Alignments as arrays . . . . . . . . . . . . . . . . . .6.4 Getting information on the alignment . . . . . . . . . . . .6.4.1 Substitutions . . . . . . . . . . . . . . . . . . . . . .6.5 Alignment Tools . . . . . . . . . . . . . . . . . . . . . . . .6.5.1 ClustalW . . . . . . . . . . . . . . . . . . . . . . . .6.5.2 MUSCLE . . . . . . . . . . . . . . . . . . . . . . . .6.5.3 MUSCLE using stdout . . . . . . . . . . . . . . . . .6.5.4 MUSCLE using stdin and stdout . . . . . . . . . . .84.94.3.1 SeqFeature objects . . . . . . . . . . . . . .4.3.2 Positions and locations . . . . . . . . . . . .4.3.3 Sequence described by a feature or locationComparison . . . . . . . . . . . . . . . . . . . . . .References . . . . . . . . . . . . . . . . . . . . . . .The format method . . . . . . . . . . . . . . . . . .Slicing a SeqRecord . . . . . . . . . . . . . . . . .Adding SeqRecord objects . . . . . . . . . . . . . .Reverse-complementing SeqRecord objects . . . . .2.

6.5.5 EMBOSS needle and waterPairwise sequence alignment . . . .6.6.1 pairwise2 . . . . . . . . . .6.6.2 PairwiseAligner . . . . . . .Substitution matrices . . . . . . . . 91. 92. 93. 95. 1177 BLAST7.1 Running BLAST over the Internet7.2 Running BLAST locally . . . . . .7.2.1 Introduction . . . . . . . .7.2.2 Standalone NCBI BLAST 7.2.3 Other versions of BLAST .7.3 Parsing BLAST output . . . . . .7.4 The BLAST record class . . . . . .7.5 Dealing with PSI-BLAST . . . . .7.6 Dealing with RPS-BLAST . . . . .125125127127127128128130131131. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .indexing. . . . . .1341341351401421461471481481499 Accessing NCBI’s Entrez databases9.1 Entrez Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.2 EInfo: Obtaining information about the Entrez databases . . . . . . .9.3 ESearch: Searching the Entrez databases . . . . . . . . . . . . . . . . .9.4 EPost: Uploading a list of identifiers . . . . . . . . . . . . . . . . . . .9.5 ESummary: Retrieving summaries from primary IDs . . . . . . . . . .9.6 EFetch: Downloading full records from Entrez . . . . . . . . . . . . . .9.7 ELink: Searching for related items in NCBI Entrez . . . . . . . . . . .9.8 EGQuery: Global Query - counts for search terms . . . . . . . . . . .9.9 ESpell: Obtaining spelling suggestions . . . . . . . . . . . . . . . . . .9.10 Parsing huge Entrez XML files . . . . . . . . . . . . . . . . . . . . . .9.11 HTML escape characters . . . . . . . . . . . . . . . . . . . . . . . . . .9.12 Handling errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.13 Specialized parsers . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.13.1 Parsing Medline records . . . . . . . . . . . . . . . . . . . . . .9.13.2 Parsing GEO records . . . . . . . . . . . . . . . . . . . . . . . .9.13.3 Parsing UniGene records . . . . . . . . . . . . . . . . . . . . . .9.14 Using a proxy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.15 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.15.1 PubMed and Medline . . . . . . . . . . . . . . . . . . . . . . .9.15.2 Searching, downloading, and parsing Entrez Nucleotide records9.15.3 Searching, downloading, and parsing GenBank records . . . . .9.15.4 Finding the lineage of an organism . . . . . . . . . . . . . . . .9.16 Using the history and WebEnv . . . . . . . . . . . . . . . . . . . . . 681701701701721731751766.66.78 BLAST and other sequence search tools8.1 The SearchIO object model . . . . . . . .8.1.1 QueryResult . . . . . . . . . . . .8.1.2 Hit . . . . . . . . . . . . . . . . . .8.1.3 HSP . . . . . . . . . . . . . . . . .8.1.4 HSPFragment . . . . . . . . . . . .8.2 A note about standards and conventions .8.3 Reading search output files . . . . . . . .8.4 Dealing with large search output files with8.5 Writing and converting search output files3

9.16.1 Searching for and downloading sequences using the history . . . . . . . . . . . . . . . 1769.16.2 Searching for and downloading abstracts using the history . . . . . . . . . . . . . . . . 1779.16.3 Searching for citations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17810 Swiss-Prot and ExPASy10.1 Parsing Swiss-Prot files . . . . . . . . . . . . . . . . . . . . .10.1.1 Parsing Swiss-Prot records . . . . . . . . . . . . . . .10.1.2 Parsing the Swiss-Prot keyword and category list . . .10.2 Parsing Prosite records . . . . . . . . . . . . . . . . . . . . . .10.3 Parsing Prosite documentation records . . . . . . . . . . . . .10.4 Parsing Enzyme records . . . . . . . . . . . . . . . . . . . . .10.5 Accessing the ExPASy server . . . . . . . . . . . . . . . . . .10.5.1 Retrieving a Swiss-Prot record . . . . . . . . . . . . .10.5.2 Searching Swiss-Prot . . . . . . . . . . . . . . . . . . .10.5.3 Retrieving Prosite and Prosite documentation records10.6 Scanning the Prosite database . . . . . . . . . . . . . . . . . .17917917918118218318418518518618618711 Going 3D: The PDB module11.1 Reading and writing crystal structure files . . . . . . . . . . . . . . . . . . .11.1.1 Reading an mmCIF file . . . . . . . . . . . . . . . . . . . . . . . . .11.1.2 Reading files in the MMTF format . . . . . . . . . . . . . . . . . . .11.1.3 Reading a PDB file . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.1.4 Reading a PQR file . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.1.5 Reading files in the PDB XML format . . . . . . . . . . . . . . . . .11.1.6 Writing mmCIF files . . . . . . . . . . . . . . . . . . . . . . . . . . .11.1.7 Writing PDB files . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.1.8 Writing PQR files . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.1.9 Writing MMTF files . . . . . . . . . . . . . . . . . . . . . . . . . . .11.2 Structure representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.2.1 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.2.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.2.3 Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.2.4 Residue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.2.5 Atom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.2.6 Extracting a specific Atom/Residue/Chain/Model from a Structure .11.3 Disorder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.3.1 General approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.3.2 Disordered atoms . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.3.3 Disordered residues . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.4 Hetero residues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.4.1 Associated problems . . . . . . . . . . . . . . . . . . . . . . . . . . .11.4.2 Water residues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.4.3 Other hetero residues . . . . . . . . . . . . . . . . . . . . . . . . . .11.5 Navigating through a Structure object . . . . . . . . . . . . . . . . . . . . .11.6 Analyzing structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.6.1 Measuring distances . . . . . . . . . . . . . . . . . . . . . . . . . . .11.6.2 Measuring angles . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.6.3 Measuring torsion angles . . . . . . . . . . . . . . . . . . . . . . . . .11.6.4 Internal coordinates for standard residues . . . . . . . . . . . . . . .11.6.5 Determining atom-atom contacts . . . . . . . . . . . . . . . . . . . .11.6.6 Superimposing two structures . . . . . . . . . . . . . . . . . . . . . .11.6.7 Mapping the residues of two related structures onto each other . . 2042054.

11.6.8 Calculating the Half Sphere Exposure . . . . . . . .11.6.9 Determining the secondary structure . . . . . . . . .11.6.10 Calculating the residue depth . . . . . . . . . . . . .11.7 Common problems in PDB files . . . . . . . . . . . . . . . .11.7.1 Examples . . . . . . . . . . . . . . . . . . . . . . . .11.7.2 Automatic correction . . . . . . . . . . . . . . . . . .11.7.3 Fatal errors . . . . . . . . . . . . . . . . . . . . . . .11.8 Accessing the Protein Data Bank . . . . . . . . . . . . . . .11.8.1 Downloading structures from the Protein Data Bank11.8.2 Downloading the entire PDB . . . . . . . . . . . . .11.8.3 Keeping a local copy of the PDB up to date . . . . .11.9 General questions . . . . . . . . . . . . . . . . . . . . . . . .11.9.1 How well tested is Bio.PDB? . . . . . . . . . . . . .11.9.2 How fast is it? . . . . . . . . . . . . . . . . . . . . .11.9.3 Is there support for molecular graphics? . . . . . . .11.9.4 Who’s using Bio.PDB? . . . . . . . . . . . . . . . . 2 Bio.PopGen: Population genetics21112.1 GenePop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21113 Phylogenetics with Bio.Phylo13.1 Demo: What’s in a Tree? . . . . . . .13.1.1 Coloring branches within a tree13.2 I/O functions . . . . . . . . . . . . . .13.3 View and export trees . . . . . . . . .13.4 Using Tree and Clade objects . . . . .13.4.1 Search and traversal methods .13.4.2 Information methods . . . . . .13.4.3 Modification methods . . . . .13.4.4 Features of PhyloXML trees . .13.5 Running external applications . . . . .13.6 PAML integration . . . . . . . . . . .13.7 Future plans . . . . . . . . . . . . . . .14 Sequence motif analysis using Bio.motifs14.1 Motif objects . . . . . . . . . . . . . . . . . . . . . .14.1.1 Creating a motif from instances . . . . . . . .14.1.2 Creating a sequence logo . . . . . . . . . . . .14.2 Reading motifs . . . . . . . . . . . . . . . . . . . . .14.2.1 JASPAR . . . . . . . . . . . . . . . . . . . .14.2.2 MEME . . . . . . . . . . . . . . . . . . . . .14.2.3 TRANSFAC . . . . . . . . . . . . . . . . . .14.3 Writing motifs . . . . . . . . . . . . . . . . . . . . .14.4 Position-Weight Matrices . . . . . . . . . . . . . . .14.5 Position-Specific Scoring Matrices . . . . . . . . . . .14.6 Searching for instances . . . . . . . . . . . . . . . . .14.6.1 Searching for exact matches . . . . . . . . . .14.6.2 Searching for matches using the PSSM score14.6.3 Selecting a score threshold . . . . . . . . . . .14.7 Each motif object has an associated Position-Specific14.8 Comparing motifs . . . . . . . . . . . . . . . . . . .14.9 De novo motif finding . . . . . . . . . . . . . . . . .5.213213214217218218219221221222222223223. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Scoring Matrix. . . . . . . . . . . . . . . . 49250.

14.9.1 MEME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25014.10Useful links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25115 Cluster analysis15.1 Distance functions . . . . . . . . . .15.2 Calculating cluster properties . . . .15.3 Partitioning algorithms . . . . . . .15.4 Hierarchical clustering . . . . . . . .15.5 Self-Organizing Maps . . . . . . . . .15.6 Principal Component Analysis . . .15.7 Handling Cluster/TreeView-type files15.8 Example calculation . . . . . . . . .25225325725826126526726827316 Supervised learning methods16.1 The Logistic Regression Model . . . . . . . . . . . . . . . . .16.1.1 Background and Purpose . . . . . . . . . . . . . . . .16.1.2 Training the logistic regression model . . . . . . . . .16.1.3 Using the logistic regression model for classification .16.1.4 Logistic Regression, Linear Discriminant Analysis, and16.2 k-Nearest Neighbors . . . . . . . . . . . . . . . . . . . . . . .16.2.1 Background and purpose . . . . . . . . . . . . . . . .16.2.2 Initializing a k-nearest neighbors model . . . . . . . .16.2.3 Using a k-nearest neighbors model for classification . .16.3 Naı̈ve Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . .16.4 Maximum Entropy . . . . . . . . . . . . . . . . . . . . . . . .16.5 Markov Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Vector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Machines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .27527527527627928028128128128128328328317 Graphics including GenomeDiagram17.1 GenomeDiagram . . . . . . . . . . . . . . . . . . .17.1.1 Introduction . . . . . . . . . . . . . . . . .17.1.2 Diagrams, tracks, feature-sets and features17.1.3 A top down example . . . . . . . . . . . . .17.1.4 A bottom up example . . . . . . . . . . . .17.1.5 Features without a SeqFeature . . . . . . .17.1.6 Feature captions . . . . . . . . . . . . . . .17.1.7 Feature sigils . . . . . . . . . . . . . . . . .17.1.8 Arrow sigils . . . . . . . . . . . . . . . . . .17.1.9 A nice example . . . . . . . . . . . . . . . .17.1.10 Multiple tracks . . . . . . . . . . . . . . . .17.1.11 Cross-Links between tracks . . . . . . . . .17.1.12 Further options . . . . . . . . . . . . . . . .17.1.13 Converting old code . . . . . . . . . . . . .17.2 Chromosomes . . . . . . . . . . . . . . . . . . . . .17.2.1 Simple Chromosomes . . . . . . . . . . . .17.2.2 Annotated Chromosomes . . . . . . . . . 06308.18 KEGG31018.1 Parsing KEGG records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31018.2 Querying the KEGG API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3106

19 Bio.phenotype: analyse phenotypic data19.1 Phenotype Microarrays . . . . . . . . . . . . . . .19.1.1 Parsing Phenotype Microarray data . . .19.1.2 Manipulating Phenotype Microarray data19.1.3 Writing Phenotype Microarray data . . .31331331331431720 Cookbook – Cool things to do with it20.1 Working with sequence files . . . . . . . . . . . . . . . . . . . . . . . . . . . .20.1.1 Filtering a sequence file . . . . . . . . . . . . . . . . . . . . . . . . . .20.1.2 Producing randomised genomes . . . . . . . . . . . . . . . . . . . . . .20.1.3 Translating a FASTA file of CDS entries . . . . . . . . . . . . . . . . .20.1.4 Making the sequences in a FASTA file upper case . . . . . . . . . . . .20.1.5 Sorting a sequence file . . . . . . . . . . . . . . . . . . . . . . . . . . .20.1.6 Simple quality filtering for FASTQ files . . . . . . . . . . . . . . . . .20.1.7 Trimming off primer sequences . . . . . . . . . . . . . . . . . . . . . .20.1.8 Trimming off adaptor sequences . . . . . . . . . . . . . . . . . . . . . .20.1.9 Converting FASTQ files . . . . . . . . . . . . . . . . . . . . . . . . . .20.1.10 Converting FASTA and QUAL files into FASTQ files . . . . . . . . . .20.1.11 Indexing a FASTQ file . . . . . . . . . . . . . . . . . . . . . . . . . . .20.1.12 Converting SFF files . . . . . . . . . . . . . . . . . . . . . . . . . . . .20.1.13 Identifying open reading frames . . . . . . . . . . . . . . . . . . . . . .20.2 Sequence parsing plus simple plots . . . . . . . . . . . . . . . . . . . . . . . .20.2.1 Histogram of sequence lengths . . . . . . . . . . . . . . . . . . . . . .20.2.2 Plot of sequence GC% . . . . . . . . . . . . . . . . . . . . . . . . . . .20.2.3 Nucleotide dot plots . . . . . . . . . . . . . . . . . . . . . . . . . . . .20.2.4 Plotting the quality scores of sequencing read data . . . . . . . . . . .20.3 Dealing with alignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20.3.1 Calculating summary information . . . . . . . . . . . . . . . . . . . .20.3.2 Calculating a quick consensus sequence . . . . . . . . . . . . . . . . .20.3.3 Position Specific Score Matrices . . . . . . . . . . . . . . . . . . . . . .20.3.4 Information Content . . . . . . . . . . . . . . . . . . . . . . . . . . . .20.4 Substitution Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20.4.1 Using common substitution matrices . . . . . . . . . . . . . . . . . . .20.4.2 Calculating a substitution matrix from a multiple sequence alignment20.5 BioSQL – storing sequences in a relational database . . . . . . . . . . . . . 3333433533833934034034034234334334334521 The Biopython testing framework21.1 Running the tests . . . . . . . . . . .21.1.1 Running the tests using Tox .21.2 Writing tests . . . . . . . . . . . . .21.2.1 Writing a test using unittest21.3 Writing doctests . . . . . . . . . . .21.4 Writing doctests in the Tutorial . . .34634634734734835035122 Advanced22.1 Parser Design . . . . .22.2 Substitution Matrices22.2.1 SubsMat . . . .22.2.2 FreqTable . . .353353353353356.7.

23 Where to go from here – contributing to Biopython23.1 Bug Reports Feature Requests . . . . . . . . . . . .23.2 Mailing lists and helping newcomers . . . . . . . . . .23.3 Contributing Documentation . . . . . . . . . . . . . .23.4 Contributing cookbook examples . . . . . . . . . . . .23.5 Maintaining a distribution for a platform . . . . . . .23.6 Contributing Unit Tests . . . . . . . . . . . . . . . . .23.7 Contributing Code . . . . . . . . . . . . . . . . . . . .35735735735735735735835824 Appendix: Useful stuff about Python36024.1 What the heck is a handle? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36024.1.1 Creating a handle from a string . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3618

Chapter 1Introduction1.1What is Biopython?The Biopython Project is an international association of developers of freely available Python (https://www.python.org) tools for computational molecular biology. Python is an object oriented, interpreted,flexible language that is becoming increasingly popular for scientific computing. Python is easy to learn, hasa very clear syntax and can easily be extended with modules written in C, C or FORTRAN.The Biopython web site (http://www.biopython.org) provides an online resource for modules, scripts,and web links for developers of Python-based software for bioinformatics use and research. Basically, thegoal of Biopython is to make it as easy as possible to use Python for bioinformatics by creating high-quality,reusable modules and classes. Biopython features include parsers for various Bioinformatics file formats(BLAST, Clustalw, FASTA, Genbank,.), access to online services (NCBI, Expasy,.), interfaces to commonand not-so-common programs (Clustalw, DSSP, MSMS.), a standard sequence class, various clusteringmodules, a KD tree data structure etc. and even documentation.Basically, we just like to program in Python and want to make it as easy as possible to use Python forbioinformatics by creating high-quality, reusable modules and scripts.1.2What can I find in the Biopython packageThe main Biopython releases have lots of functionality, including: The ability to parse bioinformatics files into Python utilizable data structures, including support forthe following formats:– Blast output – both from standalone and WWW Blast– Clustalw– FASTA– GenBank– PubMed and Medline– ExPASy files, like Enzyme and Prosite– SCOP, including ‘dom’ and ‘lin’ files– UniGene– SwissProt Files in the supported formats can be iterated over record by record or indexed and accessed via aDictionary interface.9

Code to deal with popular on-line bioinformatics destinations such as:– NCBI – Blast, Entrez and PubMed services– ExPASy – Swiss-Prot and Prosite entri

Biopython Tutorial and Cookbook Je Chang, Brad Chapman, Iddo Friedberg, Thomas Hamelryck, Michiel de Hoon, Peter Cock, Tiago Antao, Eric Talevich, Bart