Python Course In Bioinformatics

Transcription

OutlineGeneral IntroductionBasic Types in PythonProgrammingExercisesPython course in BioinformaticsXiaohui XieMarch 31, 2009Xiaohui XiePython course in Bioinformatics

OutlineGeneral IntroductionBasic Types in PythonProgrammingExercisesGeneral IntroductionBasic Types in PythonProgrammingExercisesXiaohui XiePython course in Bioinformatics

OutlineGeneral IntroductionBasic Types in PythonProgrammingExercisesWhy Python?IScripting language, raplid applicationsIMinimalistic syntaxIPowerfulIFlexiablel data structureIWidely used in Bioinformatics, and many other domainsXiaohui XiePython course in Bioinformatics

OutlineGeneral IntroductionBasic Types in PythonProgrammingExercisesWhere to get Python and learn more?IMain source of information: http://docs.python.org/ITutorial: on: http://biopython.org/wiki/Main PageXiaohui XiePython course in Bioinformatics

OutlineGeneral IntroductionBasic Types in PythonProgrammingExercisesInvoking PythonIITo start: type python in command lineIt will look likePython 2.5.2 (r252:60911, Mar 25 2009, 00:12:33)[GCC 4.1.2 (Gentoo 4.1.2 p1.0.2)] on linux2Type "help", "copyright", "credits" or "license" for more information. IYou can now type commands in the line denoted by ITo leave: type end-of-file character ctrl-D on Unix, ctrl-zon WindowsIThis is called interactive modeXiaohui XiePython course in Bioinformatics

OutlineGeneral IntroductionBasic Types in PythonProgrammingExercisesAppetizer ExampleIITask: Print all numbers in a given fileFile: numbers.txt2.13.24.3ICode: print.py####Note: the code lines begin in the first column of the file. InPython code indentation *is* syntactically relevant. Thus, thehash # (which is a comment symbol, everything past a hash isignored on current line) marks the first column of the codedata open("numbers.txt", "r")for d in data:print ddata.close()Xiaohui XiePython course in Bioinformatics

OutlineGeneral IntroductionBasic Types in PythonProgrammingExercisesAppetizer Example cont’dIITask: Print the sum of all the data in the fileCode: sum.pydata open("numbers.txt", "r")s 0for d in data:s s float(d)print sdata.close()Xiaohui XiePython course in Bioinformatics

OutlineGeneral IntroductionBasic Types in PythonProgrammingExercisesInterative ModeIprompt allows to enter commandIcommand is ended by newlineIvariables need not be initialized or declaredIa colon “:” opens a blockI. prompt denotes that block is expectedIno prompt means python outputIa block is indentedIby ending indentation, block is endedXiaohui XiePython course in Bioinformatics

OutlineGeneral IntroductionBasic Types in PythonProgrammingExercisesDifferences to Java or CIcan be used interatively. This makes it much easier to testprograms and to debugIno declaration of variablesIno brackets denote block, just indentation (Emacs supportsthe style)Ia comment begins with a “#”. Everything after that isignored.Xiaohui XiePython course in Bioinformatics

OutlineGeneral IntroductionBasic Types in PythonProgrammingExercisesNumbersIExample 4 .4 4 5 .2 -32 2# This is a comment2 22 2# and a comment on the same line as code(50-5*6)/4# Integer division returns the floor:7/37/-3Xiaohui XiePython course in Bioinformatics

OutlineGeneral IntroductionBasic Types in PythonProgrammingExercisesNumbers cont’dIExample width 20 height 5*9 width * height900 # Variables must be defined (assigned a value) before they can be # used, or an error will occur: # try to access an undefined variable. nTraceback (most recent call last):File " stdin ", line 1, in module NameError: name ’n’ is not definedXiaohui XiePython course in Bioinformatics

OutlineGeneral IntroductionBasic Types in PythonProgrammingExercisesStringsIIStrings can be enclosed in single quotes or double quotesExample ’spam eggs’’spam eggs’ ’doesn\’t’"doesn’t" "doesn’t""doesn’t" ’"Yes," he said.’’"Yes," he said.’ "\"Yes,\" he said."’"Yes," he said.’ ’"Isn\’t," she said.’’"Isn\’t," she said.’Xiaohui XiePython course in Bioinformatics

OutlineGeneral IntroductionBasic Types in PythonProgrammingExercisesStrings cont’dIIStrings can be surrounded in a pair of matching triple-quotes:""" or ’’’. End of lines do not need to be escaped whenusing triple-quotes, but they will be included in the string.Exampleprint """Usage: thingy [OPTIONS]-h-H hostname"""Xiaohui XieDisplay this usage messageHostname to connect toPython course in Bioinformatics

OutlineGeneral IntroductionBasic Types in PythonProgrammingExercisesStrings cont’dIIStrings can be concatenated (glued together) with the operator, and repeated with *:Example word ’Help’ ’A’ word’HelpA’ ’ ’ word*5 ’ ’’ HelpAHelpAHelpAHelpAHelpA ’ ’str’ ’ing’’string’ ’str’.strip() ’ing’’string’##Xiaohui Xie - -This is okThis is okPython course in Bioinformatics

OutlineGeneral IntroductionBasic Types in PythonProgrammingExercisesStrings cont’dIIIIStrings can be subscripted (indexed); like in C, the firstcharacter of a string has subscript (index) 0.There is no separate character type; a character is simply astring of size one.Substrings can be specified with the slice notation: twoindices separated by a colon.Example word ’Help’ ’A’ word[4]’A’ word[0:2]’He’ word[:2]# The first two characters’He’ word[2:]# Everything except the first two characters’lpA’Xiaohui XiePython course in Bioinformatics

OutlineGeneral IntroductionBasic Types in PythonProgrammingExercisesStrings cont’dIIUnlike a C string, Python strings cannot be changed.Assigning to an indexed position in the string results in anerror:Example word[0] ’x’Traceback (most recent call last):File " stdin ", line 1, in ?TypeError: object doesn’t support item assignment ’x’ word[1:]’xelpA’ ’Splat’ word[4]’SplatA’Xiaohui XiePython course in Bioinformatics

OutlineGeneral IntroductionBasic Types in PythonProgrammingExercisesStrings cont’dIExample from string import * dna ’gcatgacgttattacgactctg’ len(dna)22 ’n’ in dnaFalse count(dna,’a’)5 replace(dna, ’a’, ’A’)’gcAtgAcgttAttAcgActctg’IExercise: Calculate GC percent of dnaXiaohui XiePython course in Bioinformatics

OutlineGeneral IntroductionBasic Types in PythonProgrammingExercisesStrings cont’dISolution: Calculate GC percent gc (count(dna, ’c’) count(dna, ’g’)) / float(len(dna)) * 100 "%.2f" % gc’64.08’Xiaohui XiePython course in Bioinformatics

OutlineGeneral IntroductionBasic Types in PythonProgrammingExercisesStrings cont’dIExercise: Calculate the complement of DNAA - TC - GXiaohui XiePython course in Bioinformatics

OutlineGeneral IntroductionBasic Types in PythonProgrammingExercisesListsIIA list of comma-separated values (items) between squarebrackets.List items need not all have the same type (compund datatypes) a [’spam’, ’eggs’, 100, 1234] a[0]’spam’ a[3]1234 a[-2]100 a[1:-1][’eggs’, 100] a[:2] [’bacon’, 2*2][’spam’, ’eggs’, ’bacon’, 4] 3*a[:3] [’Boo!’][’spam’, ’eggs’, 100, ’spam’, ’eggs’, 100, ’spam’, ’eggs’, 100, ’Boo!’]Xiaohui XiePython course in Bioinformatics

OutlineGeneral IntroductionBasic Types in PythonProgrammingExercisesLists cont’dIIIUnlike strings, which are immutable, it is possible to changeindividual elements of a listAssignment to slices is also possible, and this can even changethe size of the list or clear it entirelyExample a[’spam’, ’eggs’, 100, 1234] a[2] a[2] 23 a[0:2] [1, 12] # Replace some items: a[0:2] [] # Remove some: a[123, 1234] a[1:1] [’bletch’, ’xyzzy’] # Insert some: a[123, ’bletch’, ’xyzzy’, 1234]Xiaohui XiePython course in Bioinformatics

OutlineGeneral IntroductionBasic Types in PythonProgrammingExercisesLists cont’dIFunctions returning a list range(3)[0, 1, 2] range(10,20,2)[10, 12, 14, 16, 18] range(5,2,-1)[5, 4, 3] aas "ALA TYR TRP SER GLY".split() aas[’ALA’, ’TYR’, ’TRP’, ’SER’, ’GLY’] " ".join(aas)’ALA TYR TRP SER GLY’ l list(’atgatgcgcccacgtacga’)[’a’, ’t’, ’g’, ’a’, ’t’, ’g’, ’c’, ’g’, ’c’, ’c’, ’c’, ’a’,’c’, ’g’, ’t’, ’a’, ’c’, ’g’, ’a’]Xiaohui XiePython course in Bioinformatics

OutlineGeneral IntroductionBasic Types in PythonProgrammingExercisesDictionariesIIIIIA dictionary is an unordered set of key: value pairs, with therequirement that the keys are uniqueA pair of braces creates an empty dictionary: .Placing a comma-separated list of key:value pairs within thebraces adds initial key:value pairs to the dictionaryThe main operations on a dictionary are storing a value withsome key and extracting the value given the keyExample tel {’jack’: 4098, ’sape’: 4139} tel[’guido’] 4127 tel{’sape’: 4139, ’guido’: 4127, ’jack’: 4098} tel[’jack’]4098Xiaohui XiePython course in Bioinformatics

OutlineGeneral IntroductionBasic Types in PythonProgrammingExercisesDictionaries cont’dIExample tel {’jack’: 4098, ’sape’: 4139, ’guido’ 4127} del tel[’sape’] tel[’irv’] 4127 tel{’guido’: 4127, ’irv’: 4127, ’jack’: 4098} tel.keys()[’guido’, ’irv’, ’jack’] ’guido’ in telTrueXiaohui XiePython course in Bioinformatics

OutlineGeneral IntroductionBasic Types in PythonProgrammingExercisesProgrammingIExamplea, b 3, 4if a b:print a belse:print a - bXiaohui XiePython course in Bioinformatics

OutlineGeneral IntroductionBasic Types in PythonProgrammingExercisesProgramming cont’dIExample . .112358# Fibonacci series:# the sum of two elements defines the nexta, b 0, 1while b 10:print ba, b b, a bXiaohui XiePython course in Bioinformatics

OutlineGeneral IntroductionBasic Types in PythonProgrammingExercisesProgramming featuresImultiple assignment: rhs evaluated before anything on theleft, and (in rhs) from left to rightIwhile loop executes as long as condition is True (non-zero,not the empty string, not None)Iblock indentation must be the same for each line of blockIneed empty line in interactive mode to indicate end of block(not required in edited code)Iuse of printXiaohui XiePython course in Bioinformatics

OutlineGeneral IntroductionBasic Types in PythonProgrammingExercisesPrintingIExample i 256*256 print ’The value of i is’, iThe value of i is 65536Xiaohui XiePython course in Bioinformatics

OutlineGeneral IntroductionBasic Types in PythonProgrammingExercisesFlow controlIExamplex 35if x 0:x 0printelif x printelif x printelse:print’Negative changed to zero’0:’Zero’1:’Single’’More’Xiaohui XiePython course in Bioinformatics

OutlineGeneral IntroductionBasic Types in PythonProgrammingExercisesIterationIIPython for iterates over sequence (string, list, generatedsequence)Examplea [’cat’, ’window’, ’defenestrate’]for x in a:print x, len(x)Xiaohui XiePython course in Bioinformatics

OutlineGeneral IntroductionBasic Types in PythonProgrammingExercisesIterationIIPython for iterates over sequence (string, list, generatedsequence)Examplea [’cat’, ’window’, ’defenestrate’]for x in a:print x, len(x)Xiaohui XiePython course in Bioinformatics

OutlineGeneral IntroductionBasic Types in PythonProgrammingExercisesDefiniting functionsIExampledef fib(n):# write Fibonacci series up to n"""Print a Fibonacci series up to n."""a, b 0, 1while b n:print b,a, b b, a b# Now call the function we just defined:fib(2000)# will return:# 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597Xiaohui XiePython course in Bioinformatics

OutlineGeneral IntroductionBasic Types in PythonProgrammingExercisesReverse Complement of DNAIIExcercise: Find the reverse complement of a DNA sequenceExample5’ - ACCGGTTAATT3’ - TGGCCAATTAA3’ :5’ :forward strandreverse strandSo the reverse complement of ACCGGTTAATT is AATTAACCGGAXiaohui XiePython course in Bioinformatics

OutlineGeneral IntroductionBasic Types in PythonProgrammingExercisesReverse Complement of DNAISolution: Find the reverse complement of a DNA sequencefrom string import *def revcomp(dna):""" reverse complement of a DNA sequence """comp dna.translate(maketrans("AGCTagct", "TCGAtcga"))lcomp list(comp)lcomp.reverse()return join(lcomp, "")Xiaohui XiePython course in Bioinformatics

OutlineGeneral IntroductionBasic Types in PythonProgrammingExercisesTranslate a DNA sequenceIIExcercise: Translate a DNA sequence to an amino acidsequenceGenetic codestandard { at’:’gac’:’gaa’:’gag’:Xiaohui Xie’Y’, ’tgt’: ’C’,’Y’, ’tgc’: ’C’,’*’ , ’tca’: ’*’,’*’, ’tcg’: ’W’,’H’, ’cgt’: ’R’,’H’, ’cgc’: ’R’,’Q’, ’cga’: ’R’,’Q’, ’cgg’: ’R’,’N’, ’agt’: ’S’,’N’, ’agc’: ’S’,’K’, ’aga’: ’R’,’K’, ’agg’: ’R’,’D’, ’ggt’: ’G’,’D’, ’ggc’: ’G’,’E’, ’gga’: ’G’,’E’, ’ggg’: ’G’ }Python course in Bioinformatics

Basic Types in Python Programming Exercises Appetizer Example I Task: Print all numbers in a given file I File: numbers.txt 2.1 3.2 4.3 I Code: print.py # Note: the code lines begin in the first column of the file. In # Python code indentation *is* syntactically relevant. Thus, the # has