Function Programming In Python - Holla.cz

Transcription

FunctionalProgrammingin PythonDavid Mertz

AdditionalResources4 Easy Ways to Learn More and Stay CurrentProgramming NewsletterGet programming r elated news and content delivered weekly to your inbox.oreilly.com/programming/newsletterFree Webcast SeriesLearn about popular programming topics from experts live, online.webcasts.oreilly.comO’Reilly RadarRead more insight and analysis about emerging technologies.radar.oreilly.comConferencesImmerse yourself in learning at an upcoming O’Reilly conference.conferences.oreilly.com 2015 O’Reilly Media, Inc. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. #15305

Functional Programmingin PythonDavid Mertz

Functional Programming in Pythonby David MertzCopyright 2015 O’Reilly Media, Inc. All rights reserved.Attribution-ShareAlike 4.0 International (CC BY-SA 4.0).See: ted in the United States of America.Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA95472.O’Reilly books may be purchased for educational, business, or sales promotional use.Online editions are also available for most titles (http://safaribooksonline.com). Formore information, contact our corporate/institutional sales department:800-998-9938 or corporate@oreilly.com.Editor: Meghan BlanchetteProduction Editor: Shiny KalapurakkelProofreader: Charles RoumeliotisMay 2015:Interior Designer: David FutatoCover Designer: Karen MontgomeryFirst EditionRevision History for the First Edition2015-05-27:First ReleaseThe O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Functional Pro‐gramming in Python, the cover image, and related trade dress are trademarks ofO’Reilly Media, Inc.While the publisher and the author have used good faith efforts to ensure that theinformation and instructions contained in this work are accurate, the publisher andthe author disclaim all responsibility for errors or omissions, including without limi‐tation responsibility for damages resulting from the use of or reliance on this work.Use of the information and instructions contained in this work is at your own risk. Ifany code samples or other technology this work contains or describes is subject toopen source licenses or the intellectual property rights of others, it is your responsi‐bility to ensure that your use thereof complies with such licenses and/or rights.978-1-491-92856-1[LSI]

Table of ContentsPreface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v(Avoiding) Flow Control. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1EncapsulationComprehensionsRecursionEliminating Loops1257Callables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Named Functions and LambdasClosures and Callable InstancesMethods of ClassesMultiple Dispatch12131519Lazy Evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25The Iterator ProtocolModule: itertools2729Higher-Order Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33Utility Higher-Order FunctionsThe operator ModuleThe functools ModuleDecorators35363637iii

PrefaceWhat Is Functional Programming?We’d better start with the hardest question: “What is functional pro‐gramming (FP), anyway?”One answer would be to say that functional programming is whatyou do when you program in languages like Lisp, Scheme, Clojure,Scala, Haskell, ML, OCAML, Erlang, or a few others. That is a safeanswer, but not one that clarifies very much. Unfortunately, it ishard to get a consistent opinion on just what functional program‐ming is, even from functional programmers themselves. A storyabout elephants and blind men seems apropos here. It is also safe tocontrast functional programming with “imperative programming”(what you do in languages like C, Pascal, C , Java, Perl, Awk, TCL,and most others, at least for the most part). Functional program‐ming is also not object-oriented programming (OOP), althoughsome languages are both. And it is not Logic Programming (e.g.,Prolog), but again some languages are multiparadigm.Personally, I would roughly characterize functional programming ashaving at least several of the following characteristics. Languagesthat get called functional make these things easy, and make otherthings either hard or impossible: Functions are first class (objects). That is, everything you can dowith “data” can be done with functions themselves (such aspassing a function to another function). Recursion is used as a primary control structure. In some lan‐guages, no other “loop” construct exists.v

There is a focus on list processing (for example, it is the sourceof the name Lisp). Lists are often used with recursion on sublistsas a substitute for loops. “Pure” functional languages eschew side effects. This excludesthe almost ubiquitous pattern in imperative languages of assign‐ing first one, then another value to the same variable to trackthe program state. Functional programming either discourages or outright disal‐lows statements, and instead works with the evaluation ofexpressions (in other words, functions plus arguments). In thepure case, one program is one expression (plus supporting defi‐nitions). Functional programming worries about what is to be computedrather than how it is to be computed. Much functional programming utilizes “higher order” functions(in other words, functions that operate on functions that oper‐ate on functions).Advocates of functional programming argue that all these character‐istics make for more rapidly developed, shorter, and less bug-pronecode. Moreover, high theorists of computer science, logic, and mathfind it a lot easier to prove formal properties of functional languagesand programs than of imperative languages and programs. One cru‐cial concept in functional programming is that of a“pure function”—one that always returns the same result given thesame arguments—which is more closely akin to the meaning of“function” in mathematics than that in imperative programming.Python is most definitely not a “pure functional programming lan‐guage”; side effects are widespread in most Python programs. Thatis, variables are frequently rebound, mutable data collections oftenchange contents, and I/O is freely interleaved with computation. It isalso not even a “functional programming language” more generally.However, Python is a multiparadigm language that makes functionalprogramming easy to do when desired, and easy to mix with otherprogramming styles.Beyond the Standard LibraryWhile they will not be discussed withing the limited space of thisreport, a large number of useful third-party Python libraries forvi Preface

functional programming are available. The one exception here isthat I will discuss Matthew Rocklin’s multipledispatch as the bestcurrent implementation of the concept it implements.Most third-party libraries around functional programming are col‐lections of higher-order functions, and sometimes enhancements tothe tools for working lazily with iterators contained in itertools.Some notable examples include the following, but this list shouldnot be taken as exhaustive: pyrsistent contains a number of immutable collections. Allmethods on a data structure that would normally mutate itinstead return a new copy of the structure containing therequested updates. The original structure is left untouched. toolz provides a set of utility functions for iterators, functions,and dictionaries. These functions interoperate well and form thebuilding blocks of common data analytic operations. Theyextend the standard libraries itertools and functools andborrow heavily from the standard libraries of contemporaryfunctional languages. hypothesis is a library for creating unit tests for finding edgecases in your code you wouldn’t have thought to look for. Itworks by generating random data matching your specificationand checking that your guarantee still holds in that case. This isoften called property-based testing, and was popularized by theHaskell library QuickCheck. more itertools tries to collect useful compositions of iteratorsthat neither itertools nor the recipes included in its docsaddress. These compositions are deceptively tricky to get rightand this well-crafted library helps users avoid pitfalls of rollingthem themselves.ResourcesThere are a large number of other papers, articles, and books writtenabout functional programming, in Python and otherwise. ThePython standard documentation itself contains an excellent intro‐duction called “Functional Programming HOWTO,” by AndrewKuchling, that discusses some of the motivation for functional pro‐gramming styles, as well as particular capabilities in Python.Preface vii

Mentioned in Kuchling’s introduction are several very old publicdomain articles this author wrote in the 2000s, on which portions ofthis report are based. These include: The first chapter of my book Text Processing in Python, whichdiscusses functional programming for text processing, in thesection titled “Utilizing Higher-Order Functions in Text Pro‐cessing.”I also wrote several articles, mentioned by Kuchling, for IBM’s devel‐operWorks site that discussed using functional programming in anearly version of Python 2.x: Charming Python: Functional programming in Python, Part 1:Making more out of your favorite scripting language Charming Python: Functional programming in Python, Part 2:Wading into functional programming? Charming Python: Functional programming in Python, Part 3:Currying and other higher-order functionsNot mentioned by Kuchling, and also for an older version ofPython, I discussed multiple dispatch in another article for the samecolumn. The implementation I created there has no advantages overthe more recent multipledispatch library, but it provides a longerconceptual explanation than this report can: Charming Python: Multiple dispatch: Generalizing polymor‐phism with multimethodsA Stylistic NoteAs in most programming texts, a fixed font will be used both forinline and block samples of code, including simple command orfunction names. Within code blocks, a notional segment of pseudocode is indicated with a word surrounded by angle brackets (i.e., notvalid Python), such as code-block . In other cases, syntacticallyvalid but undefined functions are used with descriptive names, suchas get the data().viii Preface

(Avoiding) Flow ControlIn typical imperative Python programs—including those that makeuse of classes and methods to hold their imperative code—a block ofcode generally consists of some outside loops (for or while), assign‐ment of state variables within those loops, modification of datastructures like dicts, lists, and sets (or various other structures,either from the standard library or from third-party packages), andsome branch statements (if/elif/else or try/except/finally). Allof this is both natural and seems at first easy to reason about. Theproblems often arise, however, precisely with those side effects thatcome with state variables and mutable data structures; they oftenmodel our concepts from the physical world of containers fairlywell, but it is also difficult to reason accurately about what state datais in at a given point in a program.One solution is to focus not on constructing a data collection butrather on describing “what” that data collection consists of. Whenone simply thinks, “Here’s some data, what do I need to do with it?”rather than the mechanism of constructing the data, more directreasoning is often possible. The imperative flow control described inthe last paragraph is much more about the “how” than the “what”and we can often shift the question.EncapsulationOne obvious way of focusing more on “what” than “how” is simplyto refactor code, and to put the data construction in a more isolatedplace—i.e., in a function or method. For example, consider an exist‐ing snippet of imperative code that looks like this:1

# configure the data to start withcollection get initial state()state var Nonefor datum in data set:if condition(state var):state var calculate from(datum)new modify(datum, state var)collection.add to(new)else:new modify differently(datum)collection.add to(new)# Now actually work with the datafor thing in collection:process(thing)We might simply remove the “how” of the data construction fromthe current scope, and tuck it away in a function that we can thinkabout in isolation (or not think about at all once it is sufficientlyabstracted). For example:# tuck away construction of datadef make collection(data set):collection get initial state()state var Nonefor datum in data set:if condition(state var):state var calculate from(datum, state var)new modify(datum, state var)collection.add to(new)else:new modify differently(datum)collection.add to(new)return collection# Now actually work with the datafor thing in make collection(data set):process(thing)We haven’t changed the programming logic, nor even the lines ofcode, at all, but we have still shifted the focus from “How do we con‐struct collection?” to “What does make collection() create?”ComprehensionsUsing comprehensions is often a way both to make code more com‐pact and to shift our focus from the “how” to the “what.” A compre‐hension is an expression that uses the same keywords as loop andconditional blocks, but inverts their order to focus on the data2 (Avoiding) Flow Control

rather than on the procedure. Simply changing the form of expres‐sion can often make a surprisingly large difference in how we reasonabout code and how easy it is to understand. The ternary operatoralso performs a similar restructuring of our focus, using the samekeywords in a different order. For example, if our original code was:collection list()for datum in data set:if condition(datum):collection.append(datum)else:new modify(datum)collection.append(new)Somewhat more compactly we could write this as:collection [d if condition(d) else modify(d)for d in data set]Far more important than simply saving a few characters and lines isthe mental shift enacted by thinking of what collection is, and byavoiding needing to think about or debug “What is the state of collection at this point in the loop?”List comprehensions have been in Python the longest, and are insome ways the simplest. We now also have generator comprehen‐sions, set comprehensions, and dict comprehensions available inPython syntax. As a caveat though, while you can nest comprehen‐sions to arbitrary depth, past a fairly simple level they tend to stopclarifying and start obscuring. For genuinely complex constructionof a data collection, refactoring into functions remains more reada‐ble.GeneratorsGenerator comprehensions have the same syntax as list comprehen‐sions—other than that there are no square brackets around them(but parentheses are needed syntactically in some contexts, in placeof brackets)—but they are also lazy. That is to say that they aremerely a description of “how to get the data” that is not realizeduntil one explicitly asks for it, either by calling .next() on theobject, or by looping over it. This often saves memory for largesequences and defers computation until it is actually needed. Forexample:log lines (line for line in read line(huge log file)if complex condition(line))Comprehensions 3

For typical uses, the behavior is the same as if you had constructed alist, but runtime behavior is nicer. Obviously, this generator compre‐hension also has imperative versions, for example:def get log lines(log file):line read line(log file)while True:try:if complex condition(line):yield lineline read line(log file)except StopIteration:raiselog lines get log lines(huge log file)Yes, the imperative version could be simplified too, but the versionshown is meant to illustrate the behind-the-scenes “how” of a forloop over an iteratable—more details we also want to abstract fromin our thinking. In fact, even using yield is somewhat of an abstrac‐tion from the underlying “iterator protocol.” We could do this with aclass that had . next () and . iter () methods. For example:class GetLogLines(object):def init (self, log file):self.log file log fileself.line Nonedef iter (self):return selfdef next (self):if self.line is None:self.line read line(log file)while not complex condition(self.line):self.line read line(self.log file)return self.linelog lines GetLogLines(huge log file)Aside from the digression into the iterator protocol and lazinessmore generally, the reader should see that the comprehension focu‐ses attention much better on the “what,” whereas the imperative ver‐sion—although successful as refactorings perhaps—retains the focuson the “how.”Dicts and SetsIn the same fashion that lists can be created in comprehensionsrather than by creating an empty list, looping, and repeatedly call‐4 (Avoiding) Flow Control

ing .append(), dictionaries and sets can be created “all at once”rather than by repeatedly calling .update() or .add() in a loop. Forexample: {i:chr(65 i) for i in range(6)}{0: 'A', 1: 'B', 2: 'C', 3: 'D', 4: 'E', 5: 'F'} {chr(65 i) for i in range(6)}{'A', 'B', 'C', 'D', 'E', 'F'}The imperative versions of these comprehensions would look verysimilar to the examples shown earlier for other built-in datatypes.RecursionFunctional programmers often put weight in expressing flow con‐trol through recursion rather than through loops. Done this way, wecan avoid altering the state of any variables or data structures withinan algorithm, and more importantly get more at the “what” than the“how” of a computation. However, in considering using recursivestyles we should distinguish between the cases where recursion isjust “iteration by another name” and those where a problem canreadily be partitioned into smaller problems, each approached in asimilar way.There are two reasons why we should make the distinction men‐tioned. On the one hand, using recursion effectively as a way ofmarching through a sequence of elements is, while possible, reallynot “Pythonic.” It matches the style of other languages like Lisp, def‐initely, but it often feels contrived in Python. On the other hand,Python is simply comparatively slow at recursion, and has a limitedstack depth limit. Yes, you can change this with sys.setrecursionlimit() to more than the default 1000; but if you find yourselfdoing so it is probably a mistake. Python lacks an internal featurecalled tail call elimination that makes deep recursion computation‐ally efficient in some languages. Let us find a trivial example whererecursion is really just a kind of iteration:def running sum(numbers, start 0):if len(numbers) 0:print()returntotal numbers[0] startprint(total, end " ")running sum(numbers[1:], total)Recursion 5

There is little to recommend this approach, however; an iterationthat simply repeatedly modified the total state variable would bemore readable, and moreover this function is perfectly reasonable towant to call against sequences of much larger length than 1000.However, in other cases, recursive style, even over sequential opera‐tions, still expresses algorithms more intuitively and in a way that iseasier to reason about. A slightly less trivial example, factorial inrecursive and iterative style:def factorialR(N):"Recursive factorial function"assert isinstance(N, int) and N 1return 1 if N 1 else N * factorialR(N-1)def factorialI(N):"Iterative factorial function"assert isinstance(N, int) and N 1product 1while N 1:product * NN - 1return productAlthough this algorithm can also be expressed easily enough with arunning product variable, the recursive expression still comes closerto the “what” than the “how” of the algorithm. The details of repeat‐edly changing the values of product and N in the iterative versionfeels like it’s just bookkeeping, not the nature of the computationitself (but the iterative version is probably faster, and it is easy toreach the recursion limit if it is not adjusted).As a footnote, the fastest version I know of for factorial() inPython is in a functional programming style, and also expresses the“what” of the algorithm well once some higher-order functions arefamiliar:from functools import reducefrom operator import muldef factorialHOF(n):return reduce(mul, range(1, n 1), 1)Where recursion is compelling, and sometimes even the only reallyobvious way to express a solution, is when a problem offers itself toa “divide and conquer” approach. That is, if we can do a similarcomputation on two halves (or anyway, several similarly sizedchunks) of a larger collection. In that case, the recursion depth isonly O(log N) of the size of the collection, which is unlikely to be6 (Avoiding) Flow Control

overly deep. For example, the quicksort algorithm is very elegantlyexpressed without any state variables or loops, but wholly throughrecursion:def quicksort(lst):"Quicksort over a list-like sequence"if len(lst) 0:return lstpivot lst[0]pivots [x for x in lst if x pivot]small quicksort([x for x in lst if x pivot])large quicksort([x for x in lst if x pivot])return small pivots largeSome names are used in the function body to hold convenient val‐ues, but they are never mutated. It would not be as readable, but thedefinition could be written as a single expression if we wanted to doso. In fact, it is somewhat difficult, and certainly less intuitive, totransform this into a stateful iterative version.As general advice, it is good practice to look for possibilities ofrecursive expression—and especially for versions that avoid theneed for state variables or mutable data collections—whenever aproblem looks partitionable into smaller problems. It is not a goodidea in Python—most of the time—to use recursion merely for “iter‐ation by other means.”Eliminating LoopsJust for fun, let us take a quick look at how we could take out allloops from any Python program. Most of the time this is a bad idea,both for readability and performance, but it is worth looking at howsimple it is to do in a systematic fashion as background to contem‐plate those cases where it is actually a good idea.If we simply call a function inside a for loop, the built-in higherorder function map() comes to our aid:for e in it:func(e)# statement-based loopThe following code is entirely equivalent to the functional version,except there is no repeated rebinding of the variable e involved, andhence no state:map(func, it)# map()-based "loop"Eliminating Loops 7

A similar technique is available for a functional approach to sequen‐tial program flow. Most imperative programming consists of state‐ments that amount to “do this, then do that, then do the otherthing.” If those individual actions are wrapped in functions, map()lets us do just this:# let f1, f2, f3 (etc) be functions that perform actions# an execution utility functiondo it lambda f, *args: f(*args)# map()-based action sequencemap(do it, [f1, f2, f3])We can combine the sequencing of function calls with passing argu‐ments from iterables: hello lambda first, last: print("Hello", first, last) bye lambda first, last: print("Bye", first, last) list(map(do it, [hello, bye], ['David','Jane'], ['Mertz','Doe']))Hello David MertzBye Jane DoeOf course, looking at the example, one suspects the result one reallywants is actually to pass all the arguments to each of the functionsrather than one argument from each list to each function. Express‐ing that is difficult without using a list comprehension, but easyenough using one: do all funcs lambda fns, *args: [list(map(fn, *args)) for fn in fns] do all funcs([hello, bye],['David','Jane'], ['Mertz','Doe'])Hello David MertzHello Jane DoeBye David MertzBye Jane DoeIn general, the whole of our main program could, in principle, be amap() expression with an iterable of functions to execute to com‐plete the program.Translating while is slightly more complicated, but is possible to dodirectly using recursion:# statement-based while loopwhile cond : pre-suite if break condition :breakelse:8 (Avoiding) Flow Control

suite # FP-style recursive while loopdef while block(): pre-suite if break condition :return 1else: suite return 0while FP lambda: ( cond and while block()) or while FP()while FP()Our translation of while still requires a while block() functionthat may itself contain statements rather than just expressions. Wecould go further in turning suites into function sequences, usingmap() as above. If we did this, we could, moreover, also return a sin‐gle ternary expression. The details of this further purely functionalrefactoring is left to readers as an exercise (hint: it will be ugly; funto play with, but not good production code).It is hard for cond to be useful with the usual tests, such as whilemyvar 7, since the loop body (by design) cannot change any vari‐able values. One way to add a more useful condition is to letwhile block() return a more interesting value, and compare thatreturn value for a termination condition. Here is a concrete exampleof eliminating statements:# imperative version of "echo()"def echo IMP():while 1:x input("IMP -- ")if x 'quit':breakelse:print(x)echo IMP()Now let’s remove the while loop for the functional version:# FP version of "echo()"def identity print(x):# "identity with side-effect"print(x)return xecho FP lambda: identity print(input("FP -- ")) 'quit' orecho FP()echo FP()Eliminating Loops 9

What we have accomplished is that we have managed to express alittle program that involves I/O, looping, and conditional statementsas a pure expression with recursion (in fact, as a function object thatcan be passed elsewhere if desired). We do still utilize the utilityfunction identity print(), but this function is completely general,and can be reused in every functional program expression we mightcreate later (it’s a one-time cost). Notice that any expression contain‐ing identity print(x) evaluates to the same thing as if it had sim‐ply contained x; it is only called for its I/O side effect.Eliminating RecursionAs with the simple factorial example given above, sometimes we canperform “recursion without recursion” by using functools.reduce() or other folding operations (other “folds” are not inthe Python standard library, but can easily be constructed and/oroccur in third-party libraries). A recursion is often simply a way ofcombining something simpler with an accumulated intermediateresult, and that is exactly what reduce() does at heart. A slightlylonger discussion of functools.reduce() occurs in the chapter onhigher-order functions.10 (Avoiding) Flow Control

CallablesThe emphasis in functional programming is, somewhat tautolo‐gously, on calling functions. Python actually gives us several differ‐ent ways to create functions, or at least something very function-like(i.e., that can be called). They are: Regular functions created with def and given a name at defini‐tion time Anonymous functions created with lambda Instances of classes that define a call() method Closures returned by function factories Static methods of instances, either via the @staticmethod deco‐rator or via the class dict Generator functionsThis list is probably not exhaustive, but it gives a sense of thenumerous slightly different ways one can create something callable.Of course, a plain method of a class instance is also a callable, butone generally uses those where the emphasis is on accessing andmodifying mutable state.Python is a multiple paradigm language, but it has an emphasis onobject-oriented styles. When one defines a class, it is generally togenerate instances meant as containers for data that change as onecalls methods of the class. This style is in some ways opposite to afunctional programming approach, which emphasizes immutabilityand pure functions.11

Any method that accesses the state of an instance (in any degree) todetermine what result to return is not a pure function. Of course, allthe other types of callables we discuss also allow reliance on state invarious ways. The author of this report has long pondered whetherhe could use some dark magic within Python explicitly to declare afunction as pure—say by decorating it with a hypothetical@purefunction decorator that would raise an exception if the func‐tion can have side effects—but consensus seems to be that it wouldbe impossible to guard against every edge case in Python’s internalmachinery.The advantage of a pure function and side-effect-free code is that it isgenerally easier to debug and test. Callables that freely interspersestatefulness with their returned results cannot be examined inde‐pendently of their running context to see how they behave, at leastnot entirely so. For example, a unit test (using doctest or unittest,or some third-party testing framework such as py.test or nose)might succeed in one context but fail when identical calls are madewithin a running, stateful program. Of course, at the very least, anyprogram that does anything must have some kind of output(whether to console, a file, a database, over the network, or what‐ever) in it to do anything useful, so side effects cannot be entirelyeliminated, only isolated to a degree when thinking in functionalprogramming terms.Named Functions and LambdasThe most obvious ways to create callables in Python are, in definiteorder of obviousness, named functions and lambdas. The only inprinciple difference between them is simply whether they havea . qualname attribute, since both can very well be bound to oneor more names. In most cases, lambda expressions are used withinPython only for callbacks and other uses where a simple action isinlined into a function call. But as we have shown in this report, flowcontrol in general can be incorporated into single-expression lamb‐

Free Webcast Series Learn about popular programming topics from experts live, online. webcasts.oreilly.com O’Reilly Radar . However, Python is a multiparadigm language that makes functional pr