Object-oriented Encapsulation For Dynamically Typed Languages

Transcription

Object-oriented Encapsulationfor Dynamically Typed LanguagesNathanael SchärliAndrew P. BlackStéphane DucasseSoftware Composition GroupUniversity of BernBern, SwitzerlandOGI School of Science &EngineeringOregon Health & ScienceUniversityPortland, Oregon, USASoftware Composition GroupUniversity of BernBern, chblack@cse.ogi.eduABSTRACT1.Encapsulation in object-oriented languages has traditionally been based on static type systems. As a consequence,dynamically-typed languages have only limited support forencapsulation. This is surprising, considering that encapsulation is one of the most fundamental and important concepts behind object-oriented programming and that it is essential for writing programs that are maintainable and reliable, and that remain robust as they evolve.Encapsulation is widely acknowledged as being one of thecornerstones of object-oriented programming [22]. But whatdoes the term mean? In a classic paper, Alan Snyder definedencapsulation as follows [28, p. 39]:In this paper we describe the problems that are causedby insufficient encapsulation mechanisms and then presentobject-oriented encapsulation, a simple and uniform approachthat solves these problems by bringing state of the art encapsulation features to dynamically typed languages. Weprovide a detailed discussion of our design rationales andcompare them and their consequences to the encapsulationapproaches used for statically typed languages. We also describe an implementation of object-oriented encapsulation inSmalltalk. Benchmarks show that extensive use of objectoriented encapsulation results in a slowdown of less than 15per cent.Categories and Subject DescriptorsD.3.3 [Programming Languages]: Language Constructsand Features—Classes and objects; InheritanceGeneral TermsLanguagesKeywordsDynamic typing, Encapsulation, Encapsulation Policies, Information hiding, SmalltalkPermission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.OOPSLA’04, pp. 130–139, Oct. 24-28, 2004, Vancouver, BC, Canada.Copyright 2004 ACM 1-58113-831-8/04/0010 . 5.00INTRODUCTIONEncapsulation is a technique for minimizing interdependencies among separately-written modules bydefining strict external interfaces. The external interface of a module serves as a contract between themodule and its clients, and thus between the designerof the module and other designers.We have added the emphasis to point out that while thisdefinition captures the essence of encapsulation for modules,it does not adequately define encapsulation in the contextof objects. To see this, suppose that we are trying to encapsulate an object that maintains an internal data structure,such as a tree. We would like to protect the invariants of thetree, but clients need to traverse it. If we pass our users anunrestricted reference — an alias — to the tree, clients mightmodify the tree in a way that breaks the invariant. Whatwe would like to do is to allow the encapsulating object fullaccess to the tree, including the right to modify it, but torestrict the access that is granted to clients.Solving this kind of encapsulation problem requires somesort of protection based not on modules but on individualreferences to objects. In contrast to module encapsulation asdefined by Snyder, the primary purpose of these object encapsulation [8] techniques is not to facilitate code evolution,but to increase code reliability. It does this by allowing theprogrammer to implement data structures whose instancesare guaranteed to be protected from the invocation of inappropriate operations on their subobjects.Today, most statically typed object-oriented languages suchas Java, C , and C# provide relatively good support formodule encapsulation, and many proposals have been madefor augmenting the static type systems of such languages sothat they can also express object encapsulation [2, 3, 8, 9,13, 17, 20, 21, 25].However, things are quite different in dynamically typedlanguages. Popular dynamically typed languages such as

from being called from within another module1 , but does notaddress other module encapsulation problems such as preventing a method from being overridden, nor does it provideany help in the control of object aliasing. This model wasin fact included in an experimental release of Self, but wassoon removed because it was found to be too complex. Analternative proposal, from Noble, Clarke and Potter [24],suggests extending languages like Self with dynamic objectencapsulation using techniques based on object ownership,but it does not address the fact that the base languages donot provide simple module encapsulation.Smalltalk, Self, Python, and Ruby still provide no encapsulation at all, or support it in a very limited way. Proposalssuch as MUST for an encapsulation model for Smalltalk [30]have never been adopted, either in Smalltalk or in any otherpopular dynamically typed language.The encapsulation model [12] proposed by the developers ofSelf was rejected, and is not available in more recent versionsof the Self language [1]. There have been proposals datingback at least as far as 1987 for extending Smalltalk withsome (limited) features for object encapsulation [7], but 17years later, such features are still not available in Smalltalk.Secondly, many of the proposed encapsulation models arenot well-suited to the lightweight, dynamic, and entirelymessage-based spirit of these languages. For example, theSmalltalk extension MUST [30] significantly affects the lightweight and dynamic character of the language by requiringthe programmer to make quite static encapsulation decisionsboth when declaring methods and when sending messages.Other approaches, such as that of Noble, Clarke and Potter[24], are so restrictive that they would prohibit several common programming patterns such as external iterators [20]and are hard to implement efficiently (see Section 8).In this paper, we propose an object-oriented encapsulationmodel (OOE) for dynamically typed languages such as Smalltalk and Ruby. It provides a uniform and expressive mechanism to address most of the module and object encapsulation problems of which we are aware. OOE is compatiblewith dynamic languages because it is based entirely on message passing and has a simple semantics that makes it easyto understand and reason about.OOE supports module encapsulation using Composable Encapsulation Policies [27]: the degree of encapsulation is notdictated by the implementor of a module, but is selectedby the user subject to policies defined by the implementor.It also allows encapsulation policies to be associated withindividual references to objects, which is sufficient to solvemany, although not all, object encapsulation problems.Thirdly, language designers may have perceived that thekind of experimental programming for which dynamic languages were originally promoted — rapid prototyping andsingle programmer experimental projects — did not need encapsulation. This perception may have been compoundedby the mismatch between available encapsulation techniquesand the experimental style.The rest of this paper is structured as follows. We firstdescribe the problems that are caused by insufficient encapsulation, and develop a set of goals that an effective encapsulation mechanism should meet (Section 2). After givinga brief outline of our proposal for Object-oriented Encapsulation (OOE) in Section 3, we describe OOE in detail inSection 4, deferring to Section 5 the discussion of why wetook particular design decisions, and the alternatives thatwe considered and rejected. In Section 6, we give a detaileddescription of our implementation of OOE in Smalltalk anduse benchmarks to evaluate its impact on performance. InSection 7 we evaluate OOE against our goals; Section 8 describes related work and Section 9 concludes.2.Agile programming methodologies such as extreme programming [6] make a virtue out of change and expand the reach oftechniques previously thought suitable only for experimental programming to include customer-driven projects undertaken by a sizable team. The success of these methodologies has shown that the third reason was not well founded.In particular, communal code ownership demands that programmers be given a way of expressing their intent as towhich methods are to be callable by which objects. Thus,we feel that there is a real need for encapsulation in dynamiclanguages, if only a mechanism can be found that is both adequate and appropriate. Indeed, the absence of static typesmakes the modularity improvements and reliability guarantees that come with powerful encapsulation mechanismseven more valuable in a dynamically-typed language than inone with a static type system.PROBLEMS AND GOALSIn this section, we set the context for our work, and motivateit by describing the problems that are caused by inadequateencapsulation mechanisms. Based on these problems, wethen formulate a set of goals for an object-oriented encapsulation model for dynamically typed languages.2.12.2The Problem of InterdependenceWe agree with Snyder that one of the primary purposesof encapsulation is to minimize interdependencies amongseparately-written modules [28]. All object-oriented languages that are based on classes depend on the class asthe fundamental unit of modularity. In this paper we focus on class-based languages: this means that our modulesare classes. Hence, module encapsulation means class encapsulation, and we use the two terms interchangeably.Encapsulation in Dynamic LanguagesWhy is it that encapsulation, which is such a well-establishedfeature of statically typed languages, is so poorly supportedin dynamically typed languages? We believe that there arethree reasons.First, most of the proposed encapsulation models addressonly a small subset of the various encapsulation problems,and so no one of them seemed to add enough value to justifythe additional complexity. For example, the model proposedfor Self [12] addresses the issue of how to prevent a method1Since the prototype-based language Self has no explicitmodules, the developers instead suggested to use implicitmodules consisting of an object together with its shared ancestors.2

Two fundamental operations are defined on classes: inheritance and instantiation. Consequently, there are three waysin which classes depend on each other: (1) when they are inan inheritance relationship, (2) when one class instantiatesanother to create an instance, and (3) when that instance iseventually used. Although in many languages instantiationis a built-in operation of the language (for example, Java’snew), in Smalltalk it is not: instantiation is accomplishedby sending an ordinary message to the class itself. We willtherefore not single-out instantiation in the discussion thatfollows; instantiation can be controlled in Smalltalk usingexactly the same techniques as message send.2.2.1start with two underscores are “private” in the sense thatthese names are valid only from within the class in whichthey are defined. Outside of that class, such features areavailable under a different name, which is derived from theoriginal name by prefixing the class name.Although this makes it unlikely that such an internal featurewill be accidentally called or overridden outside of the class,it offers no real protection. Furthermore, the approach ofprotecting a method based on whether its name fits a convention is clumsy because it requires all the references to themethod to be changed if the programmer decides to changethe status of the method from “private” to “public”.Interdependence through InheritanceMost dynamically typed languages such as Smalltalk andRuby do not allow a programmer to hide the internal implementation features (i.e., methods and instance variables) ofa superclass from its subclasses.2.2.2Interdependence when using InstancesWhereas a class will typically be subclassed only a handful of times, it will be instantiated many times and its instances will be used from many other classes. Thus, it iseven more important to protect the internal features of aninstance from inappropriate access by a client than it is toprotect them from a subclass. Unfortunately, most dynamically typed languages provide only very limited support forthis sort of encapsulation.One aspect of this shortcoming is that the designer of a classcannot specify access restrictions that prevent some or allof its subclasses from accessing certain instance variablesor calling certain methods. This has severe consequencesfor code evolution: whenever a feature of a superclass ismodified, the programmer must check all its (direct andindirect) subclasses to ensure that the change does not breakexisting code. This is because any subclass might use themodified feature and may rely on its old meaning.In Smalltalk and Ruby, all instance variables are protectedfrom direct access from outside the object that containsthem. In contrast, all methods are externally accessible.Python is even less restrictive: by default, instance variables are fully accessible from the outside and there is noeffective way of protecting them. Even if the programmerdeclares them as “private” by using a name starting withtwo underscores, they can still be accessed from the outsideas described in Section 2.2.1.Another aspect of this shortcoming is that the programmercannot specify that a certain feature of a class should bestatically bound, that is, that all local references to the feature’s name should always be bound to the local featurerather than to an overriding implementation that appearsin a subclass. This is perhaps unsurprising in a languagebased on message passing and dynamic binding, but it toohas serious consequences on code evolution. In fact, the consequences of this aspect are even worse. Not only must allsubclasses be checked when an existing feature of a class ischanged, but also all the subclasses and superclasses mustbe checked when a new feature is added [29].None of these languages provide any support for declaringinternal methods that cannot be invoked from outside ofthe class in which they are defined. This has severe consequences for code evolution: for every change to an existingmethod the programmer must check all the classes in thewhole application to ensure that the change does not breakexisting code. This is because even if the changed methodwas intended to be reserved for internal use, there may stillbe invocations of this method from any other class.To see this, imagine that a maintenance programmer detectssome duplicated code in an existing class C and wants toextract it into a new internal method called check. To dothis safely, the programmer has to make sure that the checkmethod does not accidentally override an internal methodwith the same name implemented in any of C’s superclasses.In addition, the programmer also needs to be sure that thereis no method named check in any of C’s subclasses, becausesuch a method would override the new implementation ofcheck that was intended to be internal to C.Some Smalltalk dialects attempt to solve this problem byusing a special naming convention to specify internal methods. In the Squeak dialect [19], for example, methods whosenames begin with pvt are effectively private: the compilerensures that these messages can be sent only to self. However, this approach not only prevents accesses to such internal methods from outside of their class but also from otherobjects of the same class. Thus, the pvt convention is a formof object encapsulation: in practice it is often too strict, because it prevents many commonly used data structures andpatterns from being implemented. As with Python’s doubleunderscore, this approach is clumsy because changing theaccess attributes of a method requires renaming it.This dependence on subclasses is particularly problematic:it is often impossible to check all the possible subclasses ofa certain class, for example, because the programmer of theclass works for a framework vendor and the subclasses areimplemented by (and a trade secret of) its customers.Python is one of the few dynamically typed languages ofwhich we are aware that provides even limited support fordecreasing the interdependence between classes that are related by inheritance. In Python, features whose names thatThe utility of the pvt feature is reflected in the Squeak image:although this feature has been available for years, only 9 outof about 40 000 methods in the latest Squeak image use it.3

2.3The Problem of Fragile Data Structuresit is ever allowed to escape, either from Morphor from the proxy class. A single inadvertentlyleaked reference, whether through a parameter,return value or exception, defeats the whole scheme.2. Whenever relevant methods are added, removed,renamed and changed in the class OrderedCollection, the proxy class may also need to be updatedto ensure that all the safe messages are correctlyforwarded and that unrestricted references to thesubmorphs object are not passed outside.3. Subtle semantic problems can arise because of thedifferent identities of the submorphs object and itsproxy object.Module encapsulation, as described by Snyder and implemented in most modern statically typed programming languages, minimizes the interdependencies between separatelywritten modules. However, it is not fine-grained enough toaddress the encapsulation problems related to object aliasing [18]. Reasoning about a class in an object-oriented program involves reasoning about the behavior of its instances,and those instances will depend for their correct operationon subobjects instantiated from other classes. If we donot have an object encapsulation mechanism that allows usto prohibit inappropriate manipulations of these subobjects(e.g., through aliases), checking the correctness of a classmay require reasoning about the whole program [8].Fat interfaces: instead of implementing a separate proxyclass, one could implement all the necessary enumeration methods directly in class Morph. However, this isreally just a variation of the proxy approach and it suffers from similar problems. It also increases the complexity of the already overly complex Morph interface.Most importantly, the standard names of the methods in the enumeration protocol cannot, in general, beused: whereas the submorphs collection understandsdo:, the parent Morph must instead implement methods like submorphsDo: and boundingPathDo: The needto use different message names destroys the uniformityof protocol that makes it possible to write polymorphiccode, which is one of the major benefits of the objectoriented paradigm.As an illustration, consider the class Morph, which is theroot of the GUI framework in Squeak. Morphs have a hierarchic structure: a Morph contains an instance variablenamed submorphs, which is a (possibly empty) collectionof Morph objects. Morph also implements a few methodssuch as addMorph:, removeMorph:, and moveMorphToFront:,which add and remove submorphs and change how they arelayered. Since a Morph is responsible for properly displaying all of its contents, it must take some additional actionswhenever its set of submorphs is changed, and it is thereforeimportant that the Morph’s clients always use these methods to modify the submorph collection, rather than doingso directly. With class-based encapsulation mechanism, theonly way to ensure this is to make the reference to the submorphs secret and never pass it out of the parent morph.The implementation of Morph in Squeak currently uses valuesemantics: the method submorphs returns a copy of thesubmorph collection. However, to avoid excessive copying,Morph also provides some of the methods that would makeup a fat interface: it implements the methods submorphsDo:,submorphsReverseDo:, submorphsIndexOf: and many othersdirectly, although other enumeration methods such as submorphsCollect: are missing.Unfortunately, this conflicts with the need of some of Morph’sclients to use the protocol provided in Collection to enumerate the submorphs. As a consequence, the implementor ofMorph has to choose between one of the following unpleasantalternatives [20].Value semantics: implement the method submorphs to return a copy of the submorphs collection. By avoidingaliases, this approach also avoids the problem of inappropriate manipulations through aliases. However,this is not a general solution to our problem becausevalue semantics is not always appropriate and its usecan therefore lead to subtle bugs. For example, ifno special care is taken, it can happen that anotherthread adds or removes submorphs so that the copyreturned by the method submorphs is out of date before the client actually used it. Another problem isthat, depending on the usage scenario, this approachcan require a large amount of unnecessary copying,thus incurring significant time and space penalties.Morph is by no means unusual: there are many other welldocumented examples that show the usefulness of objectencapsulation for common data structures and patterns suchas stacks and iterators [8, 23].2.4GoalsOur goal is to develop an encapsulation mechanism for dynamically typed languages that avoids the problems just described. We seek a mechanism that is expressive, simple,and appropriate for dynamic languages. Expressive. Our target mechanism must be expressive enough to solve the problems that we discussed inSections 2.2 and 2.3. This means that it should facilitate code evolution by minimizing the interdependencies between different classes and that it should allowone to implement reliable data structures by controlling access to individual objects through aliases.Proxies: instead of returning a reference to the submorphscollection itself, return a reference to a proxy [15] thatserves as a protecting container for the submorphs collection. The proxy understands only a safe subset ofthe collections methods (e.g., the enumeration protocol). In addition to being laborious to implementwithout language support, and introducing a forwarding overhead on every invocation, proxies have severalmethodological drawbacks. Simple. It should add minimal complexity to dynamically typed languages. This means that the semanticsof the language should remain simple and that the encapsulation mechanism should make it no harder tounderstand and reason about programs.1. To protect the real submorphs object consistently,the programmer must ensure that no reference to4

Appropriate. The absence of type declarations makesprogramming in dynamically typed languages morelightweight than in their static counterparts. It alsomakes programming more experimental and incremental, for example, it is possible to execute and test acode fragment before all the type declarations are consistent or even present. Furthermore, the absence ofstatic types allows classes to be reused in diverse andsometimes unanticipated ways. Our target encapsulation mechanism should support this dynamic style ofprogramming: it should not burden the programmerwith heavyweight type annotations, it should supportan incremental style of programming and it shouldsupport flexible and unanticipated reuse.3.it could also be applied to other languages that fall into thiscategory (e.g., Ruby and Python).For conciseness, this section presents our model withoutmuch discussion of our design decisions and without anyattempt to justify them: this material is deferred to Section 5.4.1OUR PROPOSAL IN A NUTSHELLObject-oriented Encapsulation (OOE) combines the featuresof class and object encapsulation mechanisms. OOE definesall variables to be local, which means that they are completely hidden and inaccessible from outside of the structurein which they are defined.The encapsulation mechanisms for methods are based on twocornerstones.As an additional restriction, we determine that each instanceof a class can access only its own fields, and not those ofother instances. This restriction is already present in theSmalltalk-80 language.1. OOE uses encapsulation policies [27] to specify the encapsulation properties of classes and objects in a uniform way. Encapsulation policies can be shared amongobjects and classes. We allow different clients to access a given object or a class through different encapsulation policies; this is accomplished by associatingencapsulation policies with object references.Besides instance variables, Smalltalk has the concept of classvariables [16], which correspond to static fields in Java. Aswith instance variables, we would like to encapsulate classvariables in the module in which they are declared. Therefore, we determine that a class variable that is defined ina class C can be accessed only from the class side of C: itcannot be accessed from the instance side of C nor can it beaccessed from the class side of subclasses of C. The Smalltalk jargon “the class side of C” corresponds to the Javaterminology “the static members of C”. Thus, in Java terminology, our restriction says that static fields defined in aclass C can be accessed only from the static methods of C,but not by non-static methods of C nor by static methodsof subclasses of C.2. OOE defines encapsulation semantics by distinguishing between three different kinds of message send. Thedistinction is purely syntactic and allows us to definea simple semantics that combines class and object encapsulation. For self-sends and super-sends, distinguished bythe keywords self and super, the receiver of themessage is statically known to be the current object. Thus, object encapsulation is not relevant:only the encapsulation policies used in the inheritance chain of the receiver’s class decide whethera message send is valid and how it is bound.Note that because access to instance variables and class variables can be granted through accessor methods (i.e., getters and setters), these apparently severe encapsulation constraints for variables do not affect the kind of abstractionsthat a programmer can write. However, they do allow usto keep our model uniform and simple: we can now focusexclusively on the encapsulation of methods. For object-sends (that is, all messages sent to object references other than self or super), the encapsulation policy that is associated with the target object reference is used to decide whether amessage send is valid. In this case the target istreated as a black-box that is accessed throughits external interface; internal details such as howthe target’s class is built from other classes areirrelevant.4.All Variables are LocalOne purpose of our model is to control and reduce the dependencies between modules, which we assume to be classdefinitions. As a first step, we determine that instance variables are never visible outside of the class in which they aredeclared. This means that from within a class definition D,one cannot access, for reading or writing, any instance variables that are defined in another class C, even if C and D arerelated by inheritance. An immediate consequence of thisrule is that the names of the instance variables in a class Dcan be chosen independently of the names of the instancevariables in all the other classes. This provides stronger encapsulation than Smalltalk-80, which allows a method in asubclass to access the instance variables of a superclass.4.2Using Encapsulation PoliciesEvery encapsulation model needs a way to specify what access rights should be associated with which methods. Thisis usually done by defining a set of keywords such as public, private, and protected, which can be used to annotatemethods where they are defined. This provides fine-grainedcontrol over what can be accessed (individual methods), butvery coarse-grained control over whom can perform the access (all code in one of a small number of pre-defined categories). One of the key benefits of our model is that weuse encapsulation policies to specify access rights, and thusallow much more precision in controlling the whom.OBJECT-ORIENTED ENCAPSULATIONIn this section we explain in some detail our model for object-oriented encapsulation in dynamically typed languages.For concreteness, we do this in the context of Smalltalk.However, since our proposal relies on only the most fundamental features of a dynamically typed object-orientedlanguage based on message passing, we are convinced that5

Message send. In Smalltalk, the only thing that canultimately be done with an object reference is to sendit a message. It is the encapsulation policy associatedwith the object reference that determines whether ornot the message send is valid.The concept of encapsulation policies has been describedand formalized in a language-independent way in a previouspaper presented at ECOOP 2004 [27]. We will not repeatthis materiel here, but will focus on the aspects that arerelevant to Object-oriented Encapsulation. A summary ofthe relationship between this paper and the ECOOP 2004paper can be found in Section 8.4.2.1 Assignment. When an object is assigned to a variable,passed as a argument or returned from a method, anew object reference is created. The new reference isa copy of the original and has the same encapsulationpolicy.What is an Encapsulation Policy?The basic idea behind encapsulation policies is to separatethe encapsulation aspect of a class from the implementationaspect of a class. This separation allow these two aspectsto be reused independently. The separation is accomplishedby introducing a new entity, called an encapsulation policy, which is essentially a mapping from method selectors toaccess rights. Encapsulation policies are composable: thismeans that not only can n

Encapsulation in object-oriented languages has tradition-ally been based on static type systems. As a consequence, dynamically-typed languages have only limited support for encapsulation. This is surprising, considering that encapsu-lation is one of the most fundamental and important con-cepts behind object-oriented programming and that it is es-