This Page Intentionally Left Blank - EClass ΕΚΠΑ

Transcription

This page intentionally left blank

Distributed ComputingPrinciples, Algorithms, and SystemsDistributed computing deals with all forms of computing, information access,and information exchange across multiple processing platforms connectedby computer networks. Design of distributed computing systems is a complex task. It requires a solid understanding of the design issues and anin-depth understanding of the theoretical and practical aspects of their solutions. This comprehensive textbook covers the fundamental principles andmodels underlying the theory, algorithms, and systems aspects of distributedcomputing.Broad and detailed coverage of the theory is balanced with practicalsystems-related problems such as mutual exclusion, deadlock detection,authentication, and failure recovery. Algorithms are carefully selected, lucidlypresented, and described without complex proofs. Simple explanations andillustrations are used to elucidate the algorithms. Emerging topics of significant impact, such as peer-to-peer networks and network security, are alsocovered.With state-of-the-art algorithms, numerous illustrations, examples, andhomework problems, this textbook is invaluable for advanced undergraduateand graduate students of electrical and computer engineering and computerscience. Practitioners in data networking and sensor networks will also findthis a valuable resource.Ajay D. Kshemkalyani is an Associate Professor in the Department of Computer Science, at the University of Illinois at Chicago. He was awarded hisPh.D. in Computer and Information Science in 1991 from The Ohio StateUniversity. Before moving to academia, he spent several years working oncomputer networks at IBM Research Triangle Park. In 1999, he received theNational Science Foundation’s CAREER Award. He is a Senior Member ofthe IEEE, and his principal areas of research include distributed computing,algorithms, computer networks, and concurrent systems. He currently serveson the editorial board of Computer Networks.Mukesh Singhal is Full Professor and Gartner Group Endowed Chair in Network Engineering in the Department of Computer Science at the Universityof Kentucky. He was awarded his Ph.D. in Computer Science in 1986 fromthe University of Maryland, College Park. In 2003, he received the IEEE

Technical Achievement Award, and currently serves on the editorial boardsfor the IEEE Transactions on Parallel and Distributed Systems and the IEEETransactions on Computers. He is a Fellow of the IEEE, and his principalareas of research include distributed systems, computer networks, wireless andmobile computing systems, performance evaluation, and computer security.

Distributed ComputingPrinciples, Algorithms, andSystemsAjay D. KshemkalyaniUniversity of Illinois at Chicago, ChicagoandMukesh SinghalUniversity of Kentucky, Lexington

CAMBRIDGE UNIVERSITY PRESSCambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São PauloCambridge University PressThe Edinburgh Building, Cambridge CB2 8RU, UKPublished in the United States of America by Cambridge University Press, New Yorkwww.cambridge.orgInformation on this title: www.cambridge.org/9780521876346 Cambridge University Press 2008This publication is in copyright. Subject to statutory exception and to the provision ofrelevant collective licensing agreements, no reproduction of any part may take placewithout the written permission of Cambridge University Press.First published in print format 2008ISBN-13 978-0-511-39341-9eBook (EBL)ISBN-13hardback978-0-521-87634-6Cambridge University Press has no responsibility for the persistence or accuracy of urlsfor external or third-party internet websites referred to in this publication, and does notguarantee that any content on such websites is, or will remain, accurate or appropriate.

To my father Shri Digambar andmy mother Shrimati Vimala.Ajay D. KshemkalyaniTo my mother Chandra Prabha Singhal,my father Brij Mohan Singhal, and mydaughters Meenakshi, Malvika,and Priyanka.Mukesh Singhal

11.1222.12.22.32.42.52.62.72.82.92.10page xvIntroductionDefinitionRelation to computer system componentsMotivationRelation to parallel multiprocessor/multicomputer systemsMessage-passing systems versus shared memory systemsPrimitives for distributed communicationSynchronous versus asynchronous executionsDesign issues and challengesSelection and coverage of topicsChapter summaryExercisesNotes on referencesReferences11235131419223334353637A model of distributed computationsA distributed programA model of distributed executionsModels of communication networksGlobal state of a distributed systemCuts of a distributed computationPast and future cones of an eventModels of process communicationsChapter summaryExercisesNotes on referencesReferences393940424345464748484849

.35.45.55.65.75.85.95.10Logical timeIntroductionA framework for a system of logical clocksScalar timeVector timeEfficient implementations of vector clocksJard–Jourdan’s adaptive techniqueMatrix timeVirtual timePhysical clock synchronization: NTPChapter summaryExercisesNotes on referencesReferencesGlobal state and snapshot recording algorithmsIntroductionSystem model and definitionsSnapshot algorithms for FIFO channelsVariations of the Chandy–Lamport algorithmSnapshot algorithms for non-FIFO channelsSnapshots in a causal delivery systemMonitoring global stateNecessary and sufficient conditions for consistent globalsnapshotsFinding consistent global snapshots in a distributedcomputationChapter summaryExercisesNotes on referencesReferencesTerminology and basic algorithmsTopology abstraction and overlaysClassifications and basic conceptsComplexity measures and metricsProgram structureElementary graph algorithmsSynchronizersMaximal independent set (MIS)Connected dominating setCompact routing tablesLeader 172174

ixContents5.115.125.135.145.15Challenges in designing distributed graph algorithmsObject replication problemsChapter summaryExercisesNotes on referencesReferences1751761821831851866Message ordering and group communicationMessage ordering paradigmsAsynchronous execution with synchronous communicationSynchronous program order on an asynchronous systemGroup communicationCausal order (CO)Total orderA nomenclature for multicastPropagation trees for multicastClassification of application-level multicast algorithmsSemantics of fault-tolerant group communicationDistributed multicast algorithms at the network layerChapter summaryExercisesNotes on 228230236236238239Termination detectionIntroductionSystem model of a distributed computationTermination detection using distributed snapshotsTermination detection by weight throwingA spanning-tree-based termination detection algorithmMessage-optimal termination detectionTermination detection in a very general distributed computingmodelTermination detection in the atomic computation modelTermination detection in a faulty distributed systemChapter summaryExercisesNotes on referencesReferences241241242243245247253Reasoning with knowledgeThe muddy children puzzleLogic of 17.1288.18.2257263272279279280280

xContents8.38.48.58.68.78.88.9Knowledge in synchronous systemsKnowledge in asynchronous systemsKnowledge transferKnowledge and clocksChapter summaryExercisesNotes on ibuted mutual exclusion algorithmsIntroductionPreliminariesLamport’s algorithmRicart–Agrawala algorithmSinghal’s dynamic information-structure algorithmLodha and Kshemkalyani’s fair mutual exclusion algorithmQuorum-based mutual exclusion algorithmsMaekawa’s algorithmAgarwal–El Abbadi quorum-based algorithmToken-based algorithmsSuzuki–Kasami’s broadcast algorithmRaymond’s tree-based algorithmChapter summaryExercisesNotes on 336336339348348349350Deadlock detection in distributed systemsIntroductionSystem modelPreliminariesModels of deadlocksKnapp’s classification of distributed deadlock detectionalgorithms10.6Mitchell and Merritt’s algorithm for the singleresource model10.7Chandy–Misra–Haas algorithm for the AND model10.8Chandy–Misra–Haas algorithm for the OR model10.9Kshemkalyani–Singhal algorithm for the P-out-of-Q model10.10 Chapter summary10.11 Exercises10.12 Notes on 10.410.5358360362364365374375375376

xiContents11Global predicate detectionStable and unstable predicatesModalities on predicatesCentralized algorithm for relational predicatesConjunctive predicatesDistributed algorithms for conjunctive predicatesFurther classification of predicatesChapter summaryExercisesNotes on ed shared memoryAbstraction and advantagesMemory consistency modelsShared memory mutual exclusionWait-freedomRegister hierarchy and wait-free simulationsWait-free atomic snapshots of shared objectsChapter summaryExercisesNotes on 45413Checkpointing and rollback recoveryIntroductionBackground and definitionsIssues in failure recoveryCheckpoint-based recoveryLog-based rollback recoveryKoo–Toueg coordinated checkpointing algorithmJuang–Venkatesan algorithm for asynchronous checkpointingand recoveryManivannan–Singhal quasi-synchronous checkpointingalgorithmPeterson–Kearns algorithm based on vector timeHelary–Mostefaoui–Netzer–Raynal communication-inducedprotocolChapter summaryExercisesNotes on 6506507

onsensus and agreement algorithmsProblem definitionOverview of resultsAgreement in a failure-free system (synchronous orasynchronous)Agreement in (message-passing) synchronous systems withfailuresAgreement in asynchronous message-passing systems withfailuresWait-free shared memory consensus in asynchronous systemsChapter summaryExercisesNotes on 64565Failure detectorsIntroductionUnreliable failure detectorsThe consensus problemAtomic broadcastA solution to atomic broadcastThe weakest failure detectors to solve fundamental agreementproblems15.7An implementation of a failure detector15.8An adaptive failure detection protocol15.9Exercises15.10 Notes on tion in distributed systemsIntroductionBackground and definitionsProtocols based on symmetric cryptosystemsProtocols based on asymmetric cryptosystemsPassword-based authenticationAuthentication protocol failuresChapter summaryExercisesNotes on 628Self-stabilizationIntroductionSystem 96

.1618.17Definition of self-stabilizationIssues in the design of self-stabilization algorithmsMethodologies for designing self-stabilizing systemsCommunication protocolsSelf-stabilizing distributed spanning treesSelf-stabilizing algorithms for spanning-tree constructionAn anonymous self-stabilizing algorithm for 1-maximalindependent set in treesA probabilistic self-stabilizing leader election algorithmThe role of compilers in self-stabilizationSelf-stabilization as a solution to fault toleranceFactors preventing self-stabilizationLimitations of self-stabilizationChapter summaryExercisesNotes on 667668670670671671Peer-to-peer computing and overlay graphsIntroductionData indexing and overlaysUnstructured overlaysChord distributed hash tableContent addressible networks (CAN)TapestrySome other challenges in P2P system designTradeoffs between table storage and route lengthsGraph structures of complex networksInternet graphsGeneralized random graph networksSmall-world networksScale-free networksEvolving networksChapter summaryExercisesNotes on 714720720721723727727728729Index731

PrefaceBackgroundThe field of distributed computing covers all aspects of computing and information access across multiple processing elements connected by any form ofcommunication network, whether local or wide-area in the coverage. Sincethe advent of the Internet in the 1970s, there has been a steady growth ofnew applications requiring distributed processing. This has been enabled byadvances in networking and hardware technology, the falling cost of hardware, and greater end-user awareness. These factors have contributed tomaking distributed computing a cost-effective, high-performance, and faulttolerant reality. Around the turn of the millenium, there was an explosivegrowth in the expansion and efficiency of the Internet, which was matchedby increased access to networked resources through the World Wide Web,all across the world. Coupled with an equally dramatic growth in the wirelessand mobile networking areas, and the plummeting prices of bandwidth andstorage devices, we are witnessing a rapid spurt in distributed applications andan accompanying interest in the field of distributed computing in universities,governments organizations, and private institutions.Advances in hardware technology have suddenly made sensor networkinga reality, and embedded and sensor networks are rapidly becoming an integralpart of everyone’s life – from the home network with the interconnectedgadgets to the automobile communicating by GPS (global positioning system),to the fully networked office with RFID monitoring. In the emerging globalvillage, distributed computing will be the centerpiece of all computing andinformation access sub-disciplines within computer science. Clearly, this isa very important field. Moreover, this evolving field is characterized by adiverse range of challenges for which the solutions need to have foundationson solid principles.The field of distributed computing is very important, and there is a hugedemand for a good comprehensive book. This book comprehensively coversall important topics in great depth, combining this with a clarity of explanation

xviPrefaceand ease of understanding. The book will be particularly valuable to theacademic community and the computer industry at large. Writing such acomprehensive book has been a Herculean task and there is a deep sense ofsatisfaction in knowing that we were able complete it and perform this serviceto the community.Description, approach, and featuresThe book will focus on the fundamental principles and models underlying allaspects of distributed computing. It will address the principles underlying thetheory, algorithms, and systems aspects of distributed computing. The mannerof presentation of the algorithms is very clear, explaining the main ideas andthe intuition with figures and simple explanations rather than getting entangledin intimidating notations and lengthy and hard-to-follow rigorous proofs ofthe algorithms. The selection of chapter themes is broad and comprehensive,and the book covers all important topics in depth. The selection of algorithmswithin each chapter has been done carefully to elucidate new and importanttechniques of algorithm design. Although the book focuses on foundationalaspects and algorithms for distributed computing, it thoroughly addresses allpractical systems-like problems (e.g., mutual exclusion, deadlock detection,termination detection, failure recovery, authentication, global state and time,etc.) by presenting the theory behind and algorithms for such problems. Thebook is written keeping in mind the impact of emerging topics such aspeer-to-peer computing and network security on the foundational aspects ofdistributed computing.Each chapter contains figures, examples, exercises, a summary, andreferences.ReadershipThis book is aimed as a textbook for the following: Graduate students and Senior level undergraduate students in computerscience and computer engineering. Graduate students in electrical engineering and mathematics. As wirelessnetworks, peer-to-peer networks, and mobile computing continue to growin importance, an increasing number of students from electrical engineeringdepartments will also find this book necessary. Practitioners, systems designers/programmers, and consultants in industryand research laboratories will find the book a very useful reference becauseit contains state-of-the-art algorithms and principles to address variousdesign issues in distributed systems, as well as the latest references.

xviiPrefaceHard and soft prerequisites for the use of this book include the following: An undergraduate course in algorithms is required. Undergraduate courses in operating systems and computer networks wouldbe useful. A reasonable familiarity with programming.We have aimed for a very comprehensive book that will act as a singlesource for distributed computing models and algorithms. The book has bothdepth and breadth of coverage of topics, and is characterized by clear andeasy explanations. None of the existing textbooks on distributed computingprovides all of these features.AcknowledgementsThis book grew from the notes used in the graduate courses on distributedcomputing at the Ohio State University, the University of Illinois at Chicago,and at the University of Kentucky. We would like to thank the graduatestudents at these schools for their contributions to the book in many ways.The book is based on the published research results of numerous researchersin the field. We have made all efforts to present the material in our ownwords and have given credit to the original sources of information. We wouldlike to thank all the researchers whose work has been reported in this book.Finally, we would like to thank the staff of Cambridge University Press forproviding us with excellent support in the publication of this book.Access to resourcesThe following websites will be maintained for the book. Any errors andcomments should be sent to ajayk@cs.uic.edu or singhal@cs.uky.edu. Furtherinformation about the book can be obtained from the authors’ web pages: www.cs.uic.edu/ ajayk/DCS-Book www.cs.uky.edu/ singhal/DCS-Book.

CHAPTER1Introduction1.1 DefinitionA distributed system is a collection of independent entities that cooperate tosolve a problem that cannot be individually solved. Distributed systems havebeen in existence since the start of the universe. From a school of fish to a flockof birds and entire ecosystems of microorganisms, there is communicationamong mobile intelligent agents in nature. With the widespread proliferationof the Internet and the emerging global village, the notion of distributedcomputing systems as a useful and widely deployed tool is becoming a reality.For computing systems, a distributed system has been characterized in one ofseveral ways: You know you are using one when the crash of a computer you have neverheard of prevents you from doing work [23]. A collection of computers that do not share common memory or a commonphysical clock, that communicate by a messages passing over a communication network, and where each computer has its own memory and runs itsown operating system. Typically the computers are semi-autonomous and areloosely coupled while they cooperate to address a problem collectively [29]. A collection of independent computers that appears to the users of thesystem as a single coherent computer [33]. A term that describes a wide range of computers, from weakly coupledsystems such as wide-area networks, to strongly coupled systems such aslocal area networks, to very strongly coupled systems such as multiprocessor systems [19].A distributed system can be characterized as a collection of mostlyautonomous processors communicating over a communication network andhaving the following features: No common physical clock This is an important assumption becauseit introduces the element of “distribution” in the system and gives rise tothe inherent asynchrony amongst the processors.1

2Introduction No shared memory This is a key feature that requires message-passingfor communication. This feature implies the absence of the common physical clock.It may be noted that a distributed system may still provide the abstractionof a common address space via the distributed shared memory abstraction.Several aspects of shared memory multiprocessor systems have also beenstudied in the distributed computing literature. Geographical separation The geographically wider apart that the processors are, the more representative is the system of a distributed system.However, it is not necessary for the processors to be on a wide-area network (WAN). Recently, the network/cluster of workstations (NOW/COW)configuration connecting processors on a LAN is also being increasinglyregarded as a small distributed system. This NOW configuration is becoming popular because of the low-cost high-speed off-the-shelf processorsnow available. The Google search engine is based on the NOW architecture. Autonomy and heterogeneity The processors are “loosely coupled”in that they have different speeds and each can be running a differentoperating system. They are usually not part of a dedicated system, butcooperate with one another by offering services or solving a problemjointly.1.2 Relation to computer system componentsA typical distributed system is shown in Figure 1.1. Each computer has amemory-processing unit and the computers are connected by a communicationnetwork. Figure 1.2 shows the relationships of the software components thatrun on each of the computers and use the local operating system and networkprotocol stack for functioning. The distributed software is also termed asmiddleware. A distributed execution is the execution of processes across thedistributed system to collaboratively achieve a common goal. An executionis also sometimes termed a computation or a run.The distributed system uses a layered architecture to break down the complexity of system design. The middleware is the distributed software thatFigure 1.1 A distributedsystem connects processors bya communication network.P MP MP MP processor(s)M memory bank(s)Communication network(WAN/ LAN)P MP MP MP M

Figure 1.2 Interaction of thesoftware components at eachprocessor.1.3 MotivationExtent ofdistributedprotocolsDistributed applicationDistributed software(middleware libraries)Application layerOperatingsystemTransport layerNetwork layerNetwork protocol stack3Data link layerdrives the distributed system, while providing transparency of heterogeneity atthe platform level [24]. Figure 1.2 schematically shows the interaction of thissoftware with these system components at each processor. Here we assumethat the middleware layer does not contain the traditional application layerfunctions of the network protocol stack, such as http, mail, ftp, and telnet.Various primitives and calls to functions defined in various libraries of themiddleware layer are embedded in the user program code. There exist severallibraries to choose from to invoke primitives for the more common functions – such as reliable and ordered multicasting – of the middleware layer.There are several standards such as Object Management Group’s (OMG)common object request broker architecture (CORBA) [36], and the remoteprocedure call (RPC) mechanism [1, 11]. The RPC mechanism conceptuallyworks like a local procedure call, with the difference that the procedure codemay reside on a remote machine, and the RPC software sends a messageacross the network to invoke the remote procedure. It then awaits a reply,after which the procedure call completes from the perspective of the programthat invoked it. Currently deployed commercial versions of middleware oftenuse CORBA, DCOM (distributed component object model), Java, and RMI(remote method invocation) [7] technologies. The message-passing interface(MPI) [20, 30] developed in the research community is an example of aninterface for various communication functions.1.3 MotivationThe motivation for using a distributed system is some or all of the followingrequirements:1. Inherently distributed computations In many applications such asmoney transfer in banking, or reaching consensus among parties that aregeographically distant, the computation is inherently distributed.2. Resource sharing Resources such as peripherals, complete data setsin databases, special libraries, as well as data (variable/files) cannot be

4Introductionfully replicated at all the sites because it is often neither practical norcost-effective. Further, they cannot be placed at a single site because accessto that site might prove to be a bottleneck. Therefore, such resources aretypically distributed across the system. For example, distributed databasessuch as DB2 partition the data sets across several servers, in addition toreplicating them at a few sites for rapid access as well as reliability.3. Access to geographically remote data and resources In many scenarios, the data cannot be replicated at every site participating in thedistributed execution because it may be too large or too sensitive to bereplicated. For example, payroll data within a multinational corporation isboth too large and too sensitive to be replicated at every branch office/site.It is therefore stored at a central server which can be queried by branchoffices. Similarly, special resources such as supercomputers exist only incertain locations, and to access such supercomputers, users need to log inremotely.Advances in the design of resource-constrained mobile devices as wellas in the wireless technology with which these devices communicatehave given further impetus to the importance of distributed protocols andmiddleware.4. Enhanced reliability A distributed system has the inherent potentialto provide increased reliability because of the possibility of replicatingresources and executions, as well as the reality that geographically distributed resources are not likely to crash/malfunction at the same timeunder normal circumstances. Reliability entails several aspects: availability, i.e., the resource should be accessible at all times; integrity, i.e., the value/state of the resource should be correct, in theface of concurrent access from multiple processors, as per the semanticsexpected by the application; fault-tolerance, i.e., the ability to recover from system failures, wheresuch failures may be defined to occur in one of many failure models,which we will study in Chapters 5 and 14.5. Increased performance/cost ratio By resource sharing and accessinggeographically remote data and resources, the performance/cost ratio isincreased. Although higher throughput has not necessarily been the mainobjective behind using a distributed system, nevertheless, any task can bepartitioned across the various computers in the distributed system. Such aconfiguration provides a better performance/cost ratio than using specialparallel machines. This is particularly true of the NOW configuration.In addition to meeting the above requirements, a distributed system also offersthe following advantages:6. Scalability As the processors are usually connected by a wide-area network, adding more processors does not pose a direct bottleneck for thecommunication network.

51.4 Relation to parallel multiprocessor/multicomputer systems7. Modularity and incremental expandability Heterogeneous processorsmay be easily added into the system without affecting the performance,as long as those processors are running the same middleware algorithms. Similarly, existing processors may be easily replaced by otherprocessors.1.4 Relation to parallel multiprocessor/multicomputer systemsThe characteristics of a distributed system were identified above. A typicaldistributed system would look as shown in Figure 1.1. However, how doesone classify a system that meets some but not all of the characteristics? Is thesystem still a distributed system, or does it become a parallel multiprocessorsystem? To better answer these questions, we first examine the architecture of parallel systems, and then examine some well-known taxonomies formultiprocessor/multicomputer systems.1.4.1 Characteristics of parallel systemsA parallel system may be broadly classified as belonging to one of threetypes:1. A multiprocessor system is a parallel system in which the multiple processors have direct access to shared memory which forms a common addressspace. The architecture is shown in Figure 1.3(a). Such processors usuallydo not have a common clock.A multiprocessor system usually corresponds to a uniform memoryaccess (UMA) architecture in which the access latency, i.e., waiting time, tocomplete an access to any memory location from any processor is the same.The processors are in very close physical proximity and are connected byan interconnection network. Interprocess communication across processorsis traditionally through read and write operations on the shared memory,although the use of message-passing primitives such as those provided byFigure 1.3 Two standardarchitectures for parallelsystems. (a) Uniform memoryaccess (UMA) multiprocessorsystem. (b) Non-uniformmemory access (NUMA)multiprocessor. In botharchitectures, the processorsmay locally cache data frommemory.PPPPInterconnection networkMMMP MP MP MInterconnection networkM(a)P MP M(b)M memoryP pr

Principles, Algorithms, and Systems Distributed computing deals with all forms of computing, information access, and information exchange across multiple processing platforms connected by computer networks. Design of distributed computing systems is a com-plex task. It requires a solid understanding of the design issues and an