Architecting Cloud Workflow: Theory And Practice

Transcription

Architecting Cloud Workflow: Theory and PracticeYong Zhao, Youfu LiIoan RaicuSchool of Computer Science and EngineeringUniv. of Electronic and Science Technology of ChinaChengdu, China{yongzh04, youfuli.fly}@gmail.comDepartment of Computer ScienceIllinois Institute of TechnologyChicago, USAiraicu@iit.eduShiyong LuXuan ZhangDepartment of Computer ScienceWayne State UniversityDetroit, USAshiyong@wayne.eduSchool of Computer Science and EngineeringUniv. of Electronic and Science Technology of ChinaChengdu, ChinaJossie.bunny@gmail.comAbstract—The data scale, science analysis and processingcomplexity in scientific community are growing exponentially inthe “big data” era. Cloud computing paradigm has been widelyadopted to provide unprecedented scalability and resources ondemand, while scientific workflow management systems(SWFMSs) have been proven essential to scientific computingand services computing. Uniting the advantages of both cloudcomputing and SWFMSs can bring a valuable solution to thescientific “big data” problem to researchers. Although a series ofwork have concentrated on integrating SWFMSs with Cloudplatforms that provide much experience for future research anddevelopment, a study from an architectural perspective is stillmissing. The main contributions of this paper are: 1) based on acomprehensive survey of the available integration options, wepropose a service framework for integrating SWFMSs withCloud computing; 2) we implement the service framework basedon various Cloud platforms to validate the feasibility of theproposed framework; and 3) we conduct a set of experiments todemonstrate the capability and use a NASA MODIS imageprocessing workflow as a showcase of the implementation.Keywords—Cloud Workflow; Service Framework; Workflowas-a-Service; Swift; OpenNebula; EucalyptusI.INTRODUCTIONIndustrial and Scientific communities are facing a “datadeluge” [7] coming from products, sensors, satellites,experiments and simulations. Scientists, manufacturers anddevelopers are attempting multifarious methods to deal withthe ever-increasing computing and storage problems arising inthe “big data” era. As an emerging computing paradigm, Cloudcomputing [6] is gaining tremendous momentum in bothacademia and industry. Scientific workflow managementsystems (SWFMSs) have been proven essential to scientificcomputing and services computing as they providefunctionalities such as workflow specification, processcoordination, job scheduling and execution, provenancetracking, and fault tolerance.Uniting the advantages of both cloud computing andSWFMSs can bring a valuable solution to the scientific “bigThis paper is supported by the National Science Foundation ofChina No. 61034005 and No. 61272528.data” problem to researchers. Cloud offers unprecedentedscalability to workflow systems, and could potentially changethe way we perceive and conduct scientific experiments. Thescale and complexity of the science problems that can behandled can be greatly increased on the Cloud, and the ondemand resource allocation on the Cloud will also helpimprove resource utilization and user experience.In many cases, large simulations are organized as scientificworkflows that run on Distributed Computing Infrastructures(DCIs), and we realize that workflow management systems arediverse in many aspects, such as workflow models, workflowlanguages, workflow engines, and so on. In many cases, oneworkflow system engine is dependent on one specific DCI,porting a workflow management system to run on another DCImay cost a large quantity of extra effort. So in practice,researchers may choose to integrate a specific SWFMS into aparticular Cloud, whichever takes the minimum effort tomigrate. We expect that the availability of such a serviceframework can provide a solution to breaking the limitationsthat a specific SWFMS is bound to a particular Cloudenvironment and a guidance for the architectural design ofintegrating SWFMSs into Cloud platforms. To address thisissue, First, we propose a generic service framework tointegrate SWFMSs with various Cloud based DCIs,which covers a wide spectrum from workflowmanagement and migration into Clouds, taskscheduling, Cloud resource management, and virtualresource provisioning and recycling. Second, through the introduction of the referenceservice framework, we implement the frameworkbased on a set of open-source and implementedsystems to validate the feasibility of the proposedframework. Third, we conduct a series of experiments todemonstrate the capability and use a NASA MODISimage processing workflow as a showcase of theimplementation.

II. RELATED WORKThe deployment and management of workflows over thecurrent existing heterogeneous and not yet interoperable Cloudproviders, however, is still a challenging task for the workflowdevelopers. The series of works [5] [19] presented a brokerbased framework to support the execution of workflowapplications on a multi-Cloud environment. Bhaskar PrasadRimal et al. [4] discussed a framework of scientific workflowfor multi-tenant cloud orchestration environment that dealswith semantic-based workflow as well as policy-basedworkflow. To isolate each tenant, they designed three layers ofmetadata, including tenant-specific metadata, commonmetadata and data, and maintained them in the metadatarepositories which were shared between tenants.The CODA framework [3] was designed and implementedto support big data analytics in cloud computing. Importantfunctions, such as workflow scheduling, data locality, resourceprovisioning, and monitoring functions, has been integratedinto the framework. Through the CODA framework, theworkflows can be easily composed and efficiently executed inAmazon EC2. Sunflower [13] was an adaptive P2P agentbased framework for configuring, enacting, managing andadapting autonomic workflows on hybrid Grid-Cloudinfrastructures. To orchestrate Grid and Cloud services,Sunflower utilized a bio-inspired autonomic choreographymodel and integrated the scheduling algorithm with aprovisioning component that can dynamically launch virtualmachines in a Cloud infrastructure to provide on-demandservices in peak-load situations.In order to address performance and cost issues of big dataprocessing on clouds, Long Wang et al. [15] presented a noveldesign of adaptive workflow management system whichincluded a data mining based prediction model, workflowscheduler, and iteration controls to optimize the dataprocessing via iterative workflow tasks.resource allocation, function implementation, serviceevaluation, performance and cost issues, etc., however, anormalized, service-oriented integration framework is stillmissing. As running scientific workflows as a service in theCloud platforms involves a variety of systems and techniques,Researching and designing of a service-oriented frameworkcan help to standardize the integration procedure andinteraction between essential systems.III. SERVICE FRAMEWORKIn this section, we first present the available options forrunning scientific workflow within Cloud environment basedon different layers of SWFMSs. Then we discuss the serviceframework and analyze the details from different aspects,including layers, subsystems and interfaces.A. Available OptionsThe reference architecture for SWFMSs [22] is proposed asan endeavor to standardize the SWFMS research anddevelopment efforts. As shown in Fig. 1, the referencearchitecture consists of 4 logical layers, 7 major functionalsubsystems, and 6 interfaces. The first layer is the OperationalLayer, which consists of a wide range of heterogeneous anddistributed data sources, software tools, services, and theiroperational environments, including high-end computingenvironments. The second layer is called the Task ManagementLayer. This layer consists of three subsystems: Data gement.The third layer, called the WorkflowManagement Layer, consists of Workflow Engine andWorkflow Monitoring. Finally, the fourth layer – thePresentation Layer, consists of the Workflow Designsubsystem and the Presentation and Visualization subsystem.The reference architecture would allow the scientific workflowcommunity to focus on different layers and subsystems ofSWFMSs, and also enable such systems to interact andinteroperate with each other based on the interface definitions.A workflow-oriented cloud computing framework, calledWfOC [14], was introduced to support workflow-orientedapplication on multiple data centers. This framework includedworkflow-oriented cloud computing programming language,tasks extraction and composition, tasks and data sourcesregistration, tasks functions mapper/reducer and othercomponents, and enabled users to especially focus onworkflow definition and workflow tasks logic implementationwithout needing to worry about the distribution of data andtarget execution systems.Xiao Liu et al. [16] proposed a generic QoS frameworkcovering the major stages of a workflow lifecycle, for cloudworkflow systems. The framework consisted of fourcomponents: 1) QoS requirement specification, 2) QoS-awareservice selection, 3) QoS consistency monitoring 4) and QoSviolation handling. They also illustrated a concreteperformance framework as a case study and evaluated theeffectiveness of the performance framework in their cloudworkflow system.Those works mentioned above were mainly focused ondifferent aspects of the deployment and management ofintegrating workflows into Clouds, including underlyingFig. 1. A reference architecture for SWFMSsWe argue that the above reference architecture is still validfor a Cloud-enabled SWFMS. Here, we consider four possiblesolutions for deploying the proposed reference architecture in aCloud computing environment:1) Operational-Layer-in-the-Cloud. In this solution, onlythe Operational Layer lies in the Cloud with an SWFMSrunning out of the Cloud. An SWFMS can now leverage Cloud

applications as another type of task components. In contrast toother applications, Cloud-based applications can takeadvantage of the high scalability provided by the Cloud and theinfinite resource capacity provisioned by large data centers.This solution also relieves a user the concern of vendor lock-indue to the relative ease of using alternative Cloud platforms forrunning Cloud applications. However, the SWFMS itselfcannot benefit from the scalability offered by the Cloud.2) Task-Management-Layer-in-the-Cloud. In this solution,both the Operational Layer and the Task Management Layerwill be deployed in the Cloud. In contrast to traditionaldeployment strategies, Data Product Management, ProvenanceManagement, and Task Management can now leverage thehigh scalability provided by the Cloud. In particular, DataProduct Management and Provenance Management can takeadvantage of the data models provided by the Cloud, such asblobs, tables, and queues provided by Microsoft Azure. In themeanwhile, Task Management, rather than accommodating theuser’s request based on a batch-based scheduling system, allready tasks can now be immediately deployed over someCloud computing nodes and get executed instead of waiting ina job queue for the availability of resources. One limitation ofthis solution is that the economic cost associated with thestorage of provenance and data products in the Cloud. Possibleworkflow tasks might also be restricted to the types ofapplications and environments (VM instances created byimages) that are supported by a particular Cloud infrastructure,which is yet to be standardized. Moreover, although taskscheduling and management can benefit from the scalabilityoffered by the Cloud, workflow scheduling and managementare not since the workflow engine runs outside of the Cloud.and the Workflow Management Layer are deployed in theCloud with the Presentation Layer deployed at a clientmachine. This solution provides a good balance betweensystem performance and usability: the management ofcomputation, data, and storage and other resources are allencapsulated in the Cloud, while the Presentation Layerremains at the Client machine to support the key architecturalrequirement of user interface customizability and userinteraction support [8]. Such a solution is also most suitable fora scientific workflow application system in which ad hocdomain-specific requirements are constantly evolving,demanding constant changes to the Presentation Layer for thatdomain. In this solution, both workflow and task managementcan benefit from the scalability offered by the Cloud, but thedownside is that they become more dependent on the Cloudplatform over which they run.4) All-in-the-Cloud. In this solution, a whole SWFMS isdeployed inside the Cloud and accessible via a Web browser. Adistinct feature of this solution is that no software installation isneeded for a scientist to use an SWFMS and an SWFMS canfully take advantage of all the services provided in a Cloudinfrastructure. Moreover, the Cloud-based SWFMSs canprovide highly scalable scientific workflow and taskmanagement as services, providing one kind of Software-as-aService (SaaS). One concern the user might have is theeconomic cost associated with the necessity of using Cloud ona daily basis, the dependency on the availability and reliabilityof the Cloud, as well as the risk associated with vendor lock-in.One way to address such a concern is to use an on-premiseCloud or a hybrid Cloud, in which public Clouds are used onlyfor shifting out peak workloads.3) Workflow-Management-Layer-in-the-Cloud. In thissolution, the Operational Layer, the Task Management Layer,Fig. 2. The Service FrameworkAs we described, each of the above solutions has its consand pros. In practice, a hybrid approach might be desirable, inwhich for each layer, one subsystem or a piece of thesubsystem is deployed in the Cloud, while the rest is deployedoutside of the Cloud. For each solution, a refinedmicroarchitecture for each layer and subsystem is an importantresearch problem. We envision that in the future, manysolution instances of the proposed reference architecture willcoexist, each optimized for a particular deployment strategy. Inthe meanwhile, as each solution instance conforms to the ecture,interoperability is ensured.B. Service FrameworkFor easy integration with a Cloud platform, a “TaskManagement-layer-in-the-Cloud” approach can be chosen byimplementing, for instance an “Amazon EC2” provider toSwift, then tasks in a Swift workflow can be submitted into

EC2 and executed on EC2 VM instances. However, thisapproach would leave most of the workflow management anddynamic resource scaling outside the Cloud. For applicationdevelopers, we would like to free them from complicatedCloud resource configuration and provisioning issues, andprovide them with the convenience and transparency toscalable Cloud resources, therefore we choose to take roach,which requires minimal configuration at the client side andsupports easy deployment with virtualization techniques.We propose a structured service framework that covers allthe major aspects involved in the migration and integration ofSWFMSs into the Cloud, from client-side workflowspecification, service-based workflow submission andmanagement, task scheduling and execution, to Cloud resourcemanagement and provisioning. As illustrated in Fig. 2b, theservice framework includes 4 layers, 8 components and 6interfaces. Fig. 2a shows a typical service stack of Cloudcomputing: on top of the IaaS layer, the WaaS is designed toprovide workflow as a service for researchers and applicationdevelopers. We position the WaaS layer across both the Saasand PaaS layer, because our proposed service framework canalso be applied to provide workflow platform as a service forrelated scientists.C. LayersThe first layer is the Infrastructure Layer, which consistsof multiple Cloud platforms with the underlying server, storageand network resources. This layer provides IaaS level supportsuch as the management of the fundamental physicalequipment, virtual machines and storage systems to upperlayers. The separation of the Infrastructure Layer from otherlayers isolates the science-focused and technology-independentproblem solving environment from the underlying fastadvancing high-end computing infrastructure.The second layer is called the Middleware Layer. Thislayer is responsible for resource management and provisioning,and responding to requests from upper-layer and supportingvarious scheduling frameworks. All the operations that need toaccess the underlying resources are encapsulated in this layer.According to the description in the Integration Options section,this layer is responsible for the requirements requested by theTask-Management-Layer-in-the-Cloud option. Moreover, theseparation of the Middleware Layer from the InfrastructureLayer promotes the extensibility of the Infrastructure Layerwith new Cloud platforms and new high-end computingfacilities, and localizes system evolution due to hardware orsoftware advances to the interface between the InfrastructureLayer and the Middleware Layer.The third layer is the Service Layer, which is responsiblefor providing scientific workflow management as a service tothe upper clients and realizing the execution and monitoring ofscientific workflows. This layer also provides interfaces tosupport various workflow engines. According to the integrationoptions, the Service Layer fulfills the requirements addressedin the Workflow-Management-Layer-in-the-Cloud option. Theseparation of the Service Layer from the Middleware Layerconcerns two aspects: 1) it isolates the choice of a workflowmodel from the choice of a task model, so changes to theworkflow structure do not need to affect the structures of tasksand 2) it separates workflow scheduling from task execution,thus provides space for performance and scalability of thewhole management system.The fourth layer is the Client Layer, which provides thefunctionality of workflow design, specification, visualizationand various user interfaces and tools for workflow submission,resource configuration etc. The Client layer may be out of theCloud to circumvent the disadvantages discussed in the All-inthe-Cloud option. The separation of the Client Layer fromother layers provides the flexibility of customizing the userinterfaces of the system and promotes the reusability of the restof system components for different scientific domains.D. SubsystemsThe eight major functional subsystems correspond to thekey functionalities required for workflow management as aservice in the Cloud. Although the reference framework mayallow the introduction of additional subsystems and theirfeatures in each layer, this paper only focuses on the majorsubsystems and their essential functionalities.The Workflow Specification & Submission subsystem isresponsible for producing workflow specifications representedin a workflow specification language that supports a particularworkflow model, and the submission of workflows to theCloud Workflow Management Service subsystem. TheWorkflow Specification & Submission subsystem may provideusers with a standalone or Web-based workflow designer,which may support both graphical- and scripting-based designinterfaces, and a workflow submission component to submitworkflows. The interoperability of workflows should beaddressed in this subsystem by the standardization andconversion of workflow languages.The Workflow Presentation & Visualization subsystem isimportant especially for data-intensive and visualizationintensive scientific workflows, in which the presentation ofworkflows and visualization of various data products andprovenance metadata in multi-dimensions are key to gaininginsights and knowledge from large amount of data andmetadata.The Cloud Workflow Management Service subsystem actsas an intermediary between the workflow client and thebackend Cloud Resource Manager, and is the key service in theservice framework provided to researchers interested in usingCloud-based scientific workflow. It supports the followingfunctionalities: workflow language compilation, workflowscheduling, resource acquisition, and status monitoring. Inaddition, the implementation of fault-tolerance mechanism canalso be defined in the service.The Workflow Engines subsystem supports variousworkflow engines and can be specified by end-users from theWorkflow Specification & Submission subsystem. A workflowengine is the heart of a workflow system and responsible forcreating and executing workflow runs according to a workflowrun model, which defines the state transitions of each scientificworkflow and its constituent task runs. A workflow runconsists of a coordinated execution of tasks, each of which iscalled a task run. The interoperability of workflows should be

addressed by the standardization of interfaces, workflowmodels, and workflow run models, so that a scientificworkflow or its constituent sub-workflows can be scheduledand executed in multiple Workflow Engines that are providedby various vendors.The Cloud Resource Manager (CRM) subsystem is aresource management framework that bridges Cloud WorkflowManagement Service with various Cloud platforms. It providesscientific workflows with Cloud resource provisioning as aservice and the workflows can benefit from the scalabilityoffered by the Cloud. Meanwhile, the dependency on Cloudplatforms can be reduced as implementations for various Cloudplatforms can be provided, ranging from commercial to opensource ones, including Amazon EC2, OpenNebula, Eucalyptus,CloudStack, etc.The Scheduling Management Service subsystem is aframework that bridges Cloud Resource Manager with variousTask Scheduling Frameworks. It provides a set of operationsfor the deployment and management of various schedulingframeworks according to configurations specified by users.The Task Scheduling Frameworks subsystem consists ofmultiple scheduling frameworks, such as Falkon[20], Sparrow,Gearman, and so on, and the framework can be specified byend-users through configuration. It is devised to schedule tasksdelivered from the Workflow Engines subsystem.The Cloud Platforms Subsystem refers to various supportedCloud platforms in general and the functionalities can besummarized from the Infrastructure Layer.E. InterfacesIn the reference framework, six interfaces are explicitlydefined, which show how each subsystem interacts with othersubsystems. The interoperability between the subsystemsshould be addressed by standardizing the interfaces providedby each subsystem.Interface I1 provides a set of interfaces for thecommunication between Workflow Specification &Submission subsystem and the Cloud Workflow ManagementService, so workflow specifications created by workflowdesign tools can be submitted to a workflow executionenvironment for compiling, scheduling, and management.Interface I2 provides a series of interfaces for Cloud WorkflowManagement Service to interact with Cloud ResourceManager: the Cloud Workflow Management Service sendsresource request to allocate specified cluster resources, and theCloud Resource Manager replies with the cluster informationfor task execution. Interface I3 provides a series of interfacesfor the Cloud Resource Manager to communicate with theScheduling Management Service: upon the specified resourcerequests from Cloud Workflow Management Service arereceived, the Cloud Resource Manager provisions resourcesand deploys the user-specified Task Scheduling Frameworkinto the cluster based on the services provided by theScheduling Management Service, then sends clusterinformation back to the Cloud Workflow Management Service.Interface I4 provides a set of interfaces for the Cloud ResourceManager to interact with underlying Cloud Platforms, mostlyfor resource provisioning, monitoring and recycling. InterfaceI5 provides a series of interfaces for the SchedulingManagement Service to interact with Task SchedulingFrameworks subsystem: the supported operations uponscheduling frameworks are defined here. Interface I6 provides aset of interfaces to interoperate with deployed WorkflowEngines. Workflow Specifications can be passed through todefault or user-specified workflow engine for execution.F. DiscussionThe motivation of our work is to break through workflows’dependence on the underlying resource environment, and takeadvantage of the scalability and on-demand resource allocationof the Cloud. We present a layered service framework for theimplementation and application of integrating SWFMSs intomanifold Cloud platforms, which can also be applicable whendeploying a workflow system in Grid environments. Theseparation of each layer enables abstractions and differentindependent implementations for each layer, and provides theopportunity for scientists to develop a stable and familiarproblem solving environment where rapid technologies can beleveraged but the details of which are shielded transparentlyfrom the scientists who need to focus on science itself. TheInterfaces defined in the framework is flexible andcustomizable for scientists to expand or modify according totheir own specified requirements and environments.IV. IMPLEMNTATION AND EXPERIMENTIn this section, we first describe our experience inintegrating the Swift scientific workflow management system[10] with different Cloud platforms based on the serviceframework. Then we show our experiment results ofimplementation for both the OpenNebula [1] and Eucalyptus [9]platforms to demonstrate the practicability and capability of theservice framework.A. Implementation Architecture & InterfacesFig. 3. Integration ArchitectureWe implement the service framework for both theOpenNebula and Eucalyptus platforms and we show theintegration architecture in Fig. 3. The implementation supportsworkflow specification and submission, on-demand virtualcluster provisioning, high-throughput task scheduling andexecution, and scalable resource management in the Cloud.The layers, systems and interfaces displayed in the integrationarchitecture can be easily mapped into the proposed serviceframework.

As the implementation of service framework includes avariety of systems and techniques, for the purpose of clarity,we list the subsystems, corresponding to Fig. 2, in Table 1.And we also point out which subsystems are directly from theoriginal systems and which are implemented for theintegration.We also define a series of interfaces to standardize thecomplicated interactions between different essentialsubsystems. We list the key interfaces in Table 2, and point outthe implementation status and interaction relationships. Furtherdetails about these interfaces are available at our website1.TABLE I.ComponentsOpenNebula /EucalyptusFalkon SchedulingFrameworkDescriptionreuseminor revisionimplementedCMRimplementedSwift Systemminor revisionCWMSimplementedClient SubmissionToolimplementeda.SubsystemsCloud Platforms(Abbr. CP)Task Scheduling Frameworks(Abbr. TSF)Scheduling ManagementService (Abbr. SMS)Cloud Resource Manager(Abbr. CRM)Workflow Engines(Abbr. WE)Cloud Workflow ManagementService (Abbr. CWMS)Workflow Specification &Submission(Abbr. WSS)“reuse”: we directly reuse the available components for integration“minor revision”: we reuse the available components after customization.c.“implemented”: we implement the components from design to test.TABLE II.InterfacesInterface I1Interface I2Interface I3Interface I4Interface I5Interface I6a.b.SUBSYSTEMS IMPLEMENTATION DESCRIPTIONSMSb.INTERFACES IMPLEMENTATION entedimplementedunder evaluationunder evaluationInteraction BetweenWSS and CWMSCWMS and CRMCRM and SMSCRM and CPSMS and TSFCWMS and WE“implemented”: we define and implement the interfaces.“under evaluation”: represents those interfaces have already been defined butstill need further adjustment and evaluation for detail implementation.B. The Swift Workflow Management SystemSwift is a system that bridges scientific workflows withparallel computing. Swift takes a structured approach toworkflow specification, scheduling, and execution. It consistsof a simple scripting language called SwiftScript for concisespecification of complex parallel computations based ondataset typing and iterations [17], and dynamic datasetmappings for accessing large-scale datasets represented indiverse data formats.The Swift system architecture consists of four majorcomponents: Program Specification, Scheduling, Execution,and Provisioning, as illustrated in Fig. 4. Computations are1specified in SwiftScript, which has been shown to be simpleyet powerful. SwiftScript programs are compiled into abstractcomputation plans, which are then scheduled for execution bythe workflow engine onto provisioned resources. Resourceprovisioning in Swift is very flexible, tasks can be scheduled toexecute on various resource providers, where the providerinterface can be implemented as a local host, a cluster, a multisite Grid, or the Amazon EC2 framework/index.html.Fig. 4. Swift System ArchitectureThe four major components of the Swift system can beeasily mapped into the four layers in the SWFMSs referencearchitecture: the specification falls into the Presentation Layer,although SwiftScript focuses more on the parallel scriptingaspect for user interaction than on Graphical representation; thescheduling components correspond to the WorkflowManagement Layer; the execution components maps to theTask Management layer; and the provisioning layer can bethought as mostly in the Operational Layer.C. Experiment ConfigurationOpenNebula: We use 6 machines in the experiment, eachconfigured with Intel Core i5 760 with 4 cores at 2.8GHZ, 4GBmemory, 500GB HDD, and connected with Gigabit EthernetLAN. The configuration for each VM is 1 core, 1.5GBmemory, 20GB HDD, and we use KVM as the hypervisor. Oneof the machines is used

Architecting Cloud Workflow: Theory and Practice Yong Zhao, Youfu Li School of Computer Science and Engineering Univ. of Electronic and Science Technology of China Chengdu, China {yongzh04, youfuli.fly}@gmail.com Ioan Raicu Department of Computer Science Illinois Insti