A Nuxeo Whitepaper Architecting Applications To Scale In .

Transcription

A Nuxeo WhitepaperArchitectingApplicationsto Scale in theCloudCopyright 2017 Nuxeo. All rights reserved.

ArchitectingApplications to Scalein the CloudTable ofContentsExecutive Summary. 1Between IaaS and SaaS.1Nuxeo and AWS.1AWS for Content Management. 2Architecting Applications for the Cloud. 3Introduction to the Nuxeo Platform. 5Tailoring Apps to Individual IaaS Capabilities. 6Scale Out and Distributed Architecture. 9CPU Scale Out. 9Understanding the Risks of Legacy ECMDedicated Resources and Specific Processing Requirements. 10Storage Scale Out.12Query Scale Out.12Scaling Out with Multiple Data Stores.13Scaling Out with NoSQL.14Leveraging Cloud Opportunities. 16About Nuxeo.17

ExecutiveSummaryAmazon Web Services (AWS) and similar cloud-service offerings haverevolutionized the ways in which organizations approach applicationdevelopment and deployment. Through cloud services, organizationscan obtain on-demand—inexpensive, modular infrastructures fordeploying virtual instances of networking components, storagerepositories, compute platforms, and management frameworks. Ascloud services continue to mature, the Platform-as-a-Service (PaaS)model has been steadily gaining recognition and favor.The abstraction layers provided when using a PaaS model offer aparticularly effective means for developers to focus on coding and toimplement a variety of business applications efficiently on public clouds.These abstractions also relieve administrators of many of theconfiguration and deployment challenges, as well as maintainingtraditional hardware servers, providing a ready means for selfprovisioning IT resources, and easily configuring the PaaS environmentthrough administrative tools and control panels to address changingrequirements.Between IaaS and SaaSUnderstanding the Risks of Legacy ECMIn a typical implementation, PaaS serves as a partial operating systemand middleware, residing in the technology stack between an underlyingInfrastructure-as-a-Service (IaaS) layer and the Software-as-a- Service(SaaS) layer that provides a user interface. As discussed throughout thispaper, the combination of AWS and PaaS opens up many advantagesand provides numerous mechanisms for developers and architectsto design, plan, and implement new technologies and prototype newsolutions without the cost and commitment of traditional computing.Software developers following agile development processes have theflexibility to move from ideas and concepts through rapid development1to a marketable product— swiftly and cost effectively—within aresponsive PaaS environment.Nuxeo and AWSUsing a PaaS model optimized for the capabilities and features of AWS,Nuxeo has designed an extensible modular framework for handlinghigh-volume document management and complex digital assetmanagement tasks—with high scalability and reliability. This designframework makes it possible for developers to efficiently leverageAWS cloud services and customize the platform for a diverse varietyof requirements. This paper provides an architectural overview ofthe fundamental mechanisms for building and deploying contentmanagement applications on AWS using the Nuxeo Platform.

AWS for ContentManagementSince the introduction of Amazon Simple Storage Services (S3) in2006, AWS has evolved into a powerful, responsive infrastructure thathas helped shape the way in which applications are developed andhow enterprises handle IT requirements—providing commodity-levelaccess to a full range of cloud services. Several AWS characteristicsmake it well suited for content management tasks, including: Fast, low-latency content distribution: Delivery of content overthe web to end users using Amazon CloudFront is a fast, costeffective way for handling everything from streaming video toentire dynamic websites. Through integration with other AWSofferings, such as S3 and EC2, content delivery can be optimizedto take advantage of Amazon’s global network of edge locations,minimizing latency and boosting performance. As a usage-basedservice with no commitment, the only costs accrued derive fromthe actual volume of content delivered. Content collaboration canbe handled efficiently with full access for workgroups spread outacross multiple locations. Even extremely large files, such as thoseencountered in engineering projects and digital asset management environments, can be delivered securely worldwide—at highspeed—to support collaborative efforts. Inexpensive, reliable storage: S3, Amazon’s pioneering cloudstorage service, continues to provide value to enterprises thatrequire a reliable, scalable method for storing many kinds of digitalassets in the cloud. Developers gain secure access to object containers, referred to as buckets, that are addressable by URL, locatedin a specified geographical region, and scalable to accommodateescalating storage demands. Elastic compute resources: Content-centric applications oftenplace varying demands on compute resources for media processing, handling different levels of user traffic, data migration, sortingand indexing, and other tasks. AWS provides a number of elasticcapabilities that can scale to meet demands, including the capability to scale out virtual servers (EC2), media transcoding (ElasticTranscoder), perform Auto Scaling, and adjust load balancing (ELB). Proven work environment: AWS has refined and enhanced itscloud service offerings over many years to ensure a secure andreliable work environment for the mission-critical applications encountered in content management operations. Important featuresare available and accessible, including database snapshots,automated load balancing, key performance metrics, high-volumebandwidth availability, monitoring, and so on.Architecting Applications to Scale in the Cloud2

Other AWS capabilities support content-centric applications very effectively. For example, disaster recovery can be accomplished rapidlyusing failover techniques, minimizing downtime and potential enterpriselosses from business interruptions. In digital asset management scenarios, back-end processing nodes can be separated from front-endservers so that the scaling of images and video processing can behandled more efficiently.AWS provides a capable IaaS framework for new-generation digitalasset management projects, distributed content collaboration, andglobal content distribution.ArchitectingApplications forthe CloudSimply porting traditional applications over to a cloud platform is unlikelyto be successful, resulting in numerous operational and performanceissues. Architects and developers knowledgeable about cloud servicesrecognize this and take measures to avoid the problems. To take full advantage of a cloud-services environment, whether IaaS, PaaS, or SaaS,the key to success entails understanding the limitations of the environment and then effectively leveraging the built-in advantages. Designingan application from the ground up—the best way to build apps for thecloud—requires that developers adopt a different mindset andmaster a new paradigm.For example, architecting an application that can rapidly adapt to highvolumes of transactions and a varying number of users typically requiresthat developers create a large amount of code to handle the demands ofscaling, which might include caching, database scaling, asynchronousmessaging, and so on. A well-designed PaaS platform will already beoptimized with built-in capabilities for these functions. Instead of writingcode, the developer can simply tap into the appropriate functions asneeded.General guidelines for architecting applications for availability in thecloud include: Anticipate failure: Be aware of the possibility that parts of thecloud may sometimes fail. Design and test applications for resiliencyand the ability to respond to failures. Componentize applications sothat multiple modules that communicate with each other through anAPI can recover independently, if necessary. Data sharding, application sharding, and multiple deployments of critical components areuseful in this regard.Architecting Applications to Scale in the Cloud3

Useful references on thistopic include: Open Data CenterAlliance: ArchitectingCloud-Aware Applications Architecting the Cloud:Design Decisions forCloud Computing ServiceModes (SaaS, PaaS, andIaaS) Architecting Applicationsfor the Cloud: BestPractices Employ stateless computing techniques: Using stateless protocols, such as IP and HTTP, eliminates the possibility that a statestored in memory will be lost due to an outage or service interruption. Because the internal state of an application won’t be availableas conditions change or failures take place, store these states in anobject store, database, or message queue. When scaling, scale out: Scaling up in a cloud-based environmentalways runs the risk that an application will run out of resources(for example, scaling up a database beyond the constraints of theserver performance on which the database is running). Scaling outhorizontally is much more effective, offering essentially unlimitedscalability that takes advantage of elasticity of cloud resources. Keep data consistency in mind: Because there can be multipleinstances of an application residing in different geographic regions,some changes in the database or the application may not bereflect-ed for a number of milliseconds. To maintain a model wheredata is replicated and highly available within this environment,developers must devise an approach that handles potentialinconsistencies when different application instances draw from thesame database. Data sharding based on different geographicregions is one way to accomplish this. Use event-driven behaviors: Synchronous calls that requiretimed responses typically don’t work well in the cloud. Instead, codeap-plications to be event driven, responding to actions from thesystem or individual users. To maintain high availability in the cloud,create a queue for requests that can be processed by differentapplication instances. This can boost performance and minimizelosses to a single transaction if an application instance fails.Investigate the specifications of the PaaS that you select to fully understand the capabilities and feature set as you begin development. In thecase of the Nuxeo Platform, even if your application requirements aremodest, such as basic document management or asset managementthat includes workflow processes, built-in cloud-aware capabilities thathave been architected into the product can save considerable time andeffort, as well as improving the reliability of your application. Nuxeo alsofrequently updates the platform to include new features and capabilities,so that as cloud service technologies evolve, architects and developerscan take advantage of the latest enhancements.Architecting Applications to Scale in the Cloud4

Introducingthe NuxeoPlatformThe Nuxeo Platform, when coupled with AWS and the availabletoolset, provides a comprehensive, extensible Platform-as-aService, readily adaptable to business application development.Through the efficiencies of operating tightly with AWS, the NuxeoPlatform enables architects and developers to easily build and runcontent-focused business applications that can handle extremelylarge document sets (even at volumes ranging into the billions).Several pre-built applications are included, allowing developmentteams to quickly deploy and launch a number of fully featuredcontent-management tools or customize these applications to meetspecific requirements. These modern technologies, powerful plug-inmodel, integrated development environment, and flexible packagingcapabilities make the Nuxeo Platform an ideal environment to rapidlydesign, develop, and deploy applications, on premises or within a cloudenvironment.Nuxeo Platform supports the creation of end-to-end workflows—through a graphical interface— for performing content-managementprocesses. Alternatively, applications can be built within the integrateddevelopment environment, accessing the exposed functionality in anAPI that supports the representational state transfer (REST) model.Architecting Applications to Scale in the Cloud5

Tailoring Appsto IndividualIaaS CapabilitiesCapabilities and functionality of individual IaaS offerings vary. Formaximum efficiency, interoperability, and performance, when buildingapplications to run on top of an IaaS framework, developers gainmany advantages by exploiting the built-in features, components, andcapabilities available.The infrastructure offered by the cloud provider typically: Includes components that are fully integrated and tested forinteroperability Features mechanisms for manual or automated scaling of virtualservers, storage, network resources, and other system resources Provides an easy way to monitor ongoing costs by usage, with billinglimited to those computeresources that are used Costs substantially less to use than comparable on-premisessystemsAs a developer, when architecting a solution to deploy in an IaaSenvironment, you should: Rely as much as possible on open standards, as well as industrystandards that are widely accepted. IaaS frameworks are more likely to provide a SQLDatabase thana RDF Graph database Build solutions based on a pluggable architecture. A pluggable architecture lets you adapt your application to thenew infrastructure more easily. Typically, to leverage the cloud infrastructure, you will need tomodify some of the application’s low-level implementation.The Nuxeo Platform supports both a standards-based architecture andpluggable component model (Extension Point). Nuxeo is well suited forleveraging the AWS infrastructure, featuring: Meta-Data Store: Oracle RDS or PostgreSQL Native plug-in is provided (not specific to AWS). Binary Store: S3 Binary Store A pluggable Binary Manager is used to leverage the S3 storagecapabilities. Cache: ElasticCache / Redis The pluggable cache infrastructure supports Redis. Distributed Job queuing has been based on Redis for sometime.Architecting Applications to Scale in the Cloud6

These same basic concepts apply to a number of cloud-specificservices, including provisioning and monitoring.As much as possible, the solution should fit in the model as defined bythe IaaS provider and exploit those capabilities that make it easy toperform operations without the need for extensive coding. Ideally, thisincludes the capability to configure automated processes that wouldotherwise need to be manually managed or configured.Developers and system administrators can manage and provisionresources for Nuxeo deployments using AWS CloudFormation, whichprovides an orderly mechanism to handle resource management, usingtemplates to simplify the process. CloudFormation handles many of thelow-level details, such as the order of provisioning, associateddependencies, and run-time parameters to support reliable running ofapplications.By default, Nuxeo is packaged asDebian packages and also hasavailable installers for a number ofother environments.Nuxeo Platform can also be set upusing Amazon Machine Instance.Figure 1: Nuxeo Platform standard installation in an on-premises environmentFor monitoring and managingresource use, Nuxeo exposes itsmetrics by means of JavaManagement Extensions (JMX).These metrics can be accessed byAWS CloudWatch and used to engageAutoScaling to respond rapidly tochanging application demands.Figure 2. Nuxeo Platform on AWS with RDS, S3, ELB, CloudWatch,ElasticCacheArchitecting Applications to Scale in the Cloud7

Scale Out andDistributedArchitectureScaling fluidly to meet the demands of business is a key advantage ofcloud services. The AWS IaaS includes a number of features that makeit possible to anticipate demand and scale automatically, removingthe need for administrative monitoring and manual actions to addressresource issues.The underlying architecture of cloud services makes it easy to quicklyprovision new servers (scaling out), but much more difficult to quicklyprovision the basic resources of a virtual server. For example, scaling upto enable a more powerful processor, quadruple the amount of physicalRAM available to a processor, or make similar changes to core hardwarecapabilities cannot be accomplished easily when using typical cloudservices. However, you can increase available processing power usingclustering of virtual servers to scale out, effectively achieving the sameprocessing gains as would be achieved by scaling up. Alternatively,to meet specialized application requirements, you can select virtualmachines (VMs) that are pre-provisioned with certain characteristics,such as VMs optimized for handling heavy I/O or providing substantialamounts of memory.The bottom line is: your application will be able to scale in the cloud ifit supports scale out. If you are limited to scale up, you won’t be ableto gain much benefit from cloud services.Nuxeo Platform architecture can scale out along different axes,adapting to whatever type of demands are to be absorbed, as describedin the following sections.CPU Scale OutNuxeo processing demands can easily be scaled out using the built-inclustering model. In support, AWS includes specific features to accommodate high-performance computing in the cloud, using ClusterCompute. This lets you scale out applications across thousands ofcores to more effectively handle massive throughput demands withtightly-coupled I/O across a high-bandwidth network.Using several mid-size VMs, you can build a high-performance NuxeoPlatform application and then use AutoScaling to automatically add oneor more VMs when the load increases.Architecting Applications to Scale in the Cloud9

Figure 3. Nuxeo Clustering with Scale Out and AutoScalingDedicated Resources and Specific P ocessingRequirementsCertain types of processing operations require specific resources andhardware. AWS offers several types of VM configurations to meet arange of requirements, including: I/O provisioned to handle I/O-intensive operations Graphic processing unit (GPU) or High CPU to take care ofprocessor-intensive tasks, such as videotranscoding or ray-traced 3Drendering High-memory VM to accommodate applications that require largeblocks of contiguous memory space to operate efficientlyThe Nuxeo Platform architecture provides the flexibility to dedicatenodes for specific types of processing operations so that you can: Use a general-purpose VM to accomplish basic tasks Ensure good response time for the interactive users Ensure cost-effective processing Leverage specific VMs for particular demanding tasksArchitecting Applications to Scale in the Cloud10

Dedicated processing nodes can be useful for: Performing video transcoding Processing high-resolution digital images Scanning large files for viruses Performing optical character recognition on scanned images Running cryptographic algorithms to encrypt and decrypt files Indexing large collections of documentsFor simplicity, Nuxeo recommends that you deploy exactly the sameNuxeo image on each node. The one exception would be for hardwareand the work-queue configurations. This ensures that if the dedicatednodes become unavailable, the standard nodes can continue with theprocessing.Figure 4. Nuxeo cluster with dedicated nodes and redis WorkManagerArchitecting Applications to Scale in the Cloud11

Storage Scale OutTypically, solutions rely on scaling up the data tier. Even if this issomehow possible—with solutions like AWS RDS—it is usually not theoptimal approach. Scaling up is not cost effective. Scaling up cannot be progressive and transparent. Scaling up cannot be continued indefinitely.To address this, the Nuxeo Platform includes a number of options thatsupport scaling out, to improve processing at the data level.Query Scale OutWhen using the default storage back end, Nuxeo makes extensive useof the database. Queries in particular create a great deal of databaseactivity with a potential for bottlenecks. In certain configurations, theNuxeo Visible Content Store (VCS) generates complex SQL queriesthat keep the database server very busy.When the volume of data queries increases, the number of concurrent accesses increases as well. Queries typically present the primarybottleneck that diminishes the database server performance, slowingquery response times.At the SQL level, you essentially have two options for reducing thebottleneck: Use a database server with greater capacity. Denormalize the data to make the query run faster.The use of Elasticsearch as the primary query engine lets you transparently direct the query to scale on multiple nodes and maintain high-endperformance—even on massive volumes—using mid-range hardware.By nature, Elasticsearch does not try to enforce the same kind of integrity that ACID properties do. Because of that, it can be easily distributedacross several nodes, providing a very simple and efficient scale-outsolution for queries.Architecting Applications to Scale in the Cloud12

Figure 5. Nuxeo and Elasticsearch architectureAs concurrent users increase, the requests handled per secondscales very effectively. As demonstrated in results outlined in a Nuxeoblog1 entry, platform testing showed an impressive degree of scalingto handle concurrent requests even at volume levels of one billiondocuments.Scaling Out with Multiple Data StoresIn terms of storage, if a database server reaches it’s upper limits forstore-and-retrieve operations, the Nuxeo Platform supports datasharding across several repositories. With this capability, you canexceed the scale-up limitations of one database server, because theapplication can distribute data across several repositories, each ofthem associated with a single database. This effectively boosts bothdatabase performance and scalability.Elasticsearch makes it possible to maintain a unique index for performing federated search operations across multiple repositories. From adocument and asset management perspective, this mechanism canbe extremely useful, providing a single interface to a diverse range ofinformation resources and returning query results in a single list. It alsomakes it possible to link directly to each resource to further expand thesearch.Architecting Applications to Scale in the Cloud13

Figure 6. Nuxeo and Elasticsearch and MultiRepoScaling Out with NoSQLIf your database requirements approach the level of billions ofdocuments, traditional SQL databases may not be the right choice,even with the help of Elasticsearch. Nuxeo Platform support forMongoDB, a NoSQL database, extends high-volume databasecapabilities with powerful scaling. The NoSQL database movementoriginated in response to the well-recognized limitations of traditionalrelational da-tabases and the difficulties in storing and analyzingmassive volumes of data encountered in many of today’s webapplications.MongoDB offers these benefits in a content managementenvironment: Relies on a native distributed architecture Scales out very easilyWith the pluggable Nuxeo architecture, switching from a SQLbackend (VCS) to a MongoDB backend (DBS) becomes a basicconfiguration task that can be accomplished during deployment.Architecting Applications to Scale in the Cloud14

Figure 7. Nuxeo VCS/DBS with PGSQLArchitecting Applications to Scale in the CloudFigure 8. Nuxeo VCS/DBS with MongoDB15

Leveraging CloudOpportunitiesEvaluating cloud infrastructure pricing can be a challenge. The complexity is obvious just sorting through the AWS pricing rules.Despite the complexities, there are ways to gain tremendous businessvalue and leverage the opportunities available to your advantage. Forexample: Reserved instances are much less expensive if you make longterm commitments. Spot instances provide a great solution if their transient naturedoesn’t create problems.Leveraging these opportunities effectively requires distributing theapplication components across enough individual VMs that: You can use all the resources available You can have VMs going offline without breaking the systemAs shown in Figure 9, the Nuxeo approach is to use Docker to multiplex several Nuxeo Applications on top of a set of AWS VMs runningCoreOS. CoreOS is a Linux distribution that is specifically tailored torun Docker containers and provide distributed communicationbetween the containers. This approach conforms to large-scaledocument man-agement tasks very effectively. Using this architectureprovides high resilience, fast scale out, and a favorable ratio betweenperformance and cost.Figure 9. Nuxeo CoreOS DockerArchitecting Applications to Scale in the Cloud16

About NuxeoNuxeo incorporates all the elements of modern-dayarchitecture. A services-based platform exposing hundredsof content, data, and workflow API’s, all delivered on a highlyscalable component-based cloud-native architecture.Let Nuxeo show you how to deliver tomorrows applicationstoday, faster than you thought imaginable. At Nuxeo, we arerevolutionizing the way organizations look at content anddata together!Visit nuxeo.com

Architecting Applications for the Cloud: Best Practices. Architecting Applications to Scale in the Cloud 5 The Nuxeo Platform, when coupled with AWS and the available toolset, provides a c