NIST Big Data Interoperability Framework: Vol. 5, Architectures White .

Transcription

NIST Special Publication 1500-5NIST Big Data InteroperabilityFramework:Volume 5, Architectures White PaperSurveyFinal Version 1NIST Big Data Public Working GroupReference Architecture SubgroupThis publication is available free of charge from:http://dx.doi.org/10.6028/NIST.SP.1500-5

NIST Special Publication 1500-5NIST Big Data Interoperability Framework:Volume 5, Security and PrivacyFinal Version 1NIST Big Data Public Working Group (NBD-PWG)Reference Architecture SubgroupInformation Technology LaboratoryThis publication is available free of charge mber 2015U. S. Department of CommercePenny Pritzker, SecretaryNational Institute of Standards and TechnologyWillie May, Under Secretary of Commerce for Standards and Technology and Director

NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 5, ARCHITECTURES WHITE PAPER SURVEYNational Institute of Standards and Technology (NIST) Special Publication 1500-553 pages (September 16, 2015)NIST Special Publication series 1500 is intended to capture external perspectives related to NISTstandards, measurement, and testing-related efforts. These external perspectives can come fromindustry, academia, government, and others. These reports are intended to document externalperspectives and do not represent official NIST positions.Certain commercial entities, equipment, or materials may be identified in this document in order to describe anexperimental procedure or concept adequately. Such identification is not intended to imply recommendation orendorsement by NIST, nor is it intended to imply that the entities, materials, or equipment are necessarily the bestavailable for the purpose.There may be references in this publication to other publications currently under development by NIST inaccordance with its assigned statutory responsibilities. The information in this publication, including concepts andmethodologies, may be used by federal agencies even before the completion of such companion publications. Thus,until each publication is completed, current requirements, guidelines, and procedures, where they exist, remainoperative. For planning and transition purposes, federal agencies may wish to closely follow the development ofthese new publications by NIST.Organizations are encouraged to review all draft publications during public comment periods and provide feedbackto NIST. All NIST publications are available at s on this publication may be submitted to Wo ChangNational Institute of Standards and TechnologyAttn: Wo Chang, Information Technology Laboratory100 Bureau Drive (Mail Stop 8900) Gaithersburg, MD 20899-8930Email: SP1500comments@nist.govii

NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 5, ARCHITECTURES WHITE PAPER SURVEYReports on Computer Systems TechnologyThe Information Technology Laboratory (ITL) at NIST promotes the U.S. economy and public welfare byproviding technical leadership for the Nation’s measurement and standards infrastructure. ITL developstests, test methods, reference data, proof of concept implementations, and technical analyses to advancethe development and productive use of information technology. ITL’s responsibilities include thedevelopment of management, administrative, technical, and physical standards and guidelines for thecost-effective security and privacy of other than national security-related information in federalinformation systems. This document reports on ITL’s research, guidance, and outreach efforts inInformation Technology and its collaborative activities with industry, government, and academicorganizations.AbstractBig Data is a term used to describe the large amount of data in the networked, digitized, sensor-laden,information-driven world. While opportunities exist with Big Data, the data can overwhelm traditionaltechnical approaches and the growth of data is outpacing scientific and technological advances in dataanalytics. To advance progress in Big Data, the NIST Big Data Public Working Group (NBD-PWG) isworking to develop consensus on important fundamental concepts related to Big Data. The results arereported in the NIST Big Data Interoperability Framework series of volumes. This volume, Volume 5,presents the results of the reference architecture survey. The reviewed reference architectures aredescribed in detail, followed by a summary of the reference architecture comparison.Keywordsapplication interfaces; architecture survey; Big Data; Big Data analytics; Big Data infrastructure; BigData management; Big Data storage; reference architecture.iii

NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 5, ARCHITECTURES WHITE PAPER SURVEYAcknowledgementsThis document reflects the contributions and discussions by the membership of the NBD-PWG, cochaired by Wo Chang of the NIST ITL, Robert Marcus of ET-Strategies, and Chaitanya Baru, Universityof California San Diego Supercomputer Center.The document contains input from members of the NBD-PWG Reference Architecture Subgroup, led byOrit Levin (Microsoft), Don Krapohl (Augmented Intelligence), and James Ketner (AT&T).NIST SP1500-5, Version 1 has been collaboratively authored by the NBD-PWG. As of the date of thispublication, there are over six hundred NBD-PWG participants from industry, academia, and government.Federal agency participants include the National Archives and Records Administration (NARA), NationalAeronautics and Space Administration (NASA), National Science Foundation (NSF), and the U.S.Departments of Agriculture, Commerce, Defense, Energy, Health and Human Services, HomelandSecurity, Transportation, Treasury, and Veterans Affairs.NIST would like to acknowledge the specific contributionsa to this volume by the following NBD-PWGmembers:Milind BhandarkarEMC/PivotalOrit LevinMicrosoftWo ChangNational Institute of Standards and TechnologyRobert MarcusET-StrategiesYuri DemchenkoUniversity of AmsterdamTony MiddletonLexisNexisBarry Devlin9sight ConsultingSanjay MishraVerizonHarry FoxwellOracle PressSanjay PatilSAPJames KobielusIBMThe editors for this document were Sanjay Mishra and Wo Chang.a“Contributors” are members of the NIST Big Data Public Working Group who dedicated great effort to prepareand substantial time on a regular basis to research and development in support of this document.iv

NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 5, ARCHITECTURES WHITE PAPER SURVEYTable of ContentsEXECUTIVE SUMMARY . VII1INTRODUCTION . 11.11.21.31.41.52BACKGROUND .1SCOPE AND OBJECTIVES OF THE REFERENCE ARCHITECTURE SUBGROUP .2REPORT PRODUCTION .3REPORT STRUCTURE .3FUTURE WORK ON THIS VOLUME .3BIG DATA ARCHITECTURE PROPOSALS RECEIVED . 42.12.22.32.42.52.62.72.82.9ET STRATEGIES .42.1.1 General Architecture Description .42.1.2 Architecture Model.52.1.3 Key Components .6MICROSOFT .82.2.1 General Architecture Description .82.2.2 Architecture Model.82.2.3 Key Components .9UNIVERSITY OF AMSTERDAM .102.3.1 General Architecture Description .102.3.2 Architecture Model.112.3.3 Key Components .12IBM.122.4.1 General Architecture Description .122.4.2 Architecture Model.132.4.3 Key Components .15ORACLE.152.5.1 General Architecture Description .152.5.2 Architecture Model.152.5.3 Key Components .16PIVOTAL .172.6.1 General Architecture Description .172.6.2 Architecture Model.172.6.3 Key Components .18SAP .202.7.1 General Architecture Description .202.7.2 Architecture Model.202.7.3 Key Components .219SIGHT .222.8.1 General Architecture Description .222.8.2 Architecture Model.222.8.3 Key Components .22LEXISNEXIS .242.9.1 General Architecture Description .242.9.2 Architecture Model.24v

NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 5, ARCHITECTURES WHITE PAPER SURVEY2.9.33SURVEY OF BIG DATA y Components .25BOB MARCUS .27MICROSOFT .29UNIVERSITY OF AMSTERDAM .30IBM.31ORACLE.33PIVOTAL .33SAP .349SIGHT .35LEXISNEXIS .35COMPARATIVE VIEW OF SURVEYED ARCHITECTURES .37CONCLUSIONS .41APPENDIX A: ACRONYMS . 1APPENDIX B: REFERENCES . 1FiguresFIGURE 1: COMPONENTS OF THE HIGH LEVEL REFERENCE MODEL .5FIGURE 2: DESCRIPTION OF THE COMPONENTS OF THE LOW LEVEL REFERENCE MODEL.6FIGURE 3: BIG DATA ECOSYSTEM REFERENCE ARCHITECTURE.9FIGURE 4: BIG DATA ARCHITECTURE FRAMEWORK .11FIGURE 5: IBM BIG DATA PLATFORM .13FIGURE 6: HIGH LEVEL, CONCEPTUAL VIEW OF THE INFORMATION MANAGEMENT ECOSYSTEM .16FIGURE 7: ORACLE BIG DATA REFERENCE ARCHITECTURE .16FIGURE 8: PIVOTAL ARCHITECTURE MODEL .17FIGURE 9: PIVOTAL DATA FABRIC AND ANALYTICS .18FIGURE 10: SAP BIG DATA REFERENCE ARCHITECTURE .21FIGURE 11: 9SIGHT GENERAL ARCHITECTURE .22FIGURE 12: 9SIGHT ARCHITECTURE MODEL .23FIGURE 13: LEXIS NEXIS GENERAL ARCHITECTURE.24FIGURE 14: LEXIS NEXIS HIGH PERFORMANCE COMPUTING CLUSTER .25FIGURE 15: BIG DATA LAYERED ARCHITECTURE .29FIGURE 16: DATA DISCOVERY AND EXPLORATION .31FIGURE 17(A): STACKED VIEW OF SURVEYED ARCHITECTURE .38FIGURE 17(B): STACKED VIEW OF SURVEYED ARCHITECTURE (CONTINUED) .39FIGURE 17(C): STACKED VIEW OF SURVEYED ARCHITECTURE (CONTINUED) .40FIGURE 18: BIG DATA REFERENCE ARCHITECTURE .42TablesTABLE 1: DATABASES AND INTERFACES IN THE LAYERED ARCHITECTURE FROM BOB MARCUS .27TABLE 2: MICROSOFT DATA TRANSFORMATION STEPS.29vi

NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 5, ARCHITECTURES WHITE PAPER SURVEYExecutive SummaryThis document, NIST Big Data Interoperability Framework: Volume 5, Architectures White PaperSurvey, was prepared by the NIST Big Data Public Working Group (NBD-PWG) Reference ArchitectureSubgroup to facilitate understanding of the operational intricacies in Big Data and to serve as a tool fordeveloping system-specific architectures using a common reference framework. The Subgroup surveyedcurrently published Big Data platforms by leading companies or individuals supporting the Big Dataframework and analyzed the material. This effort revealed a remarkable consistency of Big Dataarchitecture. The most common themes occurring across the architectures surveyed are outlined below.Big Data Management Structured, semi-structured, and unstructured dataVelocity, variety, volume, and variabilitySQL and NoSQLDistributed file systemBig Data Analytics Descriptive, predictive, and spatialReal-timeInteractiveBatch analyticsReportingDashboardBig Data Infrastructure In-memory data gridsOperational databaseAnalytic databaseRelational databaseFlat filesContent management systemHorizontal scalable architectureThe NIST Big Data Interoperability Framework consists of seven volumes, each of which addresses aspecific key topic, resulting from the work of the NBD-PWG. The seven volumes are: Volume 1, DefinitionsVolume 2, TaxonomiesVolume 3, Use Cases and General RequirementsVolume 4, Security and PrivacyVolume 5, Architectures White Paper SurveyVolume 6, Reference ArchitectureVolume 7, Standards Roadmapvii

NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 5, ARCHITECTURES WHITE PAPER SURVEYThe NIST Big Data Interoperability Framework will be released in three versions, which correspond tothe three development stages of the NBD-PWG work. The three stages aim to achieve the following withrespect to the NIST Big Data Reference Architecture (NBDRA).Stage 1: Identify the high-level Big Data reference architecture key components, which aretechnology-, infrastructure-, and vendor-agnostic.Stage 2: Define general interfaces between the NBDRA components.Stage 3: Validate the NBDRA by building Big Data general applications through the generalinterfaces.Potential areas of future work for the Subgroup during stage 2 are highlighted in Section 1.5 of thisvolume. The current effort documented in this volume reflects concepts developed within the rapidlyevolving field of Big Data.viii

NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 5, ARCHITECTURES WHITE PAPER SURVEY1 INTRODUCTION1.1 BACKGROUNDThere is broad agreement among commercial, academic, and government leaders about the remarkablepotential of Big Data to spark innovation, fuel commerce, and drive progress. Big Data is the commonterm used to describe the deluge of data in today’s networked, digitized, sensor-laden, and informationdriven world. The availability of vast data resources carries the potential to answer questions previouslyout of reach, including the following: How can a potential pandemic reliably be detected early enough to intervene?Can new materials with advanced properties be predicted before these materials have ever beensynthesized?How can the current advantage of the attacker over the defender in guarding against cybersecurity threats be reversed?There is also broad agreement on the ability of Big Data to overwhelm traditional approaches. The growthrates for data volumes, speeds, and complexity are outpacing scientific and technological advances in dataanalytics, management, transport, and data user spheres.Despite widespread agreement on the inherent opportunities and current limitations of Big Data, a lack ofconsensus on some important fundamental questions continues to confuse potential users and stymieprogress. These questions include the following: What attributes define Big Data solutions?How is Big Data different from traditional data environments and related applications?What are the essential characteristics of Big Data environments?How do these environments integrate with currently deployed architectures?What are the central scientific, technological, and standardization challenges that need to beaddressed to accelerate the deployment of robust Big Data solutions?Within this context, on March 29, 2012, the White House announced the Big Data Research andDevelopment Initiative.1 The initiative’s goals include helping to accelerate the pace of discovery inscience and engineering, strengthening national security, and transforming teaching and learning byimproving the ability to extract knowledge and insights from large and complex collections of digitaldata.Six federal departments and their agencies announced more than 200 million in commitments spreadacross more than 80 projects, which aim to significantly improve the tools and techniques needed toaccess, organize, and draw conclusions from huge volumes of digital data. The initiative also challengedindustry, research universities, and nonprofits to join with the federal government to make the most of theopportunities created by Big Data.Motivated by the White House initiative and public suggestions, the National Institute of Standards andTechnology (NIST) has accepted the challenge to stimulate collaboration among industry professionals tofurther the secure and effective adoption of Big Data. As one result of NIST’s Cloud and Big Data Forumheld on January 15–17, 2013, there was strong encouragement for NIST to create a public working groupfor the development of a Big Data Interoperability Framework. Forum participants noted that thisframework should define and prioritize Big Data requirements, including interoperability, portability,reusability, extensibility, data usage, analytics, and technology infrastructure. In doing so, the frameworkwould accelerate the adoption of the most secure and effective Big Data techniques and technology.1

NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 5, ARCHITECTURES WHITE PAPER SURVEYOn June 19, 2013, the NIST Big Data Public Working Group (NBD-PWG) was launched with extensiveparticipation by industry, academia, and government from across the nation. The scope of the NBD-PWGinvolves forming a community of interests from all sectors—including industry, academia, andgovernment—with the goal of developing consensus on definitions, taxonomies, secure referencearchitectures, security and privacy, and from these a standards roadmap. Such a consensus wouldcreate a vendor-neutral, technology- and infrastructure-independent framework that would enable BigData stakeholders to identify and use the best analytics tools for their processing and visualizationrequirements on the most suitable computing platform and cluster, while also allowing value-added fromBig Data service providers.The NIST Big Data Interoperability Framework consists of seven volumes, each of which addresses aspecific key topic, resulting from the work of the NBD-PWG. The seven volumes are: Volume 1, DefinitionsVolume 2, TaxonomiesVolume 3, Use Cases and General RequirementsVolume 4, Security and PrivacyVolume 5, Architectures White Paper SurveyVolume 6, Reference ArchitectureVolume 7, Standards RoadmapThe NIST Big Data Interoperability Framework will be released in three versions, which correspond tothe three stages of the NBD-PWG work. The three stages aim to achieve the following with respect to theNIST Big Data Reference Architecture (NBDRA.)Stage 1: Identify the high-level NBDRA key components, which are technology-, infrastructure-, andvendor-agnostic.Stage 2: Define general interfaces between the NBDRA components.Stage 3: Validate the NBDRA by building Big Data general applications through the generalinterfaces.Potential areas of future work for the Subgroup during stage 2 are highlighted in Section 1.5 of thisvolume. The current effort documented in this volume reflects concepts developed within the rapidlyevolving field of Big Data.1.2 SCOPE AND OBJECTIVES OF THE REFERENCE ARCHITECTURE SUBGROUPReference architectures provide “an authoritative source of information about a specific subject area thatguides and constrains the instantiations of multiple architectures and solutions.”2 Reference architecturesgenerally serve as a reference foundation for solution architectures, and may also be used for comparisonand alignment purposes. This volume was prepared by the NBD-PWG Reference Architecture Subgroup.The effort focused on developing an open reference Big Data architecture that achieves the followingobjectives: Provides a common language for the various stakeholders;Encourages adherence to common standards, specifications, and patterns;Provides consistent methods for implementation of technology to solve similar problem sets;Illustrates and improves understanding of the various Big Data components, processes, andsystems, in the context of vendor- and technology-agnostic Big Data conceptual model;Provides a technical reference for U.S. government departments, agencies, and other consumersto understand, discuss, categorize, and compare Big Data solutions; andFacilitates the analysis of candidate standards for interoperability, portability, reusability, andextendibility.2

NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 5, ARCHITECTURES WHITE PAPER SURVEYThe NBDRA is intended to facilitate the understanding of the operational intricacies in Big Data. It doesnot represent the system architecture of a specific Big Data system, but rather is a tool for describing,discussing, and developing system-specific architectures using a common framework. The referencearchitecture achieves this by providing a generic, high-level conceptual model, which serves as aneffective tool for discussing the requirements, structures, and operations inherent to Big Data. The modelis not tied to any specific vendor products, services, or reference implementation, nor does it defineprescriptive solutions for advancing innovation.The NBDRA does not address the following: Detailed specifications for any organizations’ operational systems;Detailed specifications of information exchanges or services; andRecommendations or standards for integration of infrastructure products.As a precursor to the development of the NBDRA, the NBD-PWG Reference Architecture Subgroupsurveyed the currently published Big Data platforms by leading companies supporting the Big Dataframework. All the reference architectures provided to the NBD-PWG are listed and the capabilities ofeach surveyed platform are discussed in this document.1.3 REPORT PRODUCTIONA wide spectrum of Big Data architectures were explored and developed from various industries,academic, and government initiatives. The NBD-PWG Reference Architecture Subgroup produced thisreport through the four steps outlined below.1. Announced that the NBD-PWG Reference Architecture Subgroup is open to the public to attractand solicit a wide array of subject matter experts and stakeholders in government, industry, andacademia;2. Gathered publicly available Big Data architectures and materials representing variousstakeholders, different data types, and different use cases;3. Examined and analyzed the Big Data material to better understand existing concepts, usage,goals, objectives, characteristics, and key elem

NIST Big Data Interoperability Framework: Volume 5, Architectures White Paper Survey, was prepared by the NIST Big Data Public Working Group (NBD-PWG) Reference Architecture Subgroup to facilitate understanding of the operational intricacies in Big Data and to serve as a tool for