Encyclopedia Of Big Data Technologies - Springer

Transcription

Encyclopedia of Big Data Technologies

Sherif Sakr Albert Y. ZomayaEditorsEncyclopedia of BigData TechnologiesWith 423 Figures and 54 Tables123

EditorsSherif SakrInstitute of Computer ScienceUniversity of TartuTartu, EstoniaAlbert Y. ZomayaSchool of Information TechnologiesSydney UniversitySydney, AustraliaISBN 978-3-319-77524-1ISBN 978-3-319-77525-8 (eBook)ISBN 978-3-319-77526-5 (print and electronic brary of Congress Control Number: 2018960889 Springer Nature Switzerland AG 2019This work is subject to copyright. All rights are reserved by the Publisher, whether the wholeor part of the material is concerned, specifically the rights of translation, reprinting, reuse ofillustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way,and transmission or information storage and retrieval, electronic adaptation, computer software,or by similar or dissimilar methodology now known or hereafter developed.The use of general descriptive names, registered names, trademarks, service marks, etc. in thispublication does not imply, even in the absence of a specific statement, that such names areexempt from the relevant protective laws and regulations and therefore free for general use.The publisher, the authors, and the editors are safe to assume that the advice and information inthis book are believed to be true and accurate at the date of publication. Neither the publishernor the authors or the editors give a warranty, express or implied, with respect to the materialcontained herein or for any errors or omissions that may have been made. The publisher remainsneutral with regard to jurisdictional claims in published maps and institutional affiliations.This Springer imprint is published by the registered company Springer Nature Switzerland AG.The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

PrefaceIn the field of computer science, data is considered as the main raw materialwhich is produced by abstracting the world into categories, measures, andother representational forms (e.g., characters, numbers, relations, sounds,images, electronic waves) that constitute the building blocks from whichinformation and knowledge are created. In practice, data generation andconsumption has become a part of people’s daily life especially with thepervasive availability of Internet technology. We are progressively movingtoward being a data-driven society where data has become one of the mostvaluable assets. Big data has commonly been characterized by the defining3V’s properties which refer to huge Volume, consisting of terabytes orpetabytes of data; high in Velocity, being created in or near real time; anddiversity in Variety of form, being structured and unstructured in nature.Recently, research communities, enterprises, and government sectors haveall realized the enormous potential of big data analytics, and continuous advancements have been emerging in this domain. This Encyclopedia answersthe need for solid and comprehensive research source in the domain of BigData Technologies. The aim of this Encyclopedia is to provide a full pictureof various related aspects, topics, and technologies of big data includingbig data enabling technologies, big data integration, big data storage andindexing, data compression, big data programming models, big SQL systems,big streaming systems, big semantic data processing, graph analytics, bigspatial data management, big data analysis, business process analytics, bigdata processing on modern hardware systems, big data applications, big datasecurity and privacy, and benchmarking of big data systems.With contributions of many leaders in the field, this Encyclopedia providesthe reader with comprehensive reading materials for a large range of audiences. It is a main aim of the Encyclopedia to influence the readers to thinkfurther and investigate the areas that are novel to them. The Encyclopedia hasbeen designed to serve as a solid and comprehensive reference not only toexpert researchers and software engineers in the field but equally to studentsand junior researchers as well. The first edition of the Encyclopedia containsmore than 250 entries covering a wide range of topics. The Encyclopedia’sentries will be updated regularly to follow the continuous advancement in thev

viPrefacedomain and have up-to-date coverage available for our readers. It will beavailable both in print and online versions. We hope that our readers will findthis Encyclopedia as a rich and valuable resource and a highly informativereference for Big Data Technologies.Institute of Computer ScienceUniversity of TartuTartu, EstoniaSchool of Information TechnologiesSydney UniversitySydney, AustraliaJuly 2018Sherif SakrAlbert Y. Zomaya

List of TopicsBig Data IntegrationSection Editor: Maik ThieleData CleaningData FusionData IntegrationData LakeData ProfilingData WranglingETLHolistic Schema MatchingIntegration-Oriented OntologyLarge-Scale Entity ResolutionLarge-Scale Schema MatchingPrivacy-Preserving Record LinkageProbabilistic Data IntegrationRecord LinkageSchema MappingTruth DiscoveryUncertain Schema MatchingBig SQLSection Editor: Yuanyuan Tianand Fatma OzkanBig Data IndexingCaching for SQL-on-HadoopCloud-Based SQL Solutions for Big DataColumnar Storage FormatsHiveHybrid Systems Based on TraditionalDatabase ExtensionsImpalaQuery Optimization Challengesfor SQL-on-HadoopSnappyDataSpark SQLVirtual Distributed File System: AlluxioWildfire: HTAP for Big DataBig Spatial Data ManagementSection Editor: Timos Sellisand Aamir CheemaApplications of Big Spatial Data: HealthArchitecturesIndexingLinked Geospatial DataQuery Processing – kNNQuery Processing: Computational GeometryQuery Processing: JoinsSpatial Data IntegrationSpatial Data MiningSpatial Graph Big DataSpatio-social DataSpatio-textual DataSpatiotemporal Data: TrajectoriesStreaming Big Spatial DataUsing Big Spatial Data for PlanningUser MobilityVisualizationBig Semantic Data ProcessingSection Editor: Philippe Cudré-Maurouxand Olaf HartigAutomated ReasoningBig Semantic Data Processing in the LifeSciences DomainBig Semantic Data Processing in the MaterialsDesign Domainvii

viiiData Quality and Data Cleansingof Semantic DataDistant Supervision from Knowledge GraphsFederated RDF Query ProcessingFramework-Based Scale-Out RDF SystemsKnowledge Graph EmbeddingsKnowledge Graphs in the Libraries and DigitalHumanities DomainNative Distributed RDF SystemsOntologies for Big DataRDF Dataset ProfilingRDF Serialization and ArchivalReasoning at ScaleSecurity and Privacy Aspects of Semantic DataSemantic InterlinkingSemantic SearchSemantic Stream ProcessingVisualizing Semantic DataBig Data AnalysisSection Editor: Domenico Talia andPaolo TrunfioApache MahoutApache SystemMLBig Data Analysis TechniquesBig Data Analysis and IoTBig Data Analysis for Smart CityApplicationsBig Data Analysis for Social GoodBig Data Analysis in BioinformaticsCloud Computing for Big Data AnalysisDeep Learning on Big DataEnergy Efficiency in Big Data AnalysisLanguages for Big Data analysisPerformance Evaluation of Big Data AnalysisScalable Architectures for Big Data AnalysisTools and Libraries for Big Data AnalysisWorkflow Systems for Big Data AnalysisBig Data Programming ModelsSection Editor: Sherif SakrBSP Programming ModelClojureJuliaList of TopicsPythonScalaSciDBR Language: A Powerful Tool for TamingBig DataBig Data on Modern Hardware SystemsSection Editor: Bingsheng Heand Behrooz ParhamiBig Data and Exascale ComputingComputer Architecture for Big DataData Longevity and CompatibilityData Replication and EncodingEmerging Hardware TechnologiesEnergy Implications of Big DataGPU-Based Hardware PlatformsHardware Reliability RequirementsHardware-Assisted CompressionParallel Processing with Big DataSearch and Query AcceleratorsStorage Hierarchies for Big DataStorage Technologies for Big DataStructures for Large Data SetsTabular ComputationBig Data ApplicationsSection Editor: Kamran Munirand Antonio PescapeBig Data and RecommendationBig Data Application in ManufacturingIndustryBig Data Enables Labor MarketIntelligenceBig Data Technologies for DNASequencingBig Data Warehouses for Smart IndustriesBig Data for CybersecurityBig Data for HealthBig Data in Automotive IndustryBig Data in Computer Network MonitoringBig Data in Cultural HeritageBig Data in Mobile NetworksBig Data in Network Anomaly DetectionBig Data in Smart Cities

List of TopicsBig Data in Social NetworksFlood Detection Using Social MediaBig Data StreamsEnabling Big Data TechnologiesSection Editor: Rodrigo Neves Calheirosand Marcos Dias de AssuncaoApache SparkBig Data ArchitecturesBig Data Deep Learning ToolsBig Data Visualization ToolsBig Data and Fog ComputingBig Data in the CloudDatabases as a ServiceDistributed File SystemsGraph Processing FrameworksHadoopMobile Big Data: Foundations, State of the Art,and Future DirectionsNetwork-Level Support for Big DataComputingNoSQL Database SystemsOrchestration Tools for Big DataVisualization TechniquesixDistributed Systems for Big DataSection Editor: Asterios Katsifodimosand Pramod BhatotiaAchieving Low Latency Transactions forGeo-replicated Storage with BlotterAdvancements in YARN Resource ManagerApproximate Computing for Stream AnalyticsCheap Data Analytics on Cold StorageDistributed Incremental View MaintenanceHopsFS: Scaling Hierarchical File SystemMetadata Using NewSQL DatabasesIncremental Approximate ComputingIncremental Sliding Window AnalyticsOptimizing Geo-distributed StreamingAnalyticsParallel Join Algorithms in MapReducePrivacy-Preserving Data AnalyticsRobust Data PartitioningSliding-Window Aggregation AlgorithmsStream Window Aggregation Semanticsand OptimizationStreamMine3G: Elastic and Fault TolerantLarge Scale Stream ProcessingTARDiS: A Branch-and-Merge Approachto Weak ConsistencyBig Data Security and PrivacyBig Data Transaction ProcessingSection Editor: Mohammad SadoghiSection Editor: Junjun Chenand Deepak PuthalActive StorageBlockchain Transaction ProcessingConflict-Free Replicated Data Types CRDTsCoordination AvoidanceDatabase Consistency ModelsGeo-replication ModelsGeo-scale Transaction ProcessingHardware-Assisted Transaction ProcessingHardware-Assisted TransactionProcessing: NVMHybrid OLTP and OLAPIn-Memory TransactionsTransactions in Massively MultiplayerOnline GamesWeaker Consistency Models/EventualConsistencyBig Data Stream Security Classificationfor IoT ApplicationsBig Data and Privacy Issues for ConnectedVehicles in Intelligent TransportationSystemsCo-resident Attack in Cloud Computing:An OverviewData Provenance for Big Data Security andAccountabilityExploring Scope of Computational Intelligencein IoT Security ParadigmKeyword Attacks and Privacy Preserving inPublic-Key-Based Searchable EncryptionNetwork Big Data Security IssuesPrivacy CubePrivacy-Aware Identity Management

xScalable Big Data Privacy with MapReduceSecure Big Data Computing in Cloud:An OverviewSecurity and Privacy in Big Data EnvironmentBusiness Process AnalyticsSection Editor: Marlon Dumasand Matthias WeidlichArtifact-Centric Process MiningAutomated Process DiscoveryBusiness Process AnalyticsBusiness Process Deviance MiningBusiness Process Event Logs and VisualizationBusiness Process Model MatchingBusiness Process Performance MeasurementBusiness Process QueryingConformance CheckingData-Driven Process SimulationDecision Discovery in Business ProcessesDeclarative Process MiningDecomposed Process Discovery andConformance CheckingEvent Log Cleaning for Business ProcessAnalyticsHierarchical Process DiscoveryMultidimensional Process AnalyticsPredictive Business Process MonitoringProcess Model RepairQueue MiningStreaming Process Discovery and ConformanceCheckingTrace ClusteringBig Data BenchmarkingSection Editor: Meikel Poess and Tilmann RablAnalytics BenchmarksAuditingBenchmark HarnessCRUD BenchmarksComponent BenchmarkEnd-to-End BenchmarkEnergy BenchmarkingGraph BenchmarkingMetrics for Big Data BenchmarksList of TopicsMicrobenchmarkSparkBenchStream BenchmarksSystem Under TestTPCTPC-DSTPC-HTPCx-HSVirtualized Big Data BenchmarksYCSBGraph data management and analyticsSection Editor: Hannes Voigtand George FletcherFeature Learning from Social GraphsGraph Data Integration and ExchangeGraph Data ModelsGraph Exploration and SearchGraph Generation and BenchmarksGraph InvariantsGraph OLAPGraph Partitioning: Formulations andApplications to Big DataGraph Path NavigationGraph Pattern MatchingGraph Query LanguagesGraph Query ProcessingGraph Representations and StorageGraph VisualizationGraph Data Management SystemsHistorical Graph ManagementIndexing for Graph Query EvaluationInfluence Analytics in GraphsLink Analytics in GraphsLinked Data ManagementParallel Graph ProcessingVisual Graph QueryingData CompressionSection Editor: Paolo Ferragina(Web/Social) Graph CompressionCompressed Indexes for RepetitiveTextual DatasetsComputing the Cost of Compressed Data

List of TopicsDegrees of Separation and Diameterin Large GraphsDelta Compression TechniquesDimension ReductionGenomic Data CompressionGrammar-Based CompressionInverted Index CompressionRDF CompressionSimilarity SketchingBig Stream ProcessingSection Editor: Alessandro Margaraand Tilmann RablAdaptive WindowingApache ApexApache FlinkApache KafkaApache SamzaxiContinuous QueriesDefinition of Data StreamsElasticityIntroduction to Stream Processing AlgorithmsManagement of TimeOnline Machine Learning Algorithms overData StreamsOnline Machine Learning in Big Data Streams:OverviewPattern RecognitionRecommender Systems over Data StreamsReinforcement Learning, Unsupervised Methods,and Concept Drift in Stream LearningRendezvous ArchitecturesStream Processing Languages and AbstractionsStream Query OptimizationStreaming MicroservicesTypes of Stream Processing AlgorithmsUncertainty in Streams

About the EditorsProfessor Sherif Sakr is the Head of Data Systems Group at the Instituteof Computer Science, University of Tartu. He received his PhD degree inComputer and Information Science from Konstanz University, Germany,in 2007. He received his BSc and MSc degrees in Computer Sciencefrom the Information Systems Department at the Faculty of Computers andInformation in Cairo University, Egypt, in 2000 and 2003, respectively.During his career, Prof. Sakr held appointments in several internationaland reputable organizations including the University of New South Wales,Macquarie University, Data61/CSIRO, Microsoft Research, Nokia Bell Labs,and King Saud bin Abdulaziz University for Health Sciences.Prof. Sakr’s research interest is data and information management ingeneral, particularly in big data processing systems, big data analytics,data science, and big data management in cloud computing platforms.Professor Sakr has published more than 100 refereed research publicationsin international journals and conferences such as the Proceedings of theVLDB Endowment (PVLDB), IEEE Transactions on Parallel and DistributedSystems (IEEE TPDS), IEEE Transactions on Service Computing (IEEETSC), IEEE Transactions on Big Data (IEEE TBD), ACM Computing Surveys(ACM CSUR), Journal of Computer and System Sciences (JCSS), Information Systems Journal, Cluster Computing, Journal of Grid Computing, IEEECommunications Surveys and Tutorials (IEEE COMST), IEEE Software,Scientometrics, VLDB, SIGMOD, ICDE, EDBT, WWW, CIKM, ISWC,BPM, ER, ICWS, ICSOC, IEEE SCC, IEEE Cloud, TPCTC, DASFAA,ICPE, and JCDL. Professor Sakr Co-authored 5 books and Co-Edited 3 otherbooks in the areas of data and information management and processing. Sherifis an associate editor of the cluster computing journal and Transactions onLarge-Scale Data and Knowledge-Centered Systems (TLDKS). He is also aneditorial board member of many reputable international journals. Prof. Sakr isan ACM Senior Member and an IEEE Senior Member. In 2017, he has beenxiii

xivappointed to serve as an ACM Distinguished Speaker and as an IEEEDistinguished Speaker. For more information, please visit his personal webpage (http://kodu.ut.ee/ sakr/) and his research group page (http://bigdata.cs.ut.ee/)Albert Y. Zomaya is currently the Chair Professor of High PerformanceComputing & Networking in the School of Information Technologies, University of Sydney. He is also the Director of the Centre for Distributed and HighPerformance Computing which was established in late 2009. Dr. Zomaya wasan Australian Research Council Professorial Fellow during 2010–2014 andheld the CISCO Systems Chair Professor of Internetworking during the period2002–2007 and also was Head of school for 2006–2007 in the same school.Prior to his current appointment, he was a Full Professor in the Electricaland Electronic Engineering Department at the University of Western Australia, where he also led the Parallel Computing Research Laboratory duringthe period 1990–2002. He served as Associate, Deputy, and Acting Head inthe same department and held numerous visiting positions and has extensiveindustry involvement.Dr. Zomaya published more than 600 scientific papers and articles and isauthor, co-author, or editor of more than 20 books. He served as the Editorin Chief of the IEEE Transactions on Computers (2011–2014). Currently,he serves as a Founding Editor in Chief for the IEEE Transactions onSustainable Computing, a Co-founding Editor in Chief of the IET CyberPhysical Systems: Theory and Applications, and an Associate Editor in Chief(special issues) of the Journal of Parallel and Distributed Computing.Dr. Zomaya is an Associate Editor for several leading journals, such asthe ACM Transactions on Internet Technology, ACM Computing Surveys,IEEE Transactions on Cloud Computing, IEEE Transactions on Computational Social Systems, and IEEE Transactions on Big Data. He is also theFounding Editor of several book series, such as the Wiley Book Series onParallel and Distributed Computing, the Springer Scalable Computing andCommunications, and the IET Book Series on Big Data.Dr. Zomaya was the Chair the IEEE Technical Committee on ParallelProcessing (1999–2003) and currently serves on its executive committee. Heis the Vice-Chair of the IEEE Task Force on Computational Intelligence forCloud Computing and serves on the advisory board of the IEEE TechnicalCommittee on Scalable Computing and the steering committee of the IEEETechnical Area in Green Computing.About the Editors

About the EditorsxvDr. Zomaya has delivered more than 180 keynote addresses, invitedseminars, and media briefings and has been actively involved, in a varietyof capacities, in the organization of more than 700 conferences.Dr. Zomaya is a Fellow of the IEEE, the American Association for theAdvancement of Science, and the Institution of Engineering and Technology(UK). He is a Chartered Engineer and an IEEE Computer Society’s GoldenCore Member. He received the 1997 Edgeworth David Medal from the RoyalSociety of New South Wales for outstanding contributions to Australianscience. Dr. Zomaya is the recipient of the IEEE Technical Committee onParallel Processing Outstanding Service Award (2011), the IEEE TechnicalCommittee on Scalable Computing Medal for Excellence in Scalable Computing (2011), the IEEE Computer Society Technical Achievement Award(2014), and the ACM MSWIM Reginald A. Fessenden Award (2017). Hisresearch interests span several areas in parallel and distributed computing andcomplex systems. More information can be found at http://www.it.usyd.edu.au/ azom0780/.

About the Section EditorsPramod Bhatotia The University of Edinburgh and Alan Turing Institute,Edinburgh, UKRodrigo N. Calheiros School of Computing, Engineering and Mathematics,Western Sydney University, Penrith, NSW, AustraliaAamir Cheema Monash University, Clayton, VIC, Australiaxvii

xviiiJinjun Chen School of Software and Electrical Engineering, SwinburneUniversity of Technology, Hawthorn, VIC, AustraliaPhilippe Cudré-Mauroux eXascale Infolab, University of Fribourg,Fribourg, SwitzerlandMarcos Dias de Assuncao Inria Avalon, LIP Laboratory, ENS Lyon, University of Lyon, Lyon, FranceMarlon Dumas Institute of Computer Science, University of Tartu, Tartu,EstoniaAbout the Section Editors

About the Section EditorsxixPaolo Ferragina Department of Computer Science, University of Pisa, Pisa,ItalyGeorge Fletcher Technische Universiteit Eindhoven, Eindhoven, NetherlandsOlaf Hartig Linköping University, Linköping, SwedenBingsheng He National University of Singapore, Singapore, Singapore

xxAsterios Katsifodimos TU Delft, Delft, NetherlandsAlessandro Margara Politecnico di Milano, Milano, ItalyKamran Munir Computer Science and Creative Technologies, Universityof the West of England, Bristol, UKFatma Özcan IBM Research – Almaden, San Jose, CA, USAAbout the Section Editors

About the Section EditorsxxiBehrooz Parhami Department of Electrical and Computer Engineering,University of California, Santa Barbara, CA, USAAntonio Pescapè Department of Electrical Engineering and InformationTechnology, University of Napoli Federico II, Napoli, ItalyMeikel Poess Server Technologies, Oracle Corporation, Redwood Shores,CA, USA

xxiiDeepak Puthal Faculty of Engineering and Information Technologies,School of Electrical and Data Engineering, University of Technology Sydney,Ultimo, NSW, AustraliaTilmann Rabl Database Systems and Information Management Group,Technische Universität Berlin, Berlin, GermanyMohammad Sadoghi University of California, Davis, CA, USAAbout the Section Editors

About the Section EditorsxxiiiTimos Sellis Swinburne University of Technology, Data Science ResearchInstitute, Hawthorn, VIC, AustraliaDomenico Talia University of Calabria, Rende, ItalyMaik Thiele Database Systems Group, Technische Universität Dresden,Dresden, Saxony, Germany

xxivYuanyuan Tian IBM Research – Almaden, San Jose, CA, USAPaolo Trunfio University of Calabria, DIMES, Rende, ItalyHannes Voigt Dresden Database Systems Group, Technische UniversitätDresden, Dresden, GermanyAbout the Section Editors

About the Section EditorsxxvMatthias Weidlich Humboldt-Universität zu Berlin, Department of Computer Science, Berlin, Germany

List of ContributorsZiawasch Abedjan Teschnische Universität Berlin, Berlin, GermanyAlberto Abelló Polytechnic University of Catalonia, Barcelona, SpainUmut A. Acar CMU, Pittsburgh, PA, USABiswaranjan Acharya Kalinga Institute of Industrial Technology (KIIT),Deemed to be University, Bhubaneswar, IndiaMaribel Acosta Institute AIFB, Karlsruhe Institute of Technology,Karlsruhe, GermanyKhandakar Ahmed Institute for Sustainable Industries and Liveable Cities,VU Research, Victoria University, Melbourne, AustraliaMarco Aldinucci Computer Science Department, University of Turin,Turin, ItalyAlexander Alexandrov Database Systems and Information ManagementGroup (DIMA), Technische Universität Berlin, Berlin, GermanyKhaled Ammar Thomson Reuters Labs, Thomson Reuters, Waterloo, ON,CanadaCarina Andrade Department of Information Systems, ALGORITMIResearch Centre, University of Minho, Guimarães, PortugalRenzo Angles Universidad de Talca, Talca, ChileRaja Appuswamy Data Science Department, EURECOM, Biot, FranceWalid G. Aref Purdue University, West Lafayette, IN, USAMarcelo Arenas Pontificia Universidad Católica de Chile, Santiago, ChileRickard Armiento Linköping University, Swedish e-Science ResearchCentre, Linköping, SwedenJulien Audiffren Exascale Infolab, University of Fribourg, Fribourg,SwitzerlandBooma Sowkarthiga Balasubramani University of Illinois at Chicago,Chicago, IL, USAxxvii

xxviiiS. Balkir Department of Electrical and Computer Engineering, Universityof Nebraska-Lincoln, Lincoln, NE, USASoumya Banerjee Department of Computer Science and Engineering, BirlaInstitute of Technology, Mesra, IndiaCarlos Baquero HASLab/INESC TEC and Universidade do Minho, Braga,PortugalRonald Barber IBM Research – Almaden, San Jose, CA, USAPablo Barceló Universidad de Chile, Santiago, ChilePrzemysław Błaśkiewicz Faculty of Fundamental Problems of Technology,Wrocław University of Science and Technology; National Science Center,Wrocław, PolandGuillaume Baudart IBM Research, Yorktown Heights, NY, USASibghat Ullah Bazai Institute of Natural and Mathematical Sciences,Massey University, Auckland, New ZealandMartin Beck TU Dresden, Dresden, GermanyAlex Behm Databricks, San Francisco, CA, USALoris Belcastro DIMES, University of Calabria, Rende, ItalyAndrás A. Benczúr Institute for Computer Science and Control, HungarianAcademy of Sciences (MTA SZTAKI), Budapest, HungaryKonstantina Bereta Department of Informatics and Telecommunications,National and Kapodistrian University of Athens, Athens, GreeceAntonio Berlanga Applied Artificial Intelligence Group, UniversidadCarlos III de Madrid, Colmenarejo, SpainComputer Science and Engineering, Universidad Carlos III de Madrid,Colmenarejo, SpainLaure Berti-Équille Aix-Marseille University, CNRS, LIS, Marseille,FranceMilind A. Bhandarkar Ampool, Inc., Santa Clara, CA, USAPramod Bhatotia The University of Edinburgh and Alan Turing Institute,Edinburgh, UKSourav S. Bhowmick Nanyang Technological University, Singapore,SingaporeNikos Bikakis ATHENA Research Center, Athens, GreeceSpyros Blanas Computer Science and Engineering, The Ohio State University, Columbus, OH, USAEva Blomqvist Linköping University, Linköping, SwedenMatthias Boehm IBM Research – Almaden, San Jose, CA, USAList of Contributors

List of ContributorsxxixPaolo Boldi Dipartimento di Informatica, Università degli Studi di Milano,Milano, ItalyAngela Bonifati Lyon 1 University, Villeurbanne, FranceSamia Bouzefrane Conservatoire National des Arts et Metiers, Paris,FranceAndrey Brito UFCG, Campina Grande, BrazilAndrea Burattin DTU Compute, Software Engineering, Technical University of Denmark, 2800 Kgs. Lyngby, DenmarkRodrigo N. Calheiros School of Computing, Engineering and Mathematics,Western Sydney University, Penrith, NSW, AustraliaMario Cannataro Data Analytics Research Center, University “MagnaGræcia” of Catanzaro, Catanzaro, ItalyParis Carbone KTH Royal Institute of Technology, Stockholm, SwedenJosep Carmona Universitat Politècnica de Catalunya, Barcelona, SpainAbel Armas Cervantes University of Melbourne, Melbourne, VIC,AustraliaEugenio Cesario Institute of High Performance Computing and Networksof the National Research Council of Italy (ICAR-CNR), Rende, ItalyAbhishek Chandra University of Minnesota, Minneapolis, MN, USAChii Chang Mobile and Cloud Lab, Institute of Computer Science, University of Tartu, Tartu, EstoniaJinjun Chen School of Software and Electrical Engineering, SwinburneUniversity of Technology, Hawthorn, VIC, AustraliaRuichuan Chen Nokia Bell Labs, Stuttgart, GermanyNokia Bell Labs, NJ, USAXuntao Cheng Nanyang Technological University, Jurong West, SingaporeShekha Chenthara Institute for Sustainable Industries and Liveable Cities,VU Research, Victoria University, Melbourne, AustraliaByron Choi Hong Kong Baptist University, Hong Kong, ChinaXu Chu School of Computer Science, Georgia Institute of Technology,Atlanta, GA, USACarmela Comito CNR-ICAR, Rende, ItalyRaffaele Conforti University of Melbourne, Melbourne, VIC, AustraliaGao Cong School of Computer Science and Engineering, Nanyang Technological University, Singapore, SingaporeCarlos Costa CCG – Centro de Computação Gráfica and ALGORITMIResearch Centre, University of Minho, Guimarães, Portugal

xxxPierluigi Crescenzi Dipartimento di Matematica e Informatica, Universitàdi Firenze, Florence, ItalyAlain Crolotte Teradata Corporation, El Segundo, CA, USANatacha Crooks The University of Texas at Austin, Austin, TX, USAIsabel F. Cruz University of Illinois at Chicago, Chicago, IL, USAPhilippe Cudre-Mauroux eXascale Infolab, University of Fribourg,Fribourg, SwitzerlandRenato Luiz de Freitas Cunha IBM Research, São Paulo, BrazilEmily May Curtin IBM, New York, USAAlessandro D’Alconzo Austrian Institute of Technology, Vienna, AustriaGuilherme da Cunha Rodrigues Federal Institute of Education Scienceand Technology Sul-Rio Grandense (IFSUL), Charqueadas, BrazilAlexandre da Silva Veith Inria Avalon, LIP Laboratory, ENS Lyon, University of Lyon, Lyon, FranceMassimiliano de Leoni Eindhoven University of Technology, Eindhoven,The NetherlandsAnna Delin Royal Institute of Technology, Swedish e-Science ResearchCentre, Stockholm, SwedenAdela del-Río-Ortega University of Seville, Sevilla, SpainGianluca Demartini The University of Queensland, St. Lucia, QLD,AustraliaElena Demidova L3S Research Center, Leibniz Universität Hannover,Hanover, GermanyBenoît Depaire Research Group Business Informatics, Hasselt University,Diepenbeek, BelgiumRogério Abreu de Paula IBM Research, Rua Tutóia 1157, São Paulo, BrazilAmol Deshpande Computer Science Department, University of Maryland,College Park, MD, USARenata Ghisloti Duarte de Souza Granha Bosch, Chicago, USAHelena F. Deus Elsevier Labs, Cambridge, MA, USARicardo J. Dias SUSE Linux GmbH and NOVA LINCS, Lisbon, PortugalMarcos Dias de Assuncao Inria Avalon, LIP Laboratory, ENS Lyon,University of Lyon, Lyon, FranceArturo Diaz-Perez Cinvestav Tamaulipas, Victoria, MexicoStefan Dietze L3S Research Center, Leibniz Universität Hannover,Hanover, GermanyList of Contributors

List of ContributorsxxxiChiara Di Francescomarino Fondazione Bruno Kessler – FBK, Trento,ItalyChris Douglas Microsoft, Washington, DC, USAJim Dowling KTH – Royal Institute of Technology, Stockholm, SwedenIdilio Drago Politecnico di Torino, Turin, ItalyMaurizio Drocco Computer Science Department, University of Turin,Turin, ItalyJianfeng Du Guangdong University of Foreign Studies, Guangdong, ChinaDirk Duellmann Information Technology Department, CERN, Geneva,SwitzerlandMarlon Dumas Institute of Computer Science, University of Tartu, Tartu,EstoniaTed Dunning MapR Technologies and Apache Software Foundation, SantaClara, CA, USAAhmed Eldawy Department of Computer Science and Engineering, University of California, Riverside, CA, USAMohamed Y. Elta

all realized the enormous potential of big data analytics, and continuous ad-vancements have been emerging in this domain. This Encyclopedia answers . Apache Spark Big Data Architectures Big Data Deep Learning Tools Big Data Visualization Tools Big Data and Fog Computing Big Data in the Cloud Databases as a Service