Big Data And Its Impact On Data Warehousing

Transcription

Big Data and its Impacton Data WarehousingThe “big data” movement has taken the information technology world by storm. Fueled by opensource projects emanating from the ApacheFoundation, the big data movement offers acost-effective way for organizations to processand store large volumes of any type of data:structured, semi-structured and unstructured.by wayne eckerson1Despite Problems, Big Data Makes it Huge2Two Markets for Big Data:Comparing Value Propositions3Categorizing Big Data Processing Systems4The New Analytical Ecosystem:Making Way for Big Data

chapter 1Despite Problems,Big Data Makes it HugeDespiteProblems,Big DataMakes it HugeTwo Marketsfor Big Data:Comparing ValuePropositionsCategorizingBig DataProcessingSystemsThe NewAnalyticalEcosystem:Making Wayfor Big DataThe hype and realityof the big data movement is reaching acrescendo. It’s clearthat Hadoop andNoSQL technologies are gaining afoothold in corporate computing environments. But big data software andcomputing paradigms are still in theirinfancy and must clear many hurdlesbefore organizations trust them tohandle serious data and applicationworkloads.Most leading big data vendors nowcount hundreds of customers. Big datais no longer the province of Internetand media companies with large Webproperties; companies in nearly everyindustry are jumping on the big databandwagon. These include energy,pharmaceuticals, utilities, telecommunications, insurance, retail, financialservices and government.For example, Vestas Wind Systems,a leading wind turbine maker, uses Bi-gInsights to model larger volumes ofweather data so it can pinpoint theoptimal placement of wind turbines.And a financial services customer usesHadoop to improve the accuracy ofits fraud models by addressing muchlarger volumes of transaction data.Big Data DriversHadoop clearly fills an unmet need inmany organizations. Given its opensource roots, Hadoop provides a morecost-effective way to analyze largevolumes of data compared with traditional relational database managementsystems (RDBMSes). It’s also bettersuited to processing unstructured data,such as audio, video or images, andsemi-structured data, such as Webserver log data for tracking customerbehavior on social media sites. Foryears, leading-edge companies havestruggled in vain to figure out an optimal way to analyze this type of data inBIG DATA AND ITS IMPACT ON DATA WAREHOUSING  2

chapter 1: Despite Problems, Big Data Makes it HugeDespiteProblems,Big DataMakes it HugeTwo Marketsfor Big Data:Comparing ValuePropositionsCategorizingBig DataProcessingSystemsThe NewAnalyticalEcosystem:Making Wayfor Big Datatraditional data warehousing environments, but without much luck.Finally, Hadoop is a load-and-go environment: Administrators can dumpthe data into Hadoop without havingto convert it into a particular structure. Then users (or data scientists)can analyze the data using whatevertools they want, which today are typically languages, such as Java, Pythonor Ruby. This type of data managementparadigm appeals to application developers and analysts, who often feelstraitjacketed by top-down, IT-drivenarchitectures and SQL-based tool sets.Speed BumpsBut Hadoop is not a data managementpanacea. It’s clearly at or near the apogee of its hype cycle right now, andits many warts will disillusion all butbleeding- and leading-edge adopters.For starters, Hadoop is still wet behind the ears. The Apache SoftwareFoundation just released the equivalent of version 1.0. So there are plentyof basic things missing from the environment—like security, a metadatacatalog, data quality, backups andmonitoring and control. Moreover, it’sa batch processing environment, notterribly efficient in the way it exploits aclustered environment. Hadoop knockoffs, like MapR, which embed proprietary technology underneath Hadoopapplication programming interfacesclaim up to five-fold faster performance on half as many nodes.To run a Hadoop environment, youneed to get software from a mishmashof Apache projects, with razzle-dazzlenames like Flume, Sqoop, Ooze, Pig,Hive and ZooKeeper. These independent projects often contain competing functionality, have separate releaseTo run a Hadoop environment, you need toget software from amishmash of Apacheprojects, with razzledazzle names like Flume,Sqoop, Ooze, Pig,Hive and ZooKeeper.schedules and aren’t always tightlyintegrated. And each project evolvesrapidly. That’s why there is a healthymarket for Hadoop distributions thatpackage these components into a reasonable set of implementable software.But the biggest complaint among bigdata advocates is the current lack ofdata scientists to build Hadoop applications. These wunderkinds combinea rare set of skills: statistics and math,data, process and domain knowledgeand computer programming. Unfortunately, developers have little data anddomain experience and data expertsdon’t know how to program. So there isa severe shortage of talent. Some com-BIG DATA AND ITS IMPACT ON DATA WAREHOUSING  3

chapter 1: Despite Problems, Big Data Makes it Hugepanies are hiring several people withrelated skills to cobble together onecomplete “data scientist.”EvolutionDespiteProblems,Big DataMakes it HugeTwo Marketsfor Big Data:Comparing ValuePropositionsCategorizingBig DataProcessingSystemsThe NewAnalyticalEcosystem:Making Wayfor Big DataOne good thing about the big datamovement is that it evolves fast. Thereare Apache projects to address mostof the shortcomings of Hadoop. Onepromising project is Hive, which provides SQL-like access to Hadoop, although it’s stuck in a batch processingparadigm. Another is HBase, whichovercomes Hadoop’s latency issues,but is designed for fast row-basedreads and writes to support high-performance transactional applications.Both create table-like structures on topof Hadoop files.Many commercial vendors havejumped into the fray, marrying proprietary technology with open sourcesoftware to turn Hadoop into a morecorporate-friendly compute environment. Vendors such as Zettaset, EMCGreenplum and Oracle have launchedappliances that embed Hadoop withcommercial software to offer customers the best of both worlds. Many BIand data integration vendors, such asTalend, now connect to Hadoop andcan move data back and forth seamlessly. Some even create and run MapReduce jobs in Hadoop using theirstandard visual development environments. Even Microsoft has jumpedinto the fray, offering its Hadoop portof Windows Server, an ODBC-to-Hivedriver, and a new JavaScript framework for MapReduce to the ApacheFoundation.Established softwarevendors stand to losesignificant revenue ifHadoop evolves withoutthem and gains robustdata management andanalytical functionalitythat cannibalizes theirexisting products.Cooperation or Competition?Although vendors are quick to rallybehind big data, there is some measure of desperation in the move. Established software vendors stand to losesignificant revenue if Hadoop evolveswithout them and gains robust datamanagement and analytical functionality that cannibalizes their existingproducts. They either need to generatesufficient revenue from new big dataproducts or circumscribe Hadoop sothat it plays a subservient role to theirexisting products. Most vendors arehedging their bets and playing both options, especially database vendors whoperhaps have the most to lose.Both sides are playing nice and areeager to partner and work together.Hadoop vendors benefit as more appli-BIG DATA AND ITS IMPACT ON DATA WAREHOUSING  4

chapter 1: Despite Problems, Big Data Makes it HugeDespiteProblems,Big DataMakes it Hugecations run on Hadoop, including traditional products centering on businessintelligence, extract, transform andload (ETL) and DBMSes. And commercial vendors benefit if their existing tools have a new source of data toconnect to and plumb. It’s a big newmarket whose sweet-tasting honey attracts a hive full of bees.Why invest in proprietary tools?Two Marketsfor Big Data:Comparing ValuePropositionsCategorizingBig DataProcessingSystemsThe NewAnalyticalEcosystem:Making Wayfor Big DataBut customers are already askingwhether data warehouses and BI toolswill eventually be folded into Hadoopenvironments or the reverse. Whyspend millions of dollars on a newanalytical RDBMS if you can do thatprocessing without paying a dime inlicense costs using Hadoop? Whyspend hundreds of thousands of dollars on data integration tools if yourdata scientists can turn Hadoop into ahuge data staging and transformationlayer? Why invest in traditional BI andreporting tools if your power users canexploit Hadoop using freely availableprograms such as Java, Python, Pig,Hive or Hbase?The Future is CloudyRight now, it’s too early to divine thefuture of the big data movement andpredict winners and losers. It’s possiblethat in the future all data managementand analysis will run entirely on opensource platforms and tools. But it’s justas likely that commercial vendors willco-opt (or outright buy) open sourceproducts and functionality and usethem as pipelines to magnify sales oftheir commercial products.More than likely, we’ll get a mélangeof open source and commercial capabilities. After all, 30 years after themainframe revolution, mainframes arestill a mainstay at many corporations.In IT, nothing ever dies; it just finds itsniche in an evolutionary ecosystem. pBIG DATA AND ITS IMPACT ON DATA WAREHOUSING  5

chapter 2Two Markets for Big Data:Comparing Value PropositionsDespiteProblems,Big DataMakes it HugeTwo Marketsfor Big Data:Comparing ValuePropositionsCategorizingBig DataProcessingSystemsThe NewAnalyticalEcosystem:Making Wayfor Big DataThere are two types ofbig data in the market today. There isopen source software,centered largely onHadoop, which eliminates up-front licensing costs for managing and processing large volumes of data. Andthen there are new analytical engines, including appliances and column stores, which provide significantlyhigher price-performance than general-purpose relational databases. Bothsets of big data software deliver higherreturn on investment than previousgenerations of data management technology, but in vastly different ways.HadoopFree software. Hadoop is an opensource distributed file system available through the Apache SoftwareFoundation that is capable of storingand processing large volumes of datain parallel across a grid of commodityservers. Hadoop emanated from largeInternet providers, such as Google andYahoo, which needed a cost-effectiveway to build search indexes. They knewthat traditional relational databaseswould be prohibitively expensive andtechnically unwieldy, so they came upwith a low-cost alternative they builtthemselves and eventually gave to theApache Software Foundation so otherscould benefit from their innovations.Today many companies are implementing Hadoop software fromApache as well as third-party providers, such as IBM, Cloudera, Hortonworks and EMC. Developers seeHadoop as a cost-effective way to gettheir arms around large volumes ofdata that they›ve never been able to domuch with before.Many companies use Hadoop tostore, process and analyze large volumes of Web server log data so theycan get a better feel for the browsingBIG DATA AND ITS IMPACT ON DATA WAREHOUSING  6

chapter 2: Two Markets for Big Data: Comparing Value PropositionsDespiteProblems,Big DataMakes it HugeTwo Marketsfor Big Data:Comparing ValuePropositionsCategorizingBig DataProcessingSystemsThe NewAnalyticalEcosystem:Making Wayfor Big Dataand shopping behavior of their online customers. Before, companiesoutsourced the analysis of their clickstream data or simply let it fall on thefloor since they didn›t have a way toprocess it in a timely and cost-effective way. Companies are also turningto Hadoop to process more structureddata to improve analytical models.Data agnostic. Besides being free toimplement, the other major advantageof big data software is that it is data agnostic. It can handle any type of data.Unlike a data warehouse or traditionalrelational database, Hadoop doesn’t require administrators to model or transform data before they load it. WithHadoop, you don’t define a structurefor the data; you simply load and go.This significantly reduces the cost ofpreparing data for analysis comparedwith what happens in a data warehouse. Most experts assert that 60%to 80% of the cost of building a datawarehouse, which can run into the tensof millions of dollars, involves extracting, transforming and loading data. Hadoop virtually eliminates this cost.As a result, many companies are using Hadoop as a general-purpose staging area and archive for all their data.So a telecommunications company canstore 12 months of call detail recordsinstead of aggregating that data in thedata warehouse and rolling the detailsto offline storage. With Hadoop, theycan keep all their data online and eliminate the cost of data archival systems.They can also let power users queryHadoop data directly if they want toaccess the raw data or can’t wait forthe aggregates to be loaded into thedata warehouse.Hidden costs. Of course, nothing intechnology is ever free. When it comesto processing data, you either pay thepiper up front, as in the data warehousing world, or at query time, as inthe Hadoop world. Before queryingHadoop data, a developer needs to understand the structure of the data andall of its anomalies. With a clean, wellunderstood, homogenous data set,this is not difficult. But most corporatedata doesn’t fit that description. So aHadoop developer ends up playing therole of a data warehousing developerat query time, interrogating the dataand making sure it’s format and content match their expectations. Querying Hadoop today is a “buyer beware”environment.Moreover, to run big data software,you still need to purchase, install andmanage commodity servers (unlessyou run your big data environment inthe cloud, say through Amazon WebServices). While each server may notcost a lot, the price adds up.But what’s more costly is the expertise and software required to administer Hadoop and manage gridsof commodity servers. Hadoop is stillbleeding-edge technology and fewpeople have the skills or experienceto run it efficiently in a production en-BIG DATA AND ITS IMPACT ON DATA WAREHOUSING  7

chapter 2: Two Markets for Big Data: Comparing Value PropositionsDespiteProblems,Big DataMakes it HugeTwo Marketsfor Big Data:Comparing ValuePropositionsCategorizingBig DataProcessingSystemsThe NewAnalyticalEcosystem:Making Wayfor Big Datavironment. These folks are hard tofind, and they don’t come cheap. TheApache Software Foundation admitsthat Hadoop’s latest release is equivalent to version 1.0 software. So eventhe experts have a lot to learn, sincethe technology is evolving at a rapidpace. But nonetheless, Hadoop and itsNoSQL brethren have opened up a vastnew frontier for organizations to profitfrom their data.Analytical PlatformsThe other type of big data predatesHadoop and NoSQL variants by several years. This version of big data isless a “movement” than an extensionof existing relational database technology optimized for query processing. These analytical platforms span arange of technology, from appliancesand columnar databases to sharednothing, massively parallel processing databases. The common threadamong them is that most are read-onlyenvironments that deliver exceptionalprice-performance compared withgeneral-purpose relational databasesoriginally designed to run transactionprocessing applications.Teradata laid the groundwork forthe analytical platform market when itlaunched the first analytical appliancein the early 1980s. Sybase was alsoan early forerunner, shipping the firstcolumnar database in the mid-1990s.IBM Netezza kicked the current marketinto high gear in 2003 when it unveileda popular analytical appliance, and wassoon followed by dozens of startups.Recognizing the opportunity, all the bignames in software and hardware—Oracle, IBM, Hewlett-Packard, and SAP—subsequently jumped into the market,either by building or buying technology,to provide purpose-built analytical systems to new and existing customers.Although the pricetag of these systems often exceeds 1 million, customers find that the exceptionalprice-performance delivers significantbusiness value, in both tangible and intangible form. For example, Virginiabased XO Communications recovered 3 million in lost revenue from a newrevenue assurance application it builton an analytical appliance, even beforeit had paid for the system. It subsequently built or migrated a dozen applications to run on the new purpose-builtsystem, testifying to its value.Kelley Blue Book in Irvine, Calif., purchased an analytical appliance to runits data warehouse, which was experiencing performance issues, giving the provider of online automobilevaluations a competitive edge. For instance, the new system reduces thetime needed to process hundreds ofmillions of automobile valuations fromone week to one day. Kelley Blue Booknow uses the system to analyze itsWeb advertising business and deliverdynamic pricing for its Web ads.Challenges. Given the up-frontcosts of analytical platforms, organi-BIG DATA AND ITS IMPACT ON DATA WAREHOUSING  8

chapter 2: Two Markets for Big Data: Comparing Value PropositionsDespiteProblems,Big DataMakes it HugeTwo Marketsfor Big Data:Comparing ValuePropositionsCategorizingBig DataProcessingSystemsThe NewAnalyticalEcosystem:Making Wayfor Big Datazations usually undertake a thoroughevaluation of these systems beforejumping aboard.First, a company must determinewhether an analytical platform outperforms its existing data warehousedatabase to a degree that it warrantsmigration and retraining costs. This requires a proof of concept in which thecustomer tests the systems in its owndata center using its own data across arange of queries.The good news is that the new analytical platforms usually deliver jawdropping performance for most queriestested. In fact, many customers don’tbelieve the initial results and rerun thequeries to make sure that the resultsare valid.Second, companies must choosefrom more than two dozen analyticalplatforms on the market today. For instance, they must decide whether topurchase an appliance or a softwareonly system, a columnar database or anMPP database, or an on-premises system or a Web service. Evaluating theseoptions takes time, and many companies create a short list that doesn’t always contain comparable products.Finally, companies must decidewhat role an analytical platform willplay in their data warehousing architectures. Should it serve as the datawarehousing platform? If so, does ithandle multiple workloads easily or isit a one-trick pony? If the latter, whatapplications and data sets make senseto offload to the new system? How doyou rationalize having two data warehousing environments instead of one?Today we find that companies whichhave tapped out their SQL Server orMySQL data warehouses often replacethem with analytical platforms to getbetter performance. But companiesthat have implemented an enterprisedata warehouse on Oracle, Teradataor IBM often find that analytical platforms are best used when they sitalongside the data warehouse so theycan handle new applications or existing analytical workloads offloaded tothem. This architecture helps organizations avoid a costly upgrade to a datawarehousing platform, which mighteasily exceed the cost of purchasing ananalytical platform.The big data movement consists oftwo separate but interrelated markets:one for Hadoop and open source datamanagement software and the otherfor purpose-built SQL databases optimized for query processing. Hadoopavoids most of the up-front licensingand loading costs endemic to traditional relational database systems. Butsince the technology is still immature,there are hidden costs that have thusfar kept many Hadoop implementations experimental in nature. On theother hand, analytical platforms are amore proven technology but imposesignificant up-front licensing fees andpotential migration costs. Companieswading into the waters of the big datastream need to carefully evaluate theiroptions. pfooter  9

chapter 3Categorizing Big DataProcessing SystemsDespiteProblems,Big DataMakes it HugeTwo Marketsfor Big Data:Comparing ValuePropositionsCategorizingBig DataProcessingSystemsThe NewAnalyticalEcosystem:Making Wayfor Big Datafaced with an expandinganalytical ecosystem,BI managers need tomake many technologychoices. Perhaps themost difficult involves selecting a dataprocessing system to power a varietyof analytical applications (see Chapter 4, “The New Analytical Ecosystem:Making Way for Big Data”).In the past, these types of decisions revolved around selecting oneof a handful of leading relationaldatabase management systems topower a data warehouse or datamart. Often, the choice boiled downto internal politics as much as technical functionality.Today the options aren’t as straightforward, although politics may stillplay a role. Instead of selecting a singledata management product, BI managers may need to select multiple platforms to outfit an expanding analyticalecosystem. And rather than evaluatingfour or five alternatives for each platform, the BI manager is faced with dozens of viable options in each category.The once lazy database market is nowa beehive of activity.Staying abreast of all the new products, partnerships and technologicaladvances is now a full-time job. Industry analysts who make a living siftingthrough products in emerging marketsare needed now more than ever. Mostanalysts will tell you that the first stepin selecting an analytical platform isto understand the broad categories ofproducts in the marketplace, and thenmake finer distinctions from there (seefigure 1, page 11).At a high-level, there are four categories of analytical processing systemsthat are available today: transactionalRDBMSes. The following describesthose categories and can be used as astarting point when creating a short listof products during a product evaluation process:BIG DATA AND ITS IMPACT ON DATA WAREHOUSING  10

chapter 3: Categorizing Big Data Processing Systemsfigure 1.Database/Platform lems,Big DataMakes it HugeOLTP databasesAnalytical platformsHadoopNoSQLOracle, DB2, SQLServerNetezza, Vertica, Exadata,Teradata appliancesCloudera, EMC, IBM,HortonworksCassandra, MongoDB,MarkLogic, Aster DataTransactionsystemsEnterprise datawarehouse to replaceMySQL or SQL Server inf tfast‐growingi companiesiOnline data archive for alldata (but mostlyunstructured)E tEnterprisei datad twarehouse hubAnalytical data marts tooffload the DWTwo Marketsfor Big Data:Comparing ValuePropositionsFree‐standing analyticalsandboxes (big data,extreme performance)Anal tical ssystemAnalyticalstem whenhenyou want to query all theraw data (Hbase, Hive)Analytical system whenyou can’t wait until datais modeled and put in thedata warehouse (Hbase,Hive)CategorizingBig DataProcessingSystemsThe NewAnalyticalEcosystem:Making Wayfor Big DataStaging area to feed thedata warehouseKey value pair databasesfor rapid data capture andanalysisl iDocument databases forhigh‐performanceapplication transactionsGraph systems thatcapture relationshipsamong entitiesSearch databases forquerying structured andunstructured dataHybrid SQL‐MapReducedatabases1. Transactional RDBM SystemsTransactional RDBMSes were originally designed to support transaction processing applications,although most have been retrofittedwith various types of indexes, joinpaths and custom SQL bolt-ons tomake them more palatable to analytical processing. There are two typesof transactional RDBMSes: enterprise and departmental.11Enterprise hubs. The traditionalenterprise RDBMSes, such asthose from IBM, Oracle and Sybase, are best suited as data warehousing hubs that feed a variety ofdownstream, end-user-facing systems, but don’t handle query traffic directly. Although retrofittedwith analytical capabilities, thesesystems often hit performanceand scalability walls when used forquery processing along with otherworkloads and are expensive toupgrade and replace. Thus, manycustomers now use these “graybearded” data warehousing sys-BIG DATA AND ITS IMPACT ON DATA WAREHOUSING  11

chapter 3: Categorizing Big Data Processing Systemstems as hubs to feed operationaldata stores, data marts, enterprisereporting systems, analytical sandboxes and various analytical andtransactional applications.11Departmental marts. A numberDespiteProblems,Big DataMakes it HugeTwo Marketsfor Big Data:Comparing ValuePropositionsCategorizingBig DataProcessingSystemsThe NewAnalyticalEcosystem:Making Wayfor Big Dataof companies use Microsoft SQLServer or MySQL as data marts fedby an enterprise data warehouse oras standalone data warehouses fora business unit or small and medium-sized business (SMB). Liketheir enterprise brethren, thesesystems also often hit the wallwhen usage, data volumes or querycomplexity increases rapidly. Afast-growing business unit or SMBoften replaces these transactionalRDBMSes with analytical appliances (see below) which providethe same or greater level of simplicity and ease of management asSQL Server or MySQL.2. Analytical PlatformsAnalytic platforms represent the firstwave of big data systems (see Chapter2, “Two Markets for Big Data: Comparing Value Propositions”). Theseare purpose-built SQL-based systemsdesigned to provide superior priceperformance for analytical workloadscompared with transactional RDBMSes. There are many types of analytical platforms. Most are being used asdata warehousing replacements orstandalone analytical systems.11MPP database. Massively parallelprocessing (MPP) databases withstrong mixed workload utilitiesmake good enterprise data warehouses for analytically minded organizations. Teradata was the firston the block with such a system,but it now has many competitors,including EMC Greenplum andMicrosoft’s Parallel Data Warehousing option, which are relativeupstarts compared to the 30-yearold Teradata.11Analytical appliance. These pur-pose-built analytical systemscome as an integrated hardwaresoftware combination tuned foranalytical workloads. Analyticalappliances come in many shapes,sizes and configurations. Some,like IBM Netezza, EMC Greenplumand Oracle Exadata, are more general-purpose analytical machinesthat can serve as replacements formost data warehouses. Others,such as those from Teradata, aregeared to specific analytical workloads and can deliver extremelyfast performance or manage superlarge data volumes.11In-memory systems. If you arelooking for raw performance, thereis nothing better than a system thatlets you put all your data into memory. These systems will soon become more commonplace, thanksBIG DATA AND ITS IMPACT ON DATA WAREHOUSING  12

chapter 3: Categorizing Big Data Processing Systems11to SAP, which is betting its busi-DespiteProblems,Big DataMakes it Hugesuch as SAP’s Sybase IQ, HewlettPackard’s Vertica, ParAccel, Infobright, Exasol, Calpont and Sandoffer fast performance for manytypes of queries because of theway these systems store and compress data—by columns insteadof rows. Column storage and processing is fast becoming a RDBMSfeature rather than a distinct subcategory of products.components to turn Hadoop into anenterprise-caliber, data processingenvironment. The collection of thesecomponents is called a Hadoop distribution. Leading providers of Hadoop distributions include Cloudera,IBM, EMC, Amazon, Hortonworksand MapR.Today, in most customer installations, Hadoop serves as a stagingarea and online archive for unstructured and semi-structured data, aswell as an analytical sandbox fordata scientists who query Hadoopfiles directly before the data is aggregated or loaded into the datawarehouse. But this could change.Hadoop will play an increasingly important role in the analytical ecosystem at most companies, eitherworking in concert with an enterprise DW or assuming most dutiesof one.3. Hadoop DistributionsHadoop is an open source softwareproject run within The Apache Software Foundation for processingdata-intensive applications in a distributed environment with built-inparallelism and failover. The mostimportant parts of Hadoop are theHadoop Distributed File System(HDFS), which stores data in files ona cluster of servers, and MapReduce,a programming framework for building parallel applications that run onHDFS. The open source community is building numerous additional4. NoSQL DatabasesNoSQL—shorthand for “not onlySQL”—is the name given to a broadset of databases whose only commonthread is that they don’t require SQLto process data, although some support both SQL and non-SQL formsof data processing. There are manytypes of NoSQL databases, and thelist grows longer every month. Thesespecialized systems are built usingeither proprietary and open sourcecomponents or a mix of both. In mostcases, they are designed to overcomethe limitations of traditional RDBMesness on HANA, an in-memorydatabase for transactional andanalytical processing, and is evangelizing the need for in-memorysystems. Another contender in thisspace is Kognitio. Many RDBMSes are better exploiting memoryfor caching results and processingqueries.11Columnar. Columnar databases,Two Marketsfor Big Data:Comparing ValuePropositionsCategorizingBig DataProcessingSystemsThe NewAnalyticalEcosystem:Making Wayfor Big DataBIG DATA AND ITS IMPACT ON DATA WAREHOUSING  13

chapter 3: Categorizing Big Data Processing Systemsto handle unstructured and semistructured data. Here’s a partial listing of NoSQL systems:11Key value pair databases. TheseDespiteProblems,Big DataMakes it HugeTwo Marketsfor Big Data:Comparing ValuePropositionsCategorizingBig DataProcessingSystemsThe NewAnalyticalEcosystem:Making Wayfor Big Datasystems store data as a simple record structure consisting of a keyand content. These are used for operational applications that involvelarge volumes of data, flexible datastructures and

BIG DATA AND ITS IMPACT ON DATA WAREHOUSING 3 CHAPTER 1: DESPITE PROBlEMS, BIG DATA MAkES IT HUGE traditional data warehousing environ-ments, but without much luck. Finally, Hadoop is a load-and-go en-vironment: Administrators can dump the data into Hadoop without having to convert it into a particular struc-ture. Then users (or data scientists)