Leveraging Clouds For Small And Big Data - 6 December 2011

Transcription

DB2 & Big Data Chat with the LabLeveraging Cloudsfor Small and Big Data- 6 December 2011IM Cloud Computing Center of CompetenceIMcloud@ca.ibm.com

2 Featured SpeakersLeon KatsnelsonProgram Director,IM Cloud Computing and Emerging TechnologiesIBMUri BudnikDirector,ISV Partner ProgramRightScale 2011 IBM Corporation

3Agenda DB2 and Cloud The Big Data Challenge IBM’s Approach to Big Data Big Data on Cloud RightScale Cloud Management Platform 2011 IBM Corporation

4DB2 ON CLOUD 2011 IBM Corporation

5 many reasons clients consider Cloud ComputingWe need toreduce our ITcostsNeed to supportremote teamsWe need toimprovebusinessagilityWe need tointegrate webborn dataWe are constrainedon space, energyand cooling in ourdata centerWe are alwaysconstrained onresources forproper Q&A of oursolutionsWe need toreduce ourCAPEXWe need todeliver betterresiliency 2011 IBM Corporation

6DB2 Strategy for Cloud Computing6 2011 IBM Corporation

7DB2: Ready for Any Cloud7 2011 IBM Corporation

8Free DB2-basedWebsite for a year All-in-one instance (server): DB2 Express-CWeb application server (Nginx,Ruby on Rails)Content Management System(RadiantCMS)Runs on Amazon EC2 Microinstance: 613MB of memoryUp to 2 EC2 Compute Units(for short periodic bursts)EBS storage32-bit or 64-bit platform Free for a year for new AWScustomers ng-radiant-cmson-db2-in-the-cloud/ 2011 IBM Corporation

9Dogfooding@your place, @your pace DB2skills acquisitionCommunity operated property10,500 Registered StudentsRuns on the cloud, Managedby RightSaleLeverages DB2 technologiese.g. HADRContinuous availability sincelaunch in January 20119 2011 IBM Corporation

10DB2 Galileo EarlyExperience ProgramClients get hands-on experience withthe upcoming version of DB2 in 30minutes No need to purchase, rack and cableservers No operating system patching No need to install DB2 Always use the latest drop i.e. nowasted time dealing with resolvedissuesHow: Apply for DB2 Galileo EarlyExperience on-the-cloud/ Email IMcloud@ca.ibm.com torequest 500 credit Enjoy your own private virtual DB2server in one of the data centers WW10From zero to DB2Galileo in about 30minutes 2011 IBM Corporation

11DB2 on Cloud Resources IM Cloud Computing Center of Competence– IMcloud@ca.ibm.com DB2 on Cloud homepage– www.ibm.com/db2/cloud 2011 IBM Corporation

12THE “BIG DATA” CHALLENGE 2011 IBM Corporation

13What we hear from customers . . . . Lots of potentially valuable data is dormant ordiscarded due to size/performance considerations Large volume of unstructured or semi-structureddata is not worth integrating fully (e.g. Tweets,logs, . . .) Not clear what should be analyzed (exploratory,iterative) Information distributed across multiple systemsand/or Internet Some information has a short useful lifespan Volumes can be extremely high Analysis needed in the context of existing information(not stand alone) 2011 IBM Corporation

14Big Data Presents Big OpportunitiesExtract insight from a high volume, variety and velocity of data in atimely and cost-effective mannerVariety:Manage and benefit fromdiverse data types and datastructuresVelocity: Analyze streaming data andlarge volumes of persistentdataVolume: Scale from terabytes tozettabytes 2011 IBM Corporation

15Big Data Scenarios Span Many IndustriesMulti-channel customersentiment and experience aanalysisDetect life-threateningconditions at hospitals intime to intervenePredict weather patterns to planoptimal wind turbine usage, andoptimize capital expenditure onasset placementMake risk decisions based onreal-time transactional dataIdentify criminals and threatsfrom disparate video, audio,and data feeds 2011 IBM Corporation

16Customer EngagementsUse patternsCommon requirements Customer sentiment analysis (crosssell, up-sell, campaign management) Integrated retail and web customerbehavior modeling Predictive modeling (credit card fraud) System log analytics (reduceoperational risk) Extract business insight from large volumes ofraw data (often outside operational systems) Integrate with other existing software Ready for enterprise useConsumerInsightText, Blog, WeblogClick streamsLog & transactionsBiological SequencesOperational system & streams data sourcesMulti-channelsalesNext GenFraud ModelsNew BusinessDevelopmentText AnalyticsStatistical ModelBuilding 2011 IBM Corporation

17Big DataIBM’S APPROACH 2011 IBM Corporation

18Big Data: an integral part of an enterprise data platform Manage Big Data from the instant it enters the enterprise High fidelity – no changes to original format Available for new uses, analyses, and integrations.CognosOperationalData StoreBig Data ApplicationsWarehouseApplicationsWarehouseBig Data PlatformClient and Partner SolutionsIBM Big Data SolutionsBig Data User EnvironmentDevelopersEnd UsersAdmin.Traditional data sources(ERP, CRM, databases,etc.)Big Data Enterprise rce data (Web, sensors, logs, media, etc. ) 2011 IBM Corporation

19IBM’s Platform Addresses Key Requirements1. Platform for V3 – Variety, Velocity, Volume Variety - manage data & content “As Is” Handle any velocity - low-latency streams and large volume batch Volume - huge volumes of at-rest or streaming dataBig Data Platform2. Analytics for V3 Analyze Sources in their native format - text, data, rich content Analyze all of the data - not just a subset Dynamic analytics - automatic adjustments and actions3. Ease of Use for Developers and Users Developer UIs, common languages & automatic optimization End-user UIs & visualization4. Enterprise Class Failure tolerance, Security and Privacy Scale Economically5. Extensive Integration Capabilities Integrate wide variety of sources Leverage enterprise integration technologies 2011 IBM Corporation

Platform VisionIBM Big Data SolutionsClient and Partner SolutionsRules / BPMiLog & LombardiDataWarehouseBig Data ngGeospatialTimes SeriesBlue PrintsBig Data Enterprise EnginesManagementProvisioningAdminToolsMaster DataMgmtInfoSphere ngActivityMonitorJobTrackingIdentity &Access MgmtInfoSphere MDMDatabaseDB2 & ivity Tools & OptimizationWorkloadManagement &OptimizationIBM & non-IBMMathematicalApplicationsInfoSphere nInformation Server20Cognos & SPSSMarketingUnicaData GrowthManagementInfoSphere Optim 2011 IBM Corporation

21BigInsights Summary BigInsights analytical platform for persistent “Big Data”– Based on open source & IBM technologies– Managed like a start-up . . . . Emphasis on deep customer engagements,product plan flexibility Distinguishing characteristics– Built-in analytics . . . . Enhances business knowledge– Enterprise software integration . . . . Complements and extends existingcapabilities– Production-ready platform with tooling for analysts, developers, andadministrators. . . . Speeds time-to-value; simplifies development andmaintenance IBM advantage– Combination of software, hardware, services and advanced research 2011 IBM Corporation

22InfoSphere BigInsightsPlatform for volume, variety,velocity -- V3 Enhanced Hadoop foundationAnalytics for V3Enterprise EditionLicensed Text analytics & toolingUsability Integrated install Spreadsheet-style tool Ready-made “apps”Enterprise Class Storage, security, clustermanagementIntegrationEnterprise class Web consoleBusiness process accelerators (“Apps”)Text analyticsSpreadsheet-style analysis toolRDBMS, warehouse connectivityIntegrated Web-based consoleBasic EditionFlexible job schedulerPerformance enhancementsFree downloadEclipse-based toolingIntegrated installLDAP authenticationOnline InfoCenter.BigData Univ.ApacheHadoop Connectivity to DB2, Netezza,JDBC databasesBreadth of capabilities 2011 IBM Corporation

23Big DataON CLOUD 2011 IBM Corporation

24 Hadoop is great at getting the value out of large volumes of data“94% of Hadoop users perform analyticson large volumes of data not possiblebefore; 88% analyze data in greaterdetail; while 82% can now retain more oftheir data”Ventana Research Benchmark Study “Hadoop and Information Management” 201124 2011 IBM Corporation

25 but it requires large scale compute clusters and lots of storage Yahoo! Has over 40000 nodes running Hadoop managing180-200 petabytes of data. Facebook has 2000 compute nodes with 20 Petabytes ofdata. Average cluster size is 200 servers but most clusters around30 servers Most companies can’t make upfront capital investment incompute, storage and networking resources:– Makes it hard to evaluate and pilot– Difficult to develop real hands on skill– Impractical for non continuous needs 2011 IBM Corporation

26VALUE: The Fourth V of the Big Data StoryWhat insight couldyou gain if you hadfull use of a 100node Hadoop clusterfor an hour?What if one hour ofthis 100-node clusterwould cost 34?26 2011 IBM Corporation

27BigInsights Strategy for Cloud Computing27 2011 IBM Corporation

28IBM BigInsights: Ready for Any Cloud28 2011 IBM Corporation

29IBM BigInsights on CloudHadoop for everyone no upfront investment Your own Hadoop cluster on the cloud in less than 30minutes No need to buy install, patch, maintain hardware Deploy on Amazon, IBM, Rackspace or your private cloudinfrastructure Pay as you go for your infrastructure starting at 0.34/node/hour and only pay for what you actually use Use BigInsights Basic Edition at no charge, available low costsupport. Seamlessly transition to BigInsights EnterpriseEdition when ready 2011 IBM Corporation

30IBM BigInsights and RightScale.comHadoop on Amazon, Rackspace, Private & Hybrid CloudInfrastructure Your own Hadoop cluster on thecloud in less than 30 minutes Deploy to Amazon, Rackspacecloud DCs or on your private cloud Pay as you go for yourinfrastructure starting at 0.34/node/hour Sophisticated cloud resourcemanagement by RightScale Use BigInsights Basic Edition seamlessly transition toBigInsights Enterprise Editionwhen ready Take a free course onBigDataUniversity.com30 2011 IBM Corporation

31Available Now!Deploy BigInsights on IBM SmartCloud Enterprise Your own Hadoop cluster on theIBM cloud in less than 30 minutes No need for hardware, install,patch, maintain Locate your Hadoop cluster in oneof IBM Cloud data centers WW Low hourly charges starting at 0.30/cluster/hour. Explore using BigInsights Basicseamlessly transition toBigInsights Enterprise when ready Take a free course onBigDataUniversity.com31 2011 IBM Corporation

32Evaluating Big Data TechnologyIBM IM Demo Cloud (in limited beta) IBM representative initiates theproject, invites customer to participate. Types of projects: Demos, Proof ofTechnology, Proof of Concept: Also great for Beta, ProductIntroductions, Technology Previews Skip construction and go straight toevaluation. No need to get hardware,install/patch OS, BigInsights software,write MapReduce jobs, data sets BigInsights sandbox:– Dedicated cluster of virtual systems. Not ashared sandbox– Customer operates the systems as if local– Ability to bring large data sets– Data centers in NA, EU and AP32 2011 IBM Corporation

33BigInsights on the CloudMaking Learning Hadoop Easy and Fun Flexible on-line deliveryallows learning @your placeand @your pace Free courses, free studymaterials. Cloud-based sandbox forexercises – zero setup 10500 registered students. Hadoop ProgrammingChallenge - 3 students sentto IOD 2011 in Las Vegas,all expenses paid!33 2011 IBM Corporation

34 in Summary Big Data analytics using technologies such as Hadoop can delivergreat value but its appeal is somewhat diminished by the extensivecapital requirements Cloud is a great way to sidestep the CAPEX challenges IBM InfoSphere BigInsights, enterprise-ready distribution ofHadoop is available for cloud deployment on Amazon, IBMSmartCloud Enterprise, Rackspace and on private clouds IBM helps clients leverage the power of Hadoop and ease ofdeployment of the cloud. Contact IM Cloud Computing Center ofCompetence IMCloud@ca.ibm.com. BigDataUniversity.com is a great place to start skills acquisitionwhen budget, time or travel is an issue.34 2011 IBM Corporation

35Resources Use IBM BigInsights on IBM SmartCloud Enterprise– hp?id 310 Use IBM BigInsights on Amazon, Rackspace or your private cloud:– hp?id 309 Learn Hadoop and other Big Data technologies– http://BigDataUniversity.com Get free help from IM Cloud Computing Center of Competence:– IMcloud@ca.ibm.com Download Free IBM BigInsights Basic Edition– insights/basic.html Visit home page for IBM BigInsights on ibm.com:– insights/ 2011 IBM Corporation

36Cloud Management PlatformRIGHTSCALE 2011 IBM Corporation

37RightScaleReal Customers, Real Deployments, Real Benefits Managed Cloud Deployments for 4 Years — globally More than 45,000 users; launched more than 3MM servers! Powering the largest production deployments on the cloud 2011 IBM Corporation

38Complete Systems Management 2011 IBM Corporation

39What do we Mean by Cloud Computing?RightScale 2011 IBM Corporation

40RightScale Manages IaaS CloudsRightScale 2011 IBM Corporation

41Take advantage of many resource attleSeoulNYC AreaSF AreaTokyoDC AreaFukuokaDallasSingaporeHoustonPrivate CloudsHong KongHyderabadPublic & Managed CloudsAm azon Web ServicesRackspaceDatapipeSoftLayerYahoo! Japan / IDCFTataKorea TelecomUnGeoLogicworks 2011 IBM Corporation

42ServerTemplates: Reproducible servers on demand Dynamic configuration Abstract role and behaviorfrom cloud infrastructure Predictable deployment Cloud agnostic / portable Object-oriented programmingfor sysadmins 2011 IBM Corporation

43ServerTemplates: Reproducible servers on demandConfiguring serversthrough bundling images:Configuring serverswith ServerTemplates:DB2DB2 Express-CExpress-C 9.7.49.7.4 (CentOS(CentOS 5.2)5.2)DB2Express-C9.7.4(CentOS5.4)DB2 Express-C 9.7.4 (CentOS 5.4)DB2DB2 Express-CExpress-C 9.7.49.7.4 (Ubuntu(Ubuntu 8.10)8.10)FrontendFrontend ApacheApache 1.31.3 (Ubuntu(Ubuntu 8.10)8.10)FrontendFrontend ApacheApache 2.02.0 (Ubuntu(Ubuntu 9.10)9.10) -patched-patchedCMSCMS v1.0v1.0 (CentOS(CentOS 5.4)5.4)Setup DNS and IPsboot sequenceBigInsightsBigInsights (CentOS(CentOS 5.4)5.4)BigInsightsBigInsights (Ubuntu(Ubuntu 8.10)8.10)A setRestoreof configurationlast backupdirectives that will installConfigureDB2 onand configuresoftwaretop of the base imageInstall DB2Install monitoringCMSCMS v1.1v1.1 (CentOS(CentOS 5.4)5.4)MyMy ASPASP appserverappserver (windows(windows 2008)2008)MyMy ASP.netASP.net (windows(windows 2008)2008) –– securitysecurity updateupdate 11MyMy ASP.netASP.net (windows(windows 2008)2008) –– securitysecurity updateupdate 88SharePointSharePoint v4v4 (windows(windows 2003)2003) –– 32bit32bitSharePointSharePoint v4v4 (windows(windows 2003)2003) –64bit–64bitSharePointSharePoint v4.5v4.5 (windows(windows 2003)2003) –64bit–64bit BaseBase Very few and basicbasicCentOSCentOS 5.25.2CentOSCentOS 5.45.4UbuntuUbuntu 8.108.10UbuntuUbuntu 9.109.10WinWin 20032003WinWin 20072007 2011 IBM Corporation

44ServerTemplates – Reproducible servers on demandVS.Integrated approach that puts together all the parts needed toarchitect single & multi-server deployments 2011 IBM Corporation

45Not just single servers — complete environments 2011 IBM Corporation

46RightScale Runs in Your Web Browser 2011 IBM Corporation

47Organize Servers in Deployments 2011 IBM Corporation

48Organize Servers in Deployments 2011 IBM Corporation

49Manage Systems, not Servers 2011 IBM Corporation

50 2011 IBM Corporation

51 2011 IBM Corporation

52Free Training at Your Own Pace RightScale is a Sponsor of Big Data University Free Course on Hadoop in the Cloud Includes: RightScale IBM BigInsights Amazon EC2 25 credits (limited quantity) Go to: BigDataUniversity.com Course Code: BD005EN Direct Link: ew.php?id 309 2011 IBM Corporation

53Resources – Learn More Web Resources: RightScale.com/webinars RightScale.com/whitepapers Vimeo.com/RightScale IBM.com/smartercomputing BigDataUniversity.comContact UsRightScalesales@rightscale.com1-866-720-0208IBM IM Cloud Team Follow presenters on TwitterIMcloud@ca.ibm.com @uribudnik @katsnelson 2011 IBM Corporation

54 Questions54 2011 IBM Corporation

55Additional / Backup Slides 2011 IBM Corporation

56Two Really “Hot” Areas: Cloud and Data “Big Data will earn its place as the next ‘must have’ competency in 2012″ (IDC) “Through 2015, more than 85 percent of Fortune 500 organizations will fail to effectively exploitbig data for competitive advantage” (Gartner). “2012 is likely to be a busy year for Big Data-driven mergers and acquisitions” (IDC) “[In 2012,] 80% of new commercial enterprise apps will be deployed on cloud platforms” (IDC) IP traffic over data center networks will reach 4.8 zettabytes a year by 2015, and cloudcomputing will account for one-third of it, or 1.6 zettabytes. (CISCO) “Amazon Web Services [will] exceed 1 billion in cloud services business in 2012 with Google’sEnterprise business to follow within 18 months” (IDC). 2011 IBM Corporation

57Merging the Traditional and Big Data ApproachesTraditional ApproachBig Data ApproachStructured & Repeatable AnalysisIterative & Exploratory AnalysisBusiness UsersDetermine whatquestion to askITDelivers a platform toenable creativediscoveryITBusinessStructures thedata to answerthat questionExplores whatquestions could beaskedMonthly sales reportsProfitability analysisCustomer surveysBrand sentimentProduct strategyMaximum asset utilization 2011 IBM Corporation

58BigInsights ContentFunctionVersionIntegrated Install*BasicEditionEnterpriseEditionIncIncHadoop (including common utilities, HDFS, MapReduce framework)0.20.2IncIncJaql (programming / query language)0.5.2IncIncPig (programming / query language)*0.8.1IncIncFlume (data collection/aggregation)0.9.1IncIncHive (data summarization/querying)*0.7.1IncIncLucene (text search)*3.3.0IncIncZookeeper (process coordination)3.3.3IncIncAvro (data serialization)1.5.1IncIncHBase (real time read/write)*0.90.4IncIncOozie (workflow/ job orchestration)*2.3.1IncIncOnline documentation*IncIncCapability to integrate with JDBC sources through general-purposeJaql moduleIncIncCapability to integrate with DB2, InfoSphere Warehouse (DB2 UDFsamples to submit jobs, and read results from BigInsights)IncInc*New or upgraded 2011 IBM Corporation

59BigInsights Content tionCapability to integrate with R (Jaql module to invoke R statisticalcapabilities from BigInsights)n/aIncCapability to integrate with Netezza, DB2 LUW with DPF from Jaqln/aIncLDAP authentication and additional security features*n/aIncIntegrated Web Console*n/aIncIntegrated workflow capabilities and flexible job schedulern/aIncPlatform performance enhancements (Adaptive MapReduce,efficient processing of compressed text files, large-scale textindexing, etc.)*n/aIncText analyticsn/aIncEclipse plugins for text analytic development, Jaql, Hive, Java*n/aIncReady-made “apps” for data import/export, Web crawl,Boardreader., etc. *n/aIncWeb-based application catalog*n/aIncSpreadsheet-like analytical tool *n/aIncIBM supportOptIncUnlimited storagen/aInc*New or upgraded 2011 IBM Corporation

60BigInsights: Value Beyond Open Source Technical differentiators– Built-in analytics Text processing engine, annotators, Eclipse tooling Interface to project R (statistical platform)– Enterprise software integration (DBMS, warehouse)– Spreadsheet-style analytical tool for analysts– Ready-made business process accelerators– Integrated installation of supported open source and IBM components– Web Console for administration and application access– Platform enrichment: additional security, performance features, . . .– Standard IBM licensing agreement and world-class support– More to come in future releases! Business benefits– Quicker time-to-value due to IBM technology and support– Reduced operational risk– Enhanced business knowledge with flexible analytical platform– Leverages and complements existing software assets 2011 IBM Corporation

61BigInsights and the data warehouseBig ta warehouseBigInsightsFilterSummarizeAggregate 2011 IBM Corporation

62BigInsights and the data warehouseTraditionalanalytictoolsBig DataanalyticapplicationsBigInsightsData Warehouse Query-ready archive for “cold” warehouse data 2011 IBM Corporation

Leveraging Clouds for Small and Big Data - 6 December 2011 IM Cloud Computing Center of Competence IMcloud@ca.ibm.com DB2 & Big Data Chat with the Lab