Business Intelligence And Analytics From Big Data To Big Impact

Transcription

SPECIAL ISSUE: BUSINESS INTELLIGENCE RESEARCHBUSINESS INTELLIGENCE AND ANALYTICS:FROM BIG DATA TO BIG IMPACTHsinchun ChenEller College of Management, University of Arizona,Tucson, AZ 85721 U.S.A. {hchen@eller.arizona.edu}Roger H. L. ChiangCarl H. Lindner College of Business, University of Cincinnati,Cincinnati, OH 45221-0211 U.S.A. {chianghl@ucmail.uc.edu}Veda C. StoreyJ. Mack Robinson College of Business, Georgia State University,Atlanta, GA 30302-4015 U.S.A. {vstorey@gsu.edu}Business intelligence and analytics (BI&A) has emerged as an important area of study for both practitionersand researchers, reflecting the magnitude and impact of data-related problems to be solved in contemporarybusiness organizations. This introduction to the MIS Quarterly Special Issue on Business Intelligence Researchfirst provides a framework that identifies the evolution, applications, and emerging research areas of BI&A.BI&A 1.0, BI&A 2.0, and BI&A 3.0 are defined and described in terms of their key characteristics andcapabilities. Current research in BI&A is analyzed and challenges and opportunities associated with BI&Aresearch and education are identified. We also report a bibliometric study of critical BI&A publications,researchers, and research topics based on more than a decade of related academic and industry publications.Finally, the six articles that comprise this special issue are introduced and characterized in terms of theproposed BI&A research framework.Keywords: Business intelligence and analytics, big data analytics, Web 2.0IntroductionBusiness intelligence and analytics (BI&A) and the relatedfield of big data analytics have become increasingly importantin both the academic and the business communities over thepast two decades. Industry studies have highlighted thissignificant development. For example, based on a survey ofover 4,000 information technology (IT) professionals from 93countries and 25 industries, the IBM Tech Trends Report(2011) identified business analytics as one of the four majortechnology trends in the 2010s. In a survey of the state ofbusiness analytics by Bloomberg Businessweek (2011), 97percent of companies with revenues exceeding 100 millionwere found to use some form of business analytics. A reportby the McKinsey Global Institute (Manyika et al. 2011) predicted that by 2018, the United States alone will face a shortage of 140,000 to 190,000 people with deep analytical skills,as well as a shortfall of 1.5 million data-savvy managers withthe know-how to analyze big data to make effective decisions.Hal Varian, Chief Economist at Google and emeritus professor at the University of California, Berkeley, commented onthe emerging opportunities for IT professionals and studentsin data analysis as follows:MIS Quarterly Vol. 36 No. 4, pp. 1165-1188/December 20121165

Chen et al./Introduction: Business Intelligence ResearchSo what’s getting ubiquitous and cheap? Data. Andwhat is complementary to data? Analysis. So myrecommendation is to take lots of courses about howto manipulate and analyze data: databases, machinelearning, econometrics, statistics, visualization, andso on.1The opportunities associated with data and analysis in different organizations have helped generate significant interestin BI&A, which is often referred to as the techniques, technologies, systems, practices, methodologies, and applicationsthat analyze critical business data to help an enterprise betterunderstand its business and market and make timely businessdecisions. In addition to the underlying data processing andanalytical technologies, BI&A includes business-centricpractices and methodologies that can be applied to varioushigh-impact applications such as e-commerce, market intelligence, e-government, healthcare, and security.This introduction to the MIS Quarterly Special Issue onBusiness Intelligence Research provides an overview of thisexciting and high-impact field, highlighting its many challenges and opportunities. Figure 1 shows the key sectionsof this paper, including BI&A evolution, applications, andemerging analytics research opportunities. We then reporton a bibliometric study of critical BI&A publications,researchers, and research topics based on more than a decadeof related BI&A academic and industry publications. Education and program development opportunities in BI&A arepresented, followed by a summary of the six articles thatappear in this special issue using our research framework.The final section concludes the paper.BI&A Evolution: Key Characteristicsand CapabilitiesThe term intelligence has been used by researchers inartificial intelligence since the 1950s. Business intelligencebecame a popular term in the business and IT communitiesonly in the 1990s. In the late 2000s, business analytics wasintroduced to represent the key analytical component in BI(Davenport 2006). More recently big data and big dataanalytics have been used to describe the data sets and analytical techniques in applications that are so large (fromterabytes to exabytes) and complex (from sensor to socialmedia data) that they require advanced and unique data1“Hal Varian Answers Your Questions,” February 25, 2008 -answers-your-questions/).1166MIS Quarterly Vol. 36 No. 4/December 2012storage, management, analysis, and visualization technologies. In this article we use business intelligence and analytics (BI&A) as a unified term and treat big data analytics asa related field that offers new directions for BI&A research.BI&A 1.0As a data-centric approach, BI&A has its roots in the longstanding database management field. It relies heavily onvarious data collection, extraction, and analysis technologies(Chaudhuri et al. 2011; Turban et al. 2008; Watson andWixom 2007). The BI&A technologies and applicationscurrently adopted in industry can be considered as BI&A 1.0,where data are mostly structured, collected by companiesthrough various legacy systems, and often stored in commercial relational database management systems (RDBMS). Theanalytical techniques commonly used in these systems,popularized in the 1990s, are grounded mainly in statisticalmethods developed in the 1970s and data mining techniquesdeveloped in the 1980s.Data management and warehousing is considered the foundation of BI&A 1.0. Design of data marts and tools forextraction, transformation, and load (ETL) are essential forconverting and integrating enterprise-specific data. Databasequery, online analytical processing (OLAP), and reportingtools based on intuitive, but simple, graphics are used toexplore important data characteristics. Business performancemanagement (BPM) using scorecards and dashboards helpanalyze and visualize a variety of performance metrics. Inaddition to these well-established business reporting functions, statistical analysis and data mining techniques areadopted for association analysis, data segmentation andclustering, classification and regression analysis, anomalydetection, and predictive modeling in various business applications. Most of these data processing and analytical technologies have already been incorporated into the leading commercial BI platforms offered by major IT vendors includingMicrosoft, IBM, Oracle, and SAP (Sallam et al. 2011).Among the 13 capabilities considered essential for BI platforms, according to the Gartner report by Sallam et al. (2011),the following eight are considered BI&A 1.0: reporting,dashboards, ad hoc query, search-based BI, OLAP, interactivevisualization, scorecards, predictive modeling, and datamining. A few BI&A 1.0 areas are still under active development based on the Gartner BI Hype Cycle analysis foremerging BI technologies, which include data mining workbenchs, column-based DBMS, in-memory DBMS, and realtime decision tools (Bitterer 2011). Academic curricula inInformation Systems (IS) and Computer Science (CS) often

Chen et al./Introduction: Business Intelligence ResearchFigure 1. BI&A Overview: Evolution, Applications, and Emerging Researchinclude well-structured courses such as database managementsystems, data mining, and multivariate statistics.BI&A 2.0Since the early 2000s, the Internet and the Web began to offerunique data collection and analytical research and development opportunities. The HTTP-based Web 1.0 systems,characterized by Web search engines such as Google andYahoo and e-commerce businesses such as Amazon andeBay, allow organizations to present their businesses onlineand interact with their customers directly. In addition toporting their traditional RDBMS-based product informationand business contents online, detailed and IP-specific usersearch and interaction logs that are collected seamlesslythrough cookies and server logs have become a new goldmine for understanding customers’ needs and identifying newbusiness opportunities. Web intelligence, web analytics, andthe user-generated content collected through Web 2.0-basedsocial and crowd-sourcing systems (Doan et al. 2011;O’Reilly 2005) have ushered in a new and exciting era ofBI&A 2.0 research in the 2000s, centered on text and webanalytics for unstructured web contents.An immense amount of company, industry, product, andcustomer information can be gathered from the web andorganized and visualized through various text and web miningtechniques. By analyzing customer clickstream data logs,web analytics tools such as Google Analytics can provide atrail of the user’s online activities and reveal the user’sbrowsing and purchasing patterns. Web site design, productplacement optimization, customer transaction analysis, marketstructure analysis, and product recommendations can beaccomplished through web analytics. The many Web 2.0applications developed after 2004 have also created an abundance of user-generated content from various online socialmedia such as forums, online groups, web blogs, social networking sites, social multimedia sites (for photos and videos),and even virtual worlds and social games (O’Reilly 2005). Inaddition to capturing celebrity chatter, references to everydayevents, and socio-political sentiments expressed in thesemedia, Web 2.0 applications can efficiently gather a largevolume of timely feedback and opinions from a diversecustomer population for different types of businesses.Many marketing researchers believe that social mediaanalytics presents a unique opportunity for businesses to treatthe market as a “conversation” between businesses andcustomers instead of the traditional business-to-customer,one-way “marketing” (Lusch et al. 2010). Unlike BI&A 1.0technologies that are already integrated into commercialenterprise IT systems, future BI&A 2.0 systems will requirethe integration of mature and scalable techniques in textmining (e.g., information extraction, topic identification,opinion mining, question-answering), web mining, socialnetwork analysis, and spatial-temporal analysis with existingDBMS-based BI&A 1.0 systems.MIS Quarterly Vol. 36 No. 4/December 20121167

Chen et al./Introduction: Business Intelligence ResearchExcept for basic query and search capabilities, no advancedtext analytics for unstructured content are currently considered in the 13 capabilities of the Gartner BI platforms.Several, however, are listed in the Gartner BI Hype Cycle,including information semantic services, natural languagequestion answering, and content/text analytics (Bitterer 2011).New IS and CS courses in text mining and web mining haveemerged to address needed technical training.BI&A 3.0Table 1 summarizes the key characteristics of BI&A 1.0, 2.0,and 3.0 in relation to the Gartner BI platforms core capabilities and hype cycle.The decade of the 2010s promises to be an exciting one forhigh-impact BI&A research and development for both industry and academia. The business community and industry havealready taken important steps to adopt BI&A for their needs.The IS community faces unique challenges and opportunitiesin making scientific and societal impacts that are relevant andlong-lasting (Chen 2011a). IS research and education programs need to carefully evaluate future directions, curricula,and action plans, from BI&A 1.0 to 3.0.Whereas web-based BI&A 2.0 has attracted active researchfrom academia and industry, a new research opportunity inBI&A 3.0 is emerging. As reported prominently in anOctober 2011 article in The Economist (2011), the number ofmobile phones and tablets (about 480 million units) surpassedthe number of laptops and PCs (about 380 million units) forthe first time in 2011. Although the number of PCs in usesurpassed 1 billion in 2008, the same article projected that thenumber of mobile connected devices would reach 10 billionin 2020. Mobile devices such as the iPad, iPhone, and othersmart phones and their complete ecosystems of downloadableapplicationss, from travel advisories to multi-player games,are transforming different facets of society, from education tohealthcare and from entertainment to governments. Othersensor-based Internet-enabled devices equipped with RFID,barcodes, and radio tags (the “Internet of Things”) areopening up exciting new steams of innovative applications.The ability of such mobile and Internet-enabled devices tosupport highly mobile, location-aware, person-centered, andcontext-relevant operations and transactions will continue tooffer unique research challenges and opportunities throughoutthe 2010s. Mobile interface, visualization, and HCI(human–computer interaction) design are also promisingresearch areas. Although the coming of the Web 3.0 (mobileand sensor-based) era seems certain, the underlying mobileanalytics and location and context-aware techniques forcollecting, processing, analyzing and visualizing such largescale and fluid mobile and sensor data are still unknown.Several global business and IT trends have helped shape pastand present BI&A research directions. International travel,high-speed network connections, global supply-chain, andoutsourcing have created a tremendous opportunity for ITadvancement, as predicted by Thomas Freeman in his seminalbook, The World is Flat (2005). In addition to ultra-fastglobal IT connections, the development and deployment ofbusiness-related data standards, electronic data interchange(EDI) formats, and business databases and informationsystems have greatly facilitated business data creation andutilization. The development of the Internet in the 1970s andthe subsequent large-scale adoption of the World Wide Websince the 1990s have increased business data generation andcollection speeds exponentially. Recently, the Big Data erahas quietly descended on many communities, from governments and e-commerce to health organizations. With anoverwhelming amount of web-based, mobile, and sensorgenerated data arriving at a terabyte and even exabyte scale(The Economist 2010a, 2010b), new science, discovery, andinsights can be obtained from the highly detailed, contextualized, and rich contents of relevance to any business ororganization.No integrated, commercial BI&A 3.0 systems are foreseen forthe near future. Most of the academic research on mobile BIis still in an embryonic stage. Although not included in thecurrent BI platform core capabilities, mobile BI has beenincluded in the Gartner BI Hype Cycle analysis as one of thenew technologies that has the potential to disrupt the BImarket significantly (Bitterer 2011). The uncertainty associated with BI&A 3.0 presents another unique researchdirection for the IS community.In addition to being data driven, BI&A is highly applied andcan leverage opportunities presented by the abundant data anddomain-specific analytics needed in many critical and highimpact application areas. Several of these promising andhigh-impact BI&A applications are presented below, with adiscussion of the data and analytics characteristics, potentialimpacts, and selected illustrative examples or studies: (1) ecommerce and market intelligence, (2) e-government andpolitics 2.0, (3) science and technology, (4) smart health and1168MIS Quarterly Vol. 36 No. 4/December 2012BI&A Applications: From BigData to Big Impact

Chen et al./Introduction: Business Intelligence ResearchTable 1. BI&A Evolution: Key Characteristics and CapabilitiesBI&A 1.0BI&A 2.0BI&A 3.0Key CharacteristicsDBMS-based, structured content RDBMS & data warehousing ETL & OLAP Dashboards & scorecards Data mining & statistical analysisWeb-based, unstructured content Information retrieval and extraction Opinion mining Question answering Web analytics and webintelligence Social media analytics Social network analysis Spatial-temporal analysisMobile and sensor-based content Location-aware analysis Person-centered analysis Context-relevant analysis Mobile visualization & HCI Gartner BI Platforms CoreCapabilitiesAd hoc query & search-based BIReporting, dashboards & scorecardsOLAPInteractive visualizationPredictive modeling & data miningwell-being, and (5) security and public safety. By carefullyanalyzing the application and data characteristics, researchersand practitioners can then adopt or develop the appropriateanalytical techniques to derive the intended impact. In addition to technical system implementation, significant businessor domain knowledge as well as effective communicationskills are needed for the successful completion of such BI&Aprojects. IS departments thus face unique opportunities andchallenges in developing integrated BI&A research andeducation programs for the new generation of data/analyticssavvy and business-relevant students and professionals (Chen2011a).E-Commerce and Market IntelligenceThe excitement surrounding BI&A and Big Data has arguablybeen generated primarily from the web and e-commercecommunities. Significant market transformation has beenaccomplished by leading e-commerce vendors such Amazonand eBay through their innovative and highly scalable ecommerce platforms and product recommender systems.Major Internet firms such as Google, Amazon, and Facebookcontinue to lead the development of web analytics, cloudcomputing, and social media platforms. The emergence ofcustomer-generated Web 2.0 content on various forums,newsgroups, social media platforms, and crowd-sourcingsystems offers another opportunity for researchers and prac- Gartner Hype CycleColumn-based DBMSIn-memory DBMSReal-time decisionData mining workbenches Information semanticservices Natural language questionanswering Content & text analytics Mobile BItitioners to “listen” to the voice of the market from a vastnumber of business constituents that includes customers, employees, investors, and the media (Doan et al. 2011; O’Rielly2005). Unlike traditional transaction records collected fromvarious legacy systems of the 1980s, the data that e-commercesystems collect from the web are less structured and oftencontain rich customer opinion and behavioral information.For social media analytics of customer opinions, text analysisand sentiment analysis techniques are frequently adopted(Pang and Lee 2008). Various analytical techniques have alsobeen developed for product recommender systems, such asassociation rule mining, database segmentation and clustering,anomaly detection, and graph mining (Adomavicius andTuzhilin 2005). Long-tail marketing accomplished byreaching the millions of niche markets at the shallow end ofthe product bitstream has become possible via highly targetedsearches and personalized recommendations (Anderson2004).The Netfix Prize competition2 for the best collaborativefiltering algorithm to predict user movie ratings helped generate significant academic and industry interest in recommendersystems development and resulted in awarding the grand prizeof 1 million to the Bellkor’s Pragmatic Chaos team, which2Netflix Prize php?id 1537; accessed July 9, 2012).MIS Quarterly Vol. 36 No. 4/December 20121169

Chen et al./Introduction: Business Intelligence Researchsurpassed Netflix’s own algorithm for predicting ratings by10.06 percent. However, the publicity associated with thecompetition also raised major unintended customer privacyconcerns.Much BI&A-related e-commerce research and developmentinformation is appearing in academic IS and CS papers aswell as in popular IT magazines.astrophysics and oceanography, to genomics and environmental research. To facilitate information sharing and dataanalytics, the National Science Foundation (NSF) recentlymandated that every project is required to provide a datamanagement plan. Cyber-infrastructure, in particular, hasbecome critical for supporting such data-sharing initiatives.The 2012 NSF BIGDATA3 program solicitation is an obviousexample of the U.S. government funding agency’s concertedefforts to promote big data analytics. The programE-Government and Politics 2.0The advent of Web 2.0 has generated much excitement forreinventing governments. The 2008 U.S. House, Senate, andpresidential elections provided the first signs of success foronline campaigning and political participation. Dubbed“politics 2.0,” politicians use the highly participatory andmultimedia web platforms for successful policy discussions,campaign advertising, voter mobilization, event announcements, and online donations. As government and politicalprocesses become more transparent, participatory, online, andmultimedia-rich, there is a great opportunity for adoptingBI&A research in e-government and politics 2.0 applications.Selected opinion mining, social network analysis, and socialmedia analytics techniques can be used to support onlinepolitical participation, e-democracy, political blogs andforums analysis, e-government service delivery, and processtransparency and accountability (Chen 2009; Chen et al.2007). For e-government applications, semantic informationdirectory and ontological development (as exemplified below)can also be developed to better serve their target citizens.Despite the significant transformational potential for BI&A ine-government research, there has been less academic researchthan, for example, e-commerce-related BI&A research. Egovernment research often involves researchers from politicalscience and public policy. For example, Karpf (2009) analyzed the growth of the political blogosphere in the UnitedStates and found significant innovation of existing politicalinstitutions in adopting blogging platforms into their Webofferings. In his research, 2D blogspace mapping with composite rankings helped reveal the partisan makeup of theAmerican political blogsphere. Yang and Callan (2009)demonstrated the value for ontology development for government services through their development of the OntoCopsystem, which works interactively with a user to organize andsummarize online public comments from citizens.Science and TechnologyMany areas of science and technology (S&T) are reaping thebenefits of high-throughput sensors and instruments, from1170MIS Quarterly Vol. 36 No. 4/December 2012aims to advance the core scientific and technologicalmeans of managing, analyzing, visualizing, and extracting useful information from large, diverse, distributed and heterogeneous data sets so as to accelerate the progress of scientific discovery and innovation; lead to new fields of inquiry that would nototherwise be possible; encourage the development ofnew data analytic tools and algorithms; facilitatescalable, accessible, and sustainable data infrastructure; increase understanding of human and socialprocesses and interactions; and promote economicgrowth and improved health and quality of life.Several S&T disciplines have already begun their journeytoward big data analytics. For example, in biology, the NSFfunded iPlant Collaborative4 is using cyberinfrastructure tosupport a community of researchers, educators, and studentsworking in plant sciences. iPlant is intended to foster a newgeneration of biologists equipped to harness rapidly expanding computational techniques and growing data sets toaddress the grand challenges of plant biology. The iPlant dataset is diverse and includes canonical or reference data,experimental data, simulation and model data, observationaldata, and other derived data. It also offers various opensource data processing and analytics tools.In astronomy, the Sloan Digital Sky Survey (SDSS)5 showshow computational methods and big data can support andfacilitate sense making and decision making at both themacroscopic and the microscopic level in a rapidly growingand globalized research field. The SDSS is one of the mostambitious and influential surveys in the history of astronomy.3“Core Techniques and Technologies for Advancing Big Data Science &Engineering (BIGDATA),” Program Solicitation NSF 12-499 m; accessed August 2, 2012).4iPlant Collaborative (http://www.iplantcollaborative.org/about; accessedAugust 2, 2012).5“Sloan Digital Sky Survey: Mapping the Universe” (http://www.sdss.org/;accessed August 2, 2012).

Chen et al./Introduction: Business Intelligence ResearchOver its eight years of operation, it has obtained deep, multicolor images covering more than a quarter of the sky andcreated three-dimensional maps containing more than 930,000galaxies and over 120,000 quasars. Continuing to gather dataat a rate of 200 gigabytes per night, SDSS has amassed morethan 140 terabytes of data. The international Large HadronCollider (LHC) effort for high-energy physics is anotherexample of big data, producing about 13 petabytes of data ina year (Brumfiel 2011).Smart Health and WellbeingMuch like the big data opportunities facing the e-commerceand S&T communities, the health community is facing atsunami of health- and healthcare-related content generatedfrom numerous patient care points of contact, sophisticatedmedical instruments, and web-based health communities.Two main sources of health big data are genomics-driven bigdata (genotyping, gene expression, sequencing data) andpayer–provider big data (electronic health records, insurancerecords, pharmacy prescription, patient feedback andresponses) (Miller 2012a). The expected raw sequencing datafrom each person is approximately four terabytes. From thepayer–provider side, a data matrix might have hundreds ofthousands of patients with many records and parameters(demographics, medications, outcomes) collected over a longperiod of time. Extracting knowledge from health big dataposes significant research and practical challenges, especiallyconsidering the HIPAA (Health Insurance Portability andAccountability Act) and IRB (Institutional Review Board)requirements for building a privacy-preserving and trustworthy health infrastructure and conducting ethical healthrelated research (Gelfand 2011/2012). Health big data analytics, in general, lags behind e-commerce BI&A applicationsbecause it has rarely taken advantage of scalable analyticalmethods or computational platforms (Miller 2012a).Over the past decade, electronic health records (EHR) havebeen widely adopted in hospitals and clinics worldwide.Significant clinical knowledge and a deeper understanding ofpatient disease patterns can be gleanded from such collections(Hanauer et al. 2009; Hanauer et al. 2011; Lin et al. 2011).Hanauer et al. (2011), for example, used large-scale, longitudinal EHR to research associations in medical diagnosesand consider temporal relations between events to betterelucidate patterns of disease progression. Lin et al. (2011)used symptom–disease–treatment (SDT) association rulemining on a comprehensive EHR of approximately 2.1million records from a major hospital. Based on selectedInternational Classification of Diseases (ICD-9) codes, theywere able to identify clinically relevant and accurate SDTassociations from patient records in seven distinct diseases,ranging from cancers to chronic and infectious diseases.In addition to EHR, health social media sites such as DailyStrength and PatientsLikeMe provide unique research opportunities in healthcare decision support and patient empowerment (Miller 2012b), especially for chronic diseases such asdiabetes, Parkinson’s, Alzheimer’s, and cancer. Associationrule mining and clustering, health social media monitoringand analysis, health text analytics, health ontologies, patientnetwork analysis, and adverse drug side-effect analysis arepromising areas of research in health-related BI&A. Due tothe importance of HIPAA regulations, privacy-preservinghealth data mining is also gaining attention (Gelfand 2011/2012).Partially funded by the National Institutes of Health (NIH),the NSF BIGDATA program solicitation includes commoninterests in big data across NSF and NIH. Clinical decisionmaking, patient-centered therapy, and knowledge bases forhealth, disease, genome, and environment are some of theareas in which BI&A techniques can contribute (Chen 2011b;Wactlar et al. 2011). Another recent, major NSF initiativerelated to health big data analytics is the NSF Smart Healthand Wellbeing (SHB)6 program, which seeks to addressfundamental technical and scientific issues that would supporta much-needed transformation of healthcare from reactive andhospital-centered to preventive, proactive, evidence-based,person-centered, and focused on wellbeing rather than diseasecontrol. The SHB research topics include sensor technology,networking, information and machine learning technology,modeling cognitive processes, system and process modeling,and social and economic issues (Wactlar et al. 2011), most ofwhich are relevant to healthcare BI&A.Security and Public SafetySince the tragic events of September 11, 2001, securityresearch has gained much attention, especially given theincreasing dependency of business and our global society ondigital enablement. Researchers in computational science,information systems, social sciences, engineering, medicine,and many other fields have been called upon to help enhanceour ability to fight violence, terrorism, cyber crimes, and othercyb

Keywords: Business intelligence and analytics, big data analytics, Web 2.0 Introduction Business intelligence and analytics (BI&A) and the related field of big data analytics have become increasingly important in both the academic and the business communities over the past two decades. Industry studies have highlighted this significant development.