Business Intelligence And Data Mining - Doc.lagout

Transcription

Big Data and Business AnalyticsMark Ferguson, EditorBusinessIntelligence andData MiningAnil K. Maheshwari, Ph.D.

Business Intelligenceand Data Mining

Business Intelligenceand Data MiningAnil K. Maheshwari, PhD

Business Intelligence and Data MiningCopyright Anil K. Maheshwari, PhD, 2015.All rights reserved. No part of this publication may be reproduced,stored in a retrieval system, or transmitted in any form or by anymeans—electronic, mechanical, photocopy, recording, or any otherexcept for brief quotations, not to exceed 400 words, without the priorpermission of the publisher.First published byBusiness Expert Press, LLC222 East 46th Street, New York, NY 10017www.businessexpertpress.comISBN-13: 978-1-63157-120-6 (print)ISBN-13: 978-1-63157-121-3 (e-book)eISSN: 2333-6757ISSN: 2333-6749Business Expert Press Big Data and Business Analytics Collection.Cover and interior design by S4Carlisle Publishing Services Private Ltd.,Chennai, India

Dedicated to my parents,Mr. Ratan Lal and Mrs. Meena Maheshwari.

AbstractBusiness is the act of doing something productive to serve someone’sneeds, and thus earn a living, and make the world a better place. Businessactivities are recorded on paper or using electronic media, and then theserecords become data. There is more data from customers’ responses andon the industry as a whole. All this data can be analyzed and mined usingspecial tools and techniques to generate patterns and intelligence, whichreflect how the business is functioning. These ideas can then be fed backinto the business so that it can evolve to become more effective and efficient in serving customer needs. And the cycle continues on.Business intelligence includes tools and techniques for data gathering, analysis, and visualization for helping with executive decision makingin any industry. Data mining includes statistical and machine-learningtechniques to build decision-making models from raw data. Data miningtechniques covered in this book include decision trees, regression, artificial neural networks, cluster analysis, and many more. Text mining, webmining, and big data are also covered in an easy way. A primer on datamodeling is included for those uninitiated in this topic.KeywordsData Analytics, Data Mining, Business Intelligence, Decision Trees,Regression, Neural Networks, Cluster analysis, Association rules.

ContentsAbstract.vPreface.xiiiChapter 1Wholeness of Business Intelligence and Data Mining.1Business Intelligence.2Pattern Recognition.3Data Processing Chain.6Organization of the Book.16Review Questions.17Section 1Chapter 2. 19Business Intelligence Concepts and Applications.21BI for Better Decisions.23Decision Types.23BI Tools.24BI Skills.26BI Applications .26Conclusion.34Review Questions.35Liberty Stores Case Exercise: Step 1.35Data Warehousing.37Design Considerations for DW.38DW Development Approaches.39DW Architecture.40Data Sources.40Data Loading Processes.41DW Design.41DW Access.42DW Best Practices.43Conclusion.43Chapter 3

xCONTENTSChapter 4Section 2Chapter 5Chapter 6Review Questions.43Liberty Stores Case Exercise: Step 2.44Data Mining .45Gathering and Selecting Data.47Data Cleansing and Preparation.48Outputs of Data Mining.49Evaluating Data Mining Results.50Data Mining Techniques.51Tools and Platforms for Data Mining.54Data Mining Best Practices.56Myths about Data Mining.57Data Mining Mistakes.58Conclusion.59Review Questions.60Liberty Stores Case Exercise: Step 3.60. 61Decision Trees.63Decision Tree Problem.64Decision Tree Construction .66Lessons from Constructing Trees.71Decision Tree Algorithms.72Conclusion.75Review Questions .75Liberty Stores Case Exercise: Step 4.76Regression.77Correlations and Relationships.78Visual Look at Relationships.79Regression Exercise.80Nonlinear Regression Exercise.83Logistic Regression.85Advantages and Disadvantages of Regression Models .86Conclusion.88Review Exercises.88Liberty Stores Case Exercise: Step 5.89

CONTENTS xiChapter 7Chapter 8Chapter 9Artificial Neural Networks.91Business Applications of ANN.92Design Principles of an ANN.93Representation of a Neural Network .95Architecting a Neural Network.95Developing an ANN.96Advantages and Disadvantages of Using ANNs.97Conclusion.98Review Exercises.98Cluster Analysis .99Applications of Cluster Analysis.100Definition of a Cluster.101Representing Clusters.102Clustering Techniques.102Clustering Exercise.103K-Means Algorithm for Clustering.106Selecting the Number of Clusters .109Advantages and Disadvantages of K-MeansAlgorithm.110Conclusion.111Review Exercises.111Liberty Stores Case Exercise: Step 6.112Association Rule Mining .113Business Applications of Association Rules .114Representing Association Rules.115Algorithms for Association Rule.115Apriori Algorithm.116Association Rules Exercise.116Creating Association Rules.119Conclusion.120Review Exercises.120Liberty Stores Case Exercise: Step 7 .121

xiiBUSINESS INTELLIGENCE AND DATA MININGSection 3 . 123Chapter 10 Text Mining.125Text Mining Applications.126Text Mining Process.128Mining the TDM.130Comparing Text Mining and Data Mining.131Text Mining Best Practices.132Conclusion.133Review Questions.133Liberty Stores Case Exercise: Step 8.134Chapter 11 Web Mining.135Web Content Mining.136Web Structure Mining.136Web Usage Mining.137Web Mining Algorithms.138Conclusion.139Review Questions.139Chapter 12 Big Data.141Defining Big Data.142Big Data Landscape.145Business Implications of Big Data.145Technology Implications of Big Data.146Big Data Technologies.146Management of Big Data .148Conclusion.149Review Questions.149Chapter 13 Data Modeling Primer.151Evolution of Data Management Systems.152Relational Data Model.153Implementing the Relational Data Model.155Database Management Systems.156Conclusion.156Review Questions.156Additional Resources.157Index.159

PrefaceThere are many good textbooks in the market on Business Intelligenceand Data Mining. So, why should anyone write another book on thistopic? I have been teaching courses in business intelligence and datamining for a few years. More recently, I have been teaching this courseto combined classes of MBA and Computer Science students. Existingtextbooks seem too long, too technical, and too complex for use by students. This book fills a need for an accessible book on the topic of business intelligence and data mining. My goal was to write a conversationalbook that feels easy and informative. This is an easy book that coverseverything important, with concrete examples, and invites the reader tojoin this field.This book has developed from my own class notes. It reflects manyyears of IT industry experience, as well as many years of academic teaching experience. The chapters are organized for a typical one-semestergraduate course. The book contains caselets from real-world stories at thebeginning of every chapter. There is a running case study across the chapters as exercises.Many thanks are in order. My father Mr. Ratan Lal Maheshwariencouraged me to put my thoughts in writing and make a book out ofthem. My wife Neerja helped me find the time and motivation to writethis book. My brother, Dr. Sunil Maheshwari, and I have had many yearsof encouraging conversations about it. My colleague Dr. Edi Shivaji provided help and advice during my teaching the BIDM courses. Anothercolleague Dr. Scott Herriott served as a role mod

Data mining includes statistical and machine-learning techniques to build decision-making models from raw data. Data mining techniques covered in this book include decision trees, regression, artifi-cial neural networks, cluster analysis, and many more. Text mining, web mining, and big data are also covered in an easy way. A primer on data modeling is included for those uninitiated in this .