Implementing Business Intelligence System - Case Study

Transcription

Database Systems Journal vol. VII, no. 1/201635Implementing Business Intelligence System - Case StudyYasser AL-HADAD1, Răzvan Daniel ZOTA2Faculty of Cybernetics, Statistics and Economic Informatics, Department of EconomicInformatics and Cybernetics, Bucharest University of Economic Studies, Romania1dukeyasser10@yahoo.com, 2zota@ase.roAbstract: Understanding and analysis data is essential for making decision within a system.Any analytical tasks can be implemented directly by the transactional system but it becomesmore difficult as the transactional system grows. Analytical systems and their extensionappear as a solution for complex and large datasets. We think that it's time for mediumcompanies to get the benefit from such systems as analytical systems become more variantand in hand for every possible user. In this paper, we propose an architecture of analyticalsystem that can adapt and integrate with existent transactional system of timber exportcompany. The proposed analytical system should have the ability of implementing the tasksrequired by the decision makers of the system. Also, we try to explore the ability of SQLserver of implementing our proposed architecture.Keywords: BI(Business intelligence) , DSS(Decision support systems), DW(Datawarehouse), OLAP(Online Analytical Processing), ETL(Extract, transform and load),SAS(SQL server services)IntroductionBusiness Intelligence (BI) is highbusiness application tools [1] used forcollecting, cleansing, processing andanalyzingdata.evaluatingandunderstanding the results is essential forthe efficiency of decisional system.Created knowledge are collected fromboth internal and external sources [2].and accumulation them can lead toimprove corporate profitability [3]. BItools becomes more variant and easy touse [4] and their solutions are ranked asone of most important technologicalstuffs by chief information officers [5].Investments in BI focuses on achievingbusiness targeted and increase return oninvestments [6]. So BI is an entireconcept that is used for implementing adecision support system (DSS). DSS canbe described as the next generationsystems that follow transactional andoperational systems [7]. Collected dataand maybe the resulted information arestored in data warehouse (DW). It is oneof the basic component of DSS and BIsystems. It includes all spread data overthe organization in single pool [8]. So the1existing of DW was necessary as thetransactional databases were unable to storethe accumulated historical data [9]. DW canbe accessed by proper data-analyzing toanalyze and store required data [10,11]. DWhelp to achieve strategic business objectivesby offering clean and homogeneous data inreal time to support the analytical processes[12,13]. there are many architecture that aresuit the design and implementation of DW[10,14,15]. The common idea is to load thepropagated data from operational DBMS(database management system) and othersources in DW using a special tools calledETL (extract, transform and load). ETL isthe most usable tool for integrating data intodata warehouse. Many architectures areproposed for implementing ETL. It was usedto integrate the entire data source in [16]even if it is too big. In [17], the data areprocessed in local repositories beforeintegrating them in the global datawarehouse. The difference between ETL andELT (extract, load and transform) ispresented at [18]. In [19], an arhitecture isproposed for integrating and validation datafrom multiple data sources. Other approachis presented in [20] that improves the

36performance and interacts more withusers. At [21], the metadata of ETL isdiscussed to enable the interoperabilitybetween the systems. The incompletedata are treated as grey data and specialprocedures are proposed to deal withthem [22]. Data integration includes datafrom a single or multiple sources. Theproblems of data integration frommultiple sources are overlapping data andmatching records belongs to same entity.The challenges of data integration fromsingle source using backup data files issimilar to data integration from multiplesources. Combining integration step withcleansing step can increase theperformance. on the other hand, thesetransformation lose the legacy data form.Legacy data is important for reevaluationof data cleansing process or backflow ofcleaned data.Timely decision making becomesdifficult due to the inefficiency oftransactional databases to handle theamount of information access, retrieval,update and maintenance. This shortnessimpact every industry [12]. OLAP(Online Analytical Processing) appears asone of solutions for resolving problemsof operational database. Upon the OLAP,many research is built on data cubeoperator [23]. Also various algorithms forsupporting dimension hierarchies [24].OLAP is a technology that offers users toperform analysis on detailed data hforimplementing the analytical database.Analytical database analyze complexrelation between millions of records foridentifying trends and patterns. It storesstatic data type such as derived andcalculated data for providing them in realtime whenever there are requested bymanagers. The perspective of data cubeand multidimensional database wasadopted by many corporations such asMicrosoft [25]. Now, SSAS(SQL ServerAnalysis Services) is the leader forbusiness logic analysis in services andImplementing Business Intelligence System - Case Studysoftware. Its integrated platform is one oflargest framework that implement theconcept of BI [8]. SAS solution provides fullend-to-end technology. Also for data qualityand data access, it ensures that largevolumes of data are processed andtransformedintoaccurateanalysisinformation and reporting. One of theadvantages of SAS for many users arereporting and graphical representations. Itoffers dynamic views of information, thatcan be displayed in many forms such as atable, a report and a chart. The mainrequirement of potential users of Businessintelligence solution is to perform analysisof data and testing the results. SASEnterprise BI Server provides many tools foranalysis and forecasting, which is essentialto solving problems in order to make thecompany more competitive. One of theadvantages of SAS Enterprise BI Server isthe features of web portal which providessome functionalities that enable users tomanipulate the contents and layout of theinterface. Also it is offers features Webbased distribution and reporting. Users canmanage and build reports using custom andexporting them to EXCEL or PDF files.There are functionalities that allows users toobserve the results and patterns that cannotbe observed in a traditional chart using moreinteractive methods such as 3D videos andtables with bubbles containing datapresentations. SAS product offers anintegrated platform that store data andreports. Also its predefined functions allowthe developers of IT departments to focus onother tasks such as security of BI,implementation and maintenance.The objective of the application is to build adecision support system (DSS) for inventorymanagement. The system will beimplemented for a medium level companythat exporting timber. The objective of theproject is to find an appropriate solutions forthe firm size and to be adaptable for thebusiness field and decision makerrequirements. It is expected that theproposed system will provide timelyrelevant information to meet the needs of

Database Systems Journal vol. VII, no. 1/2016decision makers so they can respondquickly to market fluctuations throughreadapt policies and strategies.2 Development methodologiesProject development is a process thatconsisting of several different stages.Each stage has its own requirements andtargets. Depending on the project type,certain stages gain additional attention inthe overall effort. There are manymethodologies types that can be used forprojectdevelopment.Softwaredevelopment methodologies can bedefined as set of guidelines and rules thatare used in the process of each stage.Each of these methodologies has itsCharacteristics, strengths and weaknesses37[26]. For our case, we have used Prototypingmethodology, see figure 1. Prototyping is amethodology that entails building aprototype or demo version of the y. Demo version includessufficient information to build a prototypeand should be built fast. It is used to refinespecifications and it acts as baseline forentire project. it makes the communicationprocess better between project owner andproject team [27]. The main characteristicof prototype methodology is that projectowner and users is actively involved byevaluating prototypes that are valued overwriting specifications and meant to bediscarded later.Fig. 1. Prototyping methodologyDemo versions can contain one or moreprototypes of the initial models of thesoftware product. In the other hand,Strengths points of this methodology iscrucial for good implementation ofsoftware project. It involved the potentialusers of the system in process ofimplementation and improve theirexperience. Also the early feedback fromusers and the project owner help to earlyidentification of any redundant or missingso an accurate identification of softwarerequirements is guaranteed. The mainWeaknessesoftheprototypemethodology is the costs generated byincreased programming effort due tobuilding the prototype, but thisWeaknesses can be ignored because SASplatform that is used for implementing ourproject guaranteed that the programmingeffort by using SAS predefined functionsand layouts will be reduced so the process ofbuilding any prototype will be easier.3 Staging methodStaging method removes the performanceoverhead involved in cross-database joins bycreating staging tables in the warehousedatabase. Staging method extracts data fromthe sources and loaded them into datawarehouse without any transformation. Itjust makes another copy of legacy datacontent and loading them into analyticalsystem in order to make them available for

Implementing Business Intelligence System - Case Study38any transformation processes that can beperformed later. Obvious methodcombines both extract and transformsteps in the same flow. It is more efficientand no need to store intermediate results.For an mid-level enterprise, the entiresize of data is not too large to generateany performance difficulties, so theprocessing efficiency of the system is notthe most important factor of the system. Inour case, the cost of developing andmaintenance is much important. as it is seenin figure 2, the flow of staging method isdivided in many small steps in order to easeand simply the process of analyzing andimplementation. Using obvious method, dataare processed directly and loaded in theanalytical database or OLAP.Fig. 2. DSS framework components4 Case studyIn the following section, we are going topropose DSS for our case study. Therequirement of system will be analyzeaccording to standpoint of intermediatedealer (quality, production capacity,price, payment conditions , etc). So thereis no need to analyze the process of timberproduction as the modelled system canvalidate itself by using data in thetransactional database where all the specificand phenomenon of the this field businessare hidden in the data. A database for antimber export company with annual turnover

Database Systems Journal vol. VII, no. 1/201639of over two million euro will be used forvalidation the proposed DSS.possibility of exporting its data usingportable document format (PDF).4.1 Exiting transactional systemAs the volume of data and activitiesincrease, more challenges appear withrespect to process and manipulatecollected data. Existing informationalsystem is designed to meet therequirement of transferring big part ofprocessing data burden to an centralizedsystem for operating the firm activities. Itreplaces the older system which werecompose by many uncorrelated systemsand procedures. That limitation waspreventing the organization to expandtheir activity. Existing system hasproposed to meet the requirement ofcompany. Standardized solution isadopted for such modern system, butother special solutions that adapt somespecific requirements.4.3 Instruments of proposed systemSSAS(SQL Server Analysis service) isadopted for implementation the proposedanalysis application. It offers the mainmethods and algorithms for analysispurposes. The friendly user interfacefacilitate the use of its services. Also thewide range of graphical representation fordata and information that are available intheir services makes viewing andunderstanding the resulted data more easier.The step of building the model should befollowed by data injection step. All theresource data are available in the MySQLserver that are not accessible directly for ananalysis model. Any input data should belocating in the engine database SQL server.MyODBC driver connecter is used to makethe data in MYSQL server accessible forintegrating them in SQL server. ODBC is astandardized API (application-programminginterface) that enable the client-sideapplication to connect to one or multipledatabases. MyODBC driver is a member ofMySQL ODBC. It provide access toMySQL database throw the standard ODBC.It offers standard interface using drivermanager based and a native interfaces. ForUnix and Mac operation system, nativeMySQL network can be used tocommunicate with MySQL database.Installing of MySQL connector ODBC isnecessary for Windows and application thatuse the ODBC interface. Figure 3 shows theinstruments are used for implementing theapplication as following: MySQL Server: represents the platformthat hosts the current transactionaldatabase. It also used to reload thearchived data into database in order tosend them to SQL server platform; MyODBCconnecter: offers thepossibility of accessing MySQLdatabases directly using SSIS (SQLserver integrating services). Data andits structure can be viewed andimported in the SQL server. So there is4.2 Technical equipmentsTransactional system is implementedusing PHP scripting language, webinterface is running by apache server andMySQL server for database. XAMPPserver is used to run the application anddatabase. The version XAMPP 1.6.3a iskept up to now to guarantee the workingof all facilities of the system. XAMPPserver database and application areinstalled at every unit to enable operatingthe system at offline mode so the server isinstalled at every unit. It should beconfigured properly at every unit toassure that is use an un busy socket andan existing hard disk partition and theplatform has the full access to use thepartition. In order to operate the system atthe offline mode, users has tosynchronize their local database usingdefined procedure for updating database.Synchronizing database is performing viaarchived files that include the latest data.For supporting the partners of corporationand agents who don't have access toinformational system, system offers the

Implementing Business Intelligence System - Case Study40 no need to export any standardformat to be imported indestination;SSIS (SQL server integrationservices): offers a wide range ofcomponents that can be used forinterrogating and integrating thedata. The graphical user interfacemakes it easy to build and configurean integrating services project. Thistypes of projects can deal withmany external data resources;SQL server database engine: Thedata destination of any integratingservices project is in SQL database engine. It is important to load allnecessary data of analyzing process tomake it accessible to OLAP processand analysis services;SSAS (SQL server analysis services):design prediction or classificationmodels for the entities available in thedatabase for supporting decisionsystem or any simulation system.Furthermore, OLAP process cancontribute in implementing analyticaldatabase for generating complexreports and preparing data for SSASdata initialization;Fig. 3. DSS framework tools4.4 Application componentsAny analysis model can be configureddirectly from structures interrogating in theintegrating services projects. Obviousmethod combines both extract andtransform steps in the same flow and loadthe resulted data in the final destinationmodel. It can create some dataredundancies and replicate some commonfunctionalities. Top-down methodologyinvolves breaking down of a system to getinsight into its compositional sub-systems.

Database Systems Journal vol. VII, no. 1/2016It assure that any additional functionalitiescan be added with less effort. Stagingmethod avoids any possibility ofreplicating common functionalities byusing some intermediary steps. figure 2shows the difference between the stagingand obvious method. The components ofstaging methods as it seen in the figure 2can be summarized as follow: Extraxt1 & load1: includes copyingdata in the intermediary databasewithout change its structures orvalues; Extract2 & trasform2 & load2:extract any new data instances instaging database to load it in the datawarehouse. This stage can includedata transformation to adapt the datato the requirements of datawarehouse; Extract3 & transform3 & load3:Extracted data from data warehouseare filtered and transformed to meetthe requirement of analysis system;4.5 Data source integrationBackup data file represents a copy ofdatabase. It is generated in certain time andhas the same data as operational database(ODB) at that period. This content alsoincludes erroneous data that exists at ODB.These type of files can be used in case ofinexistent a system for data archiving andenable recovering data in case of anysystem failure. OTLP(online transactionprocessing) or ODB is a database that isdesigned and normalized to avoidredundant data and facilitate the operations(insertion, deletion and updating) capturedfrom transactional system, it is used tostore data using the relational databasetechnology to ensure the integrity. Itdedicates to serve the transactional systemsso it just include the current activitywithout historical data.41Fig. 4. Recovering historical data usingbackup data filesfigure 4 shows components used forbuilding data warehouse for our case study.OBD and three backup data file. It showsdata content for every source and itsperiod. The grey portion indicates that datais partial and not complete. Older backupfile is used for recovering data of the greyzone. The example presented in the figure4 shows that the entire historical data can'tbe recovered so it remains some periodswhere data is not complete. Someprocedures should be taken for dealingwith this situation. ETL is used to updatethe data warehouse by capturing changesdata from transactional system and loadingthem in data warehouse. The process ofelimination method is used in SSIS by thecomponent slowly changing dimension(SCD) that is shown in figure 3. It isdealing with insert and update commandsproperly but it does not process the deletecommand so all the deleted data capturedfrom backup data files are migrated to datawarehouse. Most of deleted recordsrepresent historical data and should be keptin data warehouse. The deleted records canbe classified as following: Missed historical data: are valid datathat should be kept in the datawarehouse to serve the analyticalpurposes. Missed historical datarepresent the instances of historicaldata that cannot be recovered or notcompleted; Cancelled data: present removed datafrom the transactional system thatshould be identified and isolated.These records were removed fromthe transactional system because of

Implementing Business Intelligence System - Case Study42correcting users mistakes, duplicateinsert, eliminate transaction such asan cancelled selling transaction. Wecan mention to these records ascancelled data;4.6 Data quality problemLast step of designing data warehouse is todefine the rules used for data validation.Many Erroneous data can be revealed atstage of data integration. Determining thereason of appearing erroneous data canease the task of cleansing them. Byanalytical point of view, a classification oferroneous data is proposed as following: Platform error: any erroneous datagenerated by system withoutintervention of users can beconsidered as system error, thereason of existing such theseerroneous data is due to platformreliability; System error: error generated bytransactional system due to problemsof modelling or programming thesystem; Users mistakes: mistakes that aregenerated and not affect thetransactional application. such thesemistakes should be identified causethey can affected the analyticaldatabase. An example of these errorsis reversing the dimensions values oflength and width. This reversing isnot affect the total quantity ofmaterials but it can affect anyclustering or classification method; Cancelled data: mistakes records thatregistered in system and removedlater, the gap time between insertedand deleted operation enable to someof these records to be captured andinserted in data warehouse at theintegration data step;Some validation data rules have beendefined to capture and isolate these data.The validation rules are classifies in twocategories, unconditional and conditionalvalidation rule. Unconditional validationrule: where there are two related table,validation of the first table is dependent onthe second table. Unconditional validationmeans that no need of pre validation forthe second table. We mentioned the secondtable as validation table. In caseconditional validation rule, the data ofvalidation entity should be verified andapproved before using it in validationanother entity. This type of rule requiressome pre-validation. Different rules requiredifferent pre-validation steps. Thus, a flowthat shows dependencies between rules isimportant to indicate the order in whichthese rules should be executed.5 ConclusionsBusiness Intelligence is concept thatinclude a set of techniques and methodsthat aim to configure high level tool thatserved the analytical purposes in order tosupport the decision maker. Tools ofbusiness intelligence become more variantand in hand for every possible user. So it istime for medium companies to get thebenefit from such solution. Mediumcompanies have a large data amount thatneed to be analyzed but it cannot invest alot. At this paper, we try to build a decisionsupport system tailored for timber exportcompany. Prototyping methodology is usedas it suitable for innovative projects whereno previous examples of this type exist.There are many varies architectures fordevelopment DSS. Our proposed system isdivided into small and isolated tasks orcomponents. This enable every task to besmaller and specific but the system is moreflexible for maintenance and evolve. SQLserver services (SAS) is one of largestcompanies that also offers BI solutionsaddressed to medium companies. SAS has abunch of predefined functions and layoutsthat decrease programming effort, soimplementing the prototyping methodologybecomes easier. Building data warehouserequires integration and validating datafrom different sources. This task includesseveral steps or processes such as extact,transform, filtering, cleaning and load. Datacleansing is one of most difficult tasks.

Database Systems Journal vol. VII, no. 1/2016Different approaches and techniques areproposed for validation and cleansing datafrom single or multiple sources.In this paper, It was solved the problem ofintegration from single source but by usingbackup data files and not archiving system.The accuracy of integrated data can not beguareented 100% especially when thelegacy data is not complete. The purposeof data warehouse is to serve applicationsdedicates to solve analytical problem.Thus, it is important to adopt solution thatbalanced between the quality and cost.Automat solution is prefered in datacleansing than manual working. In our casestude, we apply some tehniques forfiltering and cleaning the historical data forbackup data files problem.AcknowledgmentPart of this work is done under theauspices of the doctoral studies within theDoctoral School of Economic Informatics,Bucharest University of Economic Studies.References[1] A. Sohollo, “Using BusinessIntelligence in IT Governance DecisionMaking,”inGovernanceandSustainability in IS, M. Nüttgens, A.Gadatsch, K. Kautz, I. Schirmer and N.Blinn, Eds. Germany: Springer, IFIPAICT366,IFIPInternationalFederation for Information Processing2011, Part 1, pp. 3–15.[2] J.A. O’Brien and G.M. Marakas,Management Information Systems,10th ed. New York, U.S.A.: TheMcGraw-Hill Companies, 2011.[3] R.L. Sallam, J. Richardson, J. Hagertyand B. Hostmann. (January 2011). MagicQuadrant for Business om/download/press/EN/Gartner BI MagicQuadrant 2011.pdf[4] Gartner. (September 2012). BusinessIntelligence, Mobile and Cloud Top theTechnology Priority List for CIOs inAsia: Gartner Executive it/page.jsp?id 2159315.[5] Gartner. (April 2012a). Gartner ormanceManagementSoftwareMarketSurpassed the 12 Billion Mark p?id 1971516.[6] H.J. Watson, C. Fuller and T.Ariyachandra,“Datawarehousegovernance: best practices at BlueCross and Blue Shield of NorthCarolina,” Decision Support Systems,Vol. 38(3), pp. 435-450, 2004.[7] T. Ariyachandra, H. J. Watson, “Keyorganizationalfactorsindatawarehouse architecture selection”,Decision Support Systems 49, 2010,200–212.[8] T. R. Sahama, P. R. Croll, “A DataWarehouse Architecture for ClinicalData Warehousing”, in Roddick, J. F.and Warren, J. R., Eds. ProceedingsAustralasian Workshop on HealthKnowledgeManagementandDiscovery (HKMD 2007) CRPIT, 68,pages pp. 227-232, Ballarat, Victoria.[9] B.A. Devlin, P.T. Murphy, “Anarchitecture for a business andinformation system,” IBM SystemsJournal 27 (1) (1988) 60 – 80.[10] W.H. Inmon., “DW 2.0 Architecturefor the Next Generation of DataWarehousing”, DM Review, Apr 2006,Vol. 16 Issue 4, p.8-25.[11] W.H. Inmon, “Building the DataWarehouse”, Third Edition, York: JohnWiley & Sons, 2002.[12] H.-G. Hwang, C.-Y. Ku, D. Yen andC.-C.Cheng,“Criticalfactorsinfluencing the adoption ofdatawarehouse technology: a study of thebanking industry in Taiwan,” DecisionSupport Systems 37.1 (2004): 1-21.[13] Nilakanta, Sree, K. Scheibe, and A.Rai.“Dimensionalissuesinagricultural data warehouse designs,”

44Computersandelectronicsinagriculture 60.2 (2008): 263-278.[14] W.H. Inmon, “Building the DataWarehouse,” Wiley, New York, 1996.[15] R. Kimball, “The Data WarehouseToolkit”, Wiley, New York, 1996.[16] C. BOJA, A. POCOVNICU and Data",Informatica Economică, Vol. 16, 2012.[17] Q. Chen, U. Dayal and M. Hsu, ADistributed OLAP Infrastructure for ECommerce, HP Laboratories, PaloAlto, California, USA.[18] R. Davenport, ETL vs ELT, InsourceIT Consultancy, Insource DataAcademy, June 2008.[19] E. Rahm and H. H. Do, DataCleaning: Problems and CurrentApproaches, IEEE Data EngineeringBulletin, Vol. 23, 2000.[20] V. Raman and J. M. Hellerstein,Potter's Wheel: An Interactive DataCleaning System, 2001.[21] H. H. Do and E. Rahm, On MetadataInteroperability in Data Warehouse,Techn. Report, Dept. of iles/Rahm2000DataCleaningProblemsand.pdf [May 3, 2016].[22] S. F. Liu and Y. Lin, 2010, GreyInformation. Theory and practicalImplementing Business Intelligence System - Case StudyApplications, Springer-Verlag, London,2010.[23] L. Cabibbo and R. Torlone, Queryingmultidimensionaldatabases,Proceedings of the 6th DBLPWorkshop, pages 253–269, 1997.[24] F. Dehne, T. Eavis, S. Hambrusch andA. Rau-Chaplin, Parallelizing thedatacube, International Conference onDatabase Theory, 2001.[25] C. Diaconu, C. Freedman, E. Ismert,P.-A. Larson, P. Mittal and R.Stonecipher, N. Verma, and M.Zwilling. Hekaton, SQL server’smemory-optimizedOLTPengine,Proceedings of the 2013 ACMSIGMOD International Conference onManagement of Data, SIGMOD ’13,pages 1243–1254, New York, NY,USA, ACM, 2013.[26] M. L. DESPA, Comparative study onsoftware development methodologies,Database Systems Journal vol. V, no.3/2014.[27] J. E. Cooling, T. S. Hughes, “Theemergence of rapid prototyping as arealtime software development tool”,ProceedingsoftheSecondInternational Conference on SoftwareEngineering for Real Time Systems,18-20 Sep. 1989, Cirencester, UK,Publisher: IET, 1989, pg. 60-64.Yasser AL-HADAD has graduated the Faculty of Information Management at the RomanianAmerican University in 2008. He received a master’s degree in Economic Informatics fromRomanian American University in 2010. Since then he is a PhD candidate, studying to obtainhis PhD in the field of economic informatics.Răzvan Daniel ZOTA has graduated the Faculty of Mathematics – Computer ScienceSection at the University of Bucharest in 1992. He has also a Bachelor degree in Economics, apostgraduate degree in Management from SNSPA Bucharest, Romania. In 2000 he hasreceived the PhD title from the Academy of Economic Studies in the field of Cybernetics andEconomic Informatics. From 2010 he is supervising PhD thesis in the field of EconomicInformatics.

Implementing Business Intelligence System - Case Study Yasser AL-HADAD1, Răzvan Daniel ZOTA2 Faculty of Cybernetics, Statistics and Economic Informatics, Department of Economic Informatics and Cybernetics, Bucharest University of Economic Studie