An Overview Of Data Mining Algorithms And Business Intelligence Tools

Transcription

Journal of Information and Computational ScienceISSN: 1548-7741An overview of Data Mining Algorithms and Businessintelligence toolsDr.R.Bulli Babu1, Mrs.P.Anitha Rani, Mr.A.Siva Sankar 3Professor, Department of CSE, St.Marys Group of Institutions Guntur.Assistant Professor, Department of CSE, St.Marys Group of Institutions Guntur.Assistant Professor, Department of CSE, St.Marys Group of Institutions Guntur.drbullibabur@gmail.com, anitha.palakayala@gmail.com, shankars4all@gmail.comABSTRACT:The primary aim of the proposed paper is to show how different data mining tools andalgorithms can be utilized in an industrial enterprise environment for enhancing the overallperformance of the business. This paper especially specializes in equipment tools followed inan industry-related commercial enterprise intelligence improvement. The proposedalgorithms are appropriate for the improvement of the information of a commercial-orientedon strategic marketing. Data mining algorithms can predict income and internet miningalgorithms are beneficial for social trend analyses. Logistic algorithms are useful for thecontrol the information related to products' future sales. Weka, Rapid Miner and KNIMEtools are beneficial for predictive mining. Finally, this paper gift a brand new version for Etrade income neural network forecasting primarily based on multi-characteristic processing.This version can produce facts of the other data mining outputs assisting logistic moves. Thisversion gives an explanation for how it's far viable to consist of unique facts miningalgorithms into a unique prototypal statistics machine linked to a massive data, and how itmay work on real business intelligence.Keywords: Business intelligence, machine learning, regression, pre-processing1. INTRODUCTIONOver the last decade, advances in computing power and speed have enabled us tomanoeuvre on the far side manual, tedious and long practices to fast, straightforward andautomatic knowledge analysis. Business Intelligent tools [1-4] are playing a vital role in thedecision-making process of the business organization. The additional advanced the info setscollected, the additional potential there's to uncover relevant insights. Retailers,banks, makers,telecommunications suppliers,andinsurers,amongothers, are victimization dataprocessing toget relationshipsamongeverythingfrom worth improvement, promotions, and demographics to however the economy, risk,competition, and social media are touching their business models, revenues, operations,and client relationships. Data mining is looking for hidden, valid, and all the possible usefulpatterns in large size data sets. Data Mining is a technique that helps you to discoverunsuspected/undiscovered relationships amongst the data for business gains. Data miningplays a key role in identifying anomalies, similarities, and relationships between data set toassess the output results. By using a wide range of tools and techniques we can use this dataVolume 9 Issue 10 - 2019533www.joics.org

Journal of Information and Computational ScienceISSN: 1548-7741to develop the income, reduce the cost improve customer satisfaction and reduce theproblems.2. IMPORTANCE OF DATA MININGThe process of digging through data to discover hidden connections and predict future trendshas a long [history. Sometimes referred to as "knowledge discovery in databases," the term"data mining" wasn’t coined until the 1990s [5]. But its foundation comprises threeintertwined scientific disciplines: statistics (the numeric study of data relationships), artificialintelligence (human-like intelligence displayed by software and/or machines) and machinelearning (algorithms that can learn from data to make predictions). What was old is newagain, as data mining technology keeps evolving to keep pace with the limitless potentialof big data and affordable computing power [6].The following diagram shows the phases involved in data mining. Initially, data is collectedfrom differentSources of the organization when it is located in different places. Since they belong todifferent kinds of data formats initially some pre-processing techniques are applied to data inorder to remove inconsistencies like noisy and outliers. After completion of pre-processingthe information will be stored at a data warehouse that can store organizational data ofseveral years in multidimensional form. Now data mining techniques can be applied to a datawarehouse to perform analysis which is useful for the decision-making system to make theright decisions at the right time so that the organization will grow in a competitive businessenvironment.Fig 1: Phases in Data Mining AnalysisVolume 9 Issue 10 - 2019534www.joics.org

Journal of Information and Computational ScienceISSN: 1548-77413. DATA MINING ALGORITHMSData mining is known as an interdisciplinary subfield of computer science and basically is acomputing process of discovering patterns in large data sets. It is considered as an essentialprocess where intelligent methods are applied in order to extract data patterns [7].Given below is a list of Top Data Mining Algorithms:Fig 2: Algorithms in Data Mininga) C4.5:C4.5 is an algorithm that is used to generate a classifier in the form of a decision tree and hasbeen developed by Ross Quinlan. And in order to do the same, C4.5 is given a set of data thatrepresents things that have already been classified.C4.5 that is often referred to as a statistical classifier is basically an extension of Quinlan'sID3 algorithm. The decision trees that are generated by C4.5 can be further used forclassification. The C4.5 algorithm has also been described as "a landmark decision treeprogram that is probably the machine learning workhorse most widely used in practice todate" by the authors of the Weka machine learning software [8].b) K-means:K-means clustering [9] that is also known as the nearest centroid classifier or The Rocchioalgorithm is a method of vector quantization that is considerably popular for cluster analysisin data mining. This algorithm is mainly useful for creating object groups in which objects ofthe same group have similar properties and objects of different groups have dissimilarproperties. It is one of the important Data Mining techniques in cluster analysis to identifysimilar data sets.Volume 9 Issue 10 - 2019535www.joics.org

Journal of Information and Computational ScienceISSN: 1548-7741c) Support vector machines:When it comes to machine learning, support vector machines that are also known as supportvector networks are basically supervised learning models that come with associated learningalgorithms which then analyze data that are used for the analysis of regression andclassification. An SVM model is created that is a representation of the examples as points inspace, which are further mapped so that the examples of the separate categories.d) Apriori:Apriori is an algorithm that is used for frequent itemset mining and association rule learningoverall transactional databases. The algorithm is proceeded by the identification of theindividual items that are frequent in the database and then extending them to larger item setsas long as they sufficiently exist in the database. The main aim of the Apriori is to identifythe general trends and association rules that exist among data sets.e) EM (Expectation-Maximization):An expectation-maximization (EM) algorithm, when it comes to statistics is an iterativemethod that is used to find maximum a posteriori (MAP) or maximum likelihood estimates ofparameters in statistical models that basically depend on unobserved latent variables.f) PageRank(PR):PageRank (PR) that was named after Larry Page who is one of the founders of Google is analgorithm that is used by Google Search to rank the websites in their search engine results.Page Rank, which is the first algorithm that was used by the company, is not the onlyalgorithm that is being used by Google to order search engine results.4. DATA MINING BUSINESS INTELLIGENCE TOOLSThere is a high need for using business intelligence tools to handle and maintain alltransactional data which is increasing exponentially day by day. However, most of the data isunstructured and hence it takes a process and method to extract useful information from thedata and transform it into an understandable and usable form. The new technologies artificialintelligence and machine learning make use of Data mining tools to extract required datafrom a large amount of data for the decision-making process.a)Rapid Miner :Rapid Miner is a data science software platform that provides an integrated environment fordata preparation, machine learning, deep learning, text mining, and predictive analysis. Theprogram is written entirely in Java programming language. The program provides an optionto try around with a huge number of arbitrarily testable operators which are detailed in XMLfiles and are made with graphical user interference of rapid miner.Volume 9 Issue 10 - 2019536www.joics.org

Journal of Information and Computational ScienceISSN: 1548-7741Fig 3: Design view main screen of Rapid Minerb)Oracle Data MiningIt is a representative of Oracle’s Advanced Analytics Database. Market-leading companiesuse it to maximize the potential of their data to make accurate predictions. The system workswith a powerful data algorithm to target the best customers. Also, it identifies both anomaliesand cross-selling opportunities and enables users to apply a different predictive model basedon their needs. Further, it customizes customer profiles in the desired way.Volume 9 Issue 10 - 2019537www.joics.org

Journal of Information and Computational ScienceISSN: 1548-7741Fig 4 : GUI tool Oracle Data Minerc) KNIMEIn this, we can deploy, scale and familiarize data within less than no time. In the businessintelligence world, KNIME is known as the platform that helps to make predictiveintelligence accessible to inexperienced users. Moreover, the data-driven innovation systemhelps uncover data potential. Also, it includes more than thousands of modules and ready-touse examples and an array of integrated tools and algorithms.Fig 5: KNIME Analytics PlatformVolume 9 Issue 10 - 2019538www.joics.org

Journal of Information and Computational ScienceISSN: 1548-7741d) OrangeOrange is an open-source data visualization, machine learning, and data mining toolkit. Itfeatures a visual programming front-end for exploratory data analysis and interactivedata visualization. Orange is a component-based visual programming software packagefor data visualization, machine learning, data mining, and data analysis. Orange componentsare called widgets and they range from simple data visualization, subset selection and preprocessing, to evaluation of learning algorithms and predictive modeling. Visualprogramming in orange is performed through an interface in which workflows are created bylinking predefined or user-designed widgets, while advanced users can use Orange as aPython library for data manipulation and widget alteration.Fig 6: Design model of Orange Toole) RattleRattle GUI is an open and free software package providing a graphical user interface for datamining using R statistical programming language provided by Toga ware. Rattle providesconsiderable data mining functionality by exposing the power of the R through a graphicaluser interface. The rattle is also used as a teaching facility to learn the R. There is an optioncalled as Log Code tab, which replicates the R code for any activity undertaken in the GUI,which can be copied and pasted. Rattle allows for the dataset to be partitioned into training,validation, and testing. The dataset can be viewed and edited.Volume 9 Issue 10 - 2019539www.joics.org

Journal of Information and Computational ScienceISSN: 1548-7741Fig 7: Interface design view of Rattle Toolf) WekaWaikato Environment for Knowledge Analysis (Weka) is a suite of machine learningsoftware developed at the University of Waikato, New Zealand. The program is written inJava. It contains a collection of visualization tools and algorithms for data analysis andpredictive modeling coupled with a graphical user interface. It performs several mining taskslike preprocessing, clustering, classification, visualization, and regression.Fig 8: WEKA ExplorerVolume 9 Issue 10 - 2019540www.joics.org

Journal of Information and Computational ScienceISSN: 1548-77415. CONCLUSIONIn this paper, we discussed the importance of data mining and the phases involved in datamining. BeforeGoing for data Mining analysis we have to pre-process the data by removing all kinds ofinconsistencies that exist in data later it should be stored in a data warehouse in amultidimensional format that can capable of storing large amounts of organizational forseveral years. Later Data mining analysis can be performed which can able to make timelydecisions by decision-makers of the organization so that it can be grown well in a competitivemarket. We also discussed the most popular algorithms that exist in data mining and theirimportance. Later we have described the tools normally used in research and diagrammaticrepresentation of their usage in real-time. From the above, we can conclude that data miningis one the fast-growing field which can provide a lot more advantages in different areas likeBusiness, Education, Medical, and Research Organizations.REFERENCES:[1] Meta Group Inc. Data Mining: Trends, Technology, and Implementation Imperatives.Stamford, CT,February 1997.[2] Goebel, M. and Grunewald, L., A Survey of Knowledge Discovery and Data MiningTools. TechnicalThe report, University of Oklahoma, School of Computer Science, Norman, OK,February 1998.[3] Waikato ML Group. User Manual Weka: The Waikato Environment for KnowledgeAnalysis.Department of Computer Science, University of Waikato (New Zealand), June 1997.[4] Thearling, K. Data Mining, and Database Marketing WWW Pages.http://www.santafe.edu/ kurt/dmvendors.shtml, 1998.[5] Online available at https://medium.com/@z atkins/daily-grind-7be9fb23b2e[6] Online available at https://www.sas.com/en sg/insights/analytics/data-mining.html[7] Online available at http://tagteam.harvard.edu/hub feeds/2087/feed items/2358304[8] Online available at http://www.handsonsystem.com/White-Papers[9] Xindong Wu, Vipin Kumar, J. Ross Quinlan, Joydeep Ghosh, Qiang Yang, HiroshiMotoda, “Top 10Algorithms in data mining”, Springer-Verlag London Limited, 2007.[10] Haiyan Zhou, Xiaolin Bai, Jinsong Shan.A Rough-Set-based Clustering Algorithm forMulti-stream.Procedia Engineering, 2011; 15: 1854- 58Volume 9 Issue 10 - 2019541www.joics.org

Data mining is known as an interdisciplinary subfield of computer science and basically is a computing process of discovering patterns in large data sets. It is considered as an essential process where intelligent methods are applied in order to extract data patterns [7]. Given below is a list of Top Data Mining Algorithms: Fig 2: Algorithms in .