Data Warehouse & Mining Lab Manual - WordPress

Transcription

Data Warehouse & Mining Lab ManualRoll No:Name:Sem: Section

2018Data Warehouse & Mining Lab ManualCERTIFICATECertified that this file is submitted byShri/Ku.Roll No. a student of VII Semester final year of the course ComputerScience & Engineering as a part of PRACTICAL as prescribed by the RashtrasantTukadoji Maharaj Nagpur University for the subject Data Warehouse & Mining inthe laboratory of during the academic yearand that I have instructed him/her for the said work,from time to time and I found him/her to be satisfactory progressive.And that I have accessed the said work and I am satisfied that the same is up tothat standard envisaged for the course.Date: -Prof. Almas AnsariSignature & Nameof Subject TeacherSignature & Nameof HODPage 1

Data Warehouse & Mining Lab Manual2018Anjuman College of Engineering and TechnologyVision To be a centre of excellence for developing quality technocrats with moral and socialethics, to face the global challenges for the sustainable development of society.Mission To create conducive academic culture for learning and identifying career goals. To provide quality technical education, research opportunities and imbibeentrepreneurship skills contributing to the socio-economic growth of the Nation. To inculcate values and skills, that will empower our students towards developmentthrough technology.Vision and Mission of the DepartmentVision: To achieve excellent standards of quality education in the field of computer scienceand engineering, aiming towards development of ethically strong technical expertscontributing to the profession in the global society.Mission: To create outcome based education environment for learning and identifying careergoals. Provide latest tools in a learning ambience to enhance innovations, problem solvingskills, leadership qualities team spirit and ethical responsibilities. Inculcating awareness through innovative activities in the emerging areas oftechnology.Prof. Almas AnsariPage 2

Data Warehouse & Mining Lab Manual2018Program Educational Objectives (PEOs) The graduates will have a strong foundation in mathematical, scientific andengineering fundamentals necessary to formulate, solve and analyze engineeringproblem in their career. Graduates will be able to create and design computer support systems and impartknowledge and skills to analyze, design, test and implement various softwareapplications. Graduates will work productively as computer science engineers towards bettermentof society exhibiting ethical qualities.Program Specific Outcomes (PSOs) Foundation of mathematical concepts: To use mathematical methodologies andtechniques for computing and solving problem using suitable mathematical analysis,data structures, database and algorithms as per the requirement. Foundation of Computer System: The capability and ability to interpret andunderstand the fundamental concepts and methodology of computer systems andprogramming. Students can understand the functionality of hardware and softwareaspects of computer systems, networks and security. Foundations of Software development: The ability to grasp the software developmentlifecycle and methodologies of software system and project development.Prof. Almas AnsariPage 3

2018Data Warehouse & Mining Lab ManualPROGRAM: CSEDEGREE: B.ECOURSE: DATA WAREHOUSING ANDSEMESTER: VIICREDITS: 2MININGCOURSE CODE: BECSE401TCOURSE TYPE: REGULARCOURSE AREA/DOMAIN: DATA MININGCONTACT HOURS: 2 hours/Week.CORRESPONDING LAB COURSE CODE :LAB COURSE NAME : DATABECSE401PWAREHOUSING AND MINING LABCOURSE PRE-REQUISITES:C.CODECOURSE NAMEDESCRIPTIONSEMDATABASE MANAGEMENT SYSTEMSvLAB COURSE OBJECTIVES: To familiarize students with the basic concepts of Data mining and Warehousing. To explain and demonstrate various mining algorithms on real world data. To brief students about the future trends in the fields of data mining.COURSE OUTCOMES: Data warehousing and mining labAfter completion of this course the students will be able SNODESCRIPTIONBLOOM‟S TAXONOMYLEVELCO.1Create a dataset for any application in the .arff format.LEVEL 6CO.2Describe various preprocessing techniques and statistical techniquesand apply those techniques on the given data set.LEVEL 1,3CO.3Apply various association rule mining algorithms on the given datasetLEVEL 3CO.4Apply various classification algorithms on the given data set.LEVEL 3CO.5Apply various clustering algorithms on the given data set.LEVEL 3CO.6Create an application using outlier analysis.LEVEL 6Prof. Almas AnsariPage 4

Data Warehouse & Mining Lab Manual2018Lab Instructions: Make entry in the Log Book as soon as you enter the Laboratory. All the students should sit according to their Roll Numbers. All the students are supposed to enter the terminal number in the Log Book. Do not change the terminal on which you are working. Strictly observe the instructions given by the Faculty / Lab. Instructor. Take permission before entering in the lab and keep your belongings in theracks. NO FOOD, DRINK, IN ANY FORM is allowed in the lab. TURN OFF CELL PHONES! If you need to use it, please keep it in bags. Avoid all horseplay in the laboratory. Do not misbehave in the computerlaboratory. Work quietly. Save often and keep your files organized. Don‟t change settings and surf safely. Do not reboot, turn off, or move any workstation or PC. Do not load any software on any lab computer (without prior permission ofFaculty and Technical Support Personnel). Only Lab Operators and TechnicalSupport Personnel are authorized to carry out these tasks. Do not reconfigure the cabling/equipment without prior permission. Do not play games on systems. Turn off the machine once you are done using it. Violation of the above rules and etiquette guidelines will result in disciplinaryaction.Prof. Almas AnsariPage 5

2018Data Warehouse & Mining Lab ManualContinuous Assessment PracticalExpNAME OF EXPERIMENTNoDateSignRemarkDemonstration of preprocessing on .arff file using1.2.student data .arffTo perform the statistical analysis of dataDemonstration of association rule mining using3.apriory algorithm on supermarket data.Demonstration of FP Growth algorithm on4.supermarket dataTo perform the classification by decision tree5.induction using weka tools.To perform classification using Bayesian6.classification algorithm using R.To perform the cluster analysis by k-means method7.using R.To perform the hierarchical clustering using R8.9.10.programming.Study of Regression Analysis using R programming.Outlier detection using R programming.Content Beyond SyllabusTo Study and introduction to leading open-source11.RapidMiner tool for data mining solutionProf. Almas AnsariPage 6

2018Data Warehouse & Mining Lab ManualCONTENTSExpNAME OF EXPERIMENTNo1.2.PAGENO.Demonstration of preprocessing on .arff file using student data .arffTo perform the statistical analysis of dataDemonstration of association rule mining using Apriory algorithm on3.4.supermarket data.Demonstration of FP Growth algorithm on supermarket dataTo perform the classification by decision tree induction using weka5.tools.To perform classification using Bayesian classification algorithm6.7.8.9.10.using R.To perform the cluster analysis by k-means method using R.To perform the hierarchical clustering using R programming.Study of Regression Analysis using R programming.Outlier detection using R programming.Content Beyond SyllabusTo Study and introduction to leading open-source RapidMiner tool for11.data mining solutionProf. Almas AnsariPage 7

Data Warehouse & Mining Lab Manual2018EXPERIMENT NO – 1Prof. Almas AnsariPage 8

Data Warehouse & Mining Lab Manual2018Aim: - Demonstration of preprocessing on .arff file uses student data.The procedure for creating a ARFF File in Weka is quite simple.Note: This is for a XLSX file/dataset containing alphanumeric values.1) If you have a XLSX file then you need to convert it into a CSV (Comma Separated Values)File.2) Then Open the CSV File with a text editor eg .Notepad 3) Append header relation e.g. @relation compile-weka.filters.unsupervised.attribute4) After that append the file with headers equal to the number of instances in your XLSX filee.g. @attribute max numeric @attribute min numeric @attribute mean numeric @attributemedian numeric. This means the file has four columns excluding the class label.5) Add the class label relation eg. @attribute CLASS {0,1} This has 2 classes mainly 0 and afterthat append the header with @data and then save the file as .arffA complete example of the ARFF header can be as follows.Dataset student .arff@relation student@attribute age { 30,30-40, 40}@attribute income {low, medium, high}@attribute student {yes, no}@attribute credit-rating {fair, excellent}@attribute buyspc {yes, no}@data30, high, no, fair, no30, high, no, excellent, no30-40, high, no, fair, yes40, medium, no, fair, yes40, low, yes, fair, yes40, low, yes, excellent, no30-40, low, yes, excellent, yes30, medium, no, fair, no30, low, yes, fair, no40, medium, yes, fair, yesProf. Almas AnsariPage 9

Data Warehouse & Mining Lab Manual201830, medium, yes, excellent, yes30-40, medium, no, excellent, yes30-40, high, yes, fair, yes40, medium, no, excellent, noOUTPUT:Paste Output Screenshot hereProf. Almas AnsariPage 10

2018Data Warehouse & Mining Lab ManualViva Voce Question1. What is preprocessing. Why it is necessary?2. How to create .arff, .csv file full form of .arff, .crv?Signature of Subject TeacherProf. Almas AnsariPage 11

Data Warehouse & Mining Lab Manual2018EXPERIMENT NO – 2Prof. Almas AnsariPage 12

Data Warehouse & Mining Lab Manual2018Aim: - To perform the statistical analysis of data. Discretization, Missing Values,Numeric Transform)Theory : This experiment illustrates some of the basic data preprocessing operations that can beperformed using WEKA-Explorer. The sample dataset used for this example is the student dataavailable in arff format.Step1: Loading the data. We can load the dataset into weka by clicking on open button inpreprocessing interface and selecting the appropriate file.Step2: Once the data is loaded, weka will recognize the attributes and during the scan of the dataweka will compute some basic strategies on each attribute. The left panel in the above figure showsthe list of recognized attributes while the top panel indicates the names of the base relation or tableand the current working relation (which are same initially).Step3:Clicking on an attribute in the left panel will show the basic statistics on the attributes for thecategorical attributes the frequency of each attribute value is shown, while for continuous attributeswe can obtain min, max, mean, standard deviation and deviation etc.,Step4: The visualization in the right button panel in the form of cross-tabulation across twoattributes.Note: we can select another attribute using the dropdown list.Step5: Selecting or filtering attributesRemoving an attribute-When we need to remove an attribute, we can do this by using the attributefilters in weka. In the filter model panel, click on choose button, this will show a popup window witha list of available filters.Scroll down the list and select the “weka.filters.unsupervised.attribute.remove” filters.Step 6:a) Next click the textbox immediately to the right of the choose button. In the resulting dialog boxenter the index of the attribute to be filtered out.b) Make sure that invert selection option is set to false. The click OK now in the filter box. Youwill see “Remove-R-7”.c) Click the apply button to apply filter to this data. This will remove the attribute and create newworking relation.d) Save the new working relation as an arff file by clicking save button on thetop(button)panel.(student.arff)Prof. Almas AnsariPage 13

2018Data Warehouse & Mining Lab ManualDiscretizationSometimes association rule mining can only be performed on categorical data.This requiresperforming discretization on numeric or continuous attributes.In the following example let usdiscretize age attribute.Let us divide the values of age attribute into three bins(intervals).First loadthe dataset into weka(student.arff) Select the age attribute.Activate filter-dialog box and select ��fromthe list.To change the defaults for the filters,click on the box immediately to the right of the choose button.We enter the index for the attribute to be discretized.In this case the attribute is age.So we mustenter „1‟ corresponding to the age attribute.Enter „3‟ as the number of bins.Leave the remaining field values as they are.Click OK button.Clicks apply in the filter panel. This will result in a new working relation with the selected attributepartition into 3 bins.Save the new working relation in a file called student-data-discretized.arffThe following screenshot shows the effect of discretization.Paste Screen shot of DiscretizationProf. Almas AnsariPage 14

2018Data Warehouse & Mining Lab ManualPaste Screen shot of MissingPaste Screen shot NumericProf. Almas AnsariValuesTransformPage 15

2018Data Warehouse & Mining Lab ManualViva Voce Question1. Enlist of preprocessing techniques?2. Brief function of each preprocessing technique?Signature of Subject TeacherProf. Almas AnsariPage 16

Data Warehouse & Mining Lab Manual2018EXPERIMENT NO – 3Prof. Almas AnsariPage 17

Data Warehouse & Mining Lab Manual2018Aim: - Demonstration of association rule mining using apriory algorithm onsupermarket data.NAMEweka.associations.AprioriSYNOPSISClass implementing an Apriori-type algorithm. Iteratively reduces the minimum support until it findsthe required number of rules with the given minimum confidence.The algorithm has an option to mine class association rules. It is adapted as explained in the secondreference.For more information see:R. Agrawal, R. Srikant: Fast Algorithms for Mining Association Rules in Large Da

Data Warehouse & Mining Lab Manual 2018 Lab Instructions: Make entry in the Log Book as soon as you enter the Laboratory. All the students should sit according