Archive

Transcription

Jiawei Han[DATA MINING: CONCEPTS AND TECHNIQUES 3RD EDITION]1

Jiawei Han[DATA MINING: CONCEPTS AND TECHNIQUES 3RD EDITION]Data Mining:Concepts and TechniquesThird EditionJiawei HanUniversity of Illinois at Urbana–ChampaignMicheline KamberJian PeiSimon Fraser UniversityMorgan Kaufmann is an imprint of Elsevier2

Jiawei Han[DATA MINING: CONCEPTS AND TECHNIQUES 3RD EDITION]Table of Contents1.Introduction . 121.1. Why Data Mining? . 121.1.1. Moving toward the Information Age. 121.1.2. Data Mining as the Evolution of Information Technology . 131.2. What Is Data Mining? . 161.3. What Kinds of Data Can Be Mined? . 181.3.1. Database Data . 181.3.2. Data Warehouses . 191.3.3. Transactional Data. 221.3.4. Other Kinds of Data . 221.4. What Kinds of Patterns Can Be Mined? . 231.4.1. Class/Concept Description: Characterization and Discrimination . 241.4.2. Mining Frequent Patterns, Associations, and Correlations . 251.4.3. Classification and Regression for Predictive Analysis. 261.4.4. Cluster Analysis . 281.4.5. Outlier Analysis. 281.4.6. Are All Patterns Interesting? . 291.5. Which Technologies Are Used? . 301.5.1. Statistics . 311.5.2. Machine Learning . 321.5.3. Database Systems and Data Warehouses . 331.5.4. Information Retrieval . 331.6. Which Kinds of Applications Are Targeted? . 341.6.1. Business Intelligence . 341.6.2. Web Search Engines. 351.7. Major Issues in Data Mining . 361.7.1. Mining Methodology. 361.7.2. User Interaction . 371.7.3. Efficiency and Scalability . 381.7.4. Diversity of Database Types . 381.7.5. Data Mining and Society . 391.8. Summary . 392. Getting to Know Your Data . 412.1. Data Objects and Attribute Types . 422.1.1. What Is an Attribute? . 422.1.2. Nominal Attributes . 432.1.3. Binary Attributes . 433

Jiawei Han[DATA MINING: CONCEPTS AND TECHNIQUES 3RD EDITION]2.1.4. Ordinal Attributes . 442.1.5. Numeric Attributes . 442.1.6. Discrete versus Continuous Attributes . 452.2. Basic Statistical Descriptions of Data . 462.2.1. Measuring the Central Tendency: Mean, Median, and Mode . 462.2.2. Measuring the Dispersion of Data: Range, Quartiles, Variance, Standard Deviation, andInterquartile Range . 492.2.3. Graphic Displays of Basic Statistical Descriptions of Data . 522.3. Data Visualization . 572.3.1. Pixel-Oriented Visualization Techniques . 572.3.2. Geometric Projection Visualization Techniques . 592.3.3. Icon-Based Visualization Techniques . 612.3.4. Hierarchical Visualization Techniques . 622.3.5. Visualizing Complex Data and Relations . 632.4. Measuring Data Similarity and Dissimilarity . 642.4.1. Data Matrix versus Dissimilarity Matrix. 652.4.2. Proximity Measures for Nominal Attributes . 662.4.3. Proximity Measures for Binary Attributes . 68Table 2.4Relational TableWhere Patients Are Described by Binary Attributes . 692.4.4. Dissimilarity of Numeric Data: Minkowski Distance . 692.4.5. Proximity Measures for Ordinal Attributes . 722.4.6. Dissimilarity for Attributes of Mixed Types . 732.4.7. Cosine Similarity . 742.5. Summary . 763. Data Preprocessing . 783.1. Data Preprocessing: An Overview . 783.1.1. Data Quality: Why Preprocess the Data? . 793.1.2. Major Tasks in Data Preprocessing . 803.2. Data Cleaning . 823.2.1. Missing Values . 823.2.2. Noisy Data . 833.2.3. Data Cleaning as a Process. 853.3. Data Integration . 873.3.1. Entity Identification Problem . 883.3.2. Redundancy and Correlation Analysis . 883.3.3. Tuple Duplication. 923.3.4. Data Value Conflict Detection and Resolution . 923.4. Data Reduction . 933.4.1. Overview of Data Reduction Strategies . 934

Jiawei Han[DATA MINING: CONCEPTS AND TECHNIQUES 3RD EDITION]3.4.2. Wavelet Transforms . 933.4.3. Principal Components Analysis . 953.4.4. Attribute Subset Selection . 963.4.5. Regression and Log-Linear Models: Parametric Data Reduction . 983.4.6. Histograms . 993.4.7. Clustering . 1003.5. Data Transformation and Data Discretization . 1033.5.1. Data Transformation Strategies Overview . 1033.5.2. Data Transformation by Normalization . 1053.5.3. Discretization by Binning. 1063.5.4. Discretization by Histogram Analysis. 1073.5.5. Discretization by Cluster, Decision Tree, and Correlation Analyses .1073.5.6. Concept Hierarchy Generation for Nominal Data . 1083.6. Summary . 1104.Data Warehousing and Online Analytical Processing . 1124.1. Data Warehouse: Basic Concepts. 1134.1.1. What Is a Data Warehouse? . 1134.1.2. Differences between Operational Database Systems and Data Warehouses .1154.1.3. But, Why Have a Separate Data Warehouse? . 1164.1.4. Data Warehousing: A Multitiered Architecture . 1174.1.5. Data Warehouse Models: Enterprise Warehouse, Data Mart, and Virtual Warehouse .1184.1.7. Metadata Repository . 1204.2. Data Warehouse Modeling: Data Cube and OL

Jiawei Han [DATA MINING: CONCEPTS AND TECHNIQUES 3RD EDITION] 3 Table of Contents 1.Introduction