GIS Data Management Lesson 15 - UW Courses Web Server

Transcription

GEOG 482 / 582 : GISData ManagementLesson 15: Web GIS Applications – Data IssuesGEOG482/582 / My Course / University of Washington

OverviewLearning Objective Questions:1.2.3.4.5.6.What is spatial data mining?What types of data mining techniques exist?What are the steps for data mining implementation?What are different types of decision problems?What is a general approach to decision workflow?What database systems are needed for decision support?Lesson PreviewLearning objectivequestions act as thelesson outline.Questions beg answers.GEOG482/582 / My Course / University of Washington

Spatial Data Mining1. What is spatial data mining? Early implementations of web systems were used for datamanagement and display. Data management of large databases now being used for datamining as a pre-cursor to decision support.Key TermsSpatial data mining Spatial data mining – process of extracting interesting andpreviously unknown information from complex data stored indatabases or warehouses Commonly called knowledge discovery in databases (KDD);now also goes by name of machine learning within artificialintelligence (AI) Data mining techniques are now embedded within variouscommercial DBMS softwareGEOG482/582 / My Course / University of Washington

Data mining driven by Availability of database technology and software tools to searchand filter through large databases to detect patterns Conventional processing approaches like SQL, statisticalanalysis, and OLAP techniques are not designed to detect andextract knowledge Surge in data processing power, e.g. parallel computers andcloud computing Advances in principles coming together from many fields:DBMS, machine learning, information theory, decision scienceGEOG482/582 / My Course / University of Washington

Data mining differs from SQL and OLAP in several ways,including: Designed specifically for very large databases (millions ofrecords), i.e. a big data concern Designed more like analysis, than simple retrieval Discover patterns, relations, trends not previously seen, e.g.,high resolution land cover and stream networks withinimagery Use machine learning to apply patterns Detect characteristics of and correlations among large numberof attributes in the dataset, e.g. what feature is related to whatfeature within what contextGEOG482/582 / My Course / University of Washington

2. What types of data mining techniques exist? Framework for data mining - See Y&H Fig 11-4 p. 419 forconcepts and techniques of data miningTwo general types of data mining: supervised and un-supervised Supervised (predictive) data mining Classification Prediction Unsupervised (descriptive) data mining Time series analysis Class concept hierarchies Association Clustering Cluster analysisGEOG482/582 / My Course / University of Washington

Data Mining Using Machine Learning Learning algorithm discovers the relationships among data ina training set, e.g., streams within unmanned aerial system(drone) imagery Use those relationships to examine the larger data set Outcomes: classification, patterns, predictions and trends canbe computed Supervised learning – requires data analyst to identify targetfield, or attributes to be mined; uses algorithms Unsupervised learning - find associations, clusters, trends indata without aid of pre-stated hypotheses or tests. See Y&H Fig 11-6 p. 422 for framework of machine learningfor spatial dataGEOG482/582 / My Course / University of Washington

Classification – as supervised learning for featurecategory formationDecision tree – branching from one discovery to anotherwhat attributes belong to what categories as a way to formcategoriesNeural Networks – build a network of associationswhat attributes are associated with what other attributes toform categoriesBayesian classifiers – discovery using context based onconditional probabilities; what is the likelihood (probability)of a given attribute being associated with another attributegiven the collection of attributesGEOG482/582 / My Course / University of Washington

Prediction – missing data or forecastsOrdinary least squares simple linear regression – a dependentvariable is predicted by one independent variable. It is basedon minimizing errors of the average difference of a trend lineamong the data values.Ordinary least squares multiple regression - multipleindependent variables are used.Nonlinear least squares regression – higher power equation thansimple linear equation used for independent variables.GEOG482/582 / My Course / University of Washington

Unsupervised (descriptive) data mining Time series analysis – relationships examined over time;sequence of observations that repeat as pattern Class concept hierarchies – low-level and high-level conceptformation based on detection of data values Association – what is associated with what in regards toattributes Clustering – what is in close relationship to what across space,over time, or as detected through processing of attribute datavalues Cluster analysis – use statistical method to determinecloseness, e.g., correlation analysis, analysis of variance,discriminant analysisGEOG482/582 / My Course / University of Washington

Visualization Many of the techniques use visualization to supportinvestigation Seeking to interpret complex relationships requires highdimensional visual bandwidth Use graphic design to unpack the relationships All principles of graphic design apply, looking forrelationshipsGEOG482/582 / My Course / University of Washington

3. What are the steps for data mining implementation?Cross-Industry Standard Process for Data Mining (CRISP-DM) Business understanding – define business objectives for themining Data Understanding – identify data sources to be mined Data preparation – characterize the data to be mined Modeling – apply the techniques Evaluate – how well did the techniques work Deployment – provide solution to user(See CRISP-DM flow Y&H Figure 11-7 p. 441 for an overview.)Case study: land cover data development in the Chesapeake Bay as anexample of machine learning for land cover classification usingEsri’s ArcGIS tools and Microsoft’s Cognitive r-data-project/GEOG482/582 / My Course / University of Washington

Spatial Decision Support4. What are different types of decision problems?Type ofDecisionProblem4 Decision Problem Components in an Open System3 Decision Problem Components in aClosed SystemContentStructureProcessSimpleXDifficult mplex(wicked)XXXContextXClosed system has a finite number of phenomena (parts) and relationshipsthat need be addressed.Open system has many as yet to be known phenomena (parts) andrelationships that need be estimated, as they are uncertain in thecomputation.GEOG482/582 / My Course / University of Washington

Complex problem situation assessment – data problemSystem Content1.1 Existence/identity as awareness of potential observables for dimensions.1.2 Observations sampled in terms of units of measurement within dimensions.1.3 Similarities among observations form a class of fields for observations.1.4 Object classes specified in a database, with domain delimited for elements withinreference systems.System Structure1.5 Composite of two or more space-time elements provided by relationships as core ofsustainable systems.System Process (dynamic)1.6 Functional sustainability relationships within the context of a social-ecological setting.System Context1.7 Purpose of functional activity being performed, including expected outcome of activity.Nyerges et al. 2014, Foundations of sustainability information representation theory:spatial–temporal dynamics of sustainable 13658816.2013.853304#.Un0G5CcVGSoGEOG482/582 / My Course / University of Washington

Green stormwater infrastructure as a complex systems problem What content, structure, process, and context to consider?How is i related to j?1st law of geography “Everything (i) is relatedto everything else (j),but near thingsare more relatedthan distant things.”(Waldo Tobler 1970) near in terms of space,time, and functionGEOG482/582 / My Course / University of Washington

Representation in Geodesign WorkflowData representation challenge ingeodesign workflow about urbanwatershed management Modeling Step 1: Representationmodel is fundamentally contentand structure.Modeling Step 2: System processmodel is a spatial-temporalprocess.Modeling Step 3: Evaluation modelcharacterize conditions of theworld that motivate decisionmaking: Do nothing or dosomething to change the world?GEOG482/582 / My Course / University of Washington

5. What is a general approach to decision workflow?Workflow can involve a considerable number of sequenced tasks Macro-micro workflow process Macro stages (steps) for overall flowKey TermGeospatial workflow Intelligence, Design, Choice e.g. according toSimon (1977) Micro activities (substeps), Gather, Organize, Select, ReviewWorkflow involves geospatial information technology use,possibly for all micro activity stepsGEOG482/582 / My Course / University of Washington

Macro-micro workflow strategy—general approachMacro-Stages in a Decision StrategyMicro- Activities ina Decision StrategyA. Gather.1. Intelligence aboutvalues, objectivesand criteria2. Design of a set offeasible options3. Choice aboutrecommendationsissues to develop &refine value treesas a basis forobjectivesprimary criteria as abasis for optiongenerationvalues, criteria, andoption list scenariosfor an evaluationB. Organize.objectives as abasis for criteriaand constraintsand applyapproach(es) foroption generationapproaches to priorityand sensitivityanalysesC. Select.criteria to be usedin analysis as abasis for generatingoptionsthe feasible option listRecommendation as aprioritized list ofoptionsD. Review.criteria, resources,constraints, andstandardsoption set(s) in linewith resources,constraints andstandardsrecommendation(s) inline with originalvalue(s), goal(s) andobjectivesStart at Stage 1 andwork down the rows Athrough D.Then, move to Stage 2and work down therows A through D.Jankowski and Nyerges, 2001 Geographic Information Systems for Group Decision Making,Taylor & Francis: London. Table 2.1, p. 17.GEOG482/582 / My Course / University of Washington

Contexts and tasks influence the nature of workflowThree-passes (scope, design constrain, implement) undergirdthe challenge in workflow.Planning tasks—commonly use longer-term and macro-scaleperspectives in a communityImplementScope1. Intelligence2. DesignImprovement programming tasks—commonly usemedium-term and meso-scale perspectives3. ChoiceImplementation tasks—commonly use shorter-term andReviewmicro-scale perspectivesDesignconstrainScale is relative to conditions within organization andcommunityGEOG482/582 / My Course / University of Washington

Transportation Improvement Programming workflowExample workflow using a web-based participatory GIS tool called LIT Macro stages (steps)Micro activities (substeps)1. Discuss concerns1a: Brainstorm concerns1b: Review summaries2. Assess improvement factors2a: Discuss factors2b: Weigh factors3. Create transportation packages3a: Discuss projects3b: Discuss funding options3c: Create your own package4. Select a package for recommendation4a: Discuss candidate packages4b: Vote on package recommendation5. Prepare group report5a: Discuss report5b: Vote on report endorsementLet’s Improve Transportationweb site for large-scale scaleexperiment about transportationimprovement decision making.The site is no longer available tothe public.The LIT experiment results aredescribed in the followingNyerges & Aguirre 2011GEOG482/582 / My Course / University of Washington

6. What database systems are needed for decision support?Planning tasks need what kind of database system support? Considerable number of data elements, managed over longperiods of time; granularity of descriptions are rather coarse.Improvement programming decision tasks need what kind ofdatabase system support? Few number of features management over time, but withhighly interactive, costs and benefits enumerated.Implementation project tasks need what kind of database systemsupport? Considerable number of data elements tracked over fine-grained timeframe.GEOG482/582 / My Course / University of Washington

SummaryIn this lesson, you learned about 1.2.3.4.5.6.Spatial data miningTypes of data mining techniques that existSteps for implementing data miningDifferent types of decision problemsGeneral approach to decision workflowDatabase systems needed for decision supportGEOG482/582 / My Course / University of Washington

Contact me atnyerges@uw.edu if youhave questions orcomments about thislesson.GEOG 482/582: GISData ManagementEND Lesson 15: Web GIS Applications– Data IssuesGEOG482/582 / My Course / University of Washington

3. What are the steps for data mining implementation? Cross-Industry Standard Process for Data Mining (CRISP-DM) Business understanding -define business objectives for the mining Data Understanding -identify data sources to be mined Data preparation -characterize the data to be mined Modeling -apply the techniques