SPSS Tutorial - Multivariate Solutions

Transcription

SPSS TutorialAEB 37 / AE 802Marketing Research MethodsWeek 7

Cluster analysisLecture / Tutorial outline Cluster analysis Example of cluster analysis Work on the assignment

Cluster Analysis It is a class of techniques used toclassify cases into groups that arerelatively homogeneous withinthemselves and heterogeneousbetween each other, on the basis ofa defined set of variables. Thesegroups are called clusters.

Cluster Analysis andmarketing research Market segmentation. E.g. clustering ofconsumers according to their attributepreferences Understanding buyers behaviours.Consumers with similarbehaviours/characteristics are clustered Identifying new product opportunities.Clusters of similar brands/products can helpidentifying competitors / market opportunities Reducing data. E.g. in preference mapping

Steps toCluster1.2.3.4.conduct aAnalysisSelect a distance measureSelect a clustering algorithmDetermine the number of clustersValidate the analysis

3REGR factor score 1 for analysis1210-1-2-3-4-3-2-10REGR factor score 2 for analysis11234

Defining distance: theEuclidean distancenDij xki xkj 2k 1Dij distance between cases i and jxki value of variable Xk for case jProblems: Different measures different weights Correlation between variables (doublecounting)Solution: Principal component analysis

Clustering procedures Hierarchical procedures– Agglomerative (start from n clusters,to get to 1 cluster)– Divisive (start from 1 cluster, to get ton cluster) Non hierarchical procedures– K-means clustering

Agglomerative clustering

AgglomerativeclusteringLinkage methods –––Single linkage (minimum distance)Complete linkage (maximum distance)Average linkageWard’s method 1.2.Compute sum of squared distances within clustersAggregate clusters with the minimum increase in theoverall sum of squaresCentroid method –The distance between two clusters is defined as thedifference between the centroids (cluster averages)

K-means clustering1.2.The number k of cluster is fixedAn initial set of k “seeds” (aggregation centres) isprovided 3.First k elementsOther seedsGiven a certain treshold, all units are assigned tothe nearest cluster seed4. New seeds are computed5. Go back to step 3 until no reclassification isnecessaryUnits can be reassigned in successive steps(optimising partioning)

Hierarchical vs Nonhierarchical methodsHierarchicalclustering No decision about thenumber of clusters Problems when datacontain a high level oferror Can be very slow Initial decision aremore influential (onestep only)Non hierarchicalclustering Faster, more reliable Need to specify thenumber of clusters(arbitrary) Need to set the initialseeds (arbitrary)

Suggested approach1. First perform a hierarchicalmethod to define the number ofclusters2. Then use the k-means procedureto actually form the clusters

Definingclusters:nStage Number of clusters01211121039485766758493102111the numberelbow ruleof(1)Agglomeration ScheduleStage Cluster FirstCluster CombinedAppearsStageCluster 1 Cluster 2CoefficientsCluster 1 Cluster 2Next 8010101128.7879011111311.4031000

Elbow rule (2): thescree diagram1210Distance86420111098765Number of clusters4321

Validating theanalysis Impact of initial seeds / order ofcases Impact of the selected method Consider the relevance of thechosen set of variables

SPSS Example

Component1-.50.0.51.01.52.0

Agglomeration ScheduleStage Cluster FirstCluster CombinedAppearsStage Cluster 1 Cluster 2 Coefficients Cluster 1 Cluster 2 Next 0Number of clusters: 10 – 6 4

nent2-.5 PAMELATHOMASCluster Number of .0.51.01.52.0

Open the datasetsupermarkets.savFrom your N: directory (if you saved itthere last timeOr download it from:http://www.rdg.ac.uk/ aes02mm/supermarket.sav Open it in SPSS

The supermarkets.savdataset

Run PrincipalComponents Analysisand save scores Select the variables to perform theanalysis Set the rule to extract principalcomponents Give instruction to save theprincipal components as newvariables

Cluster analysis:basic steps Apply Ward’s methods on theprincipal components score Check the agglomeration schedule Decide the number of clusters Apply the k-means method

Analyse / Classify

Select the componentscoresSelect from hereUntick this

Select Ward’s algorithmSelectmethod hereClick herefirst

Output: Agglomerationschedule

Number of clustersIdentify the step where the “distance coefficients” makes a biggerjump

The scree diagram(Excel 441421401381361341321301281261241221201180

Number of clustersNumber of cases150Step of ‘elbow’144Number of clusters6

Now repeat theanalysis Choose the k-means techniqueSet 6 as the number of clustersSave cluster number for each caseRun the analysis

K-means

K-means dialog boxSpecifynumber ofclusters

Save cluster membershipClick herefirstThick here

Final output

Cluster membership

Component(tutorial1. “Old Rich BigSpender”Monthly amount spentMeat expenditureFish expenditureVegetables expenditure% spent in own-brandproductOwn a car% spent in organic foodVegetarianHousehold SizeNumber of kidsWeekly TV watching(hours)Weekly Radio listening(hours)Surf the webYearly household incomeAge of respondentmeaningweek 5)4. Organic radiolistenerComponent MatrixaComponent3. VegetarianTV1234lover.810-.294 -4.26E-02.1832. Familyshopper.480-.152.347.3345.173-5.95E-02.1405. 6E-026.942E-04Extraction Method: Principal Component Analysis.a. 5 components extracted.TV and-.207web hater

Final Cluster CentersCluster1REGR factor score1 for analysis 1REGR factor score2 for analysis 1REGR factor score3 for analysis 1REGR factor score4 for analysis 1REGR factor score5 for analysis .87815

Cluster interpretationthrough mean component values Cluster 1 is very far from profile 1 (-1.34) andmore similar to profile 2 (0.38) Cluster 2 is very far from profile 5 (-0.93) andnot particularly similar to any profile Cluster 3 is extremely similar to profiles 3 and 5and very far from profile 2 Cluster 4 is similar to profiles 2 and 4 Cluster 5 is very similar to profile 3 and very farfrom profile 4 Cluster 6 is very similar to profile 5 and very farfrom profile 3

Which cluster totarget? Objective: target the organicconsumer Which is the cluster that looks more“organic”? Compute the descriptive statisticson the original variables for thatcluster

Representation of factors 1and 4(and cluster membership)3REGR factor score 4 for analysis121Cluster Number of Ca065-143-22-31-3-2-1REGR factor score011 for analysis21

SPSS Tutorial AEB 37 / AE 802 Marketing Research Methods Week 7. Cluster analysis Lecture / Tutorial outline Cluster analysis Example of cluster analysis Work on the assignment. Cluster Analysis It is a class