Weighted-Object Ensemble Clustering - George Mason University

Transcription

2013 IEEE 13th International Conference on Data MiningWeighted-Object Ensemble ClusteringYazhou RenCarlotta DomeniconiSchool of Computer Science and EngineeringSouth China University of TechnologyGuangzhou, 510006, ChinaEmail: yazhou.ren@mail.scut.edu.cnDepartment of Computer ScienceGeorge Mason UniversityFairfax, 22030, USAEmail: carlotta@cs.gmu.eduGuoji ZhangGuoxian YuSchool of SciencesSouth China University of TechnologyGuangzhou, 510640, ChinaEmail: magjzh@scut.edu.cnCollege of Computer and Information ScienceSouthwest UniversityChongqing, 400715, ChinaEmail: gxyu@swu.edu.cnconsensus across multiple clustering results, while averagingout emergent spurious structures that arise due to the variousbiases to which each participating algorithm is tuned, or to thevariance induced by different data samples.Researchers have investigated how to combine clusteringensembles with subspace clusterings in an effort to addressboth the ill-posed nature of clustering and the curse-ofdimensionality in high dimensional spaces [3], [4]. A subspaceclustering is a collection of weighted clusters, where eachcluster has a weight vector representing the relevance offeatures for that cluster. In a subspace clustering ensemble,the consensus function makes use of both the clusters andthe weight vectors provided by the base subspace clusterings.Work has also been done to evaluate the relevance of eachbase clustering (and assign weights accordingly), in an effortto improve the final consensus clustering [5]. To the best ofour knowledge, no previous work has investigated how to useweights associated to objects within the clustering ensembleframework.Several approaches have studied the benefits of weightingthe objects in iterative clustering methods combined withboosting techniques [6], [7], [8]. Empirical observations suggest that large weights should be assigned to the objects thatare hard to be clustered. As a result, centroid-based clusteringmethods move the centers towards the regions where theobjects whose cluster membership is hard to be determinedare located. In boosting, the weights bias the data distribution,thus making the region around the difficult points denser. Withthis interpretation of the weights, shifting the centers of theclusters corresponds to moving them towards the modes ofthe distribution modified by the weights. Nock and Nielsen[9] formulated clustering as a constrained minimization usinga Bregman divergence, and recommended that the hard objectsshould be given large weights. They analyzed the benefits ofusing boosting techniques in clustering and introduced severalweighted versions of classical clustering algorithms.Inspired by this work, we propose an approach calledWeighted-Object Ensemble Clustering (WOEC). We first estimate how difficult it is to cluster an object by constructingthe co-association matrix that summarizes the base clusteringresults, and we then embed the corresponding information asAbstract—Ensemble clustering, also known as consensus clustering, aims to generate a stable and robust clustering throughthe consolidation of multiple base clusterings. In recent yearsmany ensemble clustering methods have been proposed, mostof which treat each clustering and each object as equallyimportant. Some approaches make use of weights associated withclusters, or with clusterings, when assembling the different baseclusterings. Boosting algorithms developed for classification havealso led to the idea of considering weighted objects during theclustering process. However, not much effort has been put towardsincorporating weighted objects into the consensus process.To fill this gap, in this paper we propose an approach calledWeighted-Object Ensemble Clustering (WOEC). We first estimatehow difficult it is to cluster an object by constructing the coassociation matrix that summarizes the base clustering results,and we then embed the corresponding information as weightsassociated to objects. We propose three different consensustechniques to leverage the weighted objects. All three reducethe ensemble clustering problem to a graph partitioning one.We present extensive experimental results which demonstratethat our WOEC approach outperforms state-of-the-art consensusclustering methods and is robust to parameter settings.Keywords—Ensemble clustering, consensus clustering, graphpartition, weighted objects.I. I NTRODUCTIONClustering is the key step for many tasks in data mining.It is well known that off-the-shelf clustering methods maydiscover different patterns in a given set of data. This isbecause each algorithm has its own bias due to the optimizationof different criteria. Furthermore, there is no ground truth tovalidate the result. Recently the use of clustering ensembleshas emerged as a technique for overcoming problems withclustering algorithms [1], [2]. A clustering ensemble technique is characterized by two components: the mechanismto generate diverse partitions, and the consensus function tocombine the input partitions into a final clustering. A clustering ensemble consists of different clusterings, obtained frommultiple applications of any single algorithm with differentinitializations, or on various bootstrap samples of the availabledata, or from the application of different algorithms to the samedata set. Clustering ensembles offer a solution to challengesinherent to clustering arising from its ill-posed nature: they canprovide more robust and stable solutions by making use of the1550-4786/13 31.00 2013 IEEEDOI 10.1109/ICDM.2013.80627

Fern and Brodley [16] introduced HBGF to integrate baseclusterings. HBGF constructs a bipartite graph that modelsboth objects and clusters simultaneously as vertices. In thebipartite graph, an object is connected to several clusters,whereas there is no edge between pairwise objects or pairwiseclusters. HBGF applies two graph partitioning algorithms:spectral graph partitioning [17], [18] and METIS to get thefinal partitions. To handle missing values, Wang, Shan andBanerjee proposed Bayesian cluster ensembles (BCE) [19],which considers all base clustering results as feature vectors,and then learns a Bayesian mixed-membership model from thisfeature representation.Domeniconi and Al-Razgan [3], [4] combined the clustering ensemble framework with subspace clustering. A subspaceclustering is a collection of weighted clusters, where eachcluster has a weight vector representing the relevance offeatures for that cluster. The input to the consensus function isa collection of subspace clusterings. The subspace clusteringmethod used is Locally Adaptive Clustering (LAC) [20], andthe resulting clustering techniques are WSPA and WBPA. Liand Ding [21] specified different weights for different clusterings and proposed an approach called Weighted ConsensusClustering (WCC). The optimal weights are sought iterativelythrough optimization and then used to compute the consensusresult within the framework of nonnegative matrix factorization(NMF) [5].More recently, a method called ensemble clustering bymatrix completion (ECMC) was proposed [22]. ECMC usesthe reliable pair of objects to construct a partially observedco-association matrix, and exploits the matrix completionalgorithm to replenish the missing entries of the co-associationmatrix. Two objects are reliable if they are often clustered together or seldom clustered together. ECMC then uses spectralclustering on the completed matrix to get the final clustering.However, ECMC has two disadvantages: (i) the notion ofreliability between objects is hard to define, and (ii) the matrixcompletion process may result in information loss.All the aforementioned methods do not assign weightsto objects. In contrast, our proposed WOEC algorithms usethe information provided by the base clusterings to defineobjects’ weights which reflect how difficult it is to clusterthem. The weights are then embedded within the consensus function. Specifically, we propose three different WOECapproaches: (i) Weighted-Object Meta Clustering (WOMC),(ii) Weighted-Object Similarity Partition (WOSP) Clustering,and (iii) Weighted-Object Hybrid Bipartite (WOHB) GraphPartition Clustering.weights associated to objects. Different from boosting [10],the weights are not subject to iterative changes. We proposethree different consensus techniques to leverage the weightedobjects. All three reduce the ensemble clustering problem to agraph partitioning one. We present extensive experimental results which demonstrate that our WOEC approach outperformsstate-of-the-art consensus clustering methods and is robust toparameter settings. The main contributions of this paper are asfollows:i) To the best of our knowledge, for the first time weightsassociated to objects are used within clustering ensembles. In WOEC, a one-shot method is proposed to assign weights to objects, and then three weighted-objecttechniques are presented to compute the consolidatedclustering.ii) A Weighted-Object 𝑘-means algorithm (WOKmeans) isproposed. Its superiority to 𝑘-means demonstrates theeffectiveness of the one-shot weights’ assignment.iii) Extensive experiments are conducted to analyze the performance of WOEC against existing consensus clusteringalgorithms, on several data sets and using three popularclustering evaluation measures.The rest of this paper is organized as follows. In Section II,we review related work on ensemble clustering. In Section III,we introduce the WOEC methodology. Section IV gives theexperimental settings and Section V analyzes the experimentalresults. Conclusions and future work are provided in SectionVI.II. R ELATED W ORKEnsemble techniques were first developed for supervisedsettings. Empirical results have shown that they are capable ofimproving the generalization performance of the base classifiers [11]. This result has inspired researchers to further investigate the development of ensemble techniques for unsupervisedproblems, namely clustering.Fred and Jain [12], [13] captured the information providedby the base clusterings in a co-association matrix, where eachentry is the frequency according to which two objects areclustered together. The co-association matrix was used as asimilarity matrix to compute the final clustering.Strehl and Ghosh [2] proposed three graph-based methods, namely CSPA, HGPA, and MCLA. CSPA constructs aco-association matrix, which is again viewed as a pairwisesimilarity matrix. A graph is then constructed, where eachobject corresponds to a node, and each entry of the coassociation matrix gives the weight of the edge between twoobjects. The METIS [14] package is used to partition theobjects into 𝑘 clusters. HGPA combines multiple clusterings bysolving a hypergraph partitioning problem. In the hypergraph,each hyperedge represents a cluster, and it connects all theobjects that belong to the corresponding cluster. The packageHMETIS [15] is utilized to generate the final clustering.MCLA considers each cluster as a node in a meta-graph, andsets the pairwise similarity between two clusters as the ratiobetween the number of shared objects and the number of totalobjects in the two clusters. In the meta-graph, each clusteris represented by a hyperedge. MCLA groups and collapsesrelated hyperedges, and it assigns an object to the collapsedhyperedge to which it participates most strongly. MCLA alsouses METIS [14] to partition the meta-graph.III. W EIGHTED O BJECT E NSEMBLE C LUSTERINGThis section introduces the ensemble clustering problemand presents the details of our weighted-object ensembleclustering.A. Ensemble Clustering Problem FormulationAn ensemble clustering approach consists of two steps.First, multiple base clusterings are generated; second, the baseclusterings are consolidated into the final clustering result.We assume here that the base clusterings have already beengenerated, and we only discuss hard clustering; however, themethods proposed can be extended to solve soft clusteringproblems as well.628

Let 𝒳 {x1 , x2 , ., x𝑛 } denote the data set and X ℝ𝑛 𝑑 be its matricial representation, where𝑛 is the number of objects and 𝑑 is the dimensionality of eachobject x𝑖 (𝑥𝑖1 , 𝑥𝑖2 , ., 𝑥𝑖𝑑 )𝑇 . Lets consider a set of 𝑅 clustering solutions 𝐶 {𝐶 1 , 𝐶 2 , ., 𝐶 𝑅 }, where each clusteringcomponent 𝐶 𝑟 {𝐶1𝑟 , 𝐶2𝑟 , ., 𝐶𝑘𝑟𝑟 }, 𝑟 1, 2, . . . , 𝑅, partitions the data set 𝒳 into 𝑘𝑟 disjoint clusters, i.e. 𝐶𝑖𝑟 𝐶𝑗𝑟 𝑟( 𝑖 𝑗, 𝑖, 𝑗 1, 2, . . . , 𝑘𝑟 ), and 𝑘𝑘 1𝐶𝑘𝑟 𝒳 . The ensembleclustering problem consists in defining a consensus functionΓ that maps the set of base clusterings 𝐶 into a singleconsolidated clustering 𝐶 .0.3y x(1 x)[x𝑇1 ; x𝑇2 ; .; x𝑇𝑛 ]0.25y0.2x 0.50.10.050B. One-shot Weight AssignmentBoosting [10], [11] is a supervised technique which seeksto create a strong learner based on a set of weak learners.Adaboost (short for Adaptive Boosting) [23] is the mostcommonly used boosting algorithm. It iteratively generatesa distribution over the data, which is then used to train thenext weak classifier. In each run, objects that are hard to beclassified gain more weight, while easy-to-classify objects loseweight. The newly trained classifier focuses on the points withlarger weights. At termination, Adaboost combines the weakclassifiers into a strong one.The effectiveness of boosting has been proven both theoretically and experimentally. Boosting algorithms iterativelyassign weights to objects using label information, which islikely to be unknown in clustering applications. This makesdifficult to apply boosting techniques to clustering, and explains why little research has been performed in the contextof weighted-object ensemble clustering methods. This paperaims at reducing this gap. Our goal is to enable difficultto-cluster points to play a bigger role when producing thefinal consolidated clustering result. Following this objective,we propose a one-shot weight assignment to objects using theresults of all base clusterings, and then embed the weightsinto the successive consensus clustering process. Similar toboosting, points that are hard to cluster receive larger weights,while easy-to-cluster points are given smaller weights. Thedifference is that boosting is an iterative process, while ourweight assignment scheme is performed in one-shot. Thedetails are given below.Let A be the 𝑛 𝑛 co-association matrix built from theclustering solutions 𝐶:Fig. 1.00.20.4x0.60.81Quadratic function: 𝑦(𝑥) 𝑥(1 𝑥)the level of uncertainty in clustering two points x𝑖 and x𝑗 asfollows:confusion(x𝑖 , x𝑗 ) 𝐴𝑖𝑗 (1 𝐴𝑖𝑗 )(2)The confusion index reaches its maximum of 0.25 when 𝐴𝑖𝑗 0.5, and its minimum of 0 when 𝐴𝑖𝑗 0 or 𝐴𝑖𝑗 1. We usethis confusion measure to define the weight associated witheach object as follows:𝑤𝑖′ 𝑛4 confusion(x𝑖 , x𝑗 )𝑛 𝑗 1(3)The normalization factor 𝑛4 is used to guarantee that 𝑤𝑖′ [0, 1]. To avoid a value of 0 for a weight, which can lead toinstability, we add a smoothing term:𝑤𝑖′ 𝑒(4)1 𝑒where 𝑒 is a small positive number (𝑒 0.01 in our experiments). As a result, 𝑤𝑖 (0, 1]. A large 𝑤𝑖 value means thatconfusion(x𝑖 , x𝑗 ) is large for different x𝑗 values. As such itmeasures how hard it is to cluster x𝑖 . The assignment of largeweights to points that are hard to cluster is consistent with theway boosting techniques operate [10], [11].𝑤𝑖 C. AlgorithmsIn this section we introduce three different Weighted-ObjectEnsemble Clustering (WOEC) algorithms. They all make useof the weights associated with objects (computed as explainedabove) to determine the consensus clustering. The overallprocess is shown in Fig. 2. As illustrated in the Figure, theconsensus functions of WOSP and WOHB make use of thefeature vectors, while WOMC does not.1) Weighted-Object Meta Clustering Algorithm (WOMC):The first approach we introduce is a weighted version of theMeta Clustering Algorithm (MCLA) introduced in [2]. Wecall the resulting algorithm Weighted-Object Meta Clustering(WOMC). The key step of meta clustering algorithms is theclustering of clusters. When measuring the similarity betweentwo clusters, MCLA treats all the points present in bothclusters equally. In contrast, the proposed WOMC techniquedistinguishes the contribution of hard-to-cluster and easy-tocluster points. Specifically, a point in the intersection of twoclusters contributes to the clusters’ similarity in a way that isproportional to its weight. The details of the approach follow.We explicit the components of each 𝐶 𝑟 in 𝐶. This gives:𝐶 {𝐶11 , ., 𝐶𝑘11 , 𝐶12 , ., 𝐶𝑘22 , ., 𝐶1𝑅 , ., 𝐶𝑘𝑅𝑅 }, where each𝑉𝑖𝑗(1)𝑅where 𝑉𝑖𝑗 is the number of times objects x𝑖 and x𝑗 co-occur inthe same cluster, and 𝑅 𝐶 is the ensemble size. Obviously,𝐴𝑖𝑗 [0, 1]. We set 𝐴𝑖𝑖 1, 𝑖 1, 2, . . . , 𝑛. 𝐴𝑖𝑗 1 meansthat x𝑖 and x𝑗 are often placed in the same cluster. 𝐴𝑖𝑗 0means that x𝑖 and x𝑗 are often placed in different clusters. Inboth cases, the base clusterings show a high level of agreement.When 𝐴𝑖𝑗 0.5, roughly half of the clusterings group x𝑖 andx𝑗 together, and the other half place them in different clusters.This scenario is the most uncertain one, since the clusteringcomponents show no agreement on how to cluster points x𝑖and x𝑗 . We can capture this trend by mapping the 𝐴𝑖𝑗 valuesthrough a quadratic function: 𝑦(𝑥) 𝑥(1 𝑥), 𝑥 [0, 1]. Thisfunction achieves the peak at 𝑥 0.5, and the minima at 𝑥 0and 𝑥 1, as illustrated in Fig. 11 . Hence, we can measure𝐴𝑖𝑗 1 Other0.15functions satisfying these properties can be used as well.629

:HLJKW& &RQIXVLRQ&:20& 'DWD 6HW&& :263:2 %&5Fig. 2.The process map of Weighted-object Ensemble Clustering (WOEC)Algorithm 1 Weighted-Object Meta Clustering Algorithm(WOMC)Input:Set of base clusterings 𝐶;Graph partitioning algorithm METIS;Number of clusters 𝑘 .Output:The final consensus clustering 𝐶 .1: Assign weights to objects using the one-shot weight assignment.2: Construct the meta-graph 𝐺 (𝑉, 𝐸), where the weightsassociated to the edges are computed using Eq. (5).3: 𝑀 𝑒𝑡𝑎𝐶𝑙𝑢𝑠𝑡𝑒𝑟𝑠 METIS(𝐺, 𝑘 ). //Partition the metagraph into 𝑘 meta-clusters.4: 𝐶 𝐴𝑠𝑠𝑖𝑔𝑛(𝑀 𝑒𝑡𝑎𝐶𝑙𝑢𝑠𝑡𝑒𝑟𝑠) //Assign each object tothe meta-cluster in which it participates most strongly.5: return 𝐶 .component now corresponds to a cluster. The WOMC algorithm groups the clusters in 𝐶. To this end, it proceeds asfollows. It constructs 𝑅 an undirected meta-graph 𝐺 (𝑉, 𝐸)with 𝑉 𝑛𝑐 𝑟 1 𝑘𝑟 vertices, each representing a clusterin 𝐶. The edge 𝐸𝑖𝑗 connecting vertices 𝑉𝑖 and 𝑉𝑗 is assignedthe weight value 𝑆𝑖𝑗 defined as follows: 𝑥 𝐶 𝐶 𝑤𝑘(5)𝑆𝑖𝑗 𝑘 𝑖 𝑗𝑥𝑘 𝐶𝑖 𝐶𝑗 𝑤𝑘Clearly, 𝑆𝑖𝑗 [0, 1]. If 𝑆𝑖𝑗 0, there is no edge between 𝐶𝑖and 𝐶𝑗 in 𝐺. 𝑆𝑖𝑗 1 indicates that 𝐶𝑖 and 𝐶𝑗 contain the samepoints. WOMC partitions the meta-graph 𝐺 into 𝑘 (number ofclusters in the final clustering result) meta-clusters, each representing a group of clusters, by using the similarity measure 𝑆𝑖𝑗and by applying the well-known graph partitioning algorithmMETIS [14]. Similarly to MCLA, WOMC also assigns eachpoint to the meta-cluster in which it participates most strongly.Specifically, a data point may occur in different meta-clusters,and it is associated to a given meta-cluster according to theratio of clusters it belongs to. A point is eventually assignedto the meta-cluster with the highest ratio for that point. Tiesare broken randomly. There might be a situation in which noobject is assigned to a meta-cluster. Thus, the final combinedclustering 𝐶 may contains less than 𝑘 clusters.The similarity measure defined in Eq. (5) computes thesimilarity of two clusters, not only based on the fraction ofshared points, but also on the values of the weights associated 𝐶 𝐶 to the points. In constrast, the binary Jaccard measure 𝐶𝑖𝑖 𝐶𝑗𝑗 computes the similarity as the proportion between the sizeof the intersection and the size of the union of 𝐶𝑖 and 𝐶𝑗 .Suppose 𝐶𝑖 and 𝐶𝑗 have fixed points assigned to them. Thus,their Jaccard similarity is also fixed. Lets see how changingthe weights assigned to the points affects the value of 𝑆𝑖𝑗 .Consider 𝐷 (𝐶𝑖 𝐶𝑗 ) (𝐶𝑖 𝐶𝑗 ), i.e. the set of points notin the intersection. Let the weights assigned to the points in 𝐷be fixed. Then, increasing (decreasing) the weights assignedto the points in (𝐶𝑖 𝐶𝑗 ) will cause an increase (decrease) of𝑆𝑖𝑗 . As such, the more hard-to-cluster points 𝐶𝑖 and 𝐶𝑗 share,the more similar they are. Lets now fix the weights assignedto the points is (𝐶𝑖 𝐶𝑗 ). Increasing (decreasing) the weightsassigned to the points in 𝐷 will cause a decrease (increase) of𝑆𝑖𝑗 . As a consequence, the more hard-to-cluster points 𝐶𝑖 and𝐶𝑗 do not share, the smaller their similarity is. Algorithm 1describes the WOMC algorithm.2) Weighted-Object Similarity Partitioning Algorithm(WOSP): Previous work on clustering [6], [7], [8] suggestedto assign large weights to objects that are hard to be clustered,and iteratively move cluster centers towards areas that containobjects with large weights. Inspired by this work, we proceedsimilarly while leveraging the information provided by theensemble to estimate the weights. The motivation stems fromboosting. The difference is that clustering methods combinedwith a boosting technique assign weights to objects andmove cluster centers iteratively, while we only move centersonce for each base clustering by making use of the originaldata and objects’ weight information. Clustering with aboosting technique aims to generate a better single clusteringresult while we seek to find a better consolidated consensussolution. To this end, for each cluster in the base clusterings𝐶 𝑟 (𝑟 1, 2, ., 𝑅), we compute the corresponding weightedcenter as follows: x𝑖 𝐶 𝑟 𝑤𝑖 x𝑖m𝑟𝑙 𝑙(6)x𝑖 𝐶 𝑟 𝑤𝑖𝑙where 𝑙 1, ., 𝑘𝑟 . The basic idea behind this step is tomove the center of each cluster towards the region consistingof the objects that are hard to cluster. We call it shiftingcenter technique. The experiments presented in Section Vdemonstrate the rationale and the improvement achieved bysuch transformation. The similarity between a point x𝑖 and a630

weighted center m𝑟𝑙 is calculated using the following exponential function: x𝑖 m𝑟𝑙 2𝑑(x𝑖 , m𝑟𝑙 ) exp{ }(7)𝑡The probability of cluster 𝐶𝑙𝑟 , given x𝑖 , can then be definedas:𝑑(x𝑖 , m𝑟𝑙 )𝑃 (𝐶𝑙𝑟 x𝑖 ) 𝑘𝑟(8)𝑟𝑙′ 1 𝑑(x𝑖 , m𝑙′ )3) Weighted-Object Hybrid Bipartite Graph PartitioningAlgorithm (WOHB): WOSP measures pairwise similaritieswhich are solely instance-based, thus ignoring cluster similarities. We now introduce our third technique, WOHB, whichattempts to capture both kinds of similarities. This approachreduces the clustering consensus problem to a bipartite graphpartitioning problem, which partitions both cluster vertices andinstance vertices simultaneously. Thus, it also accounts forsimilarities between clusters. As WOSP, WOHB makes useof the shifting center technique and computes probabilities asin Eq. (8).WOHB constructs an undirected hybrid bipartite graph𝐺 𝑅 (𝑉, 𝐸), where 𝑉 contains 𝑛𝑐 𝑛 vertices (𝑛𝑐 𝑟 1 𝑘𝑟 ). The first 𝑛𝑐 vertices represent all the clusters in𝐶 and the last 𝑛 vertices represent objects. The edge 𝐸𝑖𝑗connecting the vertices 𝑉𝑖 and 𝑉𝑗 is assigned the weight value𝑆𝑖𝑗 defined as follows: if both vertices 𝑖 and 𝑗 are 𝑆𝑗𝑖 0,𝑆𝑖𝑗 (12)clusters or objects 𝑆𝑗𝑖 𝑃 (𝐶𝑗 x𝑖 )The smaller x𝑖 m𝑟𝑙 2 is, the larger 𝑃 (𝐶𝑙𝑟 x𝑖 ) will be.We can now define the vector 𝑃𝑖𝑟 of posterior probabilitiesassociated with x𝑖 :𝑃𝑖𝑟 (𝑃 (𝐶1𝑟 x𝑖 ), 𝑃 (𝐶2𝑟 x𝑖 ), ., 𝑃 (𝐶𝑘𝑟𝑟 x𝑖 ))(9) 𝑘𝑟where 𝑙 1 𝑃 (𝐶𝑙𝑟 x𝑖 ) 1. 𝑃𝑖𝑟 provides a new representationof x𝑖 in a space of relative coordinates with respect to clustercentroids, where each dimension corresponds to one cluster.This new representation embeds information from both theoriginal input data and the clustering ensemble. We use the𝑟cosine similarity to measure the similarity 𝑆𝑖𝑗between x𝑖 andx𝑗 with respect to the base clustering 𝐶 𝑟 :𝑟𝑆𝑖𝑗 𝑃𝑖𝑟 (𝑃𝑗𝑟 )𝑇 𝑃𝑖𝑟 𝑃𝑗𝑟 where 𝑃 (𝐶𝑗 x𝑖 ) is the probability of cluster 𝐶𝑗 given x𝑖 , andcan be computed from [Eq. (8). The] matrix 𝑆 of elements 𝑆𝑖𝑗0 𝐵𝑇, where the 𝑛 𝑛𝑐 matrixcan be written as 𝑆 𝐵0𝐵 is defined as follows: 1𝑃1 𝑃12 . 𝑃1𝑅12𝑅 𝑃𝑃2 . 𝑃1 (13)𝐵 2. . . . 12𝑅𝑃𝑛 𝑃𝑛 . 𝑃𝑛(10)where 𝑇 denotes the transpose of a vector. There are 𝑅similarity matrices 𝑆 𝑟 (𝑟 1, ., 𝑅), each one correspondingto a base clustering. We combine these matrices into one finalsimilarity matrix 𝑆:𝑅𝑆 1 𝑟𝑆𝑅 𝑟 1(11)where 𝑃𝑖𝑟 is a row vector defined in Eq. (9).A graph partitioning algorithm (METIS in our experiments)is then used to partition this graph into 𝑘 hybrid parts (eachcontaining objects and clusters), so that the edge weight-cutis minimized. The partition of the objects provides the finalclustering result. The WOHB algorithm is given in Algorithm3.𝑆 represents the average similarity between x𝑖 and x𝑗 (throughvectors 𝑃𝑖 and 𝑃𝑗 ), across the 𝑅 contributing clusterings.Hence we can construct an undirected graph 𝐺 (𝑉, 𝐸),where 𝑉 𝑛 and each vertex represents an object in 𝒳 .The edge 𝐸𝑖𝑗 connecting vertices 𝑉𝑖 and 𝑉𝑗 is assigned theweight value 𝑆𝑖𝑗 . A graph partitioning algorithm (METIS inour experiments) can be applied to graph 𝐺 to compute thepartitioning of the 𝑛 vertices that minimizes the edge weightcut. This gives the consensus clustering we seek. The WOSPalgorithm is illustrated in Algorithm 2.Algorithm 3 Weighted-Object Hybrid Bipartite Graph Partitioning Algorithm (WOHB)Input:Data set 𝒳 ;Set of base clusterings 𝐶;Graph partitioning algorithm METIS;Number of clusters 𝑘 .Output:The final consensus clustering 𝐶 .1: Assign weights to objects using the one-shot weight assignment.2: Construct the hybrid bipartite graph 𝐺 (𝑉, 𝐸) asdescribed in this subsection.3: 𝐶hybrid METIS(𝐺, 𝑘 ). //Partition the graph into 𝑘 parts.4: return the partition of objects in 𝐶hybrid as 𝐶 .Algorithm 2 Weighted-Object Similarity Partitioning Algorithm (WOSP)Input:Data set 𝒳 ;Set of base clusterings 𝐶;Graph partitioning algorithm METIS;Number of clusters 𝑘 .Output:The final consensus clustering 𝐶 .1: Assign weights to objects using the one-shot weight assignment.2: Construct the corresponding graph 𝐺 (𝑉, 𝐸), withweights computed through Eqs. (6)-(11).3: 𝐶 METIS(𝐺, 𝑘 ). //Partition the graph into 𝑘 parts,each representing a final cluster.4: return 𝐶 .D. Weighted-Object 𝑘-meansTo verify the effect of moving the center of each clusteraccording to Eq. (6), we introduce here a modified version of631

B. Evaluation MeasuresThere exist various metrics to evaluate clustering results[24], [25]. Since the labels of the data are known, we usethe Rand Index (RI) [26], the Adjusted Rand Index (ARI)[26], and the Normalized Mutual Information (NMI) [2] as thevalidity indices. Let 𝐶 {𝐶1 , 𝐶2 , ., 𝐶𝑘 } be the consensusclustering result (𝑘 is the number of clusters in the finalconsensus clustering), and let 𝐶 {𝐶 1 , 𝐶 2 , ., 𝐶 𝑘 } be theground-truth clustering (𝑘 is the number of classes).1) RI measures the similarity between 𝐶 and 𝐶 as follows:Algorithm 4 Weighted-Object 𝑘-means (WOKmeans)Input:Data set 𝒳 ;Set of base clusterings 𝐶;Number of clusters 𝑘 .Output:The final clustering 𝐶 .1: Assign weights to objects using the one-shot weight assignment.2: Randomly generate 𝑘 weighted cluster centers c1 , ., c𝑘 ,each corresponding to a cluster 𝐶𝑘 (𝑘 1, ., 𝑘 ).3: repeat4://assign each object to its closest weighted center 𝑖, 𝐶𝑘 𝐶𝑘 x𝑖 , where 𝑘 𝑎𝑟𝑔 min𝑘 x𝑖 c𝑘 .5://generate new weighted centers 𝑤𝑖 x 𝑖c𝑘 x𝑖 𝐶𝑘, 𝑘 1, ., 𝑘 (14)𝑤𝑖x𝑖 𝐶𝑘𝑅𝐼(𝐶 , 𝐶) until All weighted centers do not change or reach themaximum of iteration7: return 𝐶 {𝐶1 , 𝐶2 , ., 𝐶𝑘 }. 𝑘-means that makes use of such weighted means. The resultingtechnique is called Weighted-Object 𝑘-means (WOKmeans),and it’s illustrated in Algorithm 4. WOKmeans computes theweighted centers after assigning objects to clusters in eachiteration, as shown in Eq. (14). The main idea is to movecluster centers towards regions with larger weights iteratively.Compared to the weighted version of 𝑘-means in [9], WOKmeans computes all objects’ weights in advance using the coassociation matrix provided by the base clusterings, while [9]assigns weights to objects iteratively. ( ) ( )where 𝑆 𝑖 𝑛2𝑖 , 𝑆 𝑗 𝑛2𝑗 , and 𝑛 𝑖 and 𝑛𝑗 denotethe number of objects in cluster 𝐶𝑖 and 𝐶 𝑗 (𝑖 1, ., 𝑘 ; 𝑗 1, ., 𝑘), respectively. 𝑛𝑖𝑗 𝐶𝑖 𝐶 𝑗 . The ARI values rangefrom -1 to 1.3) NMI [2] treats the clustering labels of 𝐶 and 𝐶 astwo random variables. The mutual information of these tworandom variables is evaluated and then normalized within the[0, 1] range.A larger value of RI (ARI or NMI) means that 𝐶 and 𝐶are more similar.C. Base ClusteringsTo generate diverse clusterings we used 𝑘-means. Randomsampling of the data and feature subset selection techniqueswere applied. Specifically, for the first half of the base clusterings, in each run we randomly selected 70% of the objects andperformed 𝑘-means on this subset of the data. The remainingobjects were assigned to the closest cluster based on theEuclidean distances between the object and the cluster centers.For the second half of the base clusterings, in each run weselected 70% of the features and conducted 𝑘-means in thecorresponding subspace. When using 𝑘-means, we randomlygenerated initial centers in each independent run.DATA SETS USED IN THE ensions214499443657𝑎 is the number of pairs of points that are in the samecluster in both 𝐶 and 𝐶.𝑏 is the number of pairs of points that are in differentclusters in both 𝐶 and 𝐶.𝑐 is the number of pairs of points that are in the samecluster in 𝐶 and in different clusters in 𝐶.𝑑 is the number of pairs of points that are in differentclusters in 𝐶 and in the same cluster in 𝐶.RI ranges from 0 to 1.2) ARI defines the similarity between 𝐶 and 𝐶 as follows:(𝑛) (𝑛𝑖𝑗 ) 𝑖,𝑗2 (𝑆 𝑆)/ 2 (𝑛)(16)𝐴𝑅𝐼(𝐶 , 𝐶) 1 2 (𝑆 𝑆) (𝑆 𝑆)/ 2IV. E MPIRICAL E VALUATIONA. Data SetsWe conducted experiments on one simulated

clustering methods and is robust to parameter settings. Keywords—Ensemble clustering, consensus clustering, graph partition, weighted objects. I. INTRODUCTION Clustering is the key step for many tasks in data mining. It is well known that off-the-shelf clustering methods may discover different patterns in a given set of data. This is