Collaborative Filtering - University Of Pittsburgh

Transcription

Collaborative FilteringPeter Brusilovskywith slides by Danielle Lee and Sue Yeon Syn and

Agendan n n n n n n n ContextConceptsUsesCF vs. CBAlgorithmsPractical IssuesEvaluation MetricsFuture Issues

Types of Recommender Systemsn Collaborative Filtering Recommender Systemq n Content-based Recommender Systemq n Recommendation generated from the content features associated with products and the ratings from a user.Case-based Recommender Systemq n “Word-of-Mouth” phenomenon.A kind of content-based recommendation. Information are represented as case and the system recommends the casesthat are most similar to a user’s preference.Hybrid Recommender Systemq Combination of two or more recommendation techniques togain better performance with fewer of the drawbacks of any individual one (Burke, 2002).Danielle Lee3

Recommendation Procedure1.2.3.4.Understand and model usersCollect candidate items to recommend.Based on your recommendation method,predict target users’ preferences for each candidate item.Sort the candidate items according to theprediction probability and recommend them.Danielle Lee4

Example: Amazon.com

Amazon’s Source of Wisdom

What is Collaborative Filtering?} Traced back to the Information Tapestry project at Xerox PARCIt allowed its users to annotate the documents that they read andsystem recommends} } } } } Expanded to “automatic” CF in the works of Resnick, Riedl, MaesMore general definition as ‘the process of filtering or evaluatingitems using the opinions of other people.’CF recommends items which are likely interesting to a target userbased on the evaluation averaging the opinions of people withsimilar tastesKey idea: people who agreed with me in the past, will also agreein the future.} On the other hand, the assumption of Content-based recommendation isthat Items with similar objective features will be rated similarly.Danielle Lee7

AlgorithmsUser-based ed FilteringBayesian-network modelsProbabilisticAlgorithmsEM algorithm

Conceptsn Collaborative Filteringq n Userq n The goal of collaborative filtering is to predict how wella user will like an item that he has not rated given a set of historical preference judgments for acommunity of users.Any individual who provides ratings to a systemItemsq Anything for which a human can provide a rating

User-based CFThe input for the CF prediction algorithms is a matrix of users’ ratings on items,referred as the ratings matrix.Target UserItem 1Item 2Item 3Item 5412/4User41552113/4Danielle LeeItem 5 Average10

User-based CF borative Filtering Recommender System, Danielle LeeItem411

User-Based NN Recommendation1.Selectlike- oreandpredictthera hrough1 4ona melybasis.12

User-based NN: User Similarityn Pearson’s Correlation Coefficient for User aand User b for all rated Products, P.sim(a, b) p product ( P )p product ( P )(ra , p ra )(rb , p rb )(ra , p ra )2 p product ( P )(rb, p rb )2Average rating of user bn Pearson correlation takes values from 1 (Perfectly positive correlation) to -1(Perfectly negative correlation) .Danielle Lee13

User-based NN: Rating Predictionpred (a, p ) rasim(a,b) (rb , p rb )b neighbors ( n ) sim(a,b)b neighbors ( n ) Danielle Lee14

One Typical CF recommendation15

One Typical CF recommendation16

Benefits of Collaborative Filteringn n n Collaborative filtering systems work by people insystem, and it is expected that people to bebetter at evaluating information than a computedfunctionCF doesn’t require content analysis & extractionIndependent of any machine-readable representation of the objects being recommended.q n Works well for complex objects (or multimedia) suchas music, pictures and moviesMore diverse and serendipitous recommendationDanielle Lee17

CF vs. CBCFCBCompareUsers interestItem infoSimilaritySet of usersUser profileItem infoText documentNeeds other users’feedback - cold startShortcoming CoverageUnusual interestFeature mattersOver-specializeEliciting userfeedback

Uses for CF : Domainsn n n n n n n n n Many itemsMany ratingsMany more users than items recommendedUsers rate multiple itemsFor each user of the community, there are otherusers with common needs or tastesItem evaluation requires personal tasteItems persistsTaste persistsItems are homogenous

More on User-Based NNn n n n AdjustedCosinesimilarity,Spearman’srankcorrela- Necessitytoreducetherela l.,1999).Calcula ntensivecalcula onsCollaborative Filtering Recommender System, Danielle Lee20

Item-based NN 331543.2User4155212.8Danielle Lee21

Item-based Nearest NeighborItem 1Item 2Item 3Item 4Item 8Danielle Lee22

Item-Based NN Recommendationn Generate predictions based on similaritiesbetween itemsq n Usually a cosine similarity usedPrediction for a user u and item i is composedof a weighted sum of the user u’s ratings foritems most similar to i. pred (u , i ) j ratedItems ( u )sim(i, j ) ruij ratedItems ( u )sim(i, j )

Item-based Nearest Neighbor} } } More computationally efficient than user-based nearest neighbors.Compared with user-based approach that is affectedby the small change of users’ ratings, item-basedapproach is more stable.Recommendation algorithm used by Amazon.com(Linden et al., 2003).Danielle Lee24

Uses for CF : User Tasksn What tasks users may wish to accomplishq q q q q q Help me find new items I might likeAdvise me on a particular itemHelp me find a user (or some users) I might likeHelp our group find something new that we mightlikeDomain-specific tasksHelp me find an item, new or not

Uses for CF : System Tasksn What CF systems supportq Recommend itemsn q q Eg. Amazon.comPredict rating for a given itemConstrained recommendationsn Recommend best items from a set of items

Other Non-Probabilistic CF Algorithmsn Associa onRuleMiningq q q doncommonlyoccurringpaWernsinthera llmostprobablyalsolikeitem5.”Support(X Y) NumberofTransac onscontainingXUYNumberofTransac onsConfident(X Y) NumberofTransac onscontainingXUYNumberofTransac onscontainingXDanielle Lee27

Simple Probabilistic Algorithmsn n Represent probability distributionsGiven a user u and a rated item i, the user assigned the item a rating of r : p(r u, i).E (r u, i ) r p(r u, i )rn Bayesian-network models, Expectationmaximization (EM) algorithm

Dimensionality Reduction Algorithmsn n Map item space to a smaller number ofunderlying “dimensions”Matrix Factorization/Latent Factor models:q q q n n Singular Value Decomposition,Principal Component Analysis,Latent Semantic Analysis, etc.Expensive offline computation and mathematical complexityWill be presented in a separate lectureDanielle Lee29

Dimensionality Reduction Algorithmsn Matrix Factorization got an attention sinceNetflix Prize competition.Danielle Lee30

Practical Issues : Ratingsn Explicit vs. Implicit ratingsq Explicit ratingsn n n q Users rate themselves for an itemMost accurate descriptions of a user’s preferenceChallenging in collecting dataImplicit ratingsn n n Observations of user behaviorCan be collected with little or no cost to userRatings inference may be imprecise.

Practical Issues : Ratingsn Rating Scalesq Scalar ratingsn n q Binary ratingsn q Numerical scales1-5, 1-7, etc.Agree/Disagree, Good/Bad, etc.Unary ratingsn n Good, Purchase, etc.Absence of rating indicates no information

Practical Issues : Cold Startn New userq q q q n New Itemq q n Rate some initial itemsNon-personalized recommendationsDescribe tastesDemographic info.Non-CF : content analysis, metadataRandomly selecting itemsNew Communityq q q Provide rating incentives to subset of communityInitially generate non-CF recommendationStart with other set of ratings from another source outsidecommunity

Evaluation Metricsn Accuracyq Predict accuracyn n n q The ability of a CF system to predict a user’s rating for anitemMean absolute error (MAE)Classic, but now often criticisedRank accuracyn n Precision – percentage of items in a recommendation listthat the user would rate as usefulHalf-life utility – percentage of the maximum utility achieved by the ranked list in question

Evaluation Metricsn Noveltyq n Serendipityq n The ability of a CF system to recommend items that the userwas not already aware of.Users are given recommendations for items that they wouldnot have seen given their existing channels of discovery.Coverageq The percentage of the items known to the CF system forwhich the CF system can generate predictions.

Evaluation Metricsn Learning Rateq n Confidenceq n How quickly the CF system becomes an effectivepredictor of taste as data begins to arrive.Ability to evaluate the likely quality of its predictions.User Satisfactionq By surveying the users or measuring retention anduse statistics

Additional Issues : Interfacesn Social Navigationq q q q Make the behavior of community visibleLeaving “footprints” : read-wear / edit-wearAttempt to mimic more accurately the social process of word-of-mouth recommendationsEpinions.com

Additional Issues : InterfacesEpinions.com (http://www.epinions.com)

Additional Issues : Interfacesn Explanationq q Where, how, from whom the recommendationsare generated.Do not make it too much!n n n Not showing reasoning processGraphs, key itemsReviews

Additional Issues : Privacy & Trustn User profilesq Personalized informationn Distributed architecturen Recommender system may break trust whenmalicious users give ratings that are not representative of their true preferences.

Choose your Peersn PeerChooser (CHI 2008) John O’Donovan and Barry Smyth (UCD)

Benefits of Collaborative Filtering ! Collaborative filtering systems work by people in system, and it is expected that people to be better at evaluating information than a computed function ! CF doesn't require content analysis & extraction ! Independent of any machine-readable represent ation of the objects being recommended.