Data Mining Lecture 1: Recommender Systems

Transcription

Recommender Systems - IntroductionMaking recommendations: Big Money35% of Amazons income from recommendationsNetflix recommendation engine worth 1 Billion per yearAnd yet, Amazon seems to beable to recommend stuff I like.Data MiningLecture 1: Recommender SystemsWhen you know very littleabout that person,suggesting:Jo GrundyI things to buy,I films to watch,ECS SouthamptonFebruary 25, 2022I books to read,I people to date,.is hard.1 / 34Recommender Systems - Problem Formulation2 / 34Recommender Systems - AlgorithmsThey use one of three different types of algorithm:Content based Filtering: eg IMDB, Rotten tomatoesCreates a profile of features for each item, and a profile for eachuser, with a weighted vector of features for the item.Can use an average value of the rated item vector, or Bayesianclassifiers, ANNs, etc to decide.You have a set of films and a set of users. The users have ratedsome of the films.FilmAlice Bob Carol DaveLove Really414Deadly Weapon145Fast and Cross554Star Battles15How can you predict what should go in the blanks?Collaborative filtering:FaceBook, Amazon.Does not rely on understanding the film, or book, or the user.With a large amount of information on user preferences, predictswhat users will like based on their similarity to other users.eg. Amazon: ”people who buy x also buy y”Hybrid recommender systems: NetflixUses combinations of different approachesWe will examine Collaborative Filtering (CF) in this lecture.3 / 344 / 34

Recommender Systems - Content based approachRecommender Systems - Content based approachCan use a vector of features for each film, eg romance, actionFilmAlice Bob Carol Dave x1 romance x2 actionLove Really41410.10.11Deadly Weapon145Fast and Cross5540.20.90.11Star Fight15Use Linear Regression to find user parameter vector θ where m isthe number of films rated by that usermminθ1 X T(θ Xi y )2m(1)i 1Can also use Bayesian classifiers, MLPs, etc.Each film can be described by the vector X [1 x1the bias term)x2 ] (1 is forProblems?requires hand coded knowledge of filmnot easy to scale upuser may not have rated many filmsLearn 2D parameter vector θ, where θT X gives the number ofstars for each user.θ [0 5 0] for someone who really likes romance films5 / 34Recommender Systems - Collaborative Filtering6 / 34Recommender Systems - Collaborative FilteringCollaborative Filtering example:Alice likes Dr Who, Star Wars and Star TrekBob likes Dr Who and Star TrekA recommender system would correlate the likes, and suggest thatBob might like Star Wars too.Personal preferences can be correlated.Collaborative filtering uses a range of approaches to accomplishthis taskI Neigbourhood based approachI Model based approachTask: Discover patterns in observed behaviour across a communityof usersI Hybrid (Neighbourhood and model) based approachThis lecture will cover the Neighborhood based approachI Purchase historyI Item ratingsI Click countsPredict new preferences based on those patterns7 / 348 / 34

Collaborative FilteringCollaborative Filtering - SparsityMeasure user preferences. Eg. Film recommendationUsers rate films between 0 and 5 .03.05.04.0ThePianist3.03.04.04.53.03.0City ofGod2.53.52.52.03.51.0Sparsity can be taken advantage of to speed up computationsMost libraries that do matrix algebra are based on LAPACK,written in Fortan90Computation is done by calls to the Basic Linear AlgebraSubprograms (BLAS).This is how the Python numpy library does its linear algebra.The data is sparse, there are missing values10 / 349 / 34Collaborative Filtering - SparsityCollaborative Filtering - Sparsity1Compresssed Row Storage (CRS)Matrix specified by three arrays: val, col ind and row ptrval store

This is how the Python numpy library does its linear algebra. 10/34 Collaborative Filtering - Sparsity Compresssed Row Storage (CRS) 1 Matrix speci ed by three arrays: val, col ind and row ptr val stores the non zero values col ind stores column indices of each element in val row ptr stores the index of the elements in val which start a row E.g.