Survey On Collaborative Filtering, Content-based Filtering And Hybrid .

Transcription

International Journal of Computer Applications (0975 – 8887)Volume 110 – No. 4, January 2015Survey on Collaborative Filtering, Content-basedFiltering and Hybrid Recommendation SystemPoonam B. ThoratR. M. GoudarSunita BarveComputer EngineeringMIT Academy of EngineeringPune IndiaComputer EngineeringMIT Academy of EngineeringPune IndiaComputer EngineeringMIT Academy of EngineeringPune Indiafiltering algorithms try to recommend items based onsimilarity count [27].ABSTRACTRecommender systems or recommendation systems are asubset of information filtering system that used to anticipatethe 'evaluation' or 'preference' that user would feed to an item.In recent years E-commerce applications are widely usingRecommender system. Generally the most popular Ecommerce sites are probably music, news, books, researcharticles, and products. Recommender systems are also availablefor business experts, jokes, restaurants, financial services, lifeinsurance and twitter followers. Recommender systems haveformulated in parallel with the web. Initially Recommendersystems were based on demographic, content-based filteringand collaborative filtering. Currently, these systems areincorporating social information for enhancing a quality ofrecommendation process. For betterment of recommendationprocess in the future, Recommender systems will use personal,implicit and local information from the Internet. This paperprovides an overview of recommender systems that includecollaborative filtering, content-based filtering and hybridapproach of recommender system.1. INTRODUCTIONRecommender systems have become very popular in recentyears and are used in various web applications. RecommenderSystems (RSs) are software tools that are used to providesuggestions to user according to their requirement. Thesuggestions associate with various decision-making processes,such as which items to buy, what music to listen to. “Item” isthe general term used to denote what the system recommendsto users. A RS normally focuses on a specific type of item, itsdesign, its graphical user interface and the endations are all customized to provide useful andeffective suggestions for that specific type of item. Due to theincreasing importance of recommendation, it has become anautonomous research field since the mid 1990s [1]. Broadlyspeaking, a RS suggests to a user those items that might be ofusers interest. Former work [10] distinguishes recommendationtechniques into following four classes. collaborative-filteringA significant role is play by a Collaborative Filtering(CF) methods in the recommendation process andbecause of that Collaborative filtering is mostextensively used approach to design recommendersystem [1, 2]. In this approach recommendation foreach active user is received by comparing with thepreferences of other users who have rated the productin similar way to the active user [27].Content-Based filteringIn content-Based filtering recommendations dependson users former choices. Item description and aprofile of the user‟s orientation play an importantrole in Content-based filtering. Content-based Demographic FilteringIn demographic filtering recommendations isestablished on a demographic profile of the user.Here recommendation is based on the informationprovided by the user is considered to be similaraccording to demographic parameter such asnationality, age, gender etc [27]. Hybrid filteringThe hybrid filtering is a combination of more thanone filtering approach [27]. The hybrid filteringapproach is introduced to overcome some commonproblem that are associated with above filteringapproaches such as cold start problem,overspecialization problem and sparsity problem.Another motive behind the implementation of hybridfiltering is to improve the accuracy and efficiency ofrecommendation process.Table 1 shows the some popular sites which are currently usingrecommendation system for different purpose [26].Table 1: Popular sites using recommender systemsSiteWhat is recommendedAmazonBooks/other eerBuilderJobs1.1 Major challenges in recommendersystem Data sparsityAs we know that usage of recommender systemincreases very rapidly. So that many commercialrecommender system uses large datasets. Therefore ,the user-item matrix used for filtering could be verylarge and sparse and because of that performance ofrecommendation process may get degrade. The coldstart problem is caused by the data sparsity . Incollaborative filtering method recommendation ofitem is based on past preferences of users, so thatnew users will need to rate enough count of items to31

International Journal of Computer Applications (0975 – 8887)Volume 110 – No. 4, January 2015allow the system to catch their preferences accuratelyand thus allows for authentic recommendations [26]. ScalabilityTraditional CF algorithms will suffers fromscalability problems as the numbers of users anditems increases. For example, consider a ten millionsof customers O(M) and millions of items O(N), withthat the complexity of algorithm is „n‟ which isalready too large. As recommender system play animportant role in E-commerce application wheresystem must respond to the user requirementimmediately and irrespective of users ratings historyand purchases system must make recommendations,which requires a higher scalability. Twitter is largeweb company to scale the recommendations of theirmillions of users it uses clusters of machines [26].DiversityRecommender system are anticipated to increasediversity because they help us to discover newproducts. Some algorithms, may accidentally do theopposite. Here recommender system recommendpopular and highly rated items which are appreciatedby particular user. This lead to lower accuracy inrecommendation process. To overcome this problemthere is need to develop new hybrid ation process [26].Vulnerability to attacksSecurity is one of major issue in any system whichare deployed on web. Recommender system play animportant role in e-commerce applications andbecause of that recommender systems are probablytargets of harmful attacks trying to promote or inhibitsome items. This is one of major challenge faced bythe developer of recommender system [26].2. COLLABORATIVE FILTERINGCollaborative filtering is most extensively used approach todesign recommender system. Collaborative Filtering (CF)methods play an significant role in the recommendationprocess, although Collaborative filtering is often used alongwith other filtering techniques like content-based, knowledgebased [19]. Basically Collaborative filtering methods areestablished on gathering and examining a large amount ofinformation which based on users demeanor, activities orpreferences and anticipating taste of that particular user byusing their similarity with other users [5,8]. It does not dependon machine decomposable message and thus it is correctlyrecommending composite items and because of that it is a keybenefit of the collaborative filtering approach. In collaborativefiltering recommendation system recommended objects areselected on the basis of past evaluations of a large group ofusers.Table 2 shows the recommendation process in nutshell wherefirst we have to estimate the potential favorable opinion ofCarol about Harry potter, one can use the similarity of her withthose of Joe. Alternatively, one can note that ratings of Titanicand Harry potter follow a same pattern, which show that peoplewho liked the former might also like the latter [26]. Anexample given in Table 2 will give brief idea aboutcollaborative filtering.2.1 Techniques related to memorybased collaborative filteringTo generate prediction Memory-based CF algorithms uses thetotal or a some part of database of the user-item. Here everyuser with similar interests is part of a similar group of people.By identifying the neighbors of a new user or currently activeuser, it can produces a anticipation of preferences on new itemsfor him or her.2.1.1 k nearest neighborsThe most extensively used algorithm for collaborativefiltering is the k Nearest Neighbors (kNN) [1,5,4]. In theGroupLens Usenet article recommender it was first introduced.There are two types of k Nearest Neighbor algorithms:1. User based Nearest Neighbor2. Item based Nearest Neighbor.1. User based Nearest NeighborIn the user to user version, kNN executes the following threetasks to generate recommendations for an active user:(a) Using the selected similarity measure, we produce the set ofk neighbors for the active user a. The k neighbors for a are thenearest k (similar) users to u.(b) Once the set of k users (neighbors) similar to active user ahas been computed, in order to receive the prediction of item ion user a, one of the following aggregation approaches is oftenused: the average, the weighted sum and the adjusted weightedaggregation (deviation-from-mean).(c) To obtain the top-n recommendations, we choose the nitems, which provide most satisfaction to the active useraccording to our predictions.User to user based kNN suffers from scalability problem.JoeBobCarolTitanic5152. Item based Nearest NeighborAs the number of users increases User to user based kNNsuffers from scalability problem. To overcome this drawbacknew method called item to item kNN is introduced by Sarwaret al. [22] and Karypis. The item-based approach investigatesthe set of items rated by target user and calculates theirsimilarity with the target item i and then chooses k most similaritems 𝑖1 , 𝑖2 , 𝑖𝑘 .Theirrepresentingsimilarities 𝑠𝑖1 , 𝑠𝑖2 , 𝑠𝑖𝑘 are also computed at the same time.Formerly the most similar items are discovered, after that bytaking a weighted mean of the target user's ratings on thesesimilar items the prediction is calculated. Similaritycomputation and the prediction generation are two importantfactors which make item-based recommendation morepowerful. For similarity computation basically different typesof similarity measures are used and weighted sum andregression used for prediction computation.The reader1522.1.2 Dimensionality reduction techniquesHarry potter42?Table2: Recommendation process in a nutshellPersonsMoviesTo reduce the problems from high levels of sparsity in RSdatabases, certain studies have used dimensionality reductiontechniques [6]. The reduction methods are based on MatrixFactorization [7,9,8]. Matrix factorization is especially32

International Journal of Computer Applications (0975 – 8887)Volume 110 – No. 4, January 2015adequate for processing large RS databases and providingscalable approaches [10]. The model-based technique LatentSemantic Index (LSI) and the reduction method Singular ValueDecomposition (SVD) are typically combined to achieve highperformance [11,15,17]. SVD methods provide goodanticipation results but are computationally it is veryexpensive. Its distribution relies on static off-line settingswhere it does not alter with time the known preference ofinformation.2.2 Techniques related to model-based collaborative filteringThe basic idea behind model-based recommendation systems isto build a “model” with the help of dataset ratings. In otherwords we can say that it is process of extraction of someinformation from the dataset and use that information as a"model" to make recommendations without having the use ofcomplete dataset every time. This approach is beneficial interms of both speed and scalability. Model based approach alsoimproves prediction accuracy of algorithm.2.2.1 CF algorithms based on MDPInstead of viewing the recommendation process as aanticipation problem views it as a consecutive optimizationproblem and use a Markov decision processes (MDPs) modelfor recommender systems. An MDP is a model for sequentialstochastic determination problems, which is often used inapplications where an agent is influencing its surroundingenvironment through actions. More appropriate model isprovided by Markov decision processes (MDPs) forimplementation of recommender systems. The key advantageof MDP is they consider the long-term effects of eachrecommendation and the arithmetic mean of eachrecommendation. The MDP-based recommender system getsucceed in practice because it employ a strong initial modelwhich is solvable quickly .The MDP has memory less propertyand due to that it does not consume too much memory.active users. Hence very few ratings are given to themost popular items[3].3. CONTENT BASED-FILTERINGContent-based filtering (CBF) tries to recommend items to theactive user based on similarity count which is rated by that userpositively in the past [16,14,5]. For example, if a user likes aweb page with the words „„mobile‟‟, „„pen drive‟‟ and„„RAM‟‟, the CBF will recommend pages related to theelectronics world. Item description and a profile of the user‟sorientation play an important role in Content-based filtering.Content-based filtering algorithms try to recommend itemsbased on similarity count. The best-matching items arerecommended by comparing various candidate items withitems previously rated by the user.The tf–idf representation is most extensively used algorithm(also called vector space representation). For creation of userprofile mostly system concentrates on two types ofinformation: 1. A user's preference model. 2. User's interactionlog with the recommender system. Basically, item profile isused by these methods for (i.e. a set of distinct dimensions andcharacteristics) qualifying the item within the system. Creationof a content-based profile of users is done with help ofweighted vector of item features. Importance of each feature tothe user is denoted by the weights. It can be calculated fromindividually rated content vectors using a various proficiencies.Figure 1 shows the CBF mechanism, which includes thefollowing steps:1. Educe the attributes of items forrecommendation.2. Compare the attributes of items with the preferences of theactive user.3. Recommend items according to features that fulfill the user‟sinterests.An MDP can be defined as a four-tuple: 𝑆, 𝐴, 𝑅, 𝑃𝑟 ,where 𝑆 is a set of states, 𝐴 is a set of actions, 𝑅 is a rewardfunction for each state/action pair, and 𝑃𝑟 is the transitionprobability between every pair of states given each action [20].2.3 Advantages of collaborative filtering1.Memory-Based Collaborative filtering techniquesmakes implementation of recommendation systemeasier.2.Using Memory-Based Collaborative filteringtechniques one can add new data easily and inincremental manner.3.Model-Based Collaborative filteringimproves prediction performance.techniques2.4 Limitation of collaborative filteringFigure:1 Content –based filteringWhen the attributes of the items and the user profiles areknown, the key role for CBF is to determine whether a userwill like a specific item. This task is traditionally answered byusing heuristic methods [18] or classification algorithms, suchus: rule induction, nearest neighbors methods, Rocchio‟salgorithm, linear classifiers and probabilistic methods [13] .3.2 Advantages of content-based filtering1.Cold Start: CF systems often require a huge amountof existing data on which user can make exactrecommendations [21].1.Content-based recommender system provide userindependence through exclusive ratings which are used bythe active user to build their own profile.2.Scalability: CF makes recommendations for variousenvironments where billions of users and productsexist. Therefore, a huge amount of sparency to their active user by giving explanationhow recommender system works.3.Content-based recommenders system are adequate torecommend items not yet placed by any user. This will beadvantageous for new user.3.Sparsity: On major e-commerce site the number ofitems sold are enormously large. Because of that onlya small subset of the entire database is rated by most33

International Journal of Computer Applications (0975 – 8887)Volume 110 – No. 4, January 20153.3 Limitation of content-based filtering1.It is a difficult task to generate the attributes for items incertain areas.2.CBF advocate the same types of items because of that itsuffers from an overspecialization problem.3.It is harder to acquire feedback from users in CBF becauseusers do not typically rank the items (as in CF) andtherefore, it is not possible to determine whether therecommendation is correct.increasingly demanding. Finally, despite the company‟s effort,namelessness of its users was not sufficiently assured [25].CBF and CF can be aggregated in different ways [1].followingfigures shows the different choices for aggregating CB andCBF. Figure 2 shows the methods that estimate CBF and CFrecommendations individually and subsequently combine themto yield better recommendations across the board.4. HYBRID RECOMMENDER SYSTEMRecent research has proved that a hybrid approach could bemore effective in some cases. Basically Collaborative filteringand Content-based filtering approaches most extensively usedin information filtering application. As we know that everycoin has two side similarly each approach has its own rewardand weaknesses. Basically the main motive of hybrid approachis to aggregate collaborative filtering and content-basedfiltering to improve recommendation accuracy. Hybridapproaches can be implemented in various ways:1.Implement collaborative and content-based methodsindividually and aggregate their predictions.2.Integrate some content-based characteristics into acollaborative approach,3.Comprise some collaborative characteristics into acontent-based approach, and4.Construct a general consolidative model that integrateboth content-based and collaborative characteristics.Cold start and the sparsity are common problems inrecommender systems which are resolved by using thesemethods. Good example of hybrid recommender systems isNetflix. They make recommendations by comparing thelooking out and exploring habits of similar exploiters(collaborative filtering) as well as by providing movies thatshare features with films that a exploiters has rated highly(content-based filtering).The online DVD rental company Netflix released a data setcontaining approximately 100 million anonymous movieratings in October 2006 and challenged investigators andpractitioners to beat the accuracy of the company‟srecommendation system, Cinematch [23]. Although thereleased data set represented only a small fraction of thecompany‟s rating data, thanks to its size and quality it fastbecame a standard in the data mining and machine learningcommunity. The data set contained ratings in the integer scalefrom 1 to 5 which were accompanied by dates. The year ofrelease were provided for every movie and title. Noinformation about users was given. Submitted predictions wereevaluated by their root mean squared error (RMSE) on aqualifying data set containing over 2817,131 unknown ratings.Total 20,000 are registered teams out of that 2000 teamssubmitted at least one answer set. The grand prize of 1000,000 was awarded to a team on 21 September 2009 thatperformed better over the Cinematch‟s and also increasesaccuracy by 10%. In this competition we learned severallessons [24]. Firstly, the company acquired a superiorrecommendation system that improve users satisfaction andalso company gained lot of publicity. Secondly, ensemblemethods play an important role for improving the accuracy ofpredictions. Thirdly, we discovered that when RMSE dropsbelow a certain level that time accuracy improvements areFigure: 2Figure 3 shows the methods that integrate CBF characteristicsinto the CF approach. So that it will overcome the cold startproblem in collaborative filtering and overspecializationproblem of content-based filtering.Figure: 3Figure 4 illustrates the methods for construction of a unifiedutility system with both CBF and CF characteristics. In thismethod by combining some features of CBF and CF oneunified model is constructed that can improve effectiveness ofrecommendation process.Figure: 4Figure 5 shows the methods that incorporate CF characteristicsinto a CBF approach.Figure: 5Content-based filtering systems can allow recommendationsfor "cold-start" items for which no training information isavailable, but it suffers by lower accuracy than collaborativefiltering systems. Conversely, collaborative filtering approachfrequently provide accurate recommendations, but go wrongfor cold start items. Hybrid schemes try to aggregate thesedifferent kinds of information to get efficient recommendationresult.34

International Journal of Computer Applications (0975 – 8887)Volume 110 – No. 4, January 20154.1 Classification of HybridRecommendation Systems:Burke [12] presented taxonomy for the hybrid recommendationsystems he classifies Hybrid Recommendation Systems intofollowing seven classes: Weighted:Different recommendation components scores arecombined statistically. This class aggregates scoresfrom each factor using additive formula. Switching:From available recommendation components systemchooses particular component and applies the pickedout one. ion that will be introduced together.This class is based on merging and presentation ofmultiple rated list into single rated list.Feature Combination:Contributing and actual recommender are twodifferent recommendation components are exist forthis class. The working of actual recommender isdepends on the data modified by the contributingone. The contributing one throws features of onesource on to the other components source.Feature Augmentation:This class is similar to the feature combinationhybrids but only difference is that the contributorgives novel characteristic. It is more elastic thanfeature combination method.Cascade:This class play an role of tie breaker. Here for everyrecommender assign some priority and according tothat assign priority, lower priority recommendersplay an tie breakers role over higher priority.Meta-level:Their exist contributing and actual recommenders butthe early one completely substitutes the data for thelatter one.5. CONCLUSIONRecommender systems are turning out to be a useful tool thatwill provide suggestion to user according to their requirement.Filtering is used to improved recommendation accuracy in thefirst recommender systems. To achieve this accuracy mostmemory-based methods and algorithms were formulated andoptimized under some circumstance (e.g., kNN metrics,singular value decomposition, etc.). At this stage, to improvedthe quality of the recommendations some hybrid approachesare used (primarily collaborative filtering and content filtering).In the second stage, algorithms that admitted social informationwith former hybrid approaches were accommodated anddeveloped (e.g., trust-aware algorithms, social adaptiveapproaches, social networks analysis, etc.). Currently, thehybrid algorithms are used to integrate location informationinto existing recommendation algorithms. To improve thequality of recommender systems anticipations future researchwill concentrate on progressing the existing methods andalgorithms. Novel lines of research will be formulated forfollowing fields, such as on: (1) The existing recommendationmethods that uses different types of available information willbe combine in good order, (2) For recommender systemsprocesses enable security and privacy, (3) Flexible frameworksare design for machine-controlled analysis of heterogeneousdata.6. REFERENCES[1] Adomavicius, G.; Tuzhilin, A., "Toward the nextgeneration of recommender systems: a survey of the stateof-the-art and possible extensions," Knowledge and DataEngineering, IEEE Transactions on , vol.17, no.6,pp.734,749, June 2005.[2] F. Ricci, L. Rokach, B. Shapira, P.B. (Eds.), “KantorRecommender Systems Handbook”, first ed., 2011, XXX,842 p. 20 illus[3] J. Bobadilla, F. Serradilla, “The effect of sparsity oncollaborative filtering metrics”, in: Australian DatabaseConference, 2009, pp. 9–17.[4] J. Bobadilla, A. Hernando, F. Ortega, J. Bernal, “Aframework for collaborative filtering recommendersystems”, Expert Systems with Applications 38 (12)(2011) 14609–14623.[5] J. B. Schafer, D. Frankowski, J. Herlocker, S.Sen,“Collaborative filtering recommender systems”, in: P.Brusilovsky, A. Kobsa, W. Nejdl (Eds.), The AdaptiveWeb, 2007, pp. 291–324 .[6] B. Sarwar, G. Karypis, J. Konstan, J. Riedl, “Applicationof dimensionality reduction in recommender system – acase study”, in: ACM WebKDD Workshop, 2000b, pp.264–272.[7] Y. Koren, R. Bell, CH. Volinsky, “Matrix factorizationtechniques dor recommender systems”, IEEE Computer42 (8) (2009) 42–49.[8] X. Luo, Y. Xia, Q. Zhu, “Applying the learning rateadaptation to the matrix factorization based collaborativefiltering”, Knowledge Based Systems 37 (2013) 154–164.[9] X. Luo, Y. Xia, Q. Zhu, “Incremental collaborativefiltering recommender based on regularized matrixfactorization”, Knowledge-Based Systems 27 (2012) 271–280.[10] G. Takacs, I. Pilaszy, B. Nemeth, D. Tikk, “Scalablecollaborative filtering approaches for large recommendersystems”, Journal of Machine Learning Research 10(2009) 623–656.[11] M.G. Vozalis, K.G. Margaritis, “Using SVD anddemographic data for the enhancement of generalizedcollaborative filtering”, Information Sciences 177 (2007)3017–3037.[12] R. Burke, “Hybrid recommender systems: survey andexperiments”, User Modeling and User-AdaptedInteraction 12 (4) (2002) 331–370.[13] M. Gemmis, P. Lops, G. Semeraro, P. Basile, “Integratingtags in a semantic content-based recommender”, in:Proceedings of the 2008 ACM conference onRecommender Systems, 2008, pp. 163–170.[14] M. Pazzani, “A framework for collaborative, contentbased, and demographic filtering”, Artificial IntelligenceReview-Special Issue on Data Mining on the Internet 13(5-6) (1999) 393–408.[15] S. Zhang, W. Wang, J. Ford, F. Makedon, “Using singularvalue decomposition approximation for collaborative35

International Journal of Computer Applications (0975 – 8887)Volume 110 – No. 4, January 2015filtering”, in: IEEE International Conference on ECommerce Technology, 2005, pp. 1–8.[16] ive recommendation”, Communications of theACM 40 (3) (1997) 66–72.[17] F. Cacheda, V. Carneiro, D. Fernandez, V. Formoso,“Comparison of collaborative filtering algorithms:limitations of current techniques and proposals forscalable, high-performance recommender Systems”, ACMTransactions on the Web 5 (1) (2011). Article 2.[18] C. Basu, H. Hirsh, W. Cohen, “Recommendation asclassification: using social and content-based informationin recommendation”, in: Proceedings of the FifteenthNational Conference on Artificial Intelligence, 1998, pp.714–720.[19] Lops, Pasquale, Marco De Gemmis, and GiovanniSemeraro. "Content-based recommender systems: State ofthe art and trends." Recommender systems handbook.Springer US, 2011. 73-105.[20] Sutton, R.S., Barto, A.G., 1998. “ReinforcementLearning: An Introduction”. MIT Press, Cambridge, MA.IJCATM : www.ijcaonline.org[21] K. Heung-Nam, E.S. Abdulmotaleb, J. Geun-Sik,“Collaborative error-reflected models for cold-startrecommender systems”, Decision Support Systems 51 (3)(2011) 519–531.[22] B. M. Sarwar, G. Karypis, J. A. Konstan, and J. Reidl,“Item-based collaborative filtering recommendationalgorithms,” in ACM www ‟01, pp. 285–295, ACM,2001.[23] J. Bennett, S. Lanning, “The Netflix prize, in: Proceedingsof KDD Cup and Workshop”, 2007, pp. 3–6.[24] R.M. Bell, Y. Koren, “Lessons from the Netflix prizechallenge”, ACM SIGKDD Explorations Newsletter 9(2007) 75–79.[25] A. Narayanan, V. Shmatikov, “Robust de-anonymizationof large sparse datasets”, in: IEEE Symposium on Securityand Privacy, 2008, pp. 111–125.[26] Linyuan Lua, Matus Medo, Chi Ho Yeung, Yi-ChengZhang, Zi-Ke Zhang,Tao Zhou, et al. "Recommendersystems" Physics Reports 519.1 (2012): 1-49.[27] Sánchez Sánchez, José Luis. “Improving CollaborativeFiltering Based Recommender Systems Using ParetoDominance”. Diss. E Informatica, 2013.36

Content-Based filtering In content-Based filtering recommendations depends on users former choices. Item description and a profile of the user‟s orientation play an important role in Content-based filtering. Content-based filtering algorithms try to recommend items based on similarity count [27]. Demographic Filtering