A Review On Deep Learning For Recommender Systems . - GitHub Pages

Transcription

Artificial Intelligence Reviewhttps://doi.org/10.1007/s10462-018-9654-yA review on deep learning for recommender systems:challenges and remediesZeynep Batmaz1 · Ali Yurekli1 · Alper Bilge1· Cihan Kaleli1 Springer Nature B.V. 2018AbstractRecommender systems are effective tools of information filtering that are prevalent due toincreasing access to the Internet, personalization trends, and changing habits of computerusers. Although existing recommender systems are successful in producing decent recommendations, they still suffer from challenges such as accuracy, scalability, and cold-start. Inthe last few years, deep learning, the state-of-the-art machine learning technique utilized inmany complex tasks, has been employed in recommender systems to improve the qualityof recommendations. In this study, we provide a comprehensive review of deep learningbased recommendation approaches to enlighten and guide newbie researchers interested inthe subject. We analyze compiled studies within four dimensions which are deep learning models utilized in recommender systems, remedies for the challenges of recommendersystems, awareness and prevalence over recommendation domains, and the purposive properties. We also provide a comprehensive quantitative assessment of publications in the fieldand conclude by discussing gained insights and possible future work on the subject.Keywords Recommender systems · Deep learning · Survey · Accuracy · Scalability ·Sparsity1 IntroductionRecent advances in information technologies and prevalence of online services have providedpeople with the ability to access a massive amount of information quickly. Today, an ordinaryuser can instantly access descriptions, advertisements, comments, and reviews about almostBAlper Bilgeabilge@anadolu.edu.trZeynep Batmazzozdemir@anadolu.edu.trAli Yureklialiyurekli@anadolu.edu.trCihan Kalelickaleli@anadolu.edu.tr1Computer Engineering Department, Anadolu University, 26470 Eskisehir, Turkey123

Z. Batmaz et al.all kinds of products and services. Although accessing information is a valuable ability,people confront a colossal amount of data sources which confuses them to find useful andappropriate content and results in the information overload problem.Recommender systems are information filtering tools that deal with such problem byproviding users with the conceivably exciting content in a personalized manner (Schafer et al.2001). Currently, many online vendors equip their systems with recommendation engines,and most of the Internet users take advantage of such services in their daily activities suchas reading books, listening to music, and shopping. In a typical recommender system, theterm item refers to the product or service of which the system recommends to its users.Producing a list of recommended items for the user or predicting how much the user willlike a particular item requires a recommender system to either analyze past preferences oflike-minded users or benefit from the descriptive information about the items. These twooptions form the two major approaches in recommender systems, i.e., collaborative filtering(CF) and content-based recommendation (Bobadilla et al. 2013), respectively. There are alsohybrid approaches that combine benefits of these two approaches (Burke 2002).In recent years, artificial neural networks have begun to attract significant attention due tothe increasing computational power and big data storage facilities. The researchers successfully build and train deep architectural models (Hinton et al. 2006; Hinton and Salakhutdinov2006; Bengio 2009) which promotes deep learning as an emerging field of computer science.Currently, many state-of-the-art techniques in image processing, object recognition, naturallanguage processing, and speech recognition utilize deep neural networks as a primary tool.Promising capabilities of deep learning techniques also encourage researchers to employ deeparchitectures in recommendation tasks, as well (Salakhutdinov et al. 2007; Gunawardana andMeek 2008; Truyen et al. 2009).In this study, we intensely review applications of deep learning techniques applied inrecommender systems field to enlighten and guide researchers interested in the subject. Wepresent the current literature of the research field and reveal a perspectival synopsis of thesubject in four distinct strategic directions. Contributions of the review can be listed asfollows:(i) We present a systematic classification and detailed analysis of deep learning-basedrecommender systems.(ii) We focus on challenges of recommender systems and categorize existing literaturebased on proposed remedies.(iii) We survey on the domain awareness of existing deep learning-based recommendersystems.(iv) We discuss the state-of-the-art and provide insights by identifying thought-provokingyet under-researched study directions.The remainder of the article is structured as follows: Sect. 2 reviews the literature in thefield briefly, and Sect. 3 provides necessary background information about recommendersystems and major deep learning techniques. Section 4 reveals a perspectival synopsis ofapplied deep learning methodologies within the context of recommender systems. Section 5presents a quantitative assessment of the comprehensive literature and Sect. 6 presents ourinsights and discussions on the subject and propose future research directions. Finally, weconclude the study in Sect. 7.123

A review on deep learning for recommender systems: challenges 2 Related workThe success of deep learning practices has significantly affected research directions in recommender systems, as in many other computer science fields. Initially, Salakhutdinov et al.(2007) presents a way to use a deep hierarchical model for CF on a movie recommendationtask. Since this cornerstone study, there have been several attempts to apply deep models intorecommender systems research. By utilizing the effectiveness of deep learning at extractinghidden features and relationships, the researchers have proposed alternative solutions to recommendation challenges including accuracy, sparsity, and cold-start problem. Sedhain et al.(2015) achieve higher accuracy by predicting missing ratings of a user-item matrix with thehelp of autoencoders, and Devooght and Bersini (2017) utilize neural networks to improveshort-term prediction accuracy by converting CF into a sequence prediction problem. Wanget al. (2015b) propose a deep model using CF in order to deal with sparsity issue by learninggreat representations. Furthermore, deep models have been utilized to deal with scalabilityconcern as these models are quite useful in dimensionality reduction and feature extraction.Elkahky et al. (2015) propose a solution for scalability by using deep neural networks toobtain low dimensional features from high dimensional ones, and Louppe (2010) utilizesdeep learning for dimensionality reduction to deal with large datasets.The current popularity of deep architectures brings the need to review and analyze existingstudies about deep learning in recommender systems research. A comprehensive analysis mayhelp and guide the researchers who are willing to work on the field. Despite this urgent need, tothe extent of our knowledge, only four studies are surveying the subject. Zheng (2016) surveysand critiques the state-of-the-art deep recommendation systems. However, this survey studycontains an insufficient number of publications, which results in a very limited perspectiveover the whole concept. Betru et al. (2017) explain traditional recommender systems anddeep learning approaches. This survey is also inadequate regarding its scope since it onlyanalyzes three publications. Liu and Wu (2017) analyze deep learning-based recommendationapproaches and come up with a classification framework which categorizes the procedures byinput and output aspects. The authors explain the research in limited directions. However, ourproposed work provides guidelines to understand more precisely the usage of deep learningbased techniques in recommender systems.Recently, Zhang et al. (2017a) have published a comprehensive survey on deep learningbased recommender systems. Although the number of reviewed papers in (Zhang et al. 2017a)and this study is very close, the classification approaches demonstrate specific differences.While Zhang et al. (2017a) focus only on a structural classification of publications andpropose a two-aspect scheme (neural network models and integration models), we provide afour-dimensional categorization (neural network models, offered remedies, applied domains,and purposive properties). Furthermore, instead of diving into implementation details whenexamining the publications, we prefer to constitute a general understanding of the subjectand lead the way for researchers willing to work on deep learning for recommender systems.Our work allows scholars interested in this topic to understand main effects of utilizing deeplearning techniques in recommender systems. This review study focuses on understanding themotivation of using each deep learning-based method in recommender systems. Moreover,it aims to project insights on provided deep learning-based solutions to current challenges ofrecommender systems.123

Z. Batmaz et al.3 BackgroundWhile a recommender system can be defined as a particular type of information filteringsystem, deep learning is a growing trend in machine learning. Before examining how these twofields come together, it is necessary to go over the basics of both subjects. In this backgroundsection, we briefly describe the fundamentals, major types, and the primary challenges ofrecommender systems. Then, we introduce deep learning concept by explaining the factorsthat promote it as an emerging field of computer science. Finally, we illustrate the deeplearning models that have been widely applied in machine learning.3.1 Recommender systemsWhile the widespread use of the Internet and increasing data storage capabilities make iteasy to access large volumes of data, it becomes harder to find relevant, engaging, and usefulcontent for daily computer users due to information overload.Over the last few decades, there has been a significant amount of research on computerapplications that can discover tailored appropriate content. Recommender systems are one ofthose applications that can filter information in a personalized manner (Schafer et al. 2001).Recommender systems produce suggestions and recommendations to assist their usersin many decision-making processes. With the help of the recommender systems, users aremore likely to access appropriate products and services such as movies, books, music, food,hotels, and restaurants.In a typical recommender system, the recommendation problem is twofold, i.e., (i) estimating a prediction for an individual item or (ii) ranking items by prediction (Sarwar et al.2001). While the former process is triggered by the user and focuses on precisely predictinghow much the user will like the item in question, the latter process is provided by the recommendation engine itself and offers an ordered top-N list of items that the user might like.Based on the recommendation approach, the recommender systems are classified into threemajor categories (Adomavicius and Tuzhilin 2005):CF recommender systems produce recommendations to its users based on inclinationsof other users with similar tastes.Content-based recommender systems generate recommendations based on similarities ofnew items to those that the user liked in the past by exploiting the descriptive characteristics of items.Hybrid recommender systems utilize multiple approaches together, and they overcomedisadvantages of certain approaches by exploiting compensations of the other.Beside these common recommender systems, there are some specific recommendationtechniques, as well. Specifically, context-aware recommender systems incorporate contextual information of users into the recommendation process (Verbert et al. 2010), tag-awarerecommender systems integrate product tags to standard CF algorithms (Tso-Sutter et al.2008), trust-based recommender systems take the trust relationship among users into account(Bedi et al. 2007), and group-based recommender systems focus on personalizing recommendations at the group of users level (McCarthy et al. 2006).3.1.1 Collaborative filtering recommender systemsCF is the most prominent approach in recommender systems which makes the assumptionthat people who agree on their tastes in the past would agree in the future, as well (Sarwar123

A review on deep learning for recommender systems: challenges Fig. 1 An overview of the CF processet al. 2001). In such systems, preferences of like-minded neighbor users form the basis of allproduced recommendations rather than individual features of items.The primary actor of a CF system is the active user (a) who seeks for a rating predictionor ranking of items. By utilizing past preferences as an indicator for determining correlationamong users, a CF recommender yield referrals to a relying on tastes of compatible users.Typically, a CF system contains a list of m users U {u 1 , u 2 , . . . , u m } and n itemsP { p1 , p2 , . . . , pn }. The system constructs an m n user-item matrix that contains theuser ratings for items, where each entry ri, j denotes the rating given by user u i for item p j .In need of a referral for the a on the target item q, the CF algorithm either predicts a ratingfor q or recommends a list of most likable top-N items for a. An overview of the general CFprocess is illustrated in Fig. 1.CF algorithms follow two main methodologies approaching the recommendation generation problem:Memory-based algorithms utilize the entire user-item matrix to identify similar entities.After locating the nearest neighbors, past ratings of these entities are employed for recommendation purposes (Breese et al. 1998).Memory-based algorithms can be user-based, item-based, or hybridized. While past preferences of nearest neighbors to a are employed in user-based CF, the ratings of similaritems to q are used in item-based approach (Aggarwal 2016).Model-based algorithms aim to build an offline model by applying machine learning anddata mining techniques. Building and training such model allows estimating predictionsfor online CF tasks. Model-based CF algorithms include Bayesian models, clusteringmodels, decision trees, and singular value decomposition models (Su and Khoshgoftaar2009).3.1.2 Content-based recommender systemsContent-based recommender systems produce recommendations based on the descriptiveattributes of items and the profiles of users (Van Meteren and Van Someren 2000). In contentbased filtering, the main purpose is to recommend items that are similar to those that a userliked in the past. For instance, if a user likes a website that contains keywords such as “stack”,“queue”, and “sorting”, a content-based recommender system would suggest pages relatedwith data structures and algorithms.123

Z. Batmaz et al.Content-based filtering is very efficient when recommending a freshly inserted item intothe system. Although there exists no history of ratings for the new item, the algorithm canbenefit from the descriptive information and recommend it to the relevant users. For instance,a new science fiction movie might be suggested to a user who has previously seen and likedmovies “The Terminator” and “The Matrix”.Although content-based recommender systems are effective at recommending new items,they cannot produce personalized predictions since there is not enough information about theprofile of the user. Furthermore, the recommendations are limited in terms of diversity andnovelty since the algorithms do not leverage the community knowledge from like-mindedusers (Lops et al. 2011).3.1.3 Hybrid recommender systemsBoth CF systems and content-based recommenders have idiosyncratic strengths and weaknesses. Hybrid recommender systems, on the other hand, combine CF and content-basedmethods to avoid limitations of each approach by exploiting the benefits of the other. Atypical hybridization scenario would be employing content-based descriptive information ofa new item without any user rating in a CF recommender system (Tran and Cohen 2000).Various hybridization techniques have been proposed which can be summarized as follows(Burke 2002):Weighted: A single recommendation output is produced by combining scores of differentrecommendation approaches.Switching: Recommendation outputs are selectively produced by either algorithmdepending on the current situation.Mixed: Recommendation outputs of both approaches are shown at the same time.Cascade: Recommendation outputs produced by an approach are refined by the otherapproach.Feature combination: Features from both approaches are combined and utilized in asingle algorithm.Feature augmentation: Recommendation output of an approach is utilized as the inputof the other approach.3.1.4 Challenges of recommender systemsEven though recommender systems provide efficient ways to deal with the information overload problem, they also come up with many different challenges (Su and Khoshgoftaar 2009;Bobadilla et al. 2013). In this section, we briefly describe the major issues in recommendersystems including accuracy, sparsity, cold-start, and scalability.One of the critical requirements of a recommender system is to bring thrilling and relevantitems to the users. The trust level of the users to the system is directly related to the qualityof recommendations. If the users are not provided with favorable products and services, therecommendation engine might be considered inadequate regarding customer satisfaction,which makes it evident for users to look for alternative systems. Therefore, a recommendersystem must satisfy an appropriate level of prediction accuracy to improve preferability andeffectiveness. Accuracy as being the most discussed challenge of recommender systems iscommonly investigated in three means as the accuracy of rating predictions, usage predictions,and ranking of items (Shani and Gunawardana 2011).123

A review on deep learning for recommender systems: challenges CF systems rely on the rating history of the items given by the users of the system. Sparsityappears as a major problem especially for CF since the users only rate a small fraction of theavailable items, which makes it challenging to generate predictions (Su and Khoshgoftaar2009; Bobadilla et al. 2013). When working on a sparse dataset, a CF algorithm may failto take advantage of beneficial relationships among users and items. Data sparsity leads toanother severe challenge referred to as the cold-start problem. Producing predictions fora new user having very few ratings is not possible due to insufficient data to profile them.Likewise, presenting recently added items as recommendations to users is also not achievabledue to the lack of ratings for those items. However, unlike CF techniques, newly added usersand items can be managed in content-based recommender systems by utilizing their contentinformation.Most of the recommender systems are deployed in a responsive environment. In a typicalrecommendation scenario, a user is provided with a set of recommended items accordingto her preferences while she is navigating through a web page. In order to carry out suchscenario efficiently, the recommendations should be provided in a reasonable amount oftime, which requires a highly-scalable system (Linden et al. 2003). With the growth of thenumber of users and/or items in the system, many algorithms tend to slow down or requiremore computational resources (Shani and Gunawardana 2011). Thus, scalability turns into asignificant challenge that should be managed efficiently.3.2 Deep learningDeep learning is a field of machine learning that is based on learning several layers ofrepresentations, typically by using artificial neural networks. Through the layer hierarchy ofa deep learning model, the higher-level concepts are defined from the lower-level concepts(Deng and Yu 2014).Since Hinton et al. (2006) introduced an efficient way of training deep models and Bengio(2009) showed the capabilities of deep architectures in complicated artificial intelligencetasks, deep learning has become an emerging topic in computer science. Currently, deeplearning approaches produce the state-of-the-art solutions to many problems in computervision, natural language processing, and speech recognition (Deng and Yu 2014).Although neural networks and the science behind deep models have been around for morethan 50 years, the power of deep learning techniques has started revealing during the lastdecade. The main factors that promote deep learning as the state-of-the-art machine learningtechnique can be listed as follows:Big data: A deep learning model learns better representations as it is provided with moreamount of data.Computational power: Graphical processing units (GPU) meet the processing powerrequired for complex computations in deep learning models.Throughout this section, we briefly introduce the deep learning models that have beenwidely utilized in recommendation tasks.3.2.1 Restricted Boltzmann machinesA restricted Boltzmann machine (RBM) is a particular type of a Boltzmann machine, whichhas two layers of units. As illustrated in Fig. 2, the first layer consists of visible units, and thesecond layer includes hidden units. In this restricted architecture, there are no connectionsbetween the units in a layer (Salakhutdinov and Hinton 2009).123

Z. Batmaz et al.Fig. 2 A restricted BoltzmannmachineThe visible units in the model correspond to the components of observation, and the hiddenunits represent the dependencies between the components of the observations. For instance,in case of the famous handwritten digit recognition problem (Cireşan et al. 2010), a visibleunit becomes a pixel of a digital image, and a hidden unit represents a dependency betweenpixels in the image.3.2.2 Deep belief networksA deep belief network (DBN) is a multi-layer learning architecture that uses a stack of RBMsto extract a deep hierarchical representation of the training data. In such design, the hiddenlayer of each sub-network serves as the visible layer for the upcoming sub-network (Hinton2009).When learning through a DBN, firstly the RBM in the bottom layer is trained by inputtingthe original data into the visible units. Then, the parameters are fixed up, and the hidden unitsof the RBM are used as the input into the RBM in the second layer. The learning processcontinues until reaching the top of the stacked sub-networks, and finally, a suitable model isobtained to extract features from the input. Since the learning process is unsupervised, it iscommon to add a new network of supervised learning to the end of the DBN to use it in asupervised learning task such as classification or regression.3.2.3 AutoencodersAn autoencoder is a type of feedforward neural network, which is trained to encode the inputinto some representation, such that the input can be reconstructed from such representation(Hinton and Salakhutdinov 2006). Typically, an autoencoder consists of three layers, namely,the input layer, the hidden layer, and the output layer. The number of neurons in the inputlayer is equal to the number of neurons in the output layer.An autoencoder reconstructs the input layer at the output layer by using the representationobtained in the hidden layer. During the learning process, the network uses two mappings,which are referred to as encoder and decoder. While the encoder maps the data from the inputlayer to the hidden layer, the decoder maps the encoded data from the hidden layer to theoutput layer. An illustration of an autoencoder is given in Fig. 3.Reconstruction strategy in autoencoders may fail to extract useful features. The resultingmodel may result in uninteresting solutions, or it may provide a direct copy of the originalinput. In order to avoid such kind of problems, a denoising factor is used on the original data.A denoising autoencoder (DAE) is a variant of an autoencoder that is trained to reconstructthe original input from the corrupted form. Denoising factor makes autoencoders more stableand robust since they can deal with data corruptions (Vincent et al. 2010).123

A review on deep learning for recommender systems: challenges Fig. 3 An autoencoderSimilar to the way in combining RBMs to build deep belief networks, the autoencoderscan be stacked to create deep architectures. A stacked denoising autoencoder (SDAE) iscomposed of multiple DAEs one on top of each other.3.2.4 Recurrent neural networksA recurrent neural network (RNN) is a class of artificial neural networks that make use ofsequential information (Donahue et al. 2015). An RNN is specialized to process a sequenceof values x (0) , x (1) , . . . , x (t) . The same task is performed on every element of a sequence,while the output depends on the previous computations. In other words, RNNs have internalmemory that captures information about previous calculations.Despite the fact that RNNs are designed to deal with long-term dependencies, vanillaRNNs tend to suffer from vanishing or exploding gradient (Hochreiter and Schmidhuber1997). When backpropagation trains the network through time, the gradient is passed backthrough many time steps, and it tends to vanish or explode. The popular solutions to thisproblem are Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) architectures.3.2.5 Convolutional neural networksA convolutional neural network (CNN) is a type of feed-forward neural network whichapplies convolution operation in place of general matrix multiplication in at least one of itslayers. CNNs have been successfully applied in many difficult tasks like image and objectrecognition, audio processing, and self-driving cars.A typical CNN consists of three components that transform the input volume into anoutput volume, namely, convolutional layers, pooling layers, and fully connected layers.These layers are stacked to form convolutional network architectures as illustrated in Fig. 4.In a typical image classification task using a CNN, the layers of the network carry outfollowing operations.1. Convolution: As being the core operation, convolutions aim to extract features from theinput. Feature maps are obtained by applying convolution filters with a set of mathematicaloperations.2. Nonlinearity: In order to introduce nonlinearities into the model, an additional operation,usually ReLU (Rectified Linear Unit), is used after every convolution operation.123

Z. Batmaz et al.Fig. 4 A convolutional neural network3. Pooling (Subsampling): Pooling reduces the dimensionality of the feature maps todecrease processing time.4. Classification: The output from the convolutional and pooling layers represents highlevel features of the input. These features can be used within the fully connected layersfor classification (Zhang and Wallace 2015).4 Perspectival synopsis of deep learning within recommender systemsThe application of deep learning techniques to the recommendation domain is a favoriteand trending topic. Deep learning is beneficial in analyzing data from multiple sources anddiscovering hidden features. Since deep learning techniques’ ability of data processing ison the rise due to the advances in big data facilities and supercomputers, the researchershave already started to benefit from deep learning techniques in recommender systems.They have utilized deep learning techniques to produce practical solutions to the challengesof recommender systems such as scalability and sparsity. Moreover, they have used deeplearning for producing recommendations, dimensionality reduction, feature extraction fromdifferent data sources and integrating them into the recommendation systems. Deep learningtechniques are utilized in recommender systems to model either user-item preference matrixor content/side information, and sometimes both of them. Table 1 demonstrates publicationswhich use deep learning techniques in data modeling practices.4.1 Deep learning techniques for recommendationIn this section, we analyze how and for what purpose deep learning methods are utilized inrecommender systems. The techniques that are described throughout this section are RBMs,DBNs, autoencoders, RNNs, and CNNs. Furthermore, some less conventional methods arealso analyzed under other techniques subsection.4.1.1 Restricted Boltzmann machines for recommendationRBMs are particular types of Boltzmann machines, and they have two types of layers, whichare visible softmax layer and hidden layer. In an RBM, there is no intra-layer communication.RBMs are used to extract latent features of user preferences or item ratings in recommendationdomain (Salakhutdinov et al. 2007; Deng et al. 2017). RBMs are also utilized for jointlymodeling of both correlations between a user’s voted items and the correlation between theusers who voted a particular item to improve the accuracy of a recommendation system(Georgiev and Nakov 2013). RBMs are also used in group-based recommender systems to123

A review on deep learning for recommender systems: challenges Table 1 Purpose of using deep learning in publications regarding data modelingUser-item preference matrixContent/side informationBothSalakhutdinov et al. (2007)Oord et al. (2013)Wang et al. (2015b)Truyen et al. (2009)Wang and Wang (2014)Li et al. (2015)Georgiev and Nakov (2013)Elkahky et al. (2015)Jia et al. (2015)Sedhain et al. (2015)Shin et al. (2015)Lei et al. (2016)Strub and Mary (2015)Zhou et al. (2016)Covington et al. (2016)Du et al. (2016)Shen et al. (2016)Strub et al. (2016)Zheng et al. (2016b)Zhang et al. (2016b)Gunawardana and Meek (2008)Wu et al. (2016b)Vuurens et al. (2016)Devooght and Bersini (2017)Baalen (2016)Zanotti et al. (2016)Dai et al. (2017)Deng et al. (2017)Unger et al. (2016)Ying et al. (2016)Zuo et al. (2016)We

systems, awareness and prevalence over recommendation domains, and the purposive prop-erties. We also provide a comprehensive quantitative assessment of publications in the field and conclude by discussing gained insights and possible future work on the subject. Keywords Recommender systems · Deep learning · Survey · Accuracy · Scalability ·