DSpace at VNU: Dealing with the new user cold-start problem in recommender systems: A comparative review

Information Systems ] (]]]]) ]]]–]]] Contents lists available at ScienceDirect Information Systems journal homepage: www.elsevier.com/locate/infosys 11 13 Dealing with the new user cold-start problem in recommender systems: A comparative review 15 Q1 Le Hoang Son n,1 17 VNU University of Science, Vietnam National University, Vietnam 19 a r t i c l e i n f o abstract Article history: Received 15 September 2014 Received in revised form October 2014 Accepted October 2014 Recommended by D Shasha The Recommender System (RS) is an efficient tool for decision makers that assists in the selection of appropriate items according to their preferences and interests This system has been applied to various domains to personalize applications by recommending items such as books, movies, songs, restaurants, news articles and jokes, among others An important issue for the RS that has greatly captured the attention of researchers is the new user cold-start problem, which occurs when there is a new user that has been registered to the system and no prior rating of this user is found in the rating table In this paper, we first present a classification that divides the relevant studies addressing the new user cold-start problem into three major groups and summarize their advantages and disadvantages in a tabular format Next, some typical algorithms of these groups, such as MIPFGWC-CS, NHSM, FARAMS and HU–FCF, are described Finally, these algorithms are implemented and validated on some benchmark RS datasets under various settings of the new user cold start The experimental results indicate that NHSM achieves better accuracy and computational time than the relevant methods & 2014 Elsevier Ltd All rights reserved 21 23 25 27 29 31 Keywords: Collaborative filtering NHSM New user cold start Recommender systems 33 35 63 37 39 41 Contents 43 45 47 49 Introduction Literature review The analysis of existing methods 3.1 MIPFGWC-CS 3.2 NHSM 3.3 FARAMS 3.4 HU–FCF Experiments 4.1 Environment setup 4.2 Results and discussion Conclusions 15 65 67 69 71 73 75 51 77 53 79 81 55 n 57 59 Tel.: ỵ 84 904171284; fax: ỵ 84 438623938 E-mail addresses: sonlh@vnu.edu.vn, chinhson2002@gmail.com Official address: 334 Nguyen Trai, Thanh Xuan, Hanoi, Vietnam 83 85 http://dx.doi.org/10.1016/j.is.2014.10.001 0306-4379/& 2014 Elsevier Ltd All rights reserved 61 Please cite this article as: L.H Son, Dealing with the new user cold-start problem in recommender systems: A comparative review, Information Systems (2014), http://dx.doi.org/10.1016/j.is.2014.10.001i 87 L.H Son / Information Systems ] (]]]]) ]]]–]]] Acknowledgments Appendix A Supporting information References 17 17 17 17 Introduction The growing development of content-based systems that provide a large amount of data, such as videos, images, blogs, multimedia, and wikis, brings great challenges for analysts attempting to extract useful knowledge and capture meaningful events from the massive data Machine learning tools should indeed be oriented to what users intend to and how they want the results to be returned in a given format An efficient tool that assists decision makers to choose appropriate items according to their preferences and interests and that is currently widely used is the Recommender System (RS) Ricci et al [31] defined the RS as a special type of information system that (i) helps to make choices without sufficient personal experience of the alternatives, (ii) suggests products to customers, and (iii) provides consumers with information to help them decide which products to purchase The RS is based on a number of technologies, such as information filtering, classification learning, user modeling and adaptive hypermedia, and it is applied to various domains to personalize applications by recommending items such as books, movies, songs, restaurants, news articles and jokes, among others It has been applied to e-commerce to learn from a customer and recommend products that he or she will find most valuable from among the available products, thus helping the customer find suitable products to purchase Some e-commerce RSs are named as follows [37,22] For example, Amazon.com is the most famous e-commerce RS, structured with an information page for each book while providing details of the text and purchase information Two recommendations are found herein, including books frequently purchased by customers who purchased the selected book and authors whose books are frequently purchased eBay.com is another example that provides the Feedback Profile feature, which allows both buyers and sellers to contribute to the feedback profiles of other customers with whom they have done business The feedback consists of a satisfaction rating and a specific comment about the other customer On Moviefinder.com, customers can locate movies with a similar “mood, theme, genre or cast” through Match Maker or by their previously indicated interests through WePredict We clearly recognize that RSs are becoming important and with increasing influence on various practical applications An important issue for RSs that has greatly captured the attention of researchers is the cold-start problem This problem has two variants: the new user cold-start problem and the new item cold-start problem The new item cold-start problem occurs when there is a new item that has been transferred to the system Because it is a new product, it has no user ratings (or the number of ratings is less than a threshold as defined in some equivalent papers) and is therefore ranked at the bottom of the recommended items list Moreover, this problem can be partially handled by staff 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 65 67 11 63 members of the system providing prior ratings to the new item Thus, the concentration of the cold-start problem is dedicated to the new user cold-start problem when no prior rating could be made due to the privacy and security of the system It is difficult to give the prediction to a specific item for the new user cold-start problem because the basic filtering methods in RSs, such as collaborative filtering and content-based filtering, require the historic rating of this user to calculate the similarities for the determination of the neighborhood For this reason, the new user cold-start problem can negatively affect the recommender performance due to the inability of the system to produce meaningful recommendations [33] Addressing this problem has been the primary focus of various studies in recent years The aim of this paper is to provide a comparative review of those studies that could answer our research question “which (group of) algorithm is the most effective among all?” For this purpose, we first provide a classification that divides the relevant studies into three groups: (i) makes use of additional data sources; (ii) selects the most prominent groups of analogous users; and (iii) enhances the prediction using hybrid methods A table that summarizes the advantages and disadvantages of all groups of methods is presented Second, some typical algorithms of the groups of methods, such as MIPFGWC-CS [46] (the first group), NHSM [20] (the second group), FARAMS [17] and HU–FCF [42] (the third group), are described in detail Finally, these algorithms are implemented and validated on some benchmark RS datasets, such as MovieLens [23] and Jester [12], under various settings of the new user cold start The experimental results could reveal the answer for our research question stated above The remainder of the paper is organized as follows In Section 2, we present a literature review of the relevant studies according to the three aforementioned groups Section elaborates on the four typical methods, namely, MIPFGWC-CS [46], NHSM [20], FARAMS [17] and HU–FCF [42] Section presents the comparative experiments of these algorithms involving benchmark RS datasets Finally, Section draws conclusions and delineates the future research directions 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 101 103 105 107 109 Literature review 111 The beginning of this section starts with an example that clearly demonstrates the new user cold-start problem 113 Example We have a RS that includes three tables: the users' demographic data (Table 1), the movies' information (Table 2) and the rating (Table 3) This type of system is able to predict the user rating of a movie, which is expressed in Table Nonetheless, the new user cold-start problem occurs with a new user, e.g., Kim (User ID: 6) in Table 1, who has no prior rating such that it is difficult to provide a prediction for the first movie, e.g., Titanic (ID: 1) Please cite this article as: L.H Son, Dealing with the new user cold-start problem in recommender systems: A comparative review, Information Systems (2014), http://dx.doi.org/10.1016/j.is.2014.10.001i 115 117 119 121 123 L.H Son / Information Systems ] (]]]]) ]]]–]]] Table Users' demographic data ID Name Age Gender Occupation John David Jenny Marry Tom Kim 23 30 29 20 30 25 Male Male Male Female Male Female Student Doctor Student Engineer Engineer Doctor 11 13 Table Movies' information ID Name Genre Date Sales Titanic Hulk Scallet Romantic Horror Romantic 9/2004 10/2005 6/2009 150 300 200 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 Table Rating data User ID Movie ID Rating 1 2 3 5 3 2 4 3 ? In the following, we briefly summarize the relevant works in regards to the new user cold-start problem At the milestone of 2014, there are various works aiming to handle this problem Those studies could be divided into three categories: (i) makes use of additional data sources, (ii) chooses the most prominent groups of analogous users, and (iii) enhances the prediction using hybrid methods a) The principal idea of the first group is the use of some additional sources, such as the demographic data (a.k.a the users' profile), the users' opinions, and social tags, for a better selection of the neighbors of the new user Vozalis and Margaritis [49] demonstrated a modified version of k-nearest neighborhood by adding a user demographic vector to the user profile and embedding it in the collaborative filtering algorithm for the calculation of similarity Poirier et al [27] proposed a method that exploits blog textual data to reduce the cold-start problem by labeling subjective texts according to their expressed opinions to construct a user– item-rating matrix and establishing recommendations through collaborative filtering Zhang et al [53] presented a recommendation algorithm that makes use of social tags, particularly user-tag-object tripartite graphs, to provide more personalized recommendations when the assigned tags belong to diverse topics Almazro et al [3] introduced a hybrid demographicbased and collaborative filtering approach on the movie domain using demographic data to enhance the recommendation suggestion process Their method classified the genres of movies based on demographic attributes, e.g., user age (child, teenager or adult), student (yes or no), have children (yes or no) and gender (female or male) Preisach et al [28] argued that many user profiles contain untagged resources that could provide valuable information, especially for the cold-start problem, and proposed a purely graph-based semisupervised relational approach that uses untagged posts Said et al [34,36] modified the user similarity calculation method to employ the hybridization of demographic and collaborative approaches A modification to the k-nearest neighborhood that calculates the similarity scores between the target user and other users was introduced Wang et al [50] introduced Credible and co-clustering filterBot for cold-stArt recommendations (COBA), which uses the rating confidence level to reduce the dimensionality of the item–user matrix The items and users were co-clustered, and the ratings within every user cluster were smoothed to overcome data sparsity The recommendations were fused from item and user clusters to predict user preference Zhang et al [52] proposed the Cold-start Recommendations Using Collaborative Filtering (CRUC) scheme, which involves formulation, filtering and prediction steps They assumed that users are tracked by sensors such that each user has their own location, which is currently regarded as the item The item–user matrix was normalized and clustered to identify users who have a significant influence on the recommendation The prediction steps were performed by taking the hybrid between the item-based and user-based filtering methods Chen et al [8] employed additional information, such as the social sub-community and an ontology decision model, to assist the recommendation in the cold-start problem The social sub-community was divided according to the exiting users' history data and the mining relationship between each other An ontology decision model was then constructed on the basis of sub-community and users' static information, which makes recommendations for the new user based on his static ontology information Guo [11] proposed three different approaches from the perspective of preference modeling First, the ratings of trusted neighbors were merged to form a new rating profile for the active users based on which better recommendations can be generated Second, a novel Bayesian similarity measure was introduced by taking both the direction and length of rating vectors into account Third, a new information source called prior ratings, based on virtual product experience in virtual reality environments, was proposed to inherently resolve the concerned problems Chen et al [7] proposed a cold start recommendation method for the new user that integrates a user model with trust and distrust networks to identify trustworthy users, which are then aggregated to provide useful recommendations for new users Demographic 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 101 103 105 107 109 111 113 115 117 119 121 123 61 Please cite this article as: L.H Son, Dealing with the new user cold-start problem in recommender systems: A comparative review, Information Systems (2014), http://dx.doi.org/10.1016/j.is.2014.10.001i 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 L.H Son / Information Systems ] (]]]]) ]]]–]]] data or users' profiles are the most common additional source for solving the cold-start problem Safoury and Salah [33] presented a framework for evaluating the influence of demographic attributes on the user ratings This framework was examined using a movie dataset to evaluate the accuracy and precision of the generated recommendations Nazir and Yadav [24] introduced a profile-based approach that consisted of three main phases: Fetch, Process and Truncate The Fetch phase is concerned with obtaining the required parameters, such as Global Student Profile or Shared Profile (Profile Token), for the Process phase Process is the main engine where the actual recommendations are generated based on inputs from the Fetch phase Truncate is involved with the discarding of the profile-tokens used during the Fetch phase Formoso et al [9] proposed a novel profile-expansion approach that includes three types of techniques, namely, item-global, item-local and user-local, based on the query expansion techniques in information retrieval The experimental evaluation showed that both item-global and user-local offer outstanding improvements in precision Son et al [46] presented a novel filtering method based on fuzzy geographically clustering [45,38–44], the so-called MIPFGWC-CS, that can handle the issues of selected demographic attributes, the similarities between items and missing ratings that existed in relevant demographic-based algorithms Rosli et al [32] designed a new measure by combining similarity values obtained from a movie “Facebook Page” First, the users' similarities were computed according to the rating cast on the Movie Rating System Then, the similarity values obtained from a user's genre interest in “Like” information extracted from “Facebook Pages” were combined Finally, all of the similarity values were integrated to produce a new user's similarity value Lika et al [18] proposed a model that incorporates classification methods with demographic data for the identification of other users with similar behaviors Limitations of the first group: although the additional data sources are necessary, we sometimes not have these types of data for the selection, e.g., in some e-shopping systems when users not record their profiles and associated Facebook/Twitter accounts b The idea of the second group is to improve the methods that determine the analogous users without the aid of additional data sources Ahn [2] addressed the limitations of the existing methods for the new user cold-start problem by primarily focusing on the similarity measures, such as the Pearson coefficient and the cosine measure, and proposed a heuristic similarity measure, i.e., the socalled PIP (Proximity–Impact–Popularity) measure The Proximity factor is based on the arithmetic difference between two ratings, the Impact factor considers how strongly an item is preferred or disliked by buyers, and the Popularity factor provides greater value to a similarity for ratings that are further from the average rating of a co-rated item Lam et al [15] discussed a hybrid model based on the analysis of two probabilistic aspect models using pure collaborative filtering to combine with users' information Sun et al [47] clustered users based on the user–item rating matrix and then utilized the clustering results and users' demographic information to construct a decision tree to achieve the associations between the existing users and the new users The predictions for new users were made by combining the decision tree with the collaborative filtering algorithm Zhou et al [54] presented functional matrix factorization (fMF), a novel coldstart recommendation method that constructs a decision tree with each node being a question fMF enables the recommender to query a user adaptively according to her prior responses and associates latent profiles for each node of the tree to gradually refine the profiles It also consists of an iterative optimization scheme that alternates between decision tree construction and latent profile extraction Qiu et al [29] introduced an itemoriented function and incorporated it with a hybrid algorithm between heat conduction and the probability spreading process so that the proposed algorithm does not require any additional information, such as tag Liu et al [21] noted that the existing recommendation methods lacked a principled model for guiding how to select the most useful ratings and that ratings on the selected representatives are considerably more useful for making recommendations; thus, they proposed a principle approach to identify representative users and items using representative-based matrix factorization Bobadilla et al [5] presented a new similarity measure using optimization based on neural learning, which exceeds the best results obtained with current metrics, and described the mathematical formalization that shows how to obtain the main quality measures of a recommender system using leave-one-out cross validation Said et al [35] performed a set of tests to identify whether the weighting schemes on three common similarity measures using two different movie datasets can be beneficial for the purpose of overcoming problems related to coldstart, as well as profiling users to generate more accurate profiles not based on the most popular items They claimed that the weighting schemes appear to have little effect on datasets with a wide rating scale and high concentration of ratings on popular items Moreover, the cosine measure is very insignificantly affected by any weighting measure and produces identical results regardless of whether weighting is applied Sun et al [48] proposed a novel algorithm that learns to conduct the interview process guided by a decision tree with multiple questions at each split The splits, represented as sparse weight vectors, are learned through an L_1-constrained optimization framework The users are directed to child nodes according to the inner product of their responses and the corresponding weight vector A linear regressor is learned within each node, using all previously obtained answers as inputs to predict item ratings Liu et al [20] presented a new user similarity model – NHSM – that takes into account the global preference of user behaviors in addition to the local context information of user ratings to improve the recommendation performance in the cold-start situation Limitations of the second group: how to choose the optimal number of groups and the splitting criteria is worth considering Please cite this article as: L.H Son, Dealing with the new user cold-start problem in recommender systems: A comparative review, Information Systems (2014), http://dx.doi.org/10.1016/j.is.2014.10.001i 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 101 103 105 107 109 111 113 115 117 119 121 123 L.H Son / Information Systems ] (]]]]) ]]]–]]] 11 13 15 17 19 21 23 25 27 29 31 33 35 37 c After determining the most analogous users to the new one, some authors used hybrid methods for the calculation of similarity and/or the prediction of rating This is the basic idea of the third group Leung et al [16,17] introduced a collaborative filtering framework based on Fuzzy Association Rules and Multiple-level Similarity (FARAMS), which extends existing techniques by using fuzzy association rule mining and takes advantage of product similarities in taxonomies to address data sparseness and non-transitive associations Basiri et al [4] proposed a hybrid recommender system using the optimistic exponential type of an ordered weighted averaging operator to fuse the output of recommender system strategies for the new user cold-start problem Kim et al [13] presented a method for the cold-start problem that includes the prediction of actual ratings, the identification of prediction errors for each user, and the construction of an error-reflected model for the prediction of new users or items Ge and Ge [10] claimed that a lower-rank approximation could remove data noise resulting from unstable user behaviors and thus lead to better recommendation quality; based upon this idea, they proposed Singular Value Decomposition-based Collaborative Filtering Kim et al [14] proposed three hybrid recommenders based on user similarity and two content-boosted recommenders used in conjunction with interaction-based collaborative filtering; they experimentally showed that the best hybrid and content-boosted recommenders improve on the interaction-based collaborative filtering method Quijano-Sánchez et al [30] extended a group recommender system with a case based on previous group recommendation events Carrer-Neto et al [6] presented a hybrid recommender system based on knowledge and social networks Negre et al [25] introduced a process for solving the cold-start problem in cases of data warehouses composed of four steps: patternizing OLAP queries, predicting candidate operations, computing candidate recommendations and ranking these recommendations Xie et al [51] proposed Elver, which employs an iterative matrix completion technology and a nonnegative factorization procedure to work with meager content inklings to recommend and optimize page-interest targeting on Facebook Aharon et al [1] introduced a recommendation algorithm based on Latent Factor analysis called One-pass Factorization of Feature Sets (OFF-Set), which is able to model non-linear interactions between pairs of features and updates its model per each recommendation-reward observation in a pure online fashion Lin et al [19] described a method that considers the nascent information culled from Twitter to provide relevant recommendations in cold-start situations Nilashi et al [26] proposed new recommendation methods using ANFIS and SOM clustering A hybrid user-based fuzzy collaborative filtering method has been proposed by Son [42] Limitations of the third group: irrelevant users are still included in the computation of similarities d Table summarizes the relevant works by groups along with their advantages and disadvantages 43 The analysis of existing methods In this section, we provide more details about the typical algorithms of the groups of methods in Table to address the new user cold-start problem These algorithms are MIPFGWC-CS [46], NHSM [20], FARAMS [17] and HU– FCF [42], and these methods are presented in the subsequent sections 57 59 71 73 75 77 79 81 85 87 89 91 93 The basic concept of the MIPFGWC-CS algorithm [46] is to use fuzzy geographically clustering [45,38–44], particularly MIPFGWC, for the determination of similar users with respect to all attributes in the demographic data Because 97 99 101 103 Table Groups of methods for the new user cold-start problem Group Ideas Typical algorithms Makes use of additional data sources Use some data (users' profile, opinions, social tags) to support the selection of the neighbors of the new user À MIPFGWCCS [46] À Determination of analogous users is more accurate À Additional data are sometimes not available Advantages Disadvantages Chooses the most prominent groups of analogous users Improve the methods determining the analogous users by clustering algorithms, decision trees À NHSM [20] À The similarity degrees between users are enhanced À Additional data are not required À How to choose the optimal number of groups and the splitting criteria is worth considering À Utilizes the results of existing methods for prediction À Controls the final results by parameters À Specification of values of parameters is hard À Irrelevant users are still included in the computation of similarities 105 107 109 111 53 55 69 95 49 51 67 3.1 MIPFGWC-CS 45 47 65 83 39 41 63 Enhances the prediction using hybrid methods Use hybrid methods for the calculation of similarity and/or the prediction of ratings À FARAMS [17] À HU–FCF [42] 113 115 117 119 121 123 61 Please cite this article as: L.H Son, Dealing with the new user cold-start problem in recommender systems: A comparative review, Information Systems (2014), http://dx.doi.org/10.1016/j.is.2014.10.001i L.H Son / Information Systems ] (]]]]) ]]]–]]] 11 13 63 the new user has no prior rating, the demographic data are the only medium to calculate the similarities between users After finding users similar to the new one, MIPFGWC-CS checks whether they rated the considered item or not If ratings are found, then they are considered to be the representative ratings of users Otherwise, a similar item to the considered one is found by the Pearson coefficient, and the rating on the similar item is assumed to be the representative rating Finally, the rating of the new user to the considered item is approximated by the weighted average operator of the representative ratings Fig and Table illustrate the idea in detail As described above, the MIPFGWC-CS algorithm contains several disadvantages, as follows: 65 67 69 71 73 75 77 15 17 19 21 23 25 27 29 31 33 a) Determining the optimal number of clusters for MIPFGWC is required before running the clustering algorithm Although other parameters of MIPFGWC were suggested by Son et al [39], how to determine the optimal number of clusters is still an on-going topic of research The exact number of clusters would lead to more accurate results for finding the similar users to a new user and thus enhance the prediction accuracy b) In Fig 1, finding a similar item to the considered one by the Pearson coefficient could somehow not achieve good results because the Pearson metric has some limitations where there is a poor signal-to-noise ratio and negative spikes In other words, if the relationship between two variables is non-linear, the Pearson coefficient cannot accurately measure the correlation c) The MIPFGWC-CS relies solely on the demographic data (Fig 1) If this type of data is not available, the algorithm cannot be performed 79 81 83 85 87 89 91 93 95 97 35 37 3.2 NHSM 99 49 Liu et al [20] introduced a new similarity metric called NHSM to replace the traditional Pearson coefficient or the cosine similarity measure This heuristic similarity measure is composed of three factors of similarity, which are Proximity, Significance and Singularity Proximity considers the distance between two ratings Significance shows that the ratings are more significant if the two ratings are more distant from the median rating Singularity represents how the two ratings are different from other ratings Furthermore, NHSM integrates the modified Jaccard and the user rating preference in the design The definition of NHSM is stated below 51 simu; vịNHSM ẳ simu; vịJPSS simu; vịURP ; 53 simu; vịURP ẳ 39 41 43 45 47 55 simu; vịJPSS ẳ simu; vịPSS simu; vịJaccard ; 57 simu; vịJaccard ẳ 59 61 ; þ exp À jμu Àμv jjσ u À σ v j simu; vị PSS ẳ jI u \ I v j ; jI u j Â jI v j X À Á Proximity r u;p ; r v;p pAI ð8Þ 101 103 105 107 109 À Á À Á ÂSignif icance r u;p ; r v;p Singularity r u;p ; r v;p ; ð12Þ ð9Þ ð10Þ ð11Þ 111 Fig The MIPFGWC-CS algorithm À Á Proximity r u;p ; r v;p ¼ À À Á Signif icance r u;p ; r v;p ẳ ; ỵexp jr u;p Àr v;p j À ð13Þ 113 115 117 119 ; ỵ exp jr u;p r med jjr v;p Àr med j ð14Þ Please cite this article as: L.H Son, Dealing with the new user cold-start problem in recommender systems: A comparative review, Information Systems (2014), http://dx.doi.org/10.1016/j.is.2014.10.001i 121 123 L.H Son / Information Systems ] (]]]]) ]]]–]]] 11 63 Table The pseudo-code of the MIPFGWC procedure Input 65 Geo-demographic data X The number of elements (clusters) – NðCÞ The dimension of dataset r Threshold ε and other parameters m; η; , i ẳ 1; 3ị, j j ẳ 1; C Þ Geographic parameters α; β; γ; a; b; c; d Output Final membership values u0 and centers V t ỵ 1ị k MIPFGWC 1: Set the number of clusters C, threshold ε and other parameters such as m; η; τ 41, ði ¼ 1; 3ị, j j ẳ 1; C ị as in [39] 2: Initialize centers of clusters V j , j ¼ 1; C at t ¼ 3: 4: 13 Set geographic parameters α; β; γ; a; b; c; d satisfying condition (1) ỵ ỵ ẳ 1: Use the formulas to calculate the membership values, the hesitation level and the typicality values, respectively ukj ¼ !m 2À ; k ¼ 1; N ; j ¼ 1; C ; C P ‖X k À V j iẳ1 hkj ẳ 15 1ỵ 5: 21 23 25 6: 27 7: 29 8: k ¼ 1; N ; 69 ð1Þ ; ηÀ1 a2 ‖X k À V j ‖ γj j ¼ 1; C : ð4Þ Perform geographic modifications through Eqs (5–6) kX C X wkj u0j ỵ Â w Â uj ; u0k ¼ α Â uk ỵ A j ẳ k kj 5ị jẳ1 b < ðpopk Âpopj Þ Âpckj ÂIMdkj kaj a dkj : wkj ẳ : 6ị else ẩ ẫ If u0k is a completely monotone increasing sequence or uk Z u0k for most k ¼ 1; C , then conclude that there is no suitable solution for the given geographic parameters Otherwise, go to Step Calculate the centers of clusters at t ỵ by Eq (7) N P a1 um kj ỵ a2 t kj ỵ a3 hkj X k kẳ1 Vj ¼ ; j ¼ 1; C : N P a1 um ỵ a2 t kj ỵ a3 hkj kj 7ị kẳ1 If the difference jjV t ỵ 1Þ À V ðtÞ jj r ε, then stop the algorithm Otherwise, assign V tị ẳ V t ỵ 1ị and return to Step 37 39 41 43 45 47 49 51 À Á Singularity r u;p ; r v;p ẳ : r ỵ r ỵ exp u;p v;p p ð15Þ where μu and σ u are the mean rating and the standard variance of user u, respectively I u represents the set of ratings of user u The operator Â means the common ratings between two users r u;p is the rating of user u on item p r med is the median value in the rating scale Fig describes the filtering method using the NHSM metric The limitations of the NHSM-based filtering algorithm as follows: a) The algorithm is based solely on the rating data and makes no use of additional data, such as demographic data; thus, it somehow leads inaccurate calculations of the similarity b) The algorithm must assume that the new user has rated some prior rating in the rating data 53 55 3.3 FARAMS 57 Leung et al [17] integrated fuzzy sets theory into association rule mining techniques and applied the proposed work to the collaborative filtering of recommender systems First, the rating data are converted to the transactional database of association rule mining, fuzzified by 59 61 79 81 83 85 87 89 91 93 95 33 35 75 77 3ị k ẳ 1; N ; 71 73 j ¼ 1; C ; ‖X k À V i ‖ 67 ð2Þ ! ; τÀ1 C P ‖X k À V j ‖ t kj ¼ 19 ‖X k À V i ‖ i¼1 17 31 fuzzy memberships of linguistic variables and transformed into the type of transaction ID (TID) – items where each TID is in the form of {Item, linguistic variable}, and each item is a list of users with equivalent fuzzy memberships that opted for the {Item, linguistic variable} Then, an Apriori-like algorithm is used to define candidate item sets and possible rules with the support of MinSupp and MinConf thresholds The difference between this algorithm and the original Apriori algorithm is the use of Fuzzy Support – FC hhA;X i;hB;Y ii and Fuzzy Confidence FC hhA;X i;hB;Y ii between two items A and B equipped by their memberships X and Y respectively (Eqs (16–21)) After defining the fuzzy rules, the predicting score of a recommendable item is calculated and used to provide the final rating of the new user Fig highlights the concept in detail P t i A T Π aj A A t i ẵaj ị FShA;X i ẳ ; 16ị jTj FC hA;X i;hB;Y i ¼ FShA [ B;X [ Y i ; FShA;X i CovhA;X i;hB;Y i CORRhA;X i;hB;Y i ẳ q; Var hA;X i nVar hB;Y i 17ị 97 99 101 103 105 107 109 111 113 115 117 119 ð18Þ Please cite this article as: L.H Son, Dealing with the new user cold-start problem in recommender systems: A comparative review, Information Systems (2014), http://dx.doi.org/10.1016/j.is.2014.10.001i 121 123 L.H Son / Information Systems ] (]]]]) ]]]–]]] 63 65 67 69 71 11 73 13 75 15 77 17 79 19 81 21 83 23 85 25 87 27 89 29 91 31 93 33 95 97 35 Fig The NHSM-based filtering algorithm 37 39 CovhA;X i;hB;Y i ¼ FShA [ B;X [ Y i À FShA;X i nFShB;Y i ; ð19Þ Var hA;X i ¼ FShA;X i2 À FShA;X i ; ð20Þ 41 43 45 47 49 n P FShA;X i2 ¼ ti A T o2 Π aj A A μðt i ẵaj ị jT j 55 57 59 61 101 103 : ð21Þ In Eq (16), A; X represents an 〈Itemset, FuzzySet〉 t i ½aj is the value of aj in the ith record of the transactional database T t i ẵaj ị is the membership value of t i ½aj Eqs (16–18) provide the formulas of Fuzzy Supports, Fuzzy Confidence and Correlation, respectively The limitations of the FARAMS algorithm are as follows: 51 53 99 a) The fuzzification in FARAMS could lead to inaccurate prediction results The FARAMS algorithm was designed for movie applications, e.g., MovieLens, Jester and EachMovie, where the linguistic variables are “Like”, “Dislike” and “Neutral” with pre-defined membership functions When applied to other applications, knowing how to set up the membership functions is a matter of concern Wrong membership values would result in the activities of the algorithm In fact, not all recommender system applications require fuzzy parameters; thus, for the sake of 105 Fig The FARAMS algorithm stability and processing time, the fuzzification step should be reduced b) The limitation of rating data in the NHSM-based filtering algorithm is available c) The FARAMS algorithm could be regarded as an efficient method for calculating the similarity between items 107 109 111 113 115 117 3.4 HU–FCF 119 The basic concept of the HU–FCF method [42] is to integrate the fuzzy similarity degrees between users based on the demographic data, with the hard user-based degrees calculated from the rating histories integrated into the final Please cite this article as: L.H Son, Dealing with the new user cold-start problem in recommender systems: A comparative review, Information Systems (2014), http://dx.doi.org/10.1016/j.is.2014.10.001i 121 123 L.H Son / Information Systems ] (]]]]) ]]]–]]] Table The experimental results using the Hold-out cross validation method Dataset MAE values MovieLens Jester RMSE values MovieLens Jester Computational MovieLens Jester 11 13 a NHSM FARAMS HU–FCF 0.701 – 0.641 0.821a 0.636a 0.899 0.697 0.895 0.818a 1.040a 0.878 1.091 0.903 1.098 71 5.36a 12.56a 31.88 44.94 436.58 362.47 73 67 0.866 – time (s) 1335.49 – 69 75 Smallest value for a given dataset 77 17 19 21 Fig The HU–FCF algorithm 23 27 29 31 33 35 37 39 41 43 45 47 49 51 53 65 MIPFGWC-CS 15 25 63 similarity degrees As such, those degrees would reflect more exactly the correlation between users in terms of the internal (attributes of users) and external information (interactions between users) Each similarity degree (fuzzy/hard) is accompanied by weights automatically calculated according to the numbers of analogous users After the final similarity degrees are calculated, the final rating will be constructed based on the rating values of neighbors of the considered user Depending on the domain of a specific problem, the final rating will be approximated to its nearest value in that domain accompanied by an error threshold, which is normally less than 5% A list of nearest values with equivalent error thresholds is also given as the prediction ratings of a user for an item Fig illustrates the concept in detail The limitations of the HU–FCF algorithm are as follows: a) If the demographic data are not provided in the data list, the HU–FCF algorithm does not work because the rating data have no prior ratings of the new user Thus, the similarities between the new user and others cannot be calculated, and the final rating cannot be found b) Similar to the deficiencies of the MIPFGWC-CS algorithm, the Pearson coefficient cannot accurately measure the correlation Thus, a better similarity metric should be used instead of the Pearson coefficient c) In the branch of demographic data, the GFD matrix is calculated from all users in the system Indeed, irrelevant users may be included in the computation of similarities, thus degrading the performance of the prediction 55 programming language and executed them on a PC with an Intel Pentium CPU 2.66 GHz, GB RAM, and 80 GB HDD Experimental datasets: we use the following benchmark RS datasets MovieLens M [23]: contains 1,000,209 anonymous ratings of approximately 3900 movies provided by 6040 MovieLens users Ratings are discrete values from to Demographic data are provided in the following form: “Gender: Age: Occupation: Zipcode” Jester [12]: contains ratings of 100 jokes from 73,421 users Ratings are real values ranging from À10 to 10 The value “99” corresponds to “null” ¼“not rated” Demographic data are no longer supported for this dataset Generating cold-start users: we adopt the Hold-out and the k-fold cross validation methods, where the users in the testing set are the cold-start users For each coldstart user, we use those algorithms to predict the ratings for items that have been rated, except three rated items selected to be the basis for the calculation of similarity Each trial is measured by the evaluation indices The final results are computed as the average value of those according to users and trials Evaluation indices: we use the Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) for the validation of accuracy X MAE ¼ p À r u;i ; 22ị N u;i u;i RMSE ẳ s Á2 XÀ p Àr u;i : N u;i u;i 79 81 83 85 87 89 91 93 95 97 99 101 103 105 107 109 ð23Þ where pu;i ðr u;i Þ is the predicted (real) rating of user u for item i Experimental objectives: we compare the accuracy and the computational time of the algorithms to determine the most effective algorithm 111 113 115 117 Experiments 119 57 4.1 Environment setup 59 61 Experimental tools: we have implemented MIPFGWC-CS [46], NHSM [20], FARAMS [17] and HU–FCF [42] in the C 4.2 Results and discussion 121 First, we present the comparative results of the algorithms using the Hold-out cross validation method described Please cite this article as: L.H Son, Dealing with the new user cold-start problem in recommender systems: A comparative review, Information Systems (2014), http://dx.doi.org/10.1016/j.is.2014.10.001i 123 10 L.H Son / Information Systems ] (]]]]) ]]]–]]] 63 65 67 69 71 11 73 13 75 15 77 17 79 19 81 21 83 23 Fig The MAE and RMSE values of algorithms on the MovieLens dataset 85 25 87 27 89 29 91 31 93 33 95 35 97 37 99 39 101 41 103 43 105 45 107 47 Fig The MAE and RMSE values of algorithms on the Jester dataset 111 49 51 53 55 57 59 61 109 in Table The experimental results are evaluated by the evaluation indices In this table, the results of MIPFGWC-CS on the Jester dataset are null because this dataset does not support the demographic data, which are essential for the calculation in MIPFGWC-CS The experimental results have clearly shown that the MAE, RMSE values and the computational time of NHSM are mostly smaller than those of the other algorithms To visualize the experimental results, the MAE and RMSE values of the algorithms on the MovieLens and Jester datasets are presented in Fig and 6, respectively It is clear that the MAE and RMSE values of NHSM are the smallest among all of the algorithms For instance, the MAE value of NHSM on the Jester dataset is 0.821, which approximates to 91.3% and 91.7% of those of FARAMS and HU–FCF, respectively Similarly, the RMSE value of NHSM on the Jester dataset is 1.04, which approximates to 95.3% and 94.7% of those of FARAMS and HU–FCF, respectively The only case in which the MAE value of NHSM is larger than those of other algorithms occurs on the MovieLens dataset with MAE values of NHSM, MIPFGWC-CS, FARAMS and HU–FCF being 0.641, 0.701, 0.636 and 0.697, respectively Despite this fact, the MAE value of NHSM is only larger than that of FARAMS and is smaller than those of other algorithms Thus, the experimental results have Please cite this article as: L.H Son, Dealing with the new user cold-start problem in recommender systems: A comparative review, Information Systems (2014), http://dx.doi.org/10.1016/j.is.2014.10.001i 113 115 117 119 121 123 L.H Son / Information Systems ] (]]]]) ]]]–]]] 11 13 15 17 19 21 23 25 27 29 31 shown that NHSM obtains better accuracy than the other algorithms in terms of the MAE and RMSE evaluation indices Nonetheless, the MAE and RMSE values of the algorithms vary according to the datasets Specifically, in the case of the dataset having both the demographic and rating sets, such as MovieLens, the accuracies of algorithms are considerably better than those in the case of the dataset having the rating set only, such as Jester The mean accuracies of all algorithms in terms of MAE and RMSE on the MovieLens dataset are 0.67 and 0.87, respectively Those values on the Jester dataset are 0.87 and 1.08, respectively This result clearly demonstrates that all algorithms would work more efficiently on the data having both the demographic and rating sets than on the data having the rating set only Fig and demonstrate this fact with the maximal value of bars calculated on the xaxis in the case of the MovieLens dataset being smaller than that in the case of Jester The computational time is another advantage of NHSM This algorithm takes only 5.36 and 12.56 s to produce the results on the MovieLens and Jester datasets, respectively, while the other algorithms spend more time than NHSM The computational time of FARAMS on the MovieLens and Jester datasets is 31.88 and 44.94 s, respectively Similarly, the computational time of HU–FCF is 436 and 362 s, respectively The computational time of MIPFGWC-CS on Table The results of k-fold cross validation on the MovieLens dataset Fold 33 35 37 39 41 MIPFGWC-CS 10 Fold 43 45 47 49 51 53 55 57 59 61 MAE 10 Fold 10 a NHSM 0.804 0.758a 0.782 0.679a 0.798 0.659a 0.705 0.687 0.692 0.684 0.708 0.665 0.723 0.632a 0.693 0.615 0.672 0.591a RMSE MIPFGWC-CS NHSM 1.203 1.072a 1.045 0.945a 0.998 0.909a 0.990 0.858a 0.972 0.963 0.806a 0.982 0.882 0.822a 0.856 0.885 0.824 0.801a Computational time (s) MIPFGWC-CS NHSM 963.26 4.3a 1123.2 5.6a 1254.2 6.3a 1321.2 6.2a 1345.4 6.8a 1235.2 7.7a 1543.6 8.0a 1537.7 10.3a 1843.4 10.9a Smallest value for a given dataset FARAMS HU–FCF 0.790 0.744 0.765 0.658a 0.707 0.641a 0.648 0.605a 0.603 0.806 0.793 0.731 0.698 0.672a 0.652 0.641 0.625 0.608 FARAMS 1.138 1.025 0.986 0.972 0.840a 0.880 0.941 0.859 0.845 HU–FCF 1.163 1.032 0.969 0.912 0.893 0.886 0.842 0.843a 0.840 FARAMS 48.3 49.6 51.6 54.3 58.5 60.8 65.2 76.8 80.3 HU–FCF 345.32 356.70 489.63 478.34 498.43 552.18 603.22 643.43 668.98 11 63 Table The results of k-fold cross validation on the Jester dataset Fold MIPFGWC-CS 10 Fold 10 Fold 10 a 65 MAE NHSM a – 0.898 – 0.844a – 0.825a – 0.887 – 0.814a – 0.795 – 0.747a – 0.745 – 0.703a RMSE MIPFGWC-CS NHSM – 1.203a – 1.102a – 1.003a – 0.993a – 0.987a – 1.002 – 0.982 – 0.972 – 0.897a Computational time (s) MIPFGWC-CS NHSM – 12.4a – 13.4a – 15.2a – 15.6a – 17.2a – 18.4a – 18.9a – 18.5a – 18.4a FARAMS HU–FCF 67 0.903 0.878 0.853 0.835a 0.819 0.784a 0.793 0.742a 0.723 1.123 1.102 1.097 1.002 0.992 0.947 0.909 0.832 0.808 69 FARAMS 1.304 1.134 1.145 1.091 0.989 0.963a 0.942a 0.923a 0.912 HU–FCF 1.534 1.432 1.269 1.101 0.994 0.982 0.985 0.934 0.915 77 FARAMS 75.2 76.0 79.4 85.6 88.2 91.8 93.8 96.9 100.3 HU–FCF 304.5 334.2 365.6 398.0 415.6 405.9 425.3 420.6 489.5 71 73 75 79 81 83 85 Smallest value for a given dataset 87 89 91 93 95 97 the MovieLens dataset is 1335 s The computational time reflects the mechanism of an algorithm and its efficiency in terms of processing To this extent, NHSM is the most effective algorithm because it takes little processing time while keeping the best accuracy Through the results in Table and Figs and 6, we have extracted the following remarks of the efficiencies of algorithms with the Hold-out cross validation method The accuracy of NHSM is mostly better than those of the relevant algorithms, such as MIPFGWC-CS, FARAMS and HU–FCF, especially on the RS data, which have both the demographic and rating data, e.g., MovieLens The computational time of NHSM is also better than that of the other algorithms 99 101 103 105 107 109 111 113 Second, we performed another test of the algorithms using the k-fold cross validation method to generate the new user cold-start The experimental results on the MovieLens and the Jester datasets are presented in Table and Each table presents the MAE, RMSE and the computational time of all algorithms according to the number of folds For instance, in the 5-fold setting, the dataset is randomly divided into parts, in which parts are used for the training set and the remaining part is Please cite this article as: L.H Son, Dealing with the new user cold-start problem in recommender systems: A comparative review, Information Systems (2014), http://dx.doi.org/10.1016/j.is.2014.10.001i 115 117 119 121 123 12 L.H Son / Information Systems ] (]]]]) ]]]–]]] 63 65 67 69 71 11 73 13 75 15 77 17 79 19 81 21 83 23 85 25 87 27 89 29 Fig The MAE values of algorithms by the number of folds on the MovieLens dataset 91 31 93 33 95 35 97 37 99 39 101 41 103 43 105 45 107 47 109 49 111 51 113 53 115 55 117 57 119 59 121 61 Fig The RMSE values of algorithms by the number of folds on the MovieLens dataset Please cite this article as: L.H Son, Dealing with the new user cold-start problem in recommender systems: A comparative review, Information Systems (2014), http://dx.doi.org/10.1016/j.is.2014.10.001i 123 L.H Son / Information Systems ] (]]]]) ]]]–]]] 13 63 65 67 69 71 11 73 13 75 15 77 17 79 19 81 21 83 85 23 25 Fig The average MAE values of algorithms by the number of folds on MovieLens 87 27 89 29 91 31 93 33 95 35 97 37 99 39 101 41 103 43 105 45 107 109 47 49 51 53 55 57 59 61 Fig 10 The average RMSE values of algorithms by the number of folds on MovieLens reserved for the testing set All users in the testing set are the new user cold-starts, and all their rated items in the testing set are cleared Applying the experimental algorithms to the dataset, we calculate the MAE, RMSE and the computational time of the algorithms We continue to randomly divide the original dataset into parts and perform similar tasks until exceeding repetitions of division The final MAE, RMSE and the computational time of algorithms are calculated to find the average results of the isolated division times Through this cross validation method, we can assume that the generated new user cold- start is not dependent on the division of the original dataset, such as in the Hold-out method In addition, this makes various settings of the new user cold-start for the validation of all algorithms In Table 7, we illustrate the MAE and RMSE values of algorithms by the number of folds on the MovieLens dataset in Figs and 8, respectively By taking the average MAE (resp RMSE) values of the algorithms by the number of folds, we create the bar charts in Fig (resp Fig 10) These figures demonstrate the average accuracy of algorithms regardless of the cross validation method used to Please cite this article as: L.H Son, Dealing with the new user cold-start problem in recommender systems: A comparative review, Information Systems (2014), http://dx.doi.org/10.1016/j.is.2014.10.001i 111 113 115 117 119 121 123 14 L.H Son / Information Systems ] (]]]]) ]]]–]]] 63 65 67 69 71 11 73 13 75 15 77 17 79 19 81 21 83 23 85 25 Fig 11 The MAE values of algorithms by the number of folds on the Jester dataset 87 27 89 29 91 31 93 33 95 35 97 37 99 39 101 41 103 43 105 45 107 47 109 111 49 Fig 12 The RMSE values of algorithms by the number of folds on the Jester dataset 113 51 53 55 57 59 61 generate the new user cold-start These results clearly show that the MAE value of NHSM is better than those of other algorithms, with the numbers being approximate to 90.7%, 96.9% and 95.8% of those of MIPFGWC, FARAMS and HU–FCF, respectively These numbers in terms of RMSE in Fig 10 are 96%, 97.1% and 98.3%, respectively The descending order of the algorithms in terms of accuracy is NHSM, FARAMS, HU–FCF and MIPFGWC-CS From Figs 11–14, we illustrate the experimental results on the Jester dataset In Fig 11 (resp Fig 12), the MAE (resp RMSE) values of the algorithms by the number of folds on the Jester dataset is depicted From these figures, we calculate the average MAE (resp RMSE) values of the algorithms by the number of folds on Jester in Fig 13 (resp Fig 14) We clearly recognize that the MAE and RMSE values of the NHSM algorithm are the smallest among all of the algorithms According to Fig 13, the average MAE value of NHSM is equal to 99% and 82.4% of those of FARAMS and HU–FCF, respectively Analogously, the average RMSE value Please cite this article as: L.H Son, Dealing with the new user cold-start problem in recommender systems: A comparative review, Information Systems (2014), http://dx.doi.org/10.1016/j.is.2014.10.001i 115 117 119 121 123 L.H Son / Information Systems ] (]]]]) ]]]–]]] 15 63 65 67 69 71 11 73 13 75 15 77 17 79 19 81 21 83 23 Fig 13 The average MAE values of algorithms by the number of folds on Jester 85 25 87 27 89 29 91 31 93 33 95 35 97 37 99 39 101 41 103 43 105 45 107 47 109 49 51 53 55 57 59 Fig 14 The average RMSE values of algorithms by the number of folds on Jester of NHSM in Fig 14 is equal to 97.2% and 90.1% of those of FARAMS and HU–FCF, respectively These results clearly show that NHSM achieves better accuracy than other relevant algorithms, such as FARAMS and HU–FCF, even on the dataset that has only the rating set, e.g., Jester Finally, we present the average computational time of algorithms on both datasets in Fig 15 The results have demonstrated that the processing time of NHSM is still the smallest among all of the algorithms Through the results in Tables and and Figs 7–15, we have extracted the following remarks regarding the efficiencies of algorithms with the k-fold cross validation method The descending order of the algorithms in terms of accuracy is NHSM, FARAMS, HU–FCF and MIPFGWC-CS, with the average MAE values being 0.66370.143, 0.68470.130, 0.691 70.288 and 0.731, respectively The average RMSE values of these algorithms are 0.91570.1, 0.94370.101, 0.93170.196 and 0.953, respectively 111 113 115 117 119 121 123 61 Please cite this article as: L.H Son, Dealing with the new user cold-start problem in recommender systems: A comparative review, Information Systems (2014), http://dx.doi.org/10.1016/j.is.2014.10.001i L.H Son / Information Systems ] (]]]]) ]]]–]]] 16 63 65 67 69 71 11 73 13 75 15 77 17 79 19 81 21 83 Fig 15 The average computational time of algorithms on both datasets (s) 23 25 The average computational time of NHSM is approxi- 27 mately 11 s, which is smaller than that of FARAMS (74 s), HU–FCF (455 s) and MIPFGWC-CS (1351 s) 29 Conclusions 31 In this paper, we concentrated on the new user coldstart problem that negatively affects the recommender performance due to the inability of the recommender systems to produce meaningful recommendations A comparative review of the relevant studies addressing the new user cold-start problem was performed according to three groups, namely, (i) makes use of additional data sources, (ii) chooses the most prominent groups of analogous users, and (iii) enhances the prediction using hybrid methods We also discussed the advantages and disadvantages of these groups and noted the typical algorithm(s) of each group Details of the typical algorithms along with their theoretical analyses, such as MIPFGWC-CS [46], NHSM [20], FARAMS [17] and HU–FCF [42], were examined An experimental validation on the benchmark recommender systems datasets, namely, MovieLens and Jester, under various settings of the new user cold-start was performed The experimental results, which were presented in tables and bar chart figures, revealed the efficiencies of the algorithms in terms of accuracy and the computational time Through the above results and discussion, our findings in this article can be summarized as follows First, NHSM is the most effective algorithm among all of the investigated methods in terms of both accuracy and computational time The MAE and RMSE values of the NHSM algorithm are approximately 0.663 70.143 and 0.915 70.1, respectively The computational time of NHSM is approximately 11 s Second, the descending order of the algorithms in terms of accuracy is NHSM, FARAMS, HU–FCF and MIPFGWC-CS Third, all algorithms, especially NHSM, are stable according to various cross-validation methods, such 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 as the Hold-out and k-fold used to generate the new user cold-start These concluding remarks have clearly answered our research question stated in Section The explanation for those remarks demonstrating the superiority of the NHSM metric over other algorithms, such as FARAMS, HU–FCF and MIPFGWC-CS, could be observed from the mechanism of NHSM In cases of the RS data having the rating only, such as Jester, HU–FCF made use of the Pearson coefficient, which has some limitations, such as a poor signal-to-noise ratio and negative spikes, to calculate the similarity FARAMS utilized the fuzzy association rules with the support of a fuzzification method to work with the rating dataset As stated in limitation #a of Section 3.3, the fuzzification could lead to inaccurate prediction results if an unsuitable method, e.g., the center of gravity or the max/ method, etc., is applied The NHSM metric, on the other hand, utilized a better similarity metric than the Pearson coefficient and did not employ the fuzzification, thus avoiding the ambiguousness of dealing with the fuzzy parameters and enhancing the prediction accuracy Therefore, the accuracy of NHSM is better than those of FARAMS and HU–FCF in this case, as illustrated in Tables and In cases of the RS data having both the demographic and rating, such as MovieLens, some algorithms relied on either the demographic or the rating datasets, such as MIPFGWC-CS and FARAMS, respectively, such that their accuracies were not high A hybrid algorithm using both the demographic and the rating datasets, such as HU–FCF, could be a good choice However, the problem of the fuzzification of the demographic dataset that existed in MIPFGWC-CS, FARAMS and HU–FCF impedes the high accuracy of algorithms in the context of the new user cold-start situation An interesting observation is that if a specific cross validation method, such as the Hold-out method, is chosen to generate the cold-start users, the accuracy of NHSM could not be the best among all, as proven in Table Nevertheless, if taking large samples and diverse cross validation methods, such as the Please cite this article as: L.H Son, Dealing with the new user cold-start problem in recommender systems: A comparative review, Information Systems (2014), http://dx.doi.org/10.1016/j.is.2014.10.001i 85 87 89 91 93 95 97 99 101 103 105 107 109 111 113 115 117 119 121 123 L.H Son / Information Systems ] (]]]]) ]]]–]]] 11 13 15 17 19 k-fold in Table 7, the advantages of NHSM are more obvious, with the number of times that the accuracy of NHSM is better than those of other algorithms being large Additionally, NHSM has low computational complexity because it requires less computational time than other algorithms From these results, we could clearly recognize the efficiency of NHSM Referring back to the Introduction section, we recognize the practical implication and insightfulness of the new user cold-start problem Therefore, our further research directions could be (i) proposing a hybrid method to enhance NHSM in terms of accuracy, (ii) improving the association rules to large orders in the FARAMS method, (iii) proposing a general similarity measure that is better than the NHSM metric, and (iv) investigating applications of NHSM and its variants to the forecasting problems Those directions will enrich the knowledge of developing techniques in the fields of recommender systems and applied intelligence in the future 21 [9] [10] [11] [12] [13] [14] [15] [16] Acknowledgments 23 The authors are greatly indebted to the anonymous 25 reviewers for their comments and their valuable suggestions, which improved the quality and clarity of this paper 27 Other thanks are sent to Ms Hoang Thi Thu Huong, FPT and Mr Donald B Samuel, WHO for the language editing 29 Q2 This work is sponsored by the NAFOSTED under Contract no 102.05-2014.01 31 33 Appendix A Supporting information 35 Supplementary data associated with this article can be found in the online version at http://dx.doi.org/10.1016/j.is 2014.10.001 37 [17] [18] [19] [20] [21] [22] 39 References 41 43 45 47 49 51 53 55 57 59 61 [1] M Aharon, et al., OFF-set: one-pass factorization of feature sets for online recommendation in persistent cold start settings, in: Proceedings of the 7th ACM Conference on Recommender systems, 2013, pp 375–378 [2] H.J Ahn, A new similarity measure for collaborative filtering to alleviate the new user cold-starting problem, Inf Sci 178 (1) (2008) 37–51 [3] D Almazro, G Shahatah, L Albdulkarim, M Kherees, R Martinez, W Nzoukou, A Survey Paper on Recommender Systems, 2010, arXiv:1006.5278 [4] J Basiri, A Shakery, B Moshiri, M.Z Hayat, Alleviating the cold-start problem of recommender systems using a new hybrid approach, in: Proceedings of the 5th IEEE International Symposium on Telecommunications (IST 2010), 2010, pp 962–967 [5] J Bobadilla, F Ortega, A Hernando, J Bernal, A collaborative filtering approach to mitigate the new user cold start problem, Knowl.-Based Syst 26 (2012) 225–238 [6] W Carrer-Neto, M.L Hernández-Alcaraz, R Valencia-García, F García-Sánchez, Social knowledge-based recommender system Application to the movies domain, Expert Syst Appl 39 (12) (2012) 10990–11000 [7] C.C Chen, Y.H Wan, M.C Chung, Y.C Sun, An effective recommendation method for cold start new users using trust and distrust networks, Inf Sci 224 (2013) 19–36 [8] M Chen, C Yang, J Chen, P Yi, A method to solve cold-start problem in recommendation system based on social network sub-community and [23] [24] [25] [26] [27] [28] [29] [30] 17 ontology decision model, in: Proceedings of the 3rd International 63 Conference on Multimedia Technology (ICMT 2013), 2013, pp 159–166 V Formoso, D Fernández, F Cacheda, V Carneiro, Using profile 65 expansion techniques to alleviate the new user problem, Inf Process Manag 49 (3) (2013) 659–672 S Ge, X Ge, An SVD-based collaborative filtering approach to 67 alleviate cold-start problems, in: Proceedings of the 9th IEEE International Conference on Fuzzy Systems and Knowledge Discov69 ery (FSKD 2012), 2012, pp 1474–1477 G Guo, Integrating trust and similarity to ameliorate the data sparsity and cold start for recommender systems in: Proceedings 71 of the 7th ACM Conference on Recommender systems, 2013, pp 451–454 73 Jester, Jester Online Joke Recommender Dataset, 〈http://www.ieor berkeley.edu/ $ goldberg/jester-data/〉 (accessed September 2013) H.N Kim, A El-Saddik, G.S Jo, Collaborative error-reflected models 75 for cold-start recommender systems, Decis Support Syst 51 (3) (2011) 519–531 77 Y.S Kim, et al., Hybrid techniques to address cold start problems for people to people recommendation in social networks, PRICAI 2012: Trends in Artificial Intelligence, Springer, Berlin, Heidelberg, 2012, 79 206–217 X.N Lam, T Vu, T.D Le, A.D Duong, Addressing cold-start problem in recommendation systems, in: Proceedings of the 2nd ACM Interna81 tional Conference on Ubiquitous Information Management and Communication, 2008, pp 208–211 83 C.W.K Leung, S.C.F Chan, F.L Chung, A collaborative filtering framework based on fuzzy association rules and multiple-level similarity, Knowl Inf Syst 10 (3) (2006) 357–381 85 C.W.K Leung, S.C.F Chan, F.L Chung, An empirical study of a crosslevel association rule mining approach to cold-start recommenda87 tions, Knowl.-Based Syst 21 (7) (2008) 515–529 B Lika, K Kolomvatsos, S Hadjiefthymiades, Facing the cold start problem in recommender systems, Expert Syst Appl 41 (4) (2014) 89 2065–2073 J Lin, K Sugiyama, M.Y Kan, T.S Chua, Addressing cold-start in app 91 recommendation: latent user models constructed from twitter followers, in: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, 93 2013, pp 283–292 H Liu, Z Hu, A Mian, H Tian, X Zhu, A new user similarity model to improve the accuracy of collaborative filtering, Knowl.-Based Sys95 tems 56 (2014) 156–166 N.N Liu, X Meng, C Liu, Q Yang, Wisdom of the better few: cold 97 start recommendation via representative based rating elicitation, in: Proceedings of the 5th ACM conference on Recommender systems, 2011, pp 37–44 99 N Manouselis, et al., Recommender systems challenge 2012, in: Proceedings of the 6th ACM Conference on Recommender Systems, 101 2012, pp 353–354 MovieLens, Movie Lens dataset, 〈http://grouplens.org/datasets/ movielens/〉 (accessed September 2013) 103 U.B.M.M Nazir, A Yadav, A mechanism for handling cold start in book recommender system by sharing student profile, Int J Appl 105 Innov Eng Manag (12) (2013) 318–322 E Negre, F Ravat, O Teste, R Tournier, Cold-start recommender system problem within a multidimensional data warehouse, in: 107 Proceedings of the IEEE 7th International Conference on Research Challenges in Information Science (RCIS 2013), 2013, pp 1–8 M Nilashi, O.B Ibrahim, N Ithnin, Hybrid recommendation 109 approaches for multi-criteria collaborative filtering, Expert Syst Appl 41 (8) (2014) 3879–3900 111 D Poirier, F Fessant, I Tellier, Reducing the cold-start problem in content recommendation through opinion classification, in: Proceedings of the IEEE/WIC/ACM International Conference on Web 113 Intelligence and Intelligent Agent Technology (WI-IAT 2010), 1, 2010, pp 204–207 115 C Preisach, L.B Marinho, L Schmidt-Thieme, Semi-supervised tag recommendation-using untagged resources to mitigate cold-start problems, Advances in Knowledge Discovery and Data Mining, Q3 117 Springer, Berlin, Heidelberg, 2010, 348–357 T Qiu, G Chen, Z.K Zhang, T Zhou, An item-oriented recommenda119 tion algorithm on cold-start problem, Europhys Lett 95 (5) (2011) 58003 L Quijano-Sánchez, D Bridge, B Díaz-Agudo, J.A Recio-García, A 121 case-based solution to the cold-start problem in group recommenders, Case-Based Reasoning Research and Development, Springer, 123 Berlin, Heidelberg, 2012, 342–356 Please cite this article as: L.H Son, Dealing with the new user cold-start problem in recommender systems: A comparative review, Information Systems (2014), http://dx.doi.org/10.1016/j.is.2014.10.001i 18 11 13 15 17 19 21 23 25 27 29 31 L.H Son / Information Systems ] (]]]]) ]]]–]]] [31] F Ricci, L Rokach, B Shapira, Introduction to Recommender Systems Handbook, Springer, US, 2011 [32] A.N Rosli, T You, I Ha, K.Y Chung, G.S Jo, Alleviating the cold-start problem by incorporating movies facebook pages, Clust Comput (2014) 1–11 [33] L Safoury, A Salah, Exploiting user demographic attributes for solving cold-start problem in recommender system, Lect Notes Softw Eng (3) (2013) 303–307 [34] A Said, E.W De Luca, B Kille, B Jain, I Micus, S Albayrak, KMulE: a framework for user-based comparison of recommender algorithms, in: Proceedings of the 2012 ACM International Conference on Intelligent User Interfaces, 2012, pp 323–324 [35] A Said, B.J Jain, S Albayrak, Analyzing weighting schemes in collaborative filtering: cold start, post cold start and power users, in: Proceedings of the 27th Annual ACM Symposium on Applied Computing, 2012, pp 2035–2040 [36] A Said, T Plumbaum, E.W De Luca, S Albayrak, A comparison of how demographic data affects recommendation, in: Adjoint Proceedings of the 19th International Conference on User modeling, Adaption, and Personalization, 2011 [37] B Shapira, Recommender Systems Handbook, Springer, US, 2011 [38] L.H Son, B.C Cuong, P.L Lanzi, N.T Thong, A novel intuitionistic fuzzy clustering method for geo-demographic analysis, Expert Syst Appl 39 (10) (2012) 9848–9859 [39] L.H Son, B.C Cuong, H.V Long, Spatial interaction–modification model and applications to geo-demographic analysis, Knowl.-Based Syst 49 (2013) 152–170 [40] L.H Son, N.D Linh, H.V Long, A lossless DEM compression for fast retrieval method using fuzzy clustering and MANFIS neural network, Eng Appl Artif Intell 29 (2014) 33–42 [41] L.H Son, Enhancing clustering quality of geo-demographic analysis using context fuzzy clustering Type-2 and particle swarm optimization, Appl Soft Comput 22 (2014) 566–584 [42] L.H Son, HU–FCF: a hybrid user-based fuzzy collaborative filtering method in recommender systems, Expert Syst Appl 41 (15) (2014) 6861–6870 [43] L.H Son, Optimizing municipal solid waste collection using chaotic particle swarm optimization in GIS based environments: a case study at Danang City, Vietnam, Expert Syst Appl 41 (18) (2014) 8062–8074 [44] L.H Son, DPFCM: a novel distributed picture fuzzy clustering method on picture fuzzy sets, Expert Syst Appl 42 (1) (2014) 51–66 [45] L.H Son, P.L Lanzi, B.C Cuong, H.A Hung, Data mining in GIS: a novel context-based fuzzy geographically weighted clustering algorithm, Int J Mach Learn Comput (3) (2012) 235–238 [46] L.H Son, N.T.H Minh, K.M Cuong, N.V Canh, An application of fuzzy geographically clustering for solving the cold-start problem in recommender systems, in: Proceeding of 5th IEEE International Conference of Soft Computing and Pattern Recognition (SoCPaR 2013), 2013, pp 44–49 [47] D Sun, C Li, Z Luo, A content-enhanced approach for cold-start problem in collaborative filtering, in: Proceedings of the 2nd IEEE International Conference on Artificial Intelligence, Management Science and Electronic Commerce (AIMSEC 2011), 2011, pp 4501– 4504 [48] M Sun, F Li, J Lee, K Zhou, G Lebanon, H Zha, Learning multiplequestion decision trees for cold-start recommendation, in: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, 2013, pp 445–454 [49] M Vozalis, K.G Margaritis, Collaborative filtering enhanced by demographic correlation, in: Proceedings of the AIAI Symposium on Professional Practice in AI of the 18th World Computer Congress, 2004 [50] W Wang, D Zhang, J Zhou, COBA: a credible and co-clustering filterbot for cold-start recommendations, Practical Applications of Intelligent Systems, Springer, Berlin, Heidelberg, 2012, 467–476 [51] Y Xie, Z Chen, K Zhang, C Jin, Y Cheng, A Agrawal, A Choudhary, Elver: recommending facebook pages in cold start situation without content features, in: Proceedings of the IEEE International Conference on Big Data 2013, 2013, pp 475–479 [52] D Zhang, Q Zou, H Xiong, CRUC: Cold-start Recommendations Using Collaborative Filtering in Internet of Things, 2013, arXiv:1306 0165 [53] Z.K Zhang, C Liu, Y.C Zhang, T Zhou, Solving the cold-start problem in recommender systems with social tags, Europhys Lett 92 (2) (2010) 28002 [54] K Zhou, S.H Yang, H Zha, Functional matrix factorizations for coldstart recommendation, in: Proceedings of the 34th ACM International Conference on Research and Development in Information Retrieval, 2011, pp 315–324 33 Please cite this article as: L.H Son, Dealing with the new user cold-start problem in recommender systems: A comparative review, Information Systems (2014), http://dx.doi.org/10.1016/j.is.2014.10.001i 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 ... assume that the new user has rated some prior rating in the rating data 53 55 3.3 FARAMS 57 Leung et al [17] integrated fuzzy sets theory into association rule mining techniques and applied the. .. on the demographic data, with the hard user- based degrees calculated from the rating histories integrated into the final Please cite this article as: L.H Son, Dealing with the new user cold-start. .. evaluating the influence of demographic attributes on the user ratings This framework was examined using a movie dataset to evaluate the accuracy and precision of the generated recommendations Nazir and

DSpace at VNU: Dealing with the new user cold-start problem in recommender systems: A comparative review

Thông tin tài liệu

Từ khóa liên quan

Mục lục

Dealing with the new user cold-start problem in recommender systems: A comparative review

Introduction

Literature review

The analysis of existing methods

MIPFGWC-CS

NHSM

FARAMS

HU–FCF

Experiments

Environment setup

Results and discussion

Conclusions

Acknowledgments

Supporting information

References

Tài liệu cùng người dùng

Tài liệu liên quan