Data Analysis Machine Learning and Applications Episode 2 Part 9 pdf

468 Marcel Paulssen and Angela Sommerfeld impact. Breaking a promise and experiencing poor quality of repair work influence solely satisfaction ratings (J 36 = −.23, p <.01 and J 39 = −.32, p <.01), whereas CIs classified as showing no goodwill and restriction to basic service lowered customers trust in the service provider (J 28 = −.14, p <.01 and J 210 = −.14, p <.01). The incident category which should be primarily avoided is negative behaviors toward the customer, since it clearly has the most damaging impact on the customer-firm relationship, due to its dual influence on trust (J 211 = −.27, p <.01) and satisfaction (J 311 = −.26, p <.01). Interestingly, only one of the positive CI categories (offer- ing additional service) impacts on satisfaction with the repair department (J 33 = .23, p <.01) and none impacts on trust. Fig. 1. MIMIC model: CI categories and their impact on relationship measures, significant path coefficients are depicted. 5 Discussion Even though several papers in the marketing literature have raised the question whether and which incidents are really critical for a customer-firm relationship (Ed- vardsson & Strandvik, 2000) ours is the first study to explicitly address this question. In the present study, we conducted CI interviews without restricting valence Are Critical Incidents Really Critical 469 and number of incidents reported, and assessed their impact on measures of relationship quality. Our results confirm that positive and negative incidents possess a partially asymmetric impact on satisfaction and trust. Negative incidents have partic- ularly damaging effects on a relationship through their strong impact on trust (total causal effect: 0.58). These results are in stark contrast to Odekerken-Schröder et al.’s (2000) conclusion, that CIs do not play a significant role for developing trust. Fur- ther the damage inflicted by negative incidents can hardly be “healed” with very positive experiences, since the total causal effect of the number of positive incidents on trust is substantially smaller (0.12). Thus, management should clearly put empha- sis on avoiding negative interaction experiences. The employed MIMIC approach followed Gremler’s call (2004, p. 79) to “determine which events are truly critical to the long-term health of the customer-firm relationship” and revealed which specific incident categories have a particular strong impact on relationship health and should be avoided with priority, such as negative behavior toward the customer. The collected vivid verbatim stories from the customer’s perspective provide very concrete information for managers and can be easily communicated to train customer-contact personnel (Zeithaml & Bitner, 2003; Stauss & Hentschel, 1992). For further studies, as pointed out by one of the reviewers, an alternative evaluation possibility would be to measure the experienced severity of the experienced CI-categories instead of their mere occurrence. References BARON, R. and KENNY, D. (1986): The Moderator-Mediator Variable Distinction in Social Psychological Research: Conceptual, Strategic and Statistical Considerations. Journal of Personality and Social Psychology, 51 (6), 1173-1182. BAUMEISTER, R. F., BRATSLAVSKY, E., FUNKENAUER, C., and VOHS, K. D. (2001): Bad is Stronger than good. Review of General Psychology, 5 (4), 323-370. BITNER, M. J., BOOMS, B. H., and TETREAULT, M. S. (1990): The Service Encounter - Diagnosing Favorable and Unfavorable Incidents. Journal of Marketing, 54(1), 71-84. BOLLEN, K. A. (1989): Structural Equations with Latent Variables. New York: Wiley. EDVARDSSON, B. (1992). Service Breakdowns: A Study of Critical Incidents in an Airline. International Journal of Service Industry Management, 3(4), 17-29. EDVARDSSON, B., and STRANDVIK, T. (2000): Is a Critical Incident Critical for a Cus- tomer Relationship? Managing Service Quality, 10(2), 82-91. EICH E, MACAULAY D., and RYAN L. (1994): Mood Dependent Memory for Events of the Personal Past. Journal of Experimental Psychology - General, 123 (2), 201-215. FISKE, S. (1980): Attention and Weight in Person Perception - the Impact of Negative and Extreme Behaviour. Journal of Personality and Social Psychology, 38 (6), 889-906. FORGAS, J. P. (1995): Mood and Judgment: The Affect Infusion Model (AIM). Psychological Bulletin, 117 (1), 39-66. FORNELL, C., JOHNSON, M. D., ANDERSON, E. W., CHA, J., and BRYANT, B. E. (1996): The American Customer Satisfaction Index: Nature, Purposes, and Findings. Journal of Marketing, 60 (October), 7-18. GEYSKENS, I., STEENKAMP, J-B. E. M., and KUMAR, N. (1999): A Meta-Analysis of Sat- isfaction in Marketing Channel Relationships. Journal of Marketing Research, 36 (May), 223-238. 470 Marcel Paulssen and Angela Sommerfeld GREMLER, D. (2004): The Critical Incident Technique in Service Research. Journal of Ser- vice Research, 7(1), 65-89. JÖRESKOG, K. and SÖRBOM, D. (2001): LISREL 8: User’s Reference Guide. Chicago: Scientific Software International. KAHNEMAN, D. and TVERSKY, A. (1979): Prospect Theory - Analysis of Decision under Risk. Econometrica, 47(2), 263-291. MORGAN, R. M. and HUNT, S. D. (1994): The commitment-trust theory of relationship marketing. Journal of Marketing, 58(3), 20-38. ODEKERKEN-SCHRÖDER, G., van BIRGELEN, M., LEMMINK, J., de RUYTER, K., and WETZELS, M. (2000): Moments of Sorrow and Joy: An Empirical Assessment of the Complementary Value of Critical Incidents in Understanding Customer Service Evalua- tions. European Journal of Marketing, 34(1/2), 107-125. ROOS, I. (2002): Methods of Investigating Critical Incidents - A Comparative Review. Journal of Service Research, 4 (3), 193-204. SINGH, J. and SIRDESHMUKH, D. (2000): Agency and Trust Mechanisms in Consumer satisfaction and loyalty judgments. Journal of the Academy of the Marketing Science,28 (1), 150-167. STAUSS, B. and HENTSCHEL, B. (1992): Attribute-based versus Incident-based Measure- ment of Service Quality: Results of an Empirical Study in the German Car Service In- dustry. In: P. Kunst and J. Lemmink, (Eds.), Quality Management in Services (59-78). STAUSS, B. and WEINLICH, B. (1997): Process-oriented Measurement of Service Quality - Applying the Sequential Incident Technique. European Journal of Marketing, 31 (1), 33-55. SZYMANSKI, D. M. and HENARD, D. H. (2001): Customer Satisfaction: A Meta-Analysis of the Empirical Evidence. Academy of Marketing Science, 29(1), 16-35. TAYLOR, S. (1991): Asymmetrical Effects of Positive and Negative Events: The Mobilization-Minimization Hypothesis. Psychological Bulletin, 110 (1), 67-85. YBARRA, O. and STEFAN, W. G, (1999): Attributional Orientations and the Prediction of Behavior: The Attribution-Prediction Bias. Journal of Personality and Social Psychology, 76 (5), 718-728. ZEITHAML, V. A. and BITNER, M. J. (2003): Services Marketing: Integrating Customer Focus across the Firm (3 rd ed.). New York: McGraw-Hill. Building an Association Rules Framework for Target Marketing Nicolas March and Thomas Reutterer Institute for Retailing and Marketing, Vienna University of Economics and Business Administration, Augasse 2–6, 1090 Vienna, Austria march@troostwijk.de thomas.reutterer@wu-wien.ac.at Abstract. The discovery of association rules is a popular approach to detect cross-category purchase correlations hidden in large amounts of transaction data and extensive retail assort- ments. Traditionally, such item or category associations are studied on an ’average’ view of the market and do not reflect heterogeneity across customers. With the advent of loyalty programs, however, tracking each program member’s transactions has become facilitated, enabling retailers to customize their direct marketing efforts more effectively by utilizing cross-category purchase dependencies at a more disaggregate level. In this paper, we present the building blocks of an analytical framework that allows retailers to derive customer segment-specific associations among categories for subsequent target marketing. The proposed procedure starts with a segmentation of customers based on their transaction histories using a constrained version of K-centroids clustering. In a second step, associations are generated separately for each segment. Finally, methods for grouping and sorting the identified associations are provided. The approach is demonstrated with data from a grocery retailing loyalty program. 1 Introduction One central goal of customer relationship management (CRM) is to target customers with offers that best match their individual consumption needs. Thus, the question of who to target with which range of products or items emerges. Most previous research in CRM or direct marketing concentrates on the issue who to target (for an extensive literature review see, e.g., Prinzie and Van den Poel (2005)). We address both parts of this question and introduce the cornerstones of an analytical framework for customizing direct marketing campaigns at the customer segment level. In order to identify and to make use of possible cross-selling potentials, the proposed approach builds on techniques for exploratory analysis of market basket data. Retail managers have been interested in better understanding the purchase interde- pendency structure among categories for quite a while. One obvious reason is that knowledge about correlated demand patterns across several product categories can be exploited to foster cross-buying effects using suitable marketing actions. For example, if customers often buy a particular product A together with article B, it could 440 Nicolas March and Thomas Reutterer be useful to promote A in order to boost sales volumes of B, and vice versa. The objective of exploratory market basket analysis is to discover such unknown cross-item correlations from a typically huge collection of purchase transaction data (so-called market baskets) accruing at the retailer’s point-of-sale scanning devices (Berry and Linoff (2006)). Among others, algorithms for mining association rules are popular techniques to accomplish this task (cf., e.g. Hahsler et al. (2006)). However, such association rules are typically derived for the entire data set of available retail transactions and thus reflect an ’average’ or aggregate view of the market only. In recent years, many retailers have tried to improve their CRM activities by launching loyalty programs, which provide their members with bar-coded plastic or registered credit cards. If customers use these cards during their payment process, they get a bonus, credits or other rewards. As a side effect, these transactions become personally identifiable by linking them back to the corresponding customers. Thus, retailers are nowadays collecting series of market baskets that represent (more or less) complete buying histories of their primary clientele over time. 2 A segment-specific view of cross-category associations To exploit the potential benefits offered by such rich information on customers’ pur- chasing behavior within advanced CRM programs, cross-category correlations need to be detected on a more disaggregate (or customer segment) level instead of an aggregate level. Attempts towards this direction are made by Boztug and Reutterer (2007) or Reutterer et al. (2006). The authors employ vector quantization techniques to arrive at a set of ’generic’ (i.e., customer-unspecific) market basket classes with internally more distinctive cross-category interdependencies. In a second step they generate a segmentation of households based on a majority voting of each household’s basket class assignments throughout the individual purchase history. These segments are proposed as a basis for designing customized target marketing actions. In contrast to these approaches, the procedure presented below adopts a novel centroids-based clustering algorithm proposed by Leisch and Grün (2006), which bypasses the majority voting step for segment formation. This is achieved by a cross- category effects sensitive partitioning of the set of (non-anonymous) market basket data, which imposes group constraints determined by the household labels associated with each of the market baskets. Hence, during the iterative clustering process the single transactions are "forced" to keep linked with all the other transactions of a specific household’s buying history. This results in segments whose members can be characterized by distinctive patterns of cross-category purchase interrelationships. To get a better feeling of the inter-category purchase correlations within the previously identified segments, association rules derived separately for each segment and evaluated by calculating various measures of significance and interestingness can assist marketing managers for further decision making on targeted marketing actions. Although the within-segment cross-category associations are expected to differ significantly from those generated for the unsegmented data set (because of the data compression step employed prior to the analysis), low minimum thresholds of such Building an Association Rules Framework for Target Marketing 441 measures typically still result in a huge number of potentially interesting associations. To arrive at a clearer and managerially more traceable overview of the various segment-specific cross-category purchase correlations, we arrange them based on a distance concept suggested by Gupta et al. (1999). The next section characterizes the building blocks of the employed methodology in more detail. Section 4 empirically illustrates the proposed approach using a transaction data set from a grocery retailing loyalty program and presents selected results. Section 5 closes the article with a summary and an outlook on future research. 3 Methodology The conceptual framework of the proposed approach is depicted in Figure 1 and con- sists of three basic steps: First, a modified K-centroids cluster algorithm partitions the entire transaction data set and defines K segments of households with an interest in similar category combinations. Secondly, the well-known APRIORI algorithm (Agrawal et al. (1993)) searches within each segment for specific frequent itemsets, which are filtered by a suitable measure of interestingness. Finally, the associations are grouped via hierarchical clustering using a distance measure for associations. 5 6 - 2   5 6 - 2  5 6 - 2  ! K - c e n t r o i d c l u s t e r a l g o r i t h m h o l d i n g t h e l i n k a g e t o I p A s s o c i a t i o n m i n i n g w i t h i n s e g m e n t k = 1 A s s o c i a t i o n m i n i n g w i t h i n s e g m e n t k = 2 A s s o c i a t i o n m i n i n g w i t h i n s e g m e n t k = K X N F i l t e r i n g , g r o u p i n g a n d s o r t i n g o f m i n e d a s s o c i a t i o n s w i t h i n e a c h s e g m e n t Fig. 1. Conceptual framework of the proposed procedure Step 1: Each transaction or market basket can be interpreted as a J-dimensional binary vector x n =[1,0] J with j = 1, 2 J categories. A value of one refers to the presence and a zero to the absence of an item in the market basket. Integrated into a binary matrix X N , the rows correspond to transactions while each column represents an item. Let the set I p describe a group constraint indicating the buying history of customer p = 1,2, P with {x i ∈ X N |i ∈ I p }. The objective function for a modified K-centroids clustering respecting group constraints is (Leisch and Grün (2006)): D(X N ,C K )= P  p=1  i∈I p d(x i ,c(I p )) →min C K (1) An iterative algorithm for solving Equation 1 requires calculation of the closest centroid c(.) for each transaction x i according to the distance measure d(.) at each 442 Nicolas March and Thomas Reutterer iteration. To cope with the usually sparse binary transaction data and to make the partition cross-category effects sensitive, the Jaccard coefficient, which gives more weight the co-occurrences of ones rather than common zeros, is used as an appropri- ate distance measure (cf. Decker (2005)). Notice that in contrast to methods like the K-means algorithm, instead of single transactions groups of market baskets as given by I p (i.e., customer p’s complete buying history) need to be assigned to a minimum distant centroid. This is warranted by a function f (x i ) that determines the centroid closest to the majority of the grouped transactions (cf. Leisch and Grün (2006)). In order to achieve directly accessible and more intuitively interpretable results, we can calculate cluster-wise means for updating the prototype system instead of optimized canonical binary centroids. This results in an ’expectation-based’ clustering solution (cf. Leisch (2006)), whose centroids are equivalent to segment-specific choice probabilities of the corresponding categories. Notice that the segmentation of households is determined such that each customer’s complete purchase history points exclusively to one segment. Thus, in the present application context the set of K centroids can be interpreted as prototypical market baskets that summarize the most pronounced item combinations demanded by the respective segment members throughout their purchase history. An illustrative example is provided in Table 1 of the subsequent empirical study. Step 2: The centroids derived in the segmentation step already provide some indications on the general structure of the cross-item interdependencies within the household segments. To get a more thorough understanding, interesting category combinations (so called itemsets) can be further explored by the APRIORI algorithm using a user defined support value. For the entire data set, the support of an arbitrary itemset A is denoted by supp(A)=|{x n ∈X N |A ⊆x n }|/ |N |and defines the fraction of transactions containing itemset A. Notice that in the present context, however, itemsets are generated at the level of previously constructed segments. The itemsets are called frequent if their support is above a user-defined threshold value, which implies their sufficient statistical importance for the analyst. To generate a wide range of associations, rather low minimum support values are usually preferred. Because not all associations are equally meaningful, an additional measure of interestingness is required to filter the itemsets for evaluation purposes. Since our focus is on itemsets, asymmetric measures like confidence or lift are less useful (cf. Hahsler (2006)). We advocate here the so-called all-confidence measure introduced by Omiecinski (2003), which is the minimum confidence value for all rules that can be generated from the underlying itemset. Formally it is denoted by allcon f(A)=supp(A)/max B⊂A {supp(B)} for all frequent subsets B with B ⊂ A. Step 3: Although the all-confidence measure can assist in reducing the number of itemsets considerably, in practice it can still be difficult to handle several hundreds of remaining associations. For an easier recognition of characteristic inter-item correlations within each segment, the associations can be grouped based on the following Jaccard-like distance measure for itemsets (Gupta et al. (1999)): D(A,B)=1− | m(A∪B) | | m(A) | + | m(B) |−|m(A ∪B) | (2) Building an Association Rules Framework for Target Marketing 443 Expression m(.) denotes the set of transactions containing the itemset. From Equation 2 it should be evident that the distance between two itemsets tends to be lower if the involved itemsets occur in many common transactions. This property qualifies the measure to determine specific groups of itemsets that share some common aspects of consumption behavior (cf. Gupta et al. (1999)). 4 Empirical application The following empirical study illustrates some of the results obtained from the procedure described above. We analyzed two samples of real-world transaction data, each realized by 3,000 members of a retailer’s loyalty program. The customers made on average 26 shopping trips over an observational period of one year. Each transaction contains 268 binary variables, which represent the category range of the assortment. To achieve managerially meaningful results, preliminary screening of the data suggested the following adjustments of the raw data: 1. The purchase frequencies are clearly dominated by a small range of categories, such as fresh milk, vegetables or water (see Figure 2). Since these categories are bought several times by almost every customer during the year under investigation, they provide relatively low information on the differentiated buying habits of the customers. The opposite is supposed to be true for categories with intermediate or lower purchase frequencies. Therefore, we decided to eliminate the upper 52 categories (left side of the vertical line in Figure 2), which occur in more than 10% of all transactions. The resulting empty baskets are excluded from the analysis as well. purchase frequency 0.0 0.1 0.2 0.3 0.4 0.5 Fig. 2. Distribution of relative category purchase frequencies in decreasing order 2. To include households with sufficiently large buying histories, households with less than six store visits per year were eliminated. In addition, the upper five percentage quantile of households, which use their customer cards extremely often, were deleted. To find a sufficiently stable cluster solution with a minimum within-sum of dis- tances, the transactions made by the households from the first sample are split into 444 Nicolas March and Thomas Reutterer three equal sub samples and clustered up to fifteen times each. In each case, the best solution is kept for the following sub sample to achieve stable results. The con- verged set of centroids of the third sub sample is used for initialization of the second sample. Commonly used techniques for determination of the number of clusters rec- ommended K = 11 clusters as a decent and well-manageable number of household segments. Given these specifications, the partitioning of the second sample using the proposed cluster algorithm detects some segments, which are dominated by category combinations typically bought for specific consumption or usage purposes and other types of categorical similarities. For example, Table 1 shows an extract of a centroid vector including the top six categories in terms of highest conditional purchase probabilities in a segment of households denoted as the "wine segment". A typical market basket arising from this segment is expected to contain red/rosé wines with a probability of 32.3 %, white wines with a probability of 22.5 %, etc. Hence, the labeling "wine segment". Equally, other segments may be characterized by categories like baby food/care or organic products. On the other hand, there is also a small number of segments with category interrelationships that cannot be easily explained. However, such segments might provide some interesting insights into the interests of households which are so far unknown. Table 1. Six categories with highest purchase frequencies in the wine segment No. Category Purchase frequency 1. red / rosé wines 0.3229143 2. white wines 0.2252356 3. sparkling wine 0.1225006 4. condensed milk 0.1206619 5. appetizers 0.1080211 6. cooking oil 0.1066422 According to the second step of the proposed framework, frequent itemsets are generated from the transactions within the segments. Since we want to mine a wide range of associations, a quite low minimum support threshold is chosen (e.g., supp= 1%). In addition, all frequent itemsets are required to include at least two categories. Taking this into account, the APRIORI algorithm finds 704 frequent itemsets for the transactions of the wine segment. To reduce the number of associations and to focus on the most interesting frequent itemsets, only the 150 itemsets with highest all- confidence values are considered for grouping according to step 3 of the procedure. Grouping the frequent itemsets intends to rearrange the order of the generated (segment-specific) associations and to focus the view of the decision maker on characteristic item correlations. The distance matrix derived by Equation 2 is used as input for hierarchical clustering according to the Ward algorithm. Figure 3 shows the dendrogram for the 150 frequent itemsets within the wine cluster. Again, it is not straightforward to determine the correct number of groups g h . Frequently proposed heuristics based on plotted heterogeneity measures does not help here. Therefore, we Building an Association Rules Framework for Target Marketing 445 pass the distance matrix to the partition around medoid (PAM) algorithm of Kauf- man and Rousseeuw (2005) for several g h values. Using the maximum value of the average silhouette width for a sequence of partitions thirty groups of itemsets are proposed. In Figure 3 the grey rectangles mark two exemplary chosen clusters of associations. The corresponding associations of the right hand group are summarized in Table 2 and clearly indicate an interest of some of the wine households in hard alcoholic beverages. Fig. 3. Dendrogram of 150 frequent itemsets mined from transactions of the wine segment Table 2. Associations of hard alcoholic beverages within the wine segment No. association support all-confidence 1. {brandy, whisky} 0.011 0.23 2. {brandy, fruit brandy} 0.015 0.18 3. {fruit brandy, appetizers} 0.018 0.17 4. {brandy, appetizers} 0.016 0.15 5. {whisky, fruit brandy} 0.011 0.14 To examine whether the segment-specific associations differ from those generated within the whole data set, we have drawn and analyzed random samples with the same amount of transactions as each of the segments. The comparison of the frequent itemsets mined in the random sample and those from the segment-specific transactions shows that some segment-specific association groups clearly represent a unique characteristic of their underlying household segment. Of course, this is not true in any case. For example, the association group marked by the grey rectangle on the left-hand side in Figure 3 can be found in almost every random sample or segment. It denotes correlations between categories of hygiene products. [...]... well as revealing substitutive relations 4 .2 Results Table 1 depicts the results of fitting the model for 1 to 5 segments Table 1 Model’s fit with different numbers of segments S log L 1 -131. 49 2 -117.04 3 -100 .96 4 - 89. 76 5 - 82. 62 AIC 28 0 .97 27 6. 09 26 7. 92 2 69. 52 2 79 .24 MAIC 28 9. 97 29 7. 09 300. 92 314. 52 336 .24 Class Err .00 09 08 11 11 pseudo R2 23 81 92 92 95 494 Ralf Wagner It is obvious from the table... -7 .96 02s 5.45 03s 8 .93 04s 10. 59 05s TM score 3.67 DM score 2. 71 EM score -2. 44 IM score 1.15 NM score -.44 Intercept Covariates 83 B2B markets -. 32 services 20 inner segment R2 Segment 2 81 -2. 63 6.51 2. 76 2. 10 -8.75 4.05 1.03 -.64 23 29 -. 82 -1.36 1 .90 Segment 3 89 -7.68 1 .97 7. 82 2. 59 -4.70 88 -7.87 1.37 6. 52 2 .99 -.01 1.68 -2. 10 Wald-Statistic – 19 .25 biased biased biased biased 10 . 29 8 .94 6. 62. .. 18 .2 % 0.0 32 0.117 0.140 27 .5 % 0.106 0.157 0 .23 8 0.081 29 .2 % 0 .25 6 0.145 0.163 0.0 19 25 .1 % 0. 199 0. 122 0.147 0.043 Cluster 1 (n=80) Rel Imp PW 14.4 % 0.044 0.080 0. 095 28 .5 % 0.177 0.183 0 .23 6 0.036 33.1 % 0.317 0.131 0.165 0.017 24 .0 % 0 .21 8 0.068 0. 097 0.050 Cluster 2 (n= 82) Rel Imp PW 21 .8 % 0. 020 0.154 0.184 26 .5 % 0.036 0.1 32 0 .24 0 0. 124 25 .4 % 0. 196 0.1 59 0.161 0. 021 26 .2 % 0.181 0.174 0. 195 ... Table 2 Validity values for the total sample and for the clusters for traditional ACA estimation (using standardized part worths from step 2 at the individual level) Total sample (n=161)* Cluster 1 (n= 79) * Cluster 2 (n= 82) First-choice-hit-rate (using individual data) 62. 11 % 73. 42 % 51 .22 % Mean Spearman (using individual data) 0.735 0.7 82 0.6 89 * one respondent had missing holdout data and could... nowadays a method for which a huge number of applications are known as well as many specialized tools for data collection and analysis have been developed For part worth estimation, especially clusterwise estimation procedures (see, e.g., Baier and Gaul ( 199 9, 20 03)) and Hierarchical Bayes (HB) estimation (see, e.g., Allenby and Ginter ( 199 5), Lenk et al ( 199 6)) seem to be attractive newer developments... Wiley, New York LEISCH, F and GRÜN, B (20 06): Extending standard cluster algorithms to allow for group constraints In: A Rizzi, M Vichi (Eds.): Compstat 20 06, Proceedings in Computational Statistics Physica-Verlag, Heidelberg, 885–8 92 LEISCH, F (20 06): A toolbox for k-centroids cluster analysis In: Computational Statistics and Data Analysis, 51 (2) , 526 –544 OMIECINSKI, E (20 03): Alternative Interest... substituted in future applications by a data driven weighting scheme References AGRAWAL, R., IMIELINSKI, T and SWAMI, A ( 199 3): Mining association rules between sets of items in large databases In: Proceedings of the ACM SIGMOD International Conference on Management of Data Washington D.C., 20 7 21 6 BERRY, M and LINOFF G (20 04): Data mining techniques Wiley, Indianapolis BOZTUG, Y and REUTTERER, T (20 07): A combined... Associations in Databases In: IEEE Transactions on Knowledge and Data Engineering, 15(1), 57– 69 PRINZIE, A and VAN DEN POEL, D (20 05): Constrained optimization of data- mining problems to improve model performance A direct-marketing application In: Expert Systems with Applications, 29 , 630–640 REUTTERER, T., MILD, A., NATTER, M and TAUDES, A (20 06): A dynamic segmentation approach for targeting and customizing... from a mean part worth distribution is derived from the collected individual data (for methodological details and new developments see, e.g., Allenby et al ( 199 5), Lenk et al ( 199 6), Andrews et al (20 02) , Liechty et al (20 05)) The main advantages and therefore the reasons for the attention of HB can be summarized as follows (Orme 20 00): • • HB estimation seems (at least) to outperform traditional models... system (Sawtooth Software (20 02) ), to be precise ACA/Web within SSI/Web (Windows Version 2. 0.1b) For our investigation a five-step analysis is used to answer our focused questions 434 Michael Brusch and Daniel Baier Step 1 – Analyzing the quality In our study we had 23 9 started and 21 3 finished questionnaires Standard ACA methodology was used for individual part worth estimation Standard selection criteria . pseudo R 2 1 -131. 49 28 0 .97 28 9. 97 .00 .23 2 -117.04 27 6. 09 29 7. 09 . 09 .81 3 -100 .96 26 7. 92 300. 92 .08 . 92 4 - 89. 76 26 9. 52 314. 52 .11 . 92 5 - 82. 62 2 79 .24 336 .24 .11 .95 494 Ralf Wagner It is obvious. biased T 04s 8 .93 2. 10 2. 59 biased T 05s 10. 59 -8.75 -4.70 biased TM score 3.67 4.05 .88 10 . 29 DM score 2. 71 1.03 -7.87 8 .94 EM score -2. 44 64 1.37 6. 62 IM score 1.15 .23 6. 52 5.17 NM score 44 . 29 2. 99 3.03 Intercept. frequency 1. red / rosé wines 0. 322 91 43 2. white wines 0 .22 523 56 3. sparkling wine 0. 122 5006 4. condensed milk 0. 120 66 19 5. appetizers 0.108 021 1 6. cooking oil 0.1066 422 According to the second step

Data Analysis Machine Learning and Applications Episode 2 Part 9 pdf

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan