Data Analysis Machine Learning and Applications Episode 1 Part 9 doc

Factorial Analysis of a Set of Contingency Tables 221 As a result, SA proceeds by performing a principal component analysis (PCA) of the matrix X, X =  √ D 1 X 1 √ D t X t √ D T X T  The PCA results are also obtained using the SVD of X, giving singular values √ O s on the s-th dimension and corresponding left and right singular vectors u s and v s . We calculate projections on the s-th axis of the columns as principal coordinates g s , g s = O s D −1/2 c v s where D c (J ×J), is a diagonal matrix of all the column masses, that is all the D t c . One of the aims of the joint analysis of several data tables is to compare them through the points corresponding to the same row in the different tables. These points will be called partial rows and denoted by i t . The projection on the s-th axis of each partial row is denoted by f t is and the vector of projections of all the partial rows for table t is denoted by f t s , f t s = (D t r ) −1/2 [0 √ D t X t 0] v s Especially when the number of tables is large, comparison of partial rows is complicated. Therefore each partial row will be compared with the (overall) row, projected as f s =(D w ) −1  √ D 1 X 1 √ D t X t √ D T X T  v s =(D w ) −1 X v s where D w is the diagonal matrix whose general term is  t∈T  p t i. . The choice of this matrix D w allows us to expand the projections of the (overall) rows to keep them inside the corresponding set of projections of partial rows, and is appropriate when the partial rows have different weights in the tables. With this weighting the projections of the overall and partial rows are related as follows: f is =  t∈T √ p t i.  t∈T √ p t i. f t is So the projection of a row is a weighted average of the projections of partial rows. It is closer to those partial rows that are more similar to the overall row in terms of the relation expressed by the axis and have a greater weight than the rest of the partial rows. The dispersal of the projections of the partial rows with regard to the projection of their (overall) row indicates discrepancies between the same row in the different tables. Notice that if p t i. is equal in all the tables then f s =(1/T)  t∈T f t s , that is the overall row is projected as the average of the projections of the partial rows. Interpretation rules for simultaneous analysis In SA the transition relations between projections of different points create a simultaneous representation that provides more detailed knowledge of the matter being studied. Relation between f t is and g js : The projection of a partial row on axis s depends on the projections of the columns: f t is = √ D t √ O s  j∈J t p t ij p t i. g js 222 Amaya Zárraga and Beatriz Goitisolo Except for the factor  D t /O s , the projection of a partial row on axis s, is, as in CA, the centroid of the projections of the columns of table t. Relation between f is and g js : The projection of an overall row on axis s may be expressed in terms of the projections of the columns as follows: f is =  t∈T √ D t √ p t i.  t∈T  p t i.  1 √ O s  j∈J t p t ij p t i. g js  The projection of the row is therefore, except for the coefficients  D t /O s , the weighted average of the centroids of the projections of the columns for each table. Relation between g js and f is or f t is : The projection on the axis s, of the column j for table t, can be expressed in the following way: g js = √ D t O s   i∈I   t∈T  p t i.   p t i.  p t ij −p t i. p t . j p t i. p t . j  f is  This expression shows that the projection of a column is placed on the side of the projections of the rows with which it is associated, compared to the hypothesis of independence, and on the opposite side of the projections of those to which it is less associated. This projection is, according to partial rows: g js =  D t O s   i∈I  p t i.  p t ij −p t i. p t . j p t i. p t . j    t∈T  p t i. f t is   The same aids to interpretation are available in SA as in standard factorial analysis as regards the contribution of points to principal axes and the quality of display of a point on axis s. 2.3 Stage three: comparison of the tables: interstructure In order to compare the different tables, SA allows us, to represent each of them by means of a point and to project them on the axes. The coordinate of table t on axis s, f ts , represents the projected inertia of the table on the axis and, therefore, indicates the importance of the table in the determination of the axis. Thus, f ts =  j∈J t p t . j g 2 js = Inertia s (t) where Inertia s (t) represents the projected inertia of the sum of columns of the table t on the axis s. Due to the weighting of the tables chosen by SA, the maximum value of this inertia on the firstaxisis1.Avalueof f ts close to 0 would indicate orthogonality between the first axes of the separate analyses with regard the Simultaneous Anal- ysis. A value of f ts close to 1 would indicate that the axis of the joint analysis is approximately the same as in the separate analysis of each table. So, if all the tables present a coordinate close to the maximum value, 1, on the first factorial axis of the SA, the projected inertia onto it is approximately T, the number of tables, and this confirms that this first direction is accurately depicting the relevant associations of each table. Factorial Analysis of a Set of Contingency Tables 223 2.4 Relations between factors of the analyses In SA it is also possible to calculate the following measurements of the relation between the factors of the different analyses. Relation between factors of the individual analyses: The correlation coefficient can be used to measure the degree of similarity between the factors of the separate CA of different tables. This is possible when the marginals p t i. are equal. When p t i. are not equal, Cazes (1982) proposes calculating the correlation coefficient between factors, assigning weight to the rows corresponding to the margins of one of the tables. Therefore, these weights, and the correlation coefficient as well, depend on the choice of this reference table. In consequence, we propose to solve this problem of the weight by extending the concept of generalized covariance (Méot and Leclerc (1997)) to that of generalized correlation (Zárraga and Goitisolo (2003)). The relation between the factors s and s  of the tables t and t  respectively would be calculated as: r(f st ,f s  t  )=  i∈I f ist √ O t s  p t i.  p t  i. f is  t   O t  s  where f ist and f is  t  are the projections on the axes s and s  of the separate CA of the tables t and t  respectively and where O t s and O t  s  are the inertias associated with these axes. This measurement allows us to verify whether the factors of the separate analyses are similar and check the possible rotations that occur. Relation between factors of the SA and factors of the separate analyses: Like- wise, it is possible to calculate for each factor s of the SA, the relation with each of the factors s  of the separate analyses of the different tables: r(f s  t ,f s )=  i∈I f is  t  O t s   p t i.   t∈T  p t i.  f is √ O s If all the tables of frequencies analysed have the same row weights this measurement is reduced to: r(f s  t ,f s )=  i∈I p t i. f is  t f is   i∈I p t i. ( f is  t ) 2   i∈I p t i. ( f is ) 2 that is, the classical correlation coefficient between the factors of the separate analyses and the factors of SA. 3 Application In this section we apply SA to the data taken from an on-line survey drawn up by the Spanish Ministry of Education and Science, from January to March 2006, to Spanish students who participate in the Erasmus program in European universities. This application presents a comparative study for Spanish students, according to gender, of the relationships between the countries that they choose as destination to carry out the university interchange in the Erasmus program and the scientific fields in which they are studying. 224 Amaya Zárraga and Beatriz Goitisolo The 15 countries that they choose as destination are Austria, Belgium, Czech Republic, Denmark, Finland, France, Germany, Ireland, Italy, Netherlands, Norway, Poland, Portugal, Sweden and United Kingdom. The scientific fields in which they are studying are: Social and Legal Sciences, Engineering and Technology, Humani- ties, Health Science and Experimental Science. Therefore, we have two data tables whose rows (countries) and columns (scientific fields) correspond to the same modalities but refer to two different sets of individuals, depending on their gender. In these tables both the marginals and the grand-totals are different. This fact suggests analyzing the tables by SA since the results of applying other methods can be affected by the above mentioned differences (Zárraga and Goitisolo (2002)). The first factorial plane of SA (figure 1) explains nearly 60% of total inertia. In the plane we observe that male and female students of Humanities Area, Health Sci- ence and specially Engineering and Technology have a similar behavior in the choice of the country of destination to realize their studies, whereas students of Social and Legal Sciences and of Experimental Science choose different countries as destiny depending on their gender. The plane shows that students of Humanities Area, both male and female, choose the United Kingdom as destiny country, followed by Ireland. The countries chosen as destiny for students of both gender of Engineering and Technology are mainly Austria, Sweden and Denmark. Finally, the males and females students of Health Science Area prefer Portugal and Finland. The students of Experimental Science Area select different countries to realize the interchange depending on their gender. While male students go mainly to Portu- gal and Netherlands, females go to Norway. Also students of Social and Legal Sciences Area have a different behavior. The Netherlands and Ireland are selected as destiny country by males and females but males also go to Belgium, the United Kingdom and Italy while females do it to Norway and Sweden. The projection of partial rows of each table, joined by segments, allows us to appreciate the differences between males and females in each destiny country. We will only remark some of them. For example, United Kingdom is a country to which males and females students go in a greater proportion among the students of Humanities. Nevertheless males also choose United Kingdom to carry out Social and Legal studies whereas females do not. Male and female students that come to Portugal agree in selecting this country over the average for Health degrees. But, males also go to Portugal to study Ex- perimental Science while females prefer this country for studies of Engineering and Technology. Spanish students who go to Finland share the selection of this country over the rest of the countries to study in the areas of Health and Engineering but there are more females in the former area and males in the last one. Factorial Analysis of a Set of Contingency Tables 225 Fig. 1. Projection of columns, overall rows and partial rows In the other hand, not big differences between males and females are found in Germany, France, Belgium and Norway as it is indicate by the close projections of overall and partial rows. As conclusion of this application we can say that Simultaneous Analysis allows us to show the common structure inside each table as well as the differences in the structure of both tables. A more extensive application to the joint study of the inter and intra-structure of a bigger number of contingency tables can be found in Zárraga and Goitisolo (2006). 4 Discussion The joint study of several data tables has given rise to an extensive list of factorial methods, some of which have been gathered by Cazes (2004), for both quantitative and categorical data tables. In the correspondence analysis (CA) approach Cazes shows the similarity between some methods in the case of proportional row margins and shows the problem that arises in a joint analysis when the row margins are different or not proportional. Comments on the appropriateness of SA and a comparison with different methods, especially with Multiple Factor Analysis for Contingency Tables (Pagès and Bécue-Bertaut (2006)), in the cases where row margins are equal, proportional and not proportional between the tables can be found in Zárraga and Goitisolo (2006). 226 Amaya Zárraga and Beatriz Goitisolo 5 Software notes Software for performing Simultaneous Analysis, written in S-Plus 2000 can be found in Goitisolo (2002). The AnSimult package for R can be obtained from the authors. References CAZES, P. (1980): L’ analyse de certains tableaux rectangulaires décomposés en blocs: généralisation des propriétes rencontrées dans l’ étude des correspondances multiples. I. Définitions et applications à l’ analyse canonique des variables qualitatives. Les Cahiers de l’ Analyse des Données, V, 2, 145–161. CAZES, P.(1981): L’ analyse de certains tableaux rectangulaires décomposés en blocs: généralisation des propriétes rencontrées dans l’ étude des correspondances multiples. IV. Cas modèles. Les Cahiers de l’ Analyse des Données, VI, 2, 135–143. CAZES, P. (1982): Note sur les éléments supplémentaires en analyse des correspondances II. Tableaux multiples. Les Cahiers de l’ Analyse des Données, VII, 133–154. CAZES, P. (2004): Quelques methodes d’ analyse factorielle d’ une serie de tableaux de don- nées. La Revue de Modulad, 31, 1–31. D’ AMBRA, L. and LAURO, N. (1989): Non symetrical analysis of three-way contingency tables. Multiway Data Analysis, 301–315. ESCOFIER, B. (1983): Généralisation de l’ analyse des correspondances à la comparaison de tableaux de fréquence. INRIA, Mai, 207, 1–33. ESCOFIER, B and PAGÈS, J. (1988 (1998, 3e édition) ): Analyses Factorielles Simples et Multiples. Objetifs, méthodes et interprétation. Dunod, París. GOITISOLO, B. (2002): El análisis simultáneo. Propuesta y aplicación de un nuevo método de análisis factorial de tablas de contingencia. Phd Thesis. Basque Country University Press. Bilbao. Spain. LAURO, N. and D’ AMBRA, L. (1984): L’ Analyse non symétrique des correspondances. Data Analysis and Informatics, III, 433–446. MÉOT, A. and LECLERC, B. (1997): Voisinages a priori et analyses factorielles: Illustration dans le cas de proximités géographiques. Revue de Statistique Appliquée, XLV, 25–44. PAGÈS, J. and BÉCUE-BERTAUT, M. (2006): Multiple Factor Analysis for Contingency Tables. In: M. Greenacre and J.Blasius (Eds.): Multiple Correspondence Analysis and Related Methods. Chapman & Hall/CRC, 299–326. ZÁRRAGA, A. and GOITISOLO, B. (2002): Méthode factorielle pour l’analyse simultanée de tableaux de contingence. Revue de Statistique Appliquée L(2), 47-70. ZÁRRAGA, A. and GOITISOLO, B. (2003): Étude de la structure Inter-tableaux à travers l’Analyse Simultanée. Revue de Statistique Appliquée LI(3), 39-60. ZÁRRAGA, A. and GOITISOLO, B. (2006): Simultaneous Analysis: A Joint Study of Sev- eral Contingency Tables with Different Margins. In: M. Greenacre and J.Blasius (Eds.): Multiple Correspondence Analysis and Related Methods. Chapman & Hall/CRC, 327– 350. Non Parametric Control Chart by Multivariate Additive Partial Least Squares via Spline Rosaria Lombardo 1 , Amalia Vanacore 2 and Jean-Francçois Durand 3 1 Faculty of Economics, Second University of Naples, Italy rosaria.lombardo@unina2.it 2 Faculty of Engineering, University of Naples “Federico II", Italy amalia.vanacore@unina.it 3 Faculty of Maths, University of Montpelier II, France jfd@ensam.inra.fr Abstract. Statistical process control (SPC) chart is aimed at monitoring a process over time in order to detect any special event that may occur and find assignable causes for it. Controlling both product quality variables and process variables is a complex problem. Multivariate methods permit to treat all the data simultaneously extracting information on the “directionality" of the process variation. Highlighting the dependence relationships between process variables and product quality variables, we propose the construction of a non-parametric chart, based on Multivariate Additive Partial Least Squares Splines; proper control limits are built by applying the Bootstrap approach. 1 Introduction The multivariate nature of product quality (response or output variables) and process characteristics (predictors or input variables) highlights the limits of any analysis based exclusively on descriptive and univariate statistics. On the other hand, the possibility for process managers of extracting knowledge from large databases, opens the way to analyze the multivariate dependence relationships between quality product and process variables via predictive and regressive techniques like PLS (Tenenhaus, 1998; Wold, 1966) and its generalizations (Durand, 2001; Lombardo et al., 2007). In this paper, the application of a multivariate control chart based on a generalization of PLS-T 2 chart (Kourti and MacGregor, 1996) is proposed in order to analyze the in-control process and monitoring it over time. Furthermore, in order to face the problem of the unknown distribution of the statistic to be charted, a non- parametric approach is applied for the selection of the control limits. Distribution- free or non-parametric control charts have been proposed in literature to overcome the problems related to the lack of normality in process data. An overview in literature on univariate non-parametric control charts is given by Chakraborti et al. (2001). The principles on which non-parametric control charts rest can be generalized to multivariate settings. In particular, the bootstrap approach to estimate control 202 Rosaria Lombardo, Amalia Vanacore and Jean-Francçois Durand limits (Wu and Wang, 1997; Jones and Woodall, 1998; Liu and Tang, 1996) has been followed. 2 Multivariate control charts based on projection methods A standard multivariate quality control problem occurs when an observed vector of measurements on quality characteristics exhibits a significant shift from a set of tar- get (or standard) values. The first attempt to face the problem of multivariate process control is due to Hotelling (1947) who introduced the well-known T 2 chart based on variance-covariance matrix. Successively, different approaches to take into account the multivariate nature of the problem were proposed (Woodall, Ncube, 1985; Lowry et al., 1992; Jackson, 1991; Liu, 1995; Kourti and MacGregor, 1996, Mac- Gregor, 1997). In particular, we focus on the approach based on PLS components proposed by Kourti and MacGregor (1996), in order to monitor over time the dependence structure between a set of process variables and one or more product quality variables (Hawkins, 1991). The PLS approach proves to be effective in presence of a low-ratio of observations to variables and in case of multicollinearity among the predictors, but a major limit of this approach is that it assumes a linear dependence structure. Generally, linearity assumption in a model is reasonable as first research step, but in practice relationships between the process variables and the product quality variables are often non-linear and in order to study the dependence structure it could be much more appropriate the use of non-linear models (PLS via Spline, i.e. PLSS; Durand, 2001) as proposed by Vanacore and Lombardo (2005). The PLSS-T 2 chart allows to handle non-linear dependence relationships in data structure, miss- ing values and outliers, but it presents two major drawbacks: 1) it does not take into account the possible effect of interactions between process variables; 2) it requires testing normality assumption on the component scores, even when original data are multinormal (in fact, in case of spline, i.e. non linear transformations of original process variables, the multinormality assumption cannot be guaranteed anymore). To overcome these drawbacks we present non-parametric Multivariate Additive PLS Spline-T 2 chart based on Multivariate Additive PLSS (MAPLSS, Lombardo et al., 2007) briefly described in sub-section 2.2. 2.1 Review of MAPLSS MAPLSS is just the application of linear PLS regression of the response (matrix Y of dimension n,q) on linear combinations of the transformed predictors (matrix X of dimension n, p) and their interactions. The predictors and bivariate interactions are transformed via a set of K = d + 1+ m (d is the spline degree and m is the knot number) basis functions, called B-splines B l (.),soastorepresentanysplineasa linear combination s(x,E)= K  l=1 E l B l (x), MAPLSS-T 2 control chart 203 where E =(E 1 , , E K ) is the vector of spline coefficients computed via regression of y ∈IR on the B l (.) The centered coding matrix or design matrix including interactions becomes B =[ B i    i∈K 1 | B k, l    (k, l)∈K 2 ], (1) where K 1 and K 2 are index sets for single variables and bivariate interactions, respectively. In a generic form, the MAPLSS model, for the response j, can be written as ˆy j (A)=  lHL ˆ E j l (A)B l , (2) where A is the space dimension parameter and L is the index set pointing out the predictors as well as the bivariate interactions retained by MAPLSS. It is thus a purely additive model that depends on A which in turn depends on the spline parameters (i.e. degree, number and location of knots). Increasing the order of interaction in MAPLSS implies expanding the dimension of the design matrix B. MAPLSS constructs a sequence of centered and uncorrelated predictors, i.e. the MAPLSS (latent) components (t 1 , ,t A ). We now briefly describe the MAPLSS building-model stage. In the first phase we do not consider interactions in the design matrix. This phase consists of the following steps step 1 Denote B 0 = B and Y 0 = Y the design and response data matrices, respectively. Define t 1 = B 0 w 1 and u 1 = Y 0 c 1 as the first MAPLSS components, where the weighting unit vectors w 1 and c 1 are computed by maximizing the covariance between linear compromises of the transformed predictors and response variables, cov(t 1 ,u 1 ). step k Compute the generic MAPLSS component t k = B k−1 w k u k = Y k−1 c k . (3) Update the new matrices B k and Y k as the residuals of the least-squares regres- sions on the components previously computed using the orthogonal projection operator P t k on t k ,thatisP t k = t k t k /t k  2 , we write B k = B k−1 −P t k B k−1 (4) Y k = Y k−1 −P t k Y k−1 . (5) Final Step The algorithm stops on the base of the A number of components defined by PRESS criterion. In the second phase of the MAPLSS building-model stage, we individually evaluate all possible interactions. The rule for accepting a candidate bivariate interaction is based on the gain in fit(R 2 ) and prediction (GCV criterion) compared to that of the model with main effects only. Then, the selected interactions are ordered in decreasing value for consideration to adding them step-by-step to the main effects model. At 204 Rosaria Lombardo, Amalia Vanacore and Jean-Francçois Durand the end, in the final phase we include in the design matrix B the selected interactions and repeat the algorithm from step 1 to the final step. A simple way to illustrate the contribution of predictors to response variables, consists of ordering the predictors with respect to their decreasing influence on the response ˆy j (A), using as a criterion, the range of the s i (x i , ˆ E j i (A)) values of the transformed sample x i (see figure 3). One can also use the same criterion to prune the model, by eliminating the predictors and/or the interactions of low influence so as to obtain a more parsimonious model. 2.2 MAPLSS-T 2 chart Based on a generalization of PLS chart, taking into account not only the original process variables, but also their bivariate interactions, in this paper, we discuss the appli- cability of a new chart called MAPLSS-T 2 chart. Following the procedure used for the construction of multivariate control charts based on projection methods like PCA- T 2 chart(Jackson, 1991), PLS-T 2 chart (Kourti and MacGregor, 1996) and PLSS-T 2 chart (Vanacore and Lombardo, 2005), the MAPLSS-T 2 chart is based on the first A components. The MAPLSS-T 2 chart is an effective monitoring tool: it incorporates the variability structure underlying process data and quality product data extracting information on the directionality of the process variation. The scores of each new observation are monitored by the MAPLSS-T 2 control chart based on the following statistic T 2 A = A  a=1 (t 2 a ) O a (6) where O a and t a for a = 1, ,A are the eigenvalues and the component scores, respectively, of the previously defined covariance matrix. The control limits of the MAPLSS-T 2 chart are based on the percentiles q D (for D ≤ 10%) of the empirical distributions, F N , of MAPLSS component scores, computed on a large number N of bootstrap samples D = P(T 2 A ≤ q D |F N ). (7) Multivariate control charts can detect an unusual event but do not provide a reason for it. Following the diagnostic approach proposed by Kourti and MacGregor (1996) and using some new tools, we can investigate observations falling out of the limits through (1) bar plots of standardized out-of control scores (t a / √ O a for a = 1, ,A), to focus on the most important dimensions; (2) bar plot of the contributions of the process variables on the dimensions identified as the most important ones, to evaluate how each process variable involved in the calculation of that score contributes to it; (3) bar plot of the contributions of the process variables on product variables (mea- sured by the spline range) to evaluate the importance of process variables. [...]... Technology, 28, pp 4 09- 428 LIU, Y.R ( 19 95 ): Control Charts for Multivariate Processes Journal of the American Statistical Association, 90 , pp 13 80 -13 87 LIU, Y.R and TANG, J ( 19 96 ): Control Charts for Dependent and Independent Measurement based on Bootstrap Methods.Journal of the American Statistical Association, 91 , pp 16 94 -17 00 LOMBARDO, R., DURAND, J.F., DE VEAUX, R (2007): Multivariate Additive Partial Least... CHAMP, C W., RIGDON, S E ( 19 92 ): A multivariate EWMA control chart Technometrics, 34, pp 46-53 MACGREGOR, J.F ( 19 97 ): Using On-line Process Data to Improve Quality: Challenges for Statisticians.International Statistical Review, 65, pp 3 09- 323 NOMIKOS, P and MACGREGOR, J.F ( 19 95 ): Multivariate SPC Charts for Monitoring Batch Processes Technometrics, 37, pp 41- 59 TENENHAUS, M ( 19 98 ): La Règression PLS,... Europeennes D’AMBRA, L., and LAURO, N ( 19 82): Analisi in componenti principali in rapporto ad un sottospazio di riferimento Italian Journal of Applied Statistics, 1 DAUDIN, J.J., DUBY, C., and TRECOURT, P ( 19 88): Stability of stability of principal component analysis studied by bootstrap method Statistics , 19 , 2 DE BOOR, C ( 19 78): A practical guide to splines Springer, N.Y DURAND, J.F ( 19 93 ): Generalized... Control, in Techniques of Statistical Analysis Eds Eisenhart, Hastay and Wallis, MacGraw Hill, New York JACKSON, J E ( 19 91 ) : A User Guide to Principal Component Wiley, New York JONES, A L and WOODALL, W.H ( 19 98 ): The performance of Bootstrap Cotrol Charts Journal of Quality Technology, 30, pp 362-375 KOURTI, T and MACGREGOR, J F ( 19 96 ): Multivariate SPC Methods for Process and Product Monitoring Journal of... Quality Technology, 33, pp 304- 315 208 Rosaria Lombardo, Amalia Vanacore and Jean-Francçois Durand DURAND, J.F (20 01) : Local Polynomial additive regression through PLS and splines: PLSS Chemiometric and Intelligent Laboratory systems, 58, pp 235-246 HAWKINS, D.M ( 19 91 ) : Multivariate Quality Control based on regression-adjusted variables.Technometrics, 33, pp 61- 75 HOTELLING, H ( 19 47): Multivariate Quality... ( 19 93 ): Generalized principal component analysis with respect to instrumental variables via univariate spline trasformations Computational Statistics Data Analysis, 16 , 423-440 EUBANK, R.L ( 19 88): Smoothing splines and non parametric regression Markel Dekker and Bosel, N.Y GIFI, A ( 19 90 ): Nonlinear Multivariate Analysis Wiley, Chichester, England GROVE, D.M., WOODS, D.C., and LEWIS, S.M (2004): Multifactor... 4, 380-3 91 IZERNMAN, A.J ( 19 75): Reduced-rank regression for the multivariate linear model Journal of Multivariate Analysis, 5 HUBERT, M., ROUSSEEUW, P.J., and BRANDEN, K.V (2005): ROBPCA: A New Approach to Robust Principal Component Analysis Technometrics, 47, 1 MACGREGOR, J.F and KOURTI, T ( 19 95 ): Statistical Process Control of Multivariate Processes Control Engineering Practice, 3, 3, 403- 414 Simple... X and Y standardised data matrices, hence Q = 1 The CPCA (D’Ambra and Lauro, 19 82) aim is to analyse the structure of the explained variability of the Y data set given the process variables X Let PX = X(X DX X) 1 X (1) be the D-orthogonal projector onto the space spanned by the columns of X CPCA consists in carrying out a PCA on the matrix Y = PX Y (2) Constrained Principal Component Analysis 19 5... not both) To resolves horseshoes problem and gives more interpretable results, nonlinear transformation of data can be used (Gifi, 19 90 ) Fig 1 Plot of the first and second Constrained Principal Component 3 Nonlinear Constrained Principal Component Analysis B-spline approach (Durand, 19 93 ) allows a greater flexibility in the adjustment of dependence between the X and Y sets of variables Let S j (x j )B... be the transformation of x j -column, j = 1, , p , S j (n, k) the Bbasis spline with a priori fixed order and knots (De Boor, 19 78; Eubank, 19 88), B j (k, q) is the matrix of coefficient Similarly we can write S as: S(n, and B as: k) = [S1 (x1 )| |S p (x p )] B( ⎡ ⎤ B1 ⎢ ⎥ k,q) = ⎣ ⎦ Consider the following multivariate model Bp (3) (4) 19 6 Michele Gallo and Luigi D’Ambra X = SB + E (5) In order . Ncube, 19 85; Lowry et al., 19 92; Jackson, 19 91 ; Liu, 19 95; Kourti and MacGregor, 19 96, Mac- Gregor, 19 97). In particular, we focus on the approach based on PLS components proposed by Kourti and. In particular, the bootstrap approach to estimate control 202 Rosaria Lombardo, Amalia Vanacore and Jean-Francçois Durand limits (Wu and Wang, 19 97; Jones and Woodall, 19 98; Liu and Tang, 19 96). La Revue de Modulad, 31, 1 31. D’ AMBRA, L. and LAURO, N. ( 19 89) : Non symetrical analysis of three-way contingency tables. Multiway Data Analysis, 3 01 315 . ESCOFIER, B. ( 19 83): Généralisation

Data Analysis Machine Learning and Applications Episode 1 Part 9 doc

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan