A more efficient and consistent way of fitting PLS structural models: A better alternative to SEM than traditional PLS.
By Jonathan Starkweather, Ph.D.
Feb. 15, 2017 – Partial Least Squares, PLS, modeling (Wold, 1965, 1966a, 1973) or ‘soft modeling’ (Wold, 1982; Faulk, & Miller, 1992) was discussed in this column several years back (Starkweather, 2011). PLS is an important alternative to traditional path modeling and/or structural equation modeling, SEM, when the data in hand does not conform to the assumptions of those modeling techniques. However, PLS modeling does have its drawbacks. Early on, Dijkstra (1983) revealed a lack of consistency when PLS is used to estimate structural models. Other researchers (Wold, 1982; Fornell & Bookstein, 1982) have noted that “PLS does not solve a global optimization problem for parameter estimation, indicating that there exists no single criterion consistently minimized or maximized to determine model parameter estimates” (Hwang & Takane, 2004, p. 1). Hwang and Takane also point out that PLS offers no global goodness of fit statistic which would allow model comparisons. In response to the criticisms above, several researchers have proposed alternative methods for, or modifications to, the traditional PLS approach. As stated in the ‘matrixpls’ package vignette (Rönkkö, 2016c):
“Hwang and Takane (2014; 2004) proposed generalized structured component analysis (GSCA) arguing that it is superior over PLS because it has an explicit optimization criterion, which the PLS algorithm lacks. Dijkstra (2011; Dijkstra and Henseler 2015b; Dijkstra and Henseler 2015a) proposed that PLS can be made consistent by applying disattenuation, referring to this estimator as PLSc. Huang (2013; Bentler and Huang 2014) proposed two additional estimators that parameterize LISREL estimators based on Dijkstra's PLSc estimator. These estimators, referred to as PLSe1 an PLSe2 are argued to be more efficient than the consistent PLSc estimator” (p. 2).
Naturally, the focus of the current article is the ‘matrixpls’ package (Rönkkö, 2016a) and its capabilities to use the new methods mentioned in the preceding paragraph. Below a simulated dataset is used to demonstrate the ‘matrixpls’ function of the ‘matrixpls’ package fitting the following model.
First, import the simulated data from the Data Science and Analytics server and load the ‘matrixpls’ package (Rönkkö, 2016a).
Next, we specify the structural model using the three matrices style, i.e., inner, outer or reflective and formative. We begin by creating the inner matrix, i.e., a matrix specifying the unobserved variable relationships. Keep in mind, the ‘matrixpls’ function can recognize multiple methods for specifying a model (e.g., using ‘lavaan’ package syntax; Yves, 2012a & 2012b, as well as using the ‘semPLS’ package syntax; Monecke, & Leisch, 2012); the example below shows how the inner matrix would be specified for the ‘plspm’ package (Sanchez, Trinchera, & Russolillo, 2016) and function, which is also accepted by the ‘matrixpls’ function.
Next, we create the outer matrix (i.e. specify the relationships between observed variables and the unobserved variables). Later, this matrix will be referred to as the ‘reflective’ matrix because it specifies those types of relationships.
Next, we create the formative matrix, which in this example is all zeros because all our observed variables are reflective, i.e., not formative. Recall, reflective means that the unobserved variables are theorized to cause the observed variables (scores); whereas formative means the unobserved variables are caused by the observed variables.
Now we can combine all three matrices into a list object which specifies the structural model.
The only other thing we need is the variance-covariance matrix of the observed variables.
Now we can apply the ‘matrixpls’ function using OLS Regression estimation as would be done with traditional PLS, e.g., the ‘plspm’ package.
A much more thorough presentation of output can be seen by using either the ‘summary’ function or the ‘attributes’ function; however, the output of each is only partially presented here due to their size. The image below shows what is returned when applying the ‘names’ function to the ‘summary’, as well as the ‘attributes’, of the ‘matrixpls’ object. Using the various returned names one can then extract relevant elements of the ‘matrixpls’ object (e.g., “summary(mat.pls.1)$gof” can be used to extract the absolute goodness of fit).
Next, we re-specify the same model using the ‘lavaan’ package style of model specification syntax. This style of model specification syntax is much more intuitive and requires fewer lines (of code); as well as fewer objects in the workspace, i.e., not three matrices.
Next, we will fit the same model, as re-specified using ‘lavaan’ syntax, but we will use GSCAc estimation and two-stage least squares. Note, only partial output is displayed from the ‘summary’ function.
So, now we can do a comparison of the coefficients based on the two estimation techniques used on the same data and the same model – keep in mind with this simulated data there are very small differences in the statistics produced.
We can also compare the measurement model’s composite reliability estimates (i.e. the “Q” statistic) from each method of estimation. Note, the “Q” statistic produced is analogous to coefficient alpha from traditional item evaluation.
If interested in obtaining the Q2 predictive relevance statistic(s) you first need to run the cross-validation function which mimics the blindfolding procedure used with the ‘semPLS’ function of the package by the same name. Then, apply the ‘q2’ function to the object returned by the cross-validation.
There are many benefits to using the ‘matrixpls’ package (Rönkkö, 2016a) rather than the ‘plspm’ package (Sanchez, Trinchera, & Russolillo, 2016) or the ‘semPLS’ package (Monecke, & Leisch, 2012) for fitting structural models when SEM cannot be used. Obviously, the greatest benefits of the ‘matrixpls’ package is the ability to use multiple new and more robust estimation methods; only one such combination of estimation techniques was used above. The ‘matrixpls’ function is also more computationally efficient (Rönkkö, 2016c, p. 2) and offers more flexibility with respect to the types of models that can be fit. Consider the limitations on the specification of model matrices with traditional PLS packages. Those packages require the following matrix restrictions: “the inner must be a lower triangular matrix, reflective must have exactly one non-zero value on each row and must have at least one non-zero value on each column, and formative must only contain zeros” (Rönkkö, 2016c, p. 4). The ‘matrixpls’ function has two restrictions; all matrices must be binary and the inner must have zeros on the diagonal (Rönkkö, 2016c). Another benefit of the ‘matrixpls’ function is the ability to specify a model using the ‘lavaan’ package (Yves, 2012a) model specification syntax which is highly intuitive; interested readers should review the ‘lavaan’ package documentation available at CRAN (see Yves, 2012b). Lastly, another major benefit to using the ‘matrixpls’ package is a function for doing Monte Carlo simulations of a ‘matrixpls’ object.
A version of the R script used in this article can be found on the R&SS Do-It-Yourself Introduction to R website at the bottom of the Module 9 section.
Until next time; everybody look what’s going down…
References and Resources
Bentler, P. M., & Huang, W. (2014). On components, latent variables, PLS and simple methods: Reactions to Rigdon’s rethinking of PLS. Long Range Planning, 47(3), 138 – 145.
Dijkstra, T. (1983). Some comments on maximum likelihood and partial least squares methods. Journal of Econometrics, 22, 67 – 90.
Dijkstra, T. K., & Henseler, J. (2015a). Consistent partial least squares path modeling. MIS Quarterly, 39(2), 297 – 316.
Dijkstra, T. K., & Henseler, J. (2015b). Consistent and asymptotically normal PLS estimators for linear structural equations. Computational Statistics and Data Analysis, 81, 10 – 23.
Faulk, R. R., & Miller, N. B. (1992). A primer for soft modeling. Akron, OH: The University of Akron.
Fornell, C., & Bookstein, F. L. (1982). Two structural equation models: LISREL and PLS applied to consumer exit-voice theory. Journal of Marketing Research, 19, 440-452.
Huang, W. (2013). “PLSe: Efficient Estimators and Tests for Partial Least Squares.” Doctoral dissertation, Los Angeles: University of California.
Hwang, H., & Takane, Y. (2004). Generalized structured component analysis. Psychometrika, 69(1), 81 –99.
Hwang, H., & Takane, Y. (2014). Generalized structured component analysis: A component-based approach to structural equation modeling. New York: CRC Press / Taylor & Francis Group.
Monecke, A., & Leisch, F. (2012). semPLS: Structural Equation Modeling Using Partial Least Squares. Journal of Statistical Software, 48(3), 1-32. URL http://www.jstatsoft.org/v48/i03/. Documentation available at CRAN: https://cran.r-project.org/web/packages/semPLS/index.html
Rigdon, E. E. (2012). Rethinking partial least squares path modeling: In praise of simple methods. Long Range Planning, 45, 341 – 358.
Rönkkö, M. (2016a). matrixpls: Matrix-based Partial Least Squares Estimation. R package version 1.0.4. Documentation available at CRAN: https://cran.r-project.org/web/packages/matrixpls/index.html
Rönkkö, M. (2016b). Package matrixpls manual. Available at CRAN: https://cran.r-project.org/web/packages/matrixpls/matrixpls.pdf
Rönkkö, M. (2016c). Package matrixpls vignette: Introduction to matrixpls. Available at CRAN: https://cran.r-project.org/web/packages/matrixpls/vignettes/matrixpls-intro.pdf
Sanchez, G. (2013). PLS Path Modeling with R (Trowchez Editions). Berkeley, 2013. Available at: http://gastonsanchez.com/PLS_Path_Modeling_with_R.pdf
Sanchez, G., Trinchera, L., & Russolillo, G. (2016). Package ‘plspm’ with documentation, including package manual, available at CRAN: https://cran.r-project.org/web/packages/plspm/index.html
Starkweather, J. (2011). An alternative modeling strategy: Partial least squares. As published in the RSS Matters (now called Research Matters) column of University Information Technology’s Benchmarks online magazine. Available at: http://bayes.acs.unt.edu:8083/BayesContent/class/Jon/Benchmarks/PLS_JDS_July2011.pdf
Vinzi, V. E., Chin, W. W., Hensler, J., & Wang, H. (2010). Handbook of partial least squares: Concepts, methods, and applications. New York: Springer.
Wold, H. O. A. (1965). A fixed-point theorem with econometric background, I-II. Arkiv for Matematik, 6, 209 – 240.
Wold, H. O. A. (1966a). Estimation of principal components and related methods by iterative least squares. In P. R. Krishnaiah (Ed.), Multivariate Analysis (pp. 391 – 420). New York: Academic Press.
Wold, H. O. A. (1966b). Nonlinear estimation by iterative least squares procedures, in: E. N. David (Ed.), Research papers in statistics: Festschrift for J. Neyman (pp. 411 – 444). New York: Wiley.
Wold, H. O. A. (1973). Nonlinear iterative partial least squares (NIPALS) modeling: Some current developments. In P. R. Krishnaiah (Ed.), Multivariate Analysis (pp. 383 – 487). New York: Academic Press.
Wold, H. O. A. (1982). Soft modeling: The basic design and some extension. In K. G. Joreskog and H. Wold (Eds.), Systems under Indirect Observations II (pp. 1 – 54). Amsterdam: North-Holland.
Yves, R. (2012a). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2), 1-36. URL http://www.jstatsoft.org/v48/i02/.
Yves, R. (2012b). Package ‘lavaan’ manual which is available at CRAN: https://cran.r-project.org/web/packages/lavaan/index.html
Benchmarks Online, news journal, is a publication of University Information Technology, University of North Texas, Denton, Texas.