**relationships**

^{1}

A regression model is a formal means of expressing the general tendency of a dependent variable (Y) to vary with the independent variable (X) systematically. The independent variable (X) is also referred to as the predictor variable. George Box (2007) states, “all models are approximations. Essentially, all models are wrong, but some are useful. However, the approximate nature of the model must always be borne in mind” (p. 414), implying there are random errors present, hence the nonperfect fit.^{1,2} The expression of the general tendency of a dependent variable (Y) and an independent variable (X) is potent in many different fields.

Regression analysis provides the researcher with three essential purposes: 1) explanation, 2) control and 3) prediction. Explanation describes the statistical relationship between the dependent variable (Y) and the independent variable (X). Control allows the researcher to set bounds on the independent variable (X) to achieve a controlled output range of the dependent variable (Y). Prediction enables the researcher to answer “what if” questions; e.g., setting the independent variable (X) to a non-common value outside of the control range will likely produce what given output of the dependent variable (Y)? Many times, these three purposes overlap in practice. Regression analysis becomes a unique statistical tool for researchers, process engineers and others in various fields.

Regression models have many different types. These models include, but are not limited to, simple linear, multiple linear, nonlinear, orthogonal, calibration, logistic, and Poisson.^{1,3} Simple linear regression is a basic regression model with only one predictor variable (X), and the regression function is linear. Simple linear regression models are considered first-order models. The regression equation has no parameters (β^{0} intercept or β^{1} slope) expressed as an exponent, nor are they multiplied or divided by another parameter. The predictor variable (X) appears only in the first-order. Simple linear regression is a good starting point for exploring statistical relationships. Nonlinear regression is a different animal.

Nonlinear regression models are not first-order models. The regression equation has parameters (θ) theta 1, theta 2, etc., that may be expressed as exponents, multiplied or divided by another parameter, etc. The predictor variable (X) may or may not appear in the first-order. When constructing a nonlinear regression model, the model itself must be chosen, along with the parameter (θ) starting points (theta 1, theta 2, etc.).^{3,4} Selecting the model and parameter (θ) starting points (theta 1, theta 2, etc.) requires effort on the researcher’s part.

Researchers may not know the actual model tendency of the dependent variable (Y) and the independent variable (X). Without this knowledge, it is challenging to choose a nonlinear model and its starting parameters. In these cases, combining a mechanistic model with an empirical model is a practical approach.^{3} Mechanistic models are constructed purely from physical considerations, whereas empirical models are built from data. Small pilot runs can be used for data gathering if no data exist. Combining a mechanistic model with an empirical model provides the best of both worlds. A scatterplot of the data can be constructed and a plausible nonlinear model determined by juxtaposing the scatterplot with graphs of nonlinear models.^{4} The most challenging part of nonlinear regression is determining the parameter (θ) theta 1, theta 2, etc., starting points. Microsoft Excel provides a solution.^{5}

The Solver is an add-in embedded in Microsoft Excel. The Solver is a sophisticated optimization program that enables one to find solutions to complex linear and nonlinear problems. Solver minimizes the sum of the squared difference between data points and the function describing the data.^{6} The Solver uses an iterative process that first requires the researcher to decide the initial starting parameter (θ) values for theta 1, theta 2, etc. Solver’s first iteration computes the sum of squares difference. The next iteration involves changing the parameter values by a small amount and recalculating the sum of squares difference. This iterative process is repeated multiple times to achieve the smallest possible value of the sum of squares. A computer algorithm is used for this iterative process.

The researcher can choose from one of three Solver computer algorithms. The three algorithms are 1) Simplex LP Solving method, 2) generalized reduced gradient (GRG), and 3) Evolutionary Solving method. The Simplex LP Solving method can be applied only to linear problems, whereas the GRG and Evolutionary Solving methods can be used for nonlinear problems. The GRG runs faster than the Evolutionary Solving method, but the algorithm can stop at a local optimum, not necessarily at the global optimum. The Evolutionary Solving Method is based on natural selection theory, which makes it run slower, but it is more robust than the GRG algorithm. Using the GRG Multistart option is a good compromise between speed and robustness.

The process engineer examines the scatterplot in Figure 1. The scatterplot reveals a nonlinear pattern. There appear to be different slopes from zero to two minutes, two to four minutes, four to eight minutes, and eight to 16 minutes. The scatterplot results corroborate the mechanistic model developed from Eq. 1.

Seven crucial inputs are needed for Solver: 1) set objective cell, 2) to value, 3) by changing variable cells, 4) subject to the constraints, 5) make unconstrained variables non-negative, 6) select a solving method, and 7) multistart option. The process engineer inputs the values shown in FIGURES 5 and 6. The “objective cell” is assigned to the total SSE; the “to value” is set to minimize; the “changing variable cells” are assigned to theta 1 and theta 2; constraints are added to the “subject to the constraints” (note: the multistart option requires constraints); “make unconstrained variables non-negative” is unchecked; the “solving method” is set to GRG, and the “multistart” option is checked. The process engineer sets the “subject to the constraints” from -100 to +100 to permit wide latitude for the multistart option. Unchecking the “make unconstrained variables non-negative” permits Solver more freedom in finding a global solution. The process engineer runs Solver, and the results are shown in FIGURE 7.

- J. Neter, M. Kurter, C. Nachtsheim, and W. Wasserman,
*Applied Linear Statistical Models,*fourth edition, McGraw Hill, 1996. - G. Box and N. Draper,
*Response Surfaces, Mixtures, and Ridge Analyses,*second edition, John Wiley & Sons, 2007. - T. Ryan,
*Modern Regression Methods,*second edition, John Wiley & Sons, 2009. - D. Bates and D. Watts,
*Nonlinear Regression Analysis and Its Applications,*John Wiley & Sons, 2007. - W.P. Bowen and J.C. Jerman, “Nonlinear Regression Using Spreadsheets,
*Trends in Pharmacological Sciences,*16 (12), 413-417, 1995. - A. Brown, “A Step-By-Step Guide to Nonlinear Regression Analysis of Experimental Data Using a Microsoft Excel Spreadsheet,”
*Computer Methods and Programs in Biomedicine,*65, 191-200, 2001.

**Patrick Valentine**is technical and Lean Six Sigma manager at Uyemura USA (uyemura.com); pvalentine@uyemura.com. As part of his responsibilities, he teaches Six Sigma green belt and black belt courses. He holds a doctorate in Quality Systems Management from New England College of Business and ASQ certifications as a Six Sigma Black Belt and Reliability Engineer.