Abstract

As is known, finding the parameters of multiple linear regression is an important case. Of course, these parameters can be easily found with the help of the computer. In this study, in addition to the formula of the parameters of linear regression, the general formulas of the parameters of 5 and less independent variables of multiple linear regression are given with a certain order. The derivations of the formulas presented are given step by step. In addition to classical matrix form, these new formulas for estimation of the parameters of multiple linear regression could be proposed especially to the researchers not using computer program for calculating the complex operations. By using these formulas, the researcher can estimate easily the parameters of multiple linear regression without using a computer and so the researcher can compose easily the table of variance analysis to interpret the regression made. Since for 6 and more independent variables, the tables of the parameters of multiple linear regression are too long and they take up too much space, the general formulas of the parameters of 6 and more independent variables of multiple linear regression could not be given in this study.

1. Introduction

The process of determining the relationship between one dependent variable and one or more independent variable(s) is called regression. In mathematics, statistics, and many sciences, regression is one of the important topics. Regression is a means of predicting a dependent variable based on one or more independent variables. This is done by fitting a line or surface to the data points that minimizes the total error. The line or surface is called the regression model or equation. In this study firstly we will work with simple linear regression. After that we will work with multiple linear regression.

Regression analysis is an important statistical tool for analyzing the relationships between dependent and independent variables. The main goal of regression analysis is to determine and estimate parameters of a function that describe the best fit for a given data sets. There are many linear types of regression analysis models such as simple and multiple regression models. Regression analysis is the widely used statistical tool for understanding relationships among variables. It is used when there is a continuous dependent variable which could predict by independent variables [1].

Seber defined linear regression analysis, LRA, as a common technique of estimating the relationship between any two random variables, the explanatory variable X, and the dependent variable Y such as height and weight, income and intelligence quotient, and ages of husband and wife [2]. Bates and Watts mentioned LRA as a powerful methodology for analyzing data and used it for describing the relation between the predictor variables [3]. Chatterjee and Hadi defined regression analysis, RA, as a conceptually simple method for investigating functional relationship among variables [4]. The simple relationship among dependent and explanatory variables can be defined as follows: where a random error representing the discrepancy in the approximation is assumed to be ε. It accounts for the failure of the model to fit the data exactly. The function f (X1, X2) describes the relationship between the dependent variable Y and the explanatory variables X1, X2.

Hutcheson and Moutinho defined simple linear regression, SLR model, as a relationship between a continuous response variable Y and a continuous explanatory variable X may be represented by using a line of best fit where Y is predicted at least to some extent by X. When the relationship is linear, it may be represented mathematically using a straight line equation. The regression coefficient describes the change in Y that is associated with a unit change in X. This line is frequently computed using the least square procedure [5].

Linear regression is one of the fundamental techniques in the statistical analysis of data. We assume a straight-line model for a response variable Y as a function of one or more predictor (or explanatory) variables X [6]. In this study, firstly, we look at exactly one predictor variable and then we will look at two or more predictor variables.

Multiple linear regression is as follows when the error is omitted:where is the equation of the regression, are unknown parameters of the regression, and are the independent variables of the regression.

The unknown parameters, , are estimated by using the method of least squares in multiple linear regression. The estimates of the coefficients are the values that minimize the sum of squared errors for the sample. The obtained formula for this is given in this study on matrix notation.

Equations like this can easily be handled by any computer program that does ordinary multiple regression. But in this study for getting the parameters of multiple linear regression without using a computer program, the general formula of the parameters will be given.

Multiple regression analysis is one of the most widely used statistical procedures. Its popularity is fostered by its applicability to varied types of data and problems, ease of interpretation, robustness to violations of the underlying assumptions, and widespread availability [7].

Hosmer and Lemeshow mentioned that, in any regression problem, the key quantity is the mean value of the outcome variable given by the value of the independent variables. This quantity is called the conditional mean. It is also known as conditional expected value, or conditional expectation. It is the expected value of a real random variable with respect to a conditional probability distribution [8].

If a simple linear regression model with one predictor variable, X1, is started, then add a second predictor variable, X2, and error sum of squares (SSE) will decrease (or stay the same) while total sum of squares (SST) remains constant, and so R-squared (R2) will increase (or stay the same). In other words, R2 always increases (or stays the same) as more predictors are added to a multiple linear regression model, even if the predictors added are unrelated to the response variable. Thus, by itself, R2 cannot be used to help us identify which predictors should be included or excluded in a model. But an alternative measure, adjusted R2, does not necessarily increase as more predictors are added and can be used to help us identify which predictors should be included or excluded in a model. Due to the malfunctioning of R2, the researchers preferred to use adjusted R2 [9].

In fact, the adjusted R2 statistic does not change by adding variables to the model. In addition, will often decrease by adding excessive parameters. This is the best way to add unnecessary variables in the model without changing the R2 significantly [10].

The regression sum of squares always increases, and the error sum of squares decreases due to adding more variables to the regression model. Adding an external variable to the model continues until one decides that the result will feature good accuracy. Actually, the effectiveness of the model will decrease with adding further insignificant variables because this increases the mean square error [11, 12].

2. Material and Methods

An important objective of regression analysis is to estimate the unknown parameters in the regression model. This process is also called fitting the model to the data. There are several parameter estimation techniques. One of these techniques is the method of least squares [13].

The most common method to estimate the regression parameters is the minimization of SSE. Multiple linear regression is one of the most used methods for many research fields such as forecasting, biology, medicine, psychology, economics, and environment [14].

In this study, for linear and multiple linear regression, the method of least squares is used.

3. Results

3.1. One Independent Variable and One Dependent Variable

If we use one independent variable and one dependent variable, then we use the following as linear regression:To minimize SSE, we use the method of least squares by means of the derivatives of the parameters. Equations (5) and (6) are called normal equations. Matrix display of this system with two parameters can be written as follows: If the both sides of (5) are divided by N, we getwhere and

If this value in (8) is substituted in the equation (6), we can getand then we can get where and

3.2. Two Independent Variables and One Dependent Variable

If we use two independent variables and one dependent variable, then we use the following as multiple regression:To minimize SSE, we use the method of least squares by means of the derivatives of the parameters.Equations (13), (14), and (15) are called normal equations. Matrix display of this system with 3 parameters can be written as follows:If both sides of the (13) are divided by N, is found. If this value is substituted in (14),can be obtained. And then is obtained. And similarly if this value is substituted in (15),can be obtained. And then is obtained. These equations (19) and (21) can be solved by using adding or substituting method. The matrix form of these equations (19) and (21) is the following:Since and is square matrix, can be also solved by using the following:where and , .

By solving this system of the two equations with two unknown parameters, then we will get the parameters of multiple linear regression with two independent variables as follows:and in addition to the constant parameter (17)Now we summarize the formula for the parameters of 2 independent variables in Tables 1 and 2. The denominator and numerator of the parameters for 2 independent variables are given under each table.

3.3. Three Independent Variables and One Dependent Variable

For 3 independent variables, after similar operations like the operations of 2 independent variables are done, we haveWe get the parameters as follows:andin addition to the constant parameterNow we summarize the formula for the parameters of 3 independent variables in Tables 3, 4, and 5. The denominator and numerator of the parameters for 3 independent variables are given under each table.

3.4. Four Independent Variables and One Dependent Variable

For 4 independent variables, after similar operations like the operations before are done, we haveIn addition to the constant parameterwe summarize the formula for the parameters of 4 independent variables in Tables 6, 7, 8, and 9. The denominator and numerator of the parameters for 4 independent variables are given under each table.

3.5. Five Independent Variables and One Dependent Variable

For 5 independent variables, after similar operations like the operations before are done, we haveIn addition to the constant parameterwe summarize the formula for the parameters of 5 independent variables in Tables 10, 11, 12, 13, and 14. Since the denominator and numerator of the parameters for 5 independent variables are too long and by using tables their derivations are similar to the derivations of the parameters for 4 or less independent variables, they are not given under each table. These are left for the researchers.

4. Discussion

Actually, in [14], Pires et al. gave the criteria to select statistically valid regression parameters using multiple linear regression models. Because of this reference, the researcher should carefully select statistically valid regression parameters of multiple linear regression. So the number of parameters of multiple linear regression will not be as many as the researchers want.

5. Conclusion

The aim of this work was to get the general formula of the parameters of multiple linear regression without using a computer program. By using this formula, the parameters of multiple linear regression can be found easily. So the researchers can estimate the equation of regression. Since for 6 and more independent variables, the tables of the parameters of multiple linear regression without using a computer program are too long and they take up too much space, in this study, for only 5 and less independent variables the tables for the parameters of multiple linear regression without using a computer program were given.

In this study, the general formulas of the parameters of 5 and less independent variables of multiple linear regression are given with a certain order. Because it is intended to be shown, although the general formulas of the parameters of 5 and less independent variables of multiple linear regression take up much space, they are given in this study.

As is known, the term number of denominator or numerator of independent variables is the factorial number of independent variables. Since the denominator and numerator of the parameters for 6 and more independent variables are too long, we could not summarize the formula for the parameters in TABLES. But it should be known that they can be found by using similar derivations as in the creation of the Tables of 5 or less independent variables. These are left for the researchers who deal with 6 or more independent variables.

Data Availability

The author did not use any data set.

Conflicts of Interest

The author declares that they have no conflicts of interest.