In this research, we dealt with the multilevel linear regression model, which is one of the most important models widely used and applied in analyzing data that is characterized by the fact that the observations in it take a hierarchical form. Two different methods were also applied to estimate the model parameters, namely the general least squares method and Bayesian analysis. A comparison was made between them and which is better in the estimation process through the Akaike information criterion and the Bayesian information criterion. It was found that the Bayesian analysis method is the most efficient in the estimation process. The Bayesian analysis method is the best method for estimating the parameters of a multilevel model (with two levels) for PMRM-2 panel data in general for any type of regression models for panel data and for different sample sizes. The method maintained the preference for estimation by adopting the AIC and BIC measures. This means that the Bayesian analysis method can be adopted for estimation in the applied aspect when estimating the parameters of a multilevel model (with two levels) for PMRM-2 panel data, such as wheat data in Some governorates of Iraq, according to the available time series extending from 2000 - 2021.
Highlights:
Keywords: Multilevel Regression, Bayesian Analysis, Generalized Least Squares, AIC, BIC
The basic material in statistics is the data available about the phenomenon under study, and as a result of the type, form and nature of the data, the appropriate mathematical model is determined to represent it and then analyze it to reach the best results, especially in regression analysis. This means that each type of data has its own characteristics that determine the type of statistical method. For example, the presence of time series data represented by a linear regression model, in which random errors may be interconnected and follow one of the time series models, means the presence of an autocorrelation problem, which requires the use of the Generalized Least Square (GLS) method to estimate its parameters. Likewise, the presence of cross-sectional data for a phenomenon may lead to the emergence of the problem of heterogeneity of variance for random errors in the regression model due to the inequality of their variances, which requires the use of the Weighted Least Square (WLS) method when estimating its parameters. The two types of data above are at one level through which the parameters of that model are estimated according to the verification of its hypotheses using Methods of estimation These two types of data are widely available for various economic, social, psychological, etc. phenomena.
Research objective
Based on the above, the research aims to:
1. Estimating the parameters of the multilevel regression model MRM for panel data using the general Bayesian analysis method and the least squares method.
2. Estimating the parameters of the multilevel regression model for the panel data representing data on the phenomenon of wheat production in the agricultural sector in Iraq within a group of governorates to produce wheat crops with monetary values as dependent variables and some influential explanatory (independent) variables (quantity of water resources, modern mechanization, quantity of fertilizer).
Multilevel regression model
Multilevel data and its types
Multilevel data has types that differ in their names according to the mechanism for recording their observation, which requires the researcher to use the appropriate multilevel model for them. Therefore, they are an extension of panel data (double) as follows:
First: Hierarchical data, which is the data that overlaps with each other and represents the general picture of multilevel data. It is classified into levels, the lowest level of which represents Level 1, which is represented by the regression model of the response variable for the phenomenon, while the other higher levels (last) represent the main sample of the phenomenon, i.e. if there is a group of educational classes, we consider them to be in the second level, while the first level is the students of one of those selected classes. The multi-level regression model for two levels is the most common, and it is the model that will be focused on in the research, estimation and testing with the presence of panel data. In a two-level regression model, there are two levels, where the first level concerns the dependent variable (the lowest) or the first, while the second level concerns the upper limit and contains the parameter equations included in the first level or part of it. Multi-level data can be explained with two balanced levels, Level 2 nested.
Concept of Multilevel Regression Model
To reach the concept of Multilevel Regression Model MRM, the following must be clarified:
The traditional regression model consists of a set of variables, one of which is a response variable (dependent) and the rest are explanatory variables (independent) The response variable (dependent) is a mathematical function in terms of the explanatory variables (independent). If one explanatory variable is available, we obtain the Simple Linear Regression Model (SLRM). The model is defined as follows [1]:
Since:
Yt: represents the value of the response variable (dependent) at observation t.
Xt: represents the value of the explanatory variable (independent) at observation t.
β0,β1: Represents the unknown regression model parameters to be estimated.
ut: Represents the value of the random error at observation t.
When there are two or more explanatory (independent) variables, we obtain the Multiple Linear Regression Model (MLRM), which is defined as follows:
Yt=β0+β1 X1t+β1 X2t+⋯+βj Xjt+⋯+βk Xkt+ut (2)
t= 1, 2, …, T, j=1,2,…k
Since:
Xjt: Represents the value of the explanatory (independent) variable j at observation t. The rest of the symbols are the same as those defined in model (1), noting that there are more than two unknown parameters in this model.
It is noted from the traditional linear regression model, both simple and multiple, that it deals with only one cross-section and that the estimated values of its parameters when estimated are fixed values that do not change when the value of the observed explanatory variables included in the model changes. This picture occurs when a phenomenon is studied at the level of one region, factory, sector, etc. But when studying the same phenomenon with a series size T at the level of more than one sector or city, etc. with a size of n, here we obtain a linear regression model for panel data, which is of the following types [7]:
1- Pooled Regression Model It is obtained by collecting cross-sectional data in a single sample of size N=n.t and thus it can be treated as a traditional regression model, and the model can be written as follows
Yt=β0+β1 X1t+β1 X2t+⋯+βj Xjt+⋯+βk Xkt+ut
t= 1, 2, …, N, j=1,2,…k (3)
First: Fixed effect model: assumes that the fixed limits differ across the cross-sections during the time units so that there are a number of fixed limits in the linear regression model as many as the number of cross-sections and the model is written in the form of matrices as follows [2][7]:
[Y1 Y2 Y3 ⋮ Yn ](N×1) = [J 0 0 J 0 0 ⋯ … 0 0 0 0 J … 0 ⋮ 0 ⋮ 0 ⋮ 0 ⋱ … 0 J ](N×N) [β01 β02 β03 ⋮ β0n ](N×1)+[X1 X2 X3 ⋮ Xn ](N×K) [β]((K×1)+[U1 U2 U3 ⋮ Un ](N×1) (4)
Since:
J=[1 1 1 … 1 ](T×1)
It is concluded from model (4) that dummy variables must appear that take values (0,1) with fixed limits, and according to the strategies for estimating the parameters of the fixed effect model, other names have appeared The model includes the dummy variables model and the analysis of covariance model [7][1] [4].
Second: Random effect model: The random effect model assumes the difference in the variances of the random error resulting from the sum of two or more error terms resulting from the use of a regression model with the presence of a random fixed term (Random Intercept), i.e. the presence of (the random fixed term parameter) that can be defined in general as follows [6][4][7]:-
β0i= β0+μi, i=1,2,..n (5)
Accordingly, the model is as follows:
Y(N×1) =β0J(N×1) + X(N×K) β(K×1) + W(N×1) (6)
Model (5) can be rewritten as follows:
Y(N×1) = Z(N×(k+1)h(k+1) +W(N×1)
Since:
(wi)(T×1) =μiJ(T×1))+(ui)(T×1), i=1,2,…,n
As a result of the presence of compound errors in it, it has acquired other names, which are the Error components model or the Variance components model.
3- Random coefficients Model; Randomness in the panel regression model is achieved in several forms, the first of which is as explained in Equation (6), i.e. all the model parameters are written in the same form with a change according to the type of parameters (fixed limits or marginal slopes). As for the second form, it is when the model parameters (fixed limits and marginal slopes) are written in the form of a linear regression model, i.e. they are response variables (dependent) in terms of other explanatory variables other than the explanatory variables present in the original model.
General formula for a multilevel regression model (with two levels) for panel data
The general formula for a multilevel regression model with two levels MRM-2 can be obtained as follows [2][8]:
Where:
Yi: is a vector of order (T×1) of the observations of the response variable (dependent) in the first level of segment i.
Xi: is a matrix of order (T×(K+1)) of the observations of the explanatory variables (independent) in the first level of segment i.
βi: A vector of order ((K+1)×1) of the unknown parameters in the first level of segment i.
Yi: A vector of order (T×1) of the random errors in the first level of segment i.
The second level (Level 2) is as follows:
Where:
gqi: represents the observation t, the explanatory variable (independent) q at the second level for the ith segment.
γ00i γ01i,..,,γk(Q-1)i,γkQi: parameters of the regression model at the second level for the ith segment.
e0i,eji: represents the random errors of the linear regression models at the second level for the ith segment.
Bayes theory
Bayes' idea in the estimation process is based on the concept of employing prior or initial information about the unknown parameters (Prior Information) that are to be estimated, considering that these unknown parameters are random variables (Random Variables) and have probability distributions. This prior information can be placed in the form of a probability distribution so that an initial distribution is formed known as the initial probability density function (Prior p.d.f), which represents all previous information and experiences about the unknown parameters that were reached through analysis, follow-up and previous studies, in addition to the theories that govern certain phenomena. In other words, if we assume that we have a wave of unknown parameters θ=(θ1,θ2,…,θp), which represent parameters that are to be estimated by a value of p, such as averages, variances and common variances, etc. The initial probability density function (prior) can be symbolized as the initial function of these parameters by which is before sampling, but after sampling, a sample of size n is available for observations of the study variable (Y1, Y2,…, Yn), so the likelihood function for these observations is . By combining the initial probability density function (prior) with the likelihood function for the current observations of the sample, the posterior probability density function (Posterior p.d.f.) is obtained, , which represents the distribution function of θ but after sampling, and the above can be explained in the following form [8] [1]:
It can be expressed mathematically as follows:
Where:
f(Y,θ): represents the joint probability density function of the two random variables y,θ.
f(θ): represents the initial (prior) probability density function of the parameter θ before sampling.
f(θ): represents the maximum probability function of the sample observations.
f(Y) : represents the posterior probability density function of the parameter θ after sampling, and the symbol (∝) indicates that the function is a proportional quantity, meaning that it needs to be multiplied by a constant to become equal, and this mechanism is known in the Bayesian method.
The Bayes estimation method depends on the availability of two basic functions, which are [1]:
The first: the posterior probability density function , which is a mixture of sample information and previous information that was reached and explained in the previous paragraph.
The second: the loss function, which is symbolized by L (θ ̂,θ), which must meet the following two conditions.
By making the mathematical expectation of the loss function at its minimum, we can find the point estimate of the unknown parameters vector using the Bayesian method:
It is worth noting that there are types of loss functions, each with its own characteristics and capabilities, but the most common in the estimation process is the (Squared Error Loss Function) or (the Quadratic Loss Function) in the case of One parameter and its form is as follows:
As for the loss function in the case of having more than one parameter to be estimated as in the regression model for example (Weighted Square Error Loss Function) and it is written as follows:
Since the matrix Q is positive, deterministic, non-random and symmetric.
Finding the expectation for function (12) is as follows:
E(Y)=E(Y)
As noted, the second term of equation (13) is independent of , while the first term depends on it , thus the loss can be reduced when:
Since: E(Y) refers to the mean of the posterior distribution and represents Bayes' estimate for this type of loss function.
The above can be explained in a simplified way. One parameter θ can be dealt with in the case of the existence of the loss function defined in equation (11) as follows[1] [2]:
By taking the expectation of the function above, we get the following:
By taking the first derivative of equation (15) for the parameter , and setting it equal to zero in order to minimize the expected loss. The following is obtained [2]:
Equation (16) represents Bayes' estimate of the parameter θ equal to the expectation of the posterior probability density function of the parameter θ, where E(Y) represents the mean of the posterior distribution of the parameter θ.
By taking the second derivative of equation (15) for the parameter and setting it equal to zero, it is noted that it has a positive value, which indicates that equation (16) is at its minimum limit, which also indicates that the loss function satisfies the two conditions mentioned for equation (10).
Generalized Least Squares Method
The generalized least squares method is one of the common estimation methods that fall within the traditional estimation methods that are based on the idea that the parameters to be estimated are fixed quantities. and its formula when estimating the parameters of the multilevel regression model (with two levels) for the PMRM-2 panel data, and although the researcher agreed with the idea of the second direction in the process of estimating the parameters γ and relying on the regression model, the researcher will give the following [3][5]:
It is worth noting that the estimators of the general least squares method defined by formula (20) are unbiased estimators and are characterized by the best unbiased estimator BLUE. As can be seen from formulas (22) and (23), they depend on the values of the variances for each of the cross-sections, , which are often unknown in the application aspect, so the unbiased estimator is substituted for them and for each section. [8][5]:
Simulation and Application
Introduction:
In this research, the experimental and applied side will be conducted through the experimental comparison of the estimation methods for the regression parameters, multilevel regression (two levels), with the presence of panel data that were presented in the theoretical side, using the simulation method, as it provides many hypothetical cases that simulate the applied or practical reality, and the final result that includes the optimal estimation method and obtained from the analysis of the simulation results from the comparison of the estimation methods referred to in the theoretical side is what will be adopted in the applied (practical) side. As a result of the high flexibility of the simulation method by obtaining random samples that simulate the applied or practical reality, many researchers resort to using it in their research, as the experiments are re-implemented many times with a specific repetition such as 1000 times as in this research using bootstrap and for the various models under study, and the purpose of this is to determine the optimal methods in the estimation process, in which the practical aspect is used (applying the optimal method to real data), in addition to it being the appropriate method for comparing between estimation methods for the parameters of a model for which real data is not available when conducting research on it. In summary, the simulation method is a process of imitating a mathematical model, such as a regression model that exists in the applied reality, with a number of methods for estimating its parameters. When it is implemented and its results are compared, we arrive at the optimal method for the estimation process. This is done by building models or software to imitate an existing real system, through which the results are discussed to determine the optimal estimation method.
Stages of building the simulation experiment:
This section includes building the simulation experiment to use the simulation method in the process of comparing estimation methods and determining the optimal methods based on the comparison measures AIC and BIC. Using the Matlab program, the program was created and Monte Carlo simulation was implemented, as the simulation experiments relied on generating data for a multi-level regression model (with two levels) with the presence of the PMRM-2 panel data:
To conduct the simulation under the assumptions of a multilevel regression model (with two levels) for model (3-1), its data was generated using the bootstrap sampling method, as the data will be generated according to the following steps:
1- Generating random errors according to the normal distribution with a defined mean and variance as follows according to the assumed sample sizes (25, 50, 75, 100) where the generation was:
u1: normal (0.04, 0.5)
u2: normal (0.3, 0.5)
u3: normal (0.6, 0.5)
2- The initial values of the parameters were assumed as in the following table (1):
3- Calculating the values of the variables Y (first iteration) for all three cross-sections based on the values of random errors and the default values of the parameters defined in steps 1, 2 respectively and the real data as explanatory variables for the wheat crop to obtain the dependent variables generated as a first stage and for all cross-sections.
4- The parameters of the multilevel regression model are estimated at β Level 1 and also the estimators of Level 2 shown in Formula 26 depending on its explanatory (independent) variable and all cross-sections and thus obtaining the estimated equation as follows:
5- Finding the random errors u as a second stage through the following equation:
6- Selecting a random sample as much as the sample size specified from the random errors extracted in Step 5 with the return.
7- We re-extract the values of the variable Y depending on the sample of random errors obtained in Step 6 and the real data for the explanatory variables for the wheat crop and the default values of the parameters to obtain the second iteration of the dependent variables.
8- Repeat steps 4-7 until reaching 1000 iterations and then find the estimator for each parameter and for each cross section through
9- By obtaining the estimators of the parameters and for each of the 3 cross sections, the mean is extracted for each parameter to obtain a single value for the estimator for the cross sections and in accordance with the idea of analyzing panel data as follows:
It is worth noting that the support points for each of the parameters and errors that were adopted in the general maximum entropy method are:
(-2,-1,0,1,2) , c=2
10- To determine the optimal method in the estimation process among the estimation methods adopted in the research, which are:
a. The general least squares method, symbolized by GLS.
b. The general maximum entropy method GEM.
We use two (standard) measures to compare the estimation methods, which are:
The first measure (standard): Akaike information criterion (AIC), which is an individual score to determine which of the best models is based on the estimators for which the criterion value was calculated. Thus, the optimal method can be determined by comparing the criterion values for all models, and the model with the lowest (AIC) is the best. Thus, the estimation method based on which its value was calculated is the best. Its formula is:
Where:
K: represents the number of explanatory variables in the model.
: represents the estimation of the possibility function.
The second criterion: Bayesian Information Criterion
The BIC criterion is also a single-valued criterion for choosing the best model if the model is estimated on a specific data set. Therefore, the lowest BIC value gives the model performance advantage and thus the estimation method advantage, in which the BIC value was calculated. Its formula is as follows:
BIC = -2 log L + d Ln(N)
Where:
N: sample size.
d: number of parameters in the model.
Experimental aspect (disc ussion of simulation results):-
In this section, the simulation results will be analyzed and discussed according to the assumed cases and for each model of the panel data models (pooled, fixed model and random model) and their parameters will be estimated using (GLS, BC and GME) methods. In this aspect, a single model for the panel data was not specified. The purpose of this is to know the effect of the estimation methods adopted in the research and the extent of their efficiency stability with the change in the model as well as the change in sample sizes) N=3t ((30, 51, 75) that were generated according to generating time series (T) with sizes (10, 17, 25) and multiplying them by three cross-sections. By running the simulation program, which continues for a long time, the results were obtained that will be displayed according to the following hypothetical cases:
The first hypothetical case: When the PMRM-2 model is a pooled model.
Table (2) shows the estimates of the first level of the PMRM-2 simulation model when it is a pooled model as: Table (2) Estimates of Level 1 parameters from the simulation model
BC | GLS | Method | |
j | N=3T | ||
103.067 | 103.044 | 0 | 30 |
1.48883 | 1.48885 | 1 | |
2.99992 | 2.99997 | 2 | |
2.99992 | 2.99997 | 3 | |
0.04003 | 0.04003 | 4 | |
0.6223 | 0.62232 | 5 | |
56.89465 | 56.88664 | 0 | 51 |
1.48551 | 1.48554 | 1 | |
2.99991 | 2.99996 | 2 | |
2.99991 | 2.99996 | 3 | |
0.0005 | 0.00049 | 4 | |
0.50836 | 0.50835 | 5 | |
115.888 | 115.88 | 0 | 75 |
1.4871 | 1.48712 | 1 | |
2.99993 | 2.99998 | 2 | |
2.99993 | 2.99998 | 3 | |
0.14878 | 0.14877 | 4 | |
0.74177 | 0.74178 | 5 |
It is noted from Table (2) that the estimators of the General Least Squares (GLS) and Bayesian methods based on the natural conjugate primary function (BC) are very close in their evaluations of each other for all the parameters of the first level (Level 1) of the multi-level model (with two levels) for the assumed panel data in the simulation experiment, unlike the General Maximum Entropy (GME) estimator, which gave estimates that differ from them, but not much. This observation is also clear for the estimates of the parameters of the (Level 2) of the same model and for the same estimation methods, as shown in the following table:
It is noted that the values of the GLS and BC methods are close in value and sometimes equal to each other, unlike the GME method, which showed a difference from them, which gives a scientific impression that there is a difference When estimating and to determine the optimal method, and based on the results of Tables (2) (3), they are compared using the AIC and BIC coefficients to determine the optimal method in the estimation process for the parameters of a multilevel model (with two levels) for the PMRM-2 panel data when the model is pooled, and the results are as follows:
BC | GLS | Method | |
Coefficient (scale) | N=3T | ||
349.1869 | 349.1872 | AIC | 30 |
357.5941 | 357.5944 | BIC | |
586.8427 | 586.8427 | AIC | 51 |
598.4336 | 598.4336 | BIC | |
2544.094 | 2544.097 | AIC | 75 |
2564.591 | 2564.593 | BIC |
It is noted from Table (4) that the values of the AIC and BIC coefficients for the general least squares (GLS) method and the Bayesian BC method based on the normal conjugate probability function are very close when N=30 and equal for the two sample sizes N=51 and N=75, which indicates the closeness or equality of their efficiency in the estimation process and when comparing them in the process of estimating the parameters of a multi-level model (with two levels) for the PMRM-2 panel data, assuming that the model is pooled. It is also noted that the values of the AIC and BIC coefficients increase with the increase in the sample size.
1- It was found that the general maximum entropy method gives more accurate estimates than the general least squares method according to Akaike criterion and Bayesian information criterion.
2- The values of the general maximum entropy method (GME) estimates for the level 1 parameters of a multilevel model (with two levels) for the PMRM-2 panel data for wheat crop are clearly evident, and this observation is also evident in the estimates of the level 2 parameters.
3- The values of the AIC and BIC coefficients for the two methods of the general least squares method (GLS) for different types of samples (N=30, N=51 and N=75) indicate that their efficiency is close or equal in the estimation process when compared with the values of the same coefficients for the general maximum entropy method (GME).
4- The estimators of the general least squares method (GLS) for the level 1 parameters of a multilevel model (with two levels) for the assumed panel data in the simulation experiment, unlike the estimator of the general maximum entropy method (GME), which gave different estimates from them, but not much. This observation is also clear for the estimates of the level 2 parameters for the same model and for the estimation methods.
multilevel models with latent variable interactions, Structural Equation Modeling: A Multidisciplinary Journal, DOI: 10.1080/10705511.2020.1761808