Computer Science
DOI: 10.21070/acopen.10.2025.10488

Statistical Study for the Estimators of Bayesian Analysis and Generalized Least Squares Method for the Parameters of the Multilevel Regression Model


Kajian Statistik untuk Estimator Analisis Bayesian dan Metode Kuadrat Terkecil Tergeneralisasi untuk Parameter Model Regresi Multilevel

Collage of Administration and Economic, Al Mustansiriyah University
Iraq
Ministry of Education
Iraq

(*) Corresponding Author

Multilevel Regression Bayesian Analysis Generalized Least Squares AIC BIC

Abstract

In this research, we dealt with the multilevel linear regression model, which is one of the most important models widely used and applied in analyzing data that is characterized by the fact that the observations in it take a hierarchical form. Two different methods were also applied to estimate the model parameters, namely the general least squares method and Bayesian analysis. A comparison was made between them and which is better in the estimation process through the Akaike information criterion and the Bayesian information criterion. It was found that the Bayesian analysis method is the most efficient in the estimation process. The Bayesian analysis method is the best method for estimating the parameters of a multilevel model (with two levels) for PMRM-2 panel data in general for any type of regression models for panel data and for different sample sizes. The method maintained the preference for estimation by adopting the AIC and BIC measures. This means that the Bayesian analysis method can be adopted for estimation in the applied aspect when estimating the parameters of a multilevel model (with two levels) for PMRM-2 panel data, such as wheat data in Some governorates of Iraq, according to the available time series extending from 2000 - 2021.

Highlights:

  1. Multilevel regression applied for hierarchical data with two-level structure.
  2. Bayesian analysis outperformed generalized least squares using AIC and BIC measures.
  3. Study analyzed wheat data (2000–2021) in Iraq governorates effectively.

Keywords: Multilevel Regression, Bayesian Analysis, Generalized Least Squares, AIC, BIC

Introduction

The basic material in statistics is the data available about the phenomenon under study, and as a result of the type, form and nature of the data, the appropriate mathematical model is determined to represent it and then analyze it to reach the best results, especially in regression analysis. This means that each type of data has its own characteristics that determine the type of statistical method. For example, the presence of time series data represented by a linear regression model, in which random errors may be interconnected and follow one of the time series models, means the presence of an autocorrelation problem, which requires the use of the Generalized Least Square (GLS) method to estimate its parameters. Likewise, the presence of cross-sectional data for a phenomenon may lead to the emergence of the problem of heterogeneity of variance for random errors in the regression model due to the inequality of their variances, which requires the use of the Weighted Least Square (WLS) method when estimating its parameters. The two types of data above are at one level through which the parameters of that model are estimated according to the verification of its hypotheses using Methods of estimation These two types of data are widely available for various economic, social, psychological, etc. phenomena.

Research objective

Based on the above, the research aims to:

1. Estimating the parameters of the multilevel regression model MRM for panel data using the general Bayesian analysis method and the least squares method.

2. Estimating the parameters of the multilevel regression model for the panel data representing data on the phenomenon of wheat production in the agricultural sector in Iraq within a group of governorates to produce wheat crops with monetary values ​​as dependent variables and some influential explanatory (independent) variables (quantity of water resources, modern mechanization, quantity of fertilizer).

Methods

Multilevel regression model

Multilevel data and its types

Multilevel data has types that differ in their names according to the mechanism for recording their observation, which requires the researcher to use the appropriate multilevel model for them. Therefore, they are an extension of panel data (double) as follows:

First: Hierarchical data, which is the data that overlaps with each other and represents the general picture of multilevel data. It is classified into levels, the lowest level of which represents Level 1, which is represented by the regression model of the response variable for the phenomenon, while the other higher levels (last) represent the main sample of the phenomenon, i.e. if there is a group of educational classes, we consider them to be in the second level, while the first level is the students of one of those selected classes. The multi-level regression model for two levels is the most common, and it is the model that will be focused on in the research, estimation and testing with the presence of panel data. In a two-level regression model, there are two levels, where the first level concerns the dependent variable (the lowest) or the first, while the second level concerns the upper limit and contains the parameter equations included in the first level or part of it. Multi-level data can be explained with two balanced levels, Level 2 nested.

Concept of Multilevel Regression Model

To reach the concept of Multilevel Regression Model MRM, the following must be clarified:

The traditional regression model consists of a set of variables, one of which is a response variable (dependent) and the rest are explanatory variables (independent) The response variable (dependent) is a mathematical function in terms of the explanatory variables (independent). If one explanatory variable is available, we obtain the Simple Linear Regression Model (SLRM). The model is defined as follows [1]:

Figure 1.

Since:

Yt: represents the value of the response variable (dependent) at observation t.

Xt: represents the value of the explanatory variable (independent) at observation t.

β01: Represents the unknown regression model parameters to be estimated.

ut: Represents the value of the random error at observation t.

When there are two or more explanatory (independent) variables, we obtain the Multiple Linear Regression Model (MLRM), which is defined as follows:

Yt01 X1t1 X2t+⋯+βj Xjt+⋯+βk Xkt+ut (2)

t= 1, 2, …, T, j=1,2,…k

Since:

Xjt: Represents the value of the explanatory (independent) variable j at observation t. The rest of the symbols are the same as those defined in model (1), noting that there are more than two unknown parameters in this model.

It is noted from the traditional linear regression model, both simple and multiple, that it deals with only one cross-section and that the estimated values ​​of its parameters when estimated are fixed values ​​that do not change when the value of the observed explanatory variables included in the model changes. This picture occurs when a phenomenon is studied at the level of one region, factory, sector, etc. But when studying the same phenomenon with a series size T at the level of more than one sector or city, etc. with a size of n, here we obtain a linear regression model for panel data, which is of the following types [7]:

1- Pooled Regression Model It is obtained by collecting cross-sectional data in a single sample of size N=n.t and thus it can be treated as a traditional regression model, and the model can be written as follows

Yt01 X1t1 X2t+⋯+βj Xjt+⋯+βk Xkt+ut

t= 1, 2, …, N, j=1,2,…k (3)

First: Fixed effect model: assumes that the fixed limits differ across the cross-sections during the time units so that there are a number of fixed limits in the linear regression model as many as the number of cross-sections and the model is written in the form of matrices as follows [2][7]:

[Y1 Y2 Y3 ⋮ Yn ](N×1) = [J 0 0 J 0 0 ⋯ … 0 0 0 0 J … 0 ⋮ 0 ⋮ 0 ⋮ 0 ⋱ … 0 J ](N×N)01 β02 β03 ⋮ β0n ](N×1)+[X1 X2 X3 ⋮ Xn ](N×K) [β]((K×1)+[U1 U2 U3 ⋮ Un ](N×1) (4)

Since:

J=[1 1 1 … 1 ](T×1)

It is concluded from model (4) that dummy variables must appear that take values ​​(0,1) with fixed limits, and according to the strategies for estimating the parameters of the fixed effect model, other names have appeared The model includes the dummy variables model and the analysis of covariance model [7][1] [4].

Second: Random effect model: The random effect model assumes the difference in the variances of the random error resulting from the sum of two or more error terms resulting from the use of a regression model with the presence of a random fixed term (Random Intercept), i.e. the presence of (the random fixed term parameter) that can be defined in general as follows [6][4][7]:-

β0i= β0i, i=1,2,..n (5)

Accordingly, the model is as follows:

Y(N×1)0J(N×1) + X(N×K) β(K×1) + W(N×1) (6)

Model (5) can be rewritten as follows:

Y(N×1) = Z(N×(k+1)h(k+1) +W(N×1)

Since:

(wi)(T×1) =μiJ(T×1))+(ui)(T×1), i=1,2,…,n

As a result of the presence of compound errors in it, it has acquired other names, which are the Error components model or the Variance components model.

3- Random coefficients Model; Randomness in the panel regression model is achieved in several forms, the first of which is as explained in Equation (6), i.e. all the model parameters are written in the same form with a change according to the type of parameters (fixed limits or marginal slopes). As for the second form, it is when the model parameters (fixed limits and marginal slopes) are written in the form of a linear regression model, i.e. they are response variables (dependent) in terms of other explanatory variables other than the explanatory variables present in the original model.

General formula for a multilevel regression model (with two levels) for panel data

The general formula for a multilevel regression model with two levels MRM-2 can be obtained as follows [2][8]:

Figure 2.

Where:

Yi: is a vector of order (T×1) of the observations of the response variable (dependent) in the first level of segment i.

Xi: is a matrix of order (T×(K+1)) of the observations of the explanatory variables (independent) in the first level of segment i.

βi: A vector of order ((K+1)×1) of the unknown parameters in the first level of segment i.

Yi: A vector of order (T×1) of the random errors in the first level of segment i.

The second level (Level 2) is as follows:

Figure 3.

Where:

gqi: represents the observation t, the explanatory variable (independent) q at the second level for the ith segment.

γ00i γ01i,..,,γk(Q-1)ikQi: parameters of the regression model at the second level for the ith segment.

e0i,eji: represents the random errors of the linear regression models at the second level for the ith segment.

Bayes theory

Bayes' idea in the estimation process is based on the concept of employing prior or initial information about the unknown parameters (Prior Information) that are to be estimated, considering that these unknown parameters are random variables (Random Variables) and have probability distributions. This prior information can be placed in the form of a probability distribution so that an initial distribution is formed known as the initial probability density function (Prior p.d.f), which represents all previous information and experiences about the unknown parameters that were reached through analysis, follow-up and previous studies, in addition to the theories that govern certain phenomena. In other words, if we assume that we have a wave of unknown parameters θ=(θ1,θ2,…,θp), which represent parameters that are to be estimated by a value of p, such as averages, variances and common variances, etc. The initial probability density function (prior) can be symbolized as the initial function of these parameters by which is before sampling, but after sampling, a sample of size n is available for observations of the study variable (Y1, Y2,…, Yn), so the likelihood function for these observations is . By combining the initial probability density function (prior) with the likelihood function for the current observations of the sample, the posterior probability density function (Posterior p.d.f.) is obtained, , which represents the distribution function of θ but after sampling, and the above can be explained in the following form [8] [1]:

Figure 4.Bayesian theory concept

It can be expressed mathematically as follows:

Figure 5.

Where:

f(Y,θ): represents the joint probability density function of the two random variables y,θ.

f(θ): represents the initial (prior) probability density function of the parameter θ before sampling.

f(θ): represents the maximum probability function of the sample observations.

f(Y) : represents the posterior probability density function of the parameter θ after sampling, and the symbol (∝) indicates that the function is a proportional quantity, meaning that it needs to be multiplied by a constant to become equal, and this mechanism is known in the Bayesian method.

The Bayes estimation method depends on the availability of two basic functions, which are [1]:

The first: the posterior probability density function , which is a mixture of sample information and previous information that was reached and explained in the previous paragraph.

The second: the loss function, which is symbolized by L (θ ̂,θ), which must meet the following two conditions.

Figure 6.

By making the mathematical expectation of the loss function at its minimum, we can find the point estimate of the unknown parameters vector using the Bayesian method:

Figure 7.

It is worth noting that there are types of loss functions, each with its own characteristics and capabilities, but the most common in the estimation process is the (Squared Error Loss Function) or (the Quadratic Loss Function) in the case of One parameter and its form is as follows:

Figure 8.

As for the loss function in the case of having more than one parameter to be estimated as in the regression model for example (Weighted Square Error Loss Function) and it is written as follows:

Figure 9.

Since the matrix Q is positive, deterministic, non-random and symmetric.

Finding the expectation for function (12) is as follows:

E(Y)=E(Y)

Figure 10.

As noted, the second term of equation (13) is independent of , while the first term depends on it , thus the loss can be reduced when:

Figure 11.

Since: E(Y) refers to the mean of the posterior distribution and represents Bayes' estimate for this type of loss function.

The above can be explained in a simplified way. One parameter θ can be dealt with in the case of the existence of the loss function defined in equation (11) as follows[1] [2]:

Figure 12.

By taking the expectation of the function above, we get the following:

Figure 13.

By taking the first derivative of equation (15) for the parameter , and setting it equal to zero in order to minimize the expected loss. The following is obtained [2]:

Figure 14.

Equation (16) represents Bayes' estimate of the parameter θ equal to the expectation of the posterior probability density function of the parameter θ, where E(Y) represents the mean of the posterior distribution of the parameter θ.

By taking the second derivative of equation (15) for the parameter and setting it equal to zero, it is noted that it has a positive value, which indicates that equation (16) is at its minimum limit, which also indicates that the loss function satisfies the two conditions mentioned for equation (10).

Generalized Least Squares Method

The generalized least squares method is one of the common estimation methods that fall within the traditional estimation methods that are based on the idea that the parameters to be estimated are fixed quantities. and its formula when estimating the parameters of the multilevel regression model (with two levels) for the PMRM-2 panel data, and although the researcher agreed with the idea of ​​the second direction in the process of estimating the parameters γ and relying on the regression model, the researcher will give the following [3][5]:

Figure 15.

It is worth noting that the estimators of the general least squares method defined by formula (20) are unbiased estimators and are characterized by the best unbiased estimator BLUE. As can be seen from formulas (22) and (23), they depend on the values ​​of the variances for each of the cross-sections, , which are often unknown in the application aspect, so the unbiased estimator is substituted for them and for each section. [8][5]:

Simulation and Application

Introduction:

In this research, the experimental and applied side will be conducted through the experimental comparison of the estimation methods for the regression parameters, multilevel regression (two levels), with the presence of panel data that were presented in the theoretical side, using the simulation method, as it provides many hypothetical cases that simulate the applied or practical reality, and the final result that includes the optimal estimation method and obtained from the analysis of the simulation results from the comparison of the estimation methods referred to in the theoretical side is what will be adopted in the applied (practical) side. As a result of the high flexibility of the simulation method by obtaining random samples that simulate the applied or practical reality, many researchers resort to using it in their research, as the experiments are re-implemented many times with a specific repetition such as 1000 times as in this research using bootstrap and for the various models under study, and the purpose of this is to determine the optimal methods in the estimation process, in which the practical aspect is used (applying the optimal method to real data), in addition to it being the appropriate method for comparing between estimation methods for the parameters of a model for which real data is not available when conducting research on it. In summary, the simulation method is a process of imitating a mathematical model, such as a regression model that exists in the applied reality, with a number of methods for estimating its parameters. When it is implemented and its results are compared, we arrive at the optimal method for the estimation process. This is done by building models or software to imitate an existing real system, through which the results are discussed to determine the optimal estimation method.

Stages of building the simulation experiment:

This section includes building the simulation experiment to use the simulation method in the process of comparing estimation methods and determining the optimal methods based on the comparison measures AIC and BIC. Using the Matlab program, the program was created and Monte Carlo simulation was implemented, as the simulation experiments relied on generating data for a multi-level regression model (with two levels) with the presence of the PMRM-2 panel data:

Figure 16.

To conduct the simulation under the assumptions of a multilevel regression model (with two levels) for model (3-1), its data was generated using the bootstrap sampling method, as the data will be generated according to the following steps:

1- Generating random errors according to the normal distribution with a defined mean and variance as follows according to the assumed sample sizes (25, 50, 75, 100) where the generation was:

u1: normal (0.04, 0.5)

u2: normal (0.3, 0.5)

u3: normal (0.6, 0.5)

2- The initial values ​​of the parameters were assumed as in the following table (1):

Figure 17.

3- Calculating the values ​​of the variables Y (first iteration) for all three cross-sections based on the values ​​of random errors and the default values ​​of the parameters defined in steps 1, 2 respectively and the real data as explanatory variables for the wheat crop to obtain the dependent variables generated as a first stage and for all cross-sections.

4- The parameters of the multilevel regression model are estimated at β Level 1 and also the estimators of Level 2 shown in Formula 26 depending on its explanatory (independent) variable and all cross-sections and thus obtaining the estimated equation as follows:

Figure 18.

5- Finding the random errors u as a second stage through the following equation:

Figure 19.

6- Selecting a random sample as much as the sample size specified from the random errors extracted in Step 5 with the return.

7- We re-extract the values ​​of the variable Y depending on the sample of random errors obtained in Step 6 and the real data for the explanatory variables for the wheat crop and the default values ​​of the parameters to obtain the second iteration of the dependent variables.

8- Repeat steps 4-7 until reaching 1000 iterations and then find the estimator for each parameter and for each cross section through

Figure 20.

9- By obtaining the estimators of the parameters and for each of the 3 cross sections, the mean is extracted for each parameter to obtain a single value for the estimator for the cross sections and in accordance with the idea of ​​analyzing panel data as follows:

Figure 21.

It is worth noting that the support points for each of the parameters and errors that were adopted in the general maximum entropy method are:

(-2,-1,0,1,2) , c=2

10- To determine the optimal method in the estimation process among the estimation methods adopted in the research, which are:

a. The general least squares method, symbolized by GLS.

b. The general maximum entropy method GEM.

We use two (standard) measures to compare the estimation methods, which are:

The first measure (standard): Akaike information criterion (AIC), which is an individual score to determine which of the best models is based on the estimators for which the criterion value was calculated. Thus, the optimal method can be determined by comparing the criterion values ​​for all models, and the model with the lowest (AIC) is the best. Thus, the estimation method based on which its value was calculated is the best. Its formula is:

Where:

K: represents the number of explanatory variables in the model.

: represents the estimation of the possibility function.

The second criterion: Bayesian Information Criterion

The BIC criterion is also a single-valued criterion for choosing the best model if the model is estimated on a specific data set. Therefore, the lowest BIC value gives the model performance advantage and thus the estimation method advantage, in which the BIC value was calculated. Its formula is as follows:

BIC = -2 log L + d Ln(N)

Where:

N: sample size.

d: number of parameters in the model.

Result and Discussion

Experimental aspect (disc ussion of simulation results):-

In this section, the simulation results will be analyzed and discussed according to the assumed cases and for each model of the panel data models (pooled, fixed model and random model) and their parameters will be estimated using (GLS, BC and GME) methods. In this aspect, a single model for the panel data was not specified. The purpose of this is to know the effect of the estimation methods adopted in the research and the extent of their efficiency stability with the change in the model as well as the change in sample sizes) N=3t ((30, 51, 75) that were generated according to generating time series (T) with sizes (10, 17, 25) and multiplying them by three cross-sections. By running the simulation program, which continues for a long time, the results were obtained that will be displayed according to the following hypothetical cases:

The first hypothetical case: When the PMRM-2 model is a pooled model.

Table (2) shows the estimates of the first level of the PMRM-2 simulation model when it is a pooled model as: Table (2) Estimates of Level 1 parameters from the simulation model

BC GLS Method
j N=3T
103.067 103.044 0 30
1.48883 1.48885 1
2.99992 2.99997 2
2.99992 2.99997 3
0.04003 0.04003 4
0.6223 0.62232 5
56.89465 56.88664 0 51
1.48551 1.48554 1
2.99991 2.99996 2
2.99991 2.99996 3
0.0005 0.00049 4
0.50836 0.50835 5
115.888 115.88 0 75
1.4871 1.48712 1
2.99993 2.99998 2
2.99993 2.99998 3
0.14878 0.14877 4
0.74177 0.74178 5
Table 1.Estimates of Level 1 parameters of the simulation model when it is a pooled model

It is noted from Table (2) that the estimators of the General Least Squares (GLS) and Bayesian methods based on the natural conjugate primary function (BC) are very close in their evaluations of each other for all the parameters of the first level (Level 1) of the multi-level model (with two levels) for the assumed panel data in the simulation experiment, unlike the General Maximum Entropy (GME) estimator, which gave estimates that differ from them, but not much. This observation is also clear for the estimates of the parameters of the (Level 2) of the same model and for the same estimation methods, as shown in the following table:

Figure 22. Estimates of the parameters of the (Level 2) of the simulation model when it is a pooled model

It is noted that the values ​​of the GLS and BC methods are close in value and sometimes equal to each other, unlike the GME method, which showed a difference from them, which gives a scientific impression that there is a difference When estimating and to determine the optimal method, and based on the results of Tables (2) (3), they are compared using the AIC and BIC coefficients to determine the optimal method in the estimation process for the parameters of a multilevel model (with two levels) for the PMRM-2 panel data when the model is pooled, and the results are as follows:

BC GLS Method
Coefficient (scale) N=3T
349.1869 349.1872 AIC 30
357.5941 357.5944 BIC
586.8427 586.8427 AIC 51
598.4336 598.4336 BIC
2544.094 2544.097 AIC 75
2564.591 2564.593 BIC
Table 2.Evaluation of the AIC and BIC coefficients for the PMRM-2 simulation model when it is a pooled model pooled model

It is noted from Table (4) that the values ​​of the AIC and BIC coefficients for the general least squares (GLS) method and the Bayesian BC method based on the normal conjugate probability function are very close when N=30 and equal for the two sample sizes N=51 and N=75, which indicates the closeness or equality of their efficiency in the estimation process and when comparing them in the process of estimating the parameters of a multi-level model (with two levels) for the PMRM-2 panel data, assuming that the model is pooled. It is also noted that the values ​​of the AIC and BIC coefficients increase with the increase in the sample size.

Conclusion

1- It was found that the general maximum entropy method gives more accurate estimates than the general least squares method according to Akaike criterion and Bayesian information criterion.

2- The values ​​of the general maximum entropy method (GME) estimates for the level 1 parameters of a multilevel model (with two levels) for the PMRM-2 panel data for wheat crop are clearly evident, and this observation is also evident in the estimates of the level 2 parameters.

3- The values ​​of the AIC and BIC coefficients for the two methods of the general least squares method (GLS) for different types of samples (N=30, N=51 and N=75) indicate that their efficiency is close or equal in the estimation process when compared with the values ​​of the same coefficients for the general maximum entropy method (GME).

4- The estimators of the general least squares method (GLS) for the level 1 parameters of a multilevel model (with two levels) for the assumed panel data in the simulation experiment, unlike the estimator of the general maximum entropy method (GME), which gave different estimates from them, but not much. This observation is also clear for the estimates of the level 2 parameters for the same model and for the estimation methods.

multilevel models with latent variable interactions, Structural Equation Modeling: A Multidisciplinary Journal, DOI: 10.1080/10705511.2020.1761808

References

  1. . E. Aarts et al., "A Solution to Dependency: Using Multilevel Analysis to Accommodate Nested Data," Nature Neuroscience, vol. 17, no. 4, 2014.
  2. . M. R. Abonazel, "Generalized Estimators of Stationary Random-Coefficients Panel Data Models: Asymptotic and Small Sample Properties," REVSTAT – Statistical Journal, vol. 17, no. 4, pp. 493–521, Oct. 2019.
  3. . J. M. K. Aheto and G. A. Dagne, "Multilevel Modeling, Prevalence, and Predictors of Hypertension in Ghana: Evidence From Wave 2 of the World Health Organization’s Study on Global AGEing and Adult Health," Health Science Reports, vol. 4, p. e453, 2021.
  4. . A. D. Al-Nasser, O. M. Eidous, and L. M. Mohaidat, "Multilevel Linear Models Analysis Using Generalized Maximum Entropy," Asian Journal of Mathematics and Statistics, vol. 3, no. 2, pp. 111–118, 2010.
  5. . A. D. Al-Nasser, "Multilevel Linear Models Analysis Using Generalized Maximum Entropy," presented at MTISD, University of Salento, Lecce, 18–20 Jun. 2008.
  6. . J. Antonakis, N. Bastardoz, and M. Rönkkö, "On Ignoring the Random Effects Assumption in Multilevel Models: Review, Criticism, and Recommendations," Organizational Research Methods, vol. 24, no. 2, pp. 443–483, 2021.
  7. . M. M. Asadullah and M. M. Husian, "The Effect of Sample Size on Random Component in Multilevel Modeling," European Journal of Statistics, vol. 2, p. 7, 2022.
  8. . T. Asparouhov and B. Muthén, "Bayesian Estimation of Single and Multilevel Models With Latent Variable Interactions," Structural Equation Modeling: A Multidisciplinary Journal, vol. 27, no. 3, pp. 1–22, 2020, doi: 10.1080/10705511.2020.1761808.