Sample Size Requirements for Simple and Complex Mediation Models

Mediation models have been widely used in many disciplines to better understand the underlying processes between independent and dependent variables. Despite their popularity and importance, the appropriate sample sizes for estimating those models are not well known. Although several approaches (such as Monte Carlo methods) exist, applied researchers tend to use insufficient sample sizes to estimate their models of interest, which might result in unstable and inaccurate estimation of the model parameters including mediation effects. In the present study, sample size requirements were investigated for four frequently used mediation models: one simple mediation model and three complex mediation models. For each model, path and structural equation modeling approaches were examined, and partial and complete mediation conditions were considered. Both the percentile bootstrap method and the multivariate delta method were compared for testing mediation effects. A series of Monte Carlo simulations was conducted under various simulation conditions, including those concerning the level of effect sizes, the number of indicators, the magnitude of factor loadings, and the proportion of missing data. The results not only present practical and general guidelines for substantive researchers to determine minimum required sample sizes but also improve understanding of which factors are related to sample size requirements in mediation models.

Keywords: mediation analysis, mediation model, sample size, indirect effect, bootstrap method

Mediation analysis has been widely used for decades to better understand the relationship between independent and dependent variables and recently has become one of the most popular statistical models in both methodological studies (e.g., Fritz & MacKinnon, 2007; Lachowicz et al., 2018; Liu & Wang, 2019; B. Muthén & Asparouhov, 2015; Thoemmes et al., 2010) and substantive research (e.g., Duckworth et al., 2016; Lockman & Servaty-Seib, 2018; Malone et al., 2016). In a mediation model, the effect of an independent variable on a dependent variable is explained through a third variable called a mediator (Fritz & MacKinnon, 2007). In many different disciplines, mediation models have been utilized to answer various research questions. For example, in psychology, the associations between midlife Eriksonian development and both late-life global cognition and executive functioning were partially mediated by late-life depression (Malone et al., 2016). In business management, the effect of supplier-facing purchasing and supply management practices on operational performance was fully mediated by internal purchasing and supply management practices (Foerstl et al., 2016). In health science, the effect of past mental health on present physical health was intervened through lifestyle choices and social interactions (Ohrnberger et al., 2017).

Although mediation models provide useful tools for describing the relationship between a stimulus and a response (MacKinnon, 2008), a common issue encountered by substantive researchers has been to determine appropriate sample sizes for estimating mediation effects properly. It has been well known that the standard error of a test statistic is affected by sample size, which in turn, affects the power and/or Type I error rate. This implies that despite statistical nonsignificance due to smaller sample sizes than needed (in terms of appropriate power, accurate estimation, etc.), a nonignorable effect size may be present. On the other hand, even when statistical significance is obtained using a test, the effect may be negligible or without any practical importance, especially when the sample size is much larger than needed. Too large a sample size could thus squander the cost, time, and effort of conducting research.

Determining proper sample sizes is one of the unresolved, important issues in the utilization of mediation models. Although L. K. Muthén and Muthén (2002) showed how to perform Monte Carlo simulations to determine required sample size given a statistical model, Monte Carlo simulation is still not an accessible method for most applied researchers. Therefore, there have been several studies investigating sample size requirements, each with its strengths and limitations. For example, Fritz and MacKinnon (2007) conducted Monte Carlo simulations regarding the performance of six mediation testing methods (Sobel’s first-order test [Sobel, 1982], the percentile bootstrap test [Shrout & Bolger, 2002], etc.), which were commonly employed in the literature and provided the sample sizes necessary for achieving .8 power in a single-mediator path model with several effect size combinations of path coefficients. Similarly, Liu and Wang (2019) presented a method of planning sample sizes for a simple 1 mediation path model. They conducted a simulation study to examine the impact of uncertainty in effect size estimates on the power of the joint significance test (Cohen & Cohen, 1983). R functions and a web application were developed for implementing the proposed method. The researchers chose the levels of effect sizes following Fritz and MacKinnon (2007) so that the results could be easily compared. However, as previously mentioned, both studies were limited to only a simple mediation path model. Wolf et al. (2013) evaluated sample size requirements for confirmatory factor analysis (CFA) models with one, two, and three factors, and simple mediation models with and without latent variables via Monte Carlo simulations. The authors first examined the impact of key model properties, such as the number of indicators and the magnitude of factor loadings on CFA models, and then they held those components constant (i.e., three indicators and 0.65 factor loading) to examine the impact of changes in effect sizes for structural path coefficients in mediation models. They also considered the effect of missing data on sample size requirements. Although their results provided some guidelines for sample size requirements on mediation models, only simple mediation models were considered with limited conditions. For more complex types of mediation, Schoemann et al. (2017) proposed a new method along with a convenient tool (a web application using R) for determining power and sample sizes in simple- and multiple-mediator (i.e., two-parallel or two-serial mediators) models. Although their application provided researchers with an easy-to-use tool in designing mediation studies, the method could be applied only to certain types of path model, not to structural equation models (SEMs).

Other than the sample size requirement studies listed above, there are a few other studies that have focused on power analysis in designing mediation models. For example, Thoemmes et al. (2010) described a general framework for estimating power based on the L. K. Muthén and Muthén’s (2002) Monte Carlo approach for simple and complex mediation models that include multiple mediators, three-path mediation, mediation with latent variables, moderated mediation, and mediation in longitudinal designs. They provided tabled values of required sample sizes for some models but under limited simulation conditions. Also, only mediating variables were considered as latent variables, which seems less likely to occur in practice. Similarly, Zhang (2014) proposed a power analysis to detect mediation effects based on the percentile bootstrap method via Monte Carlo simulations. The method can handle nonnormal data with excessive skewness and kurtosis, and some examples that utilized complex mediation models (e.g., a multigroup model and a longitudinal model) were illustrated. An R package was also developed for its application.

As discussed, several studies have investigated sample sizes in mediation analysis (e.g., Fritz & MacKinnon, 2007; Liu & Wang, 2019; Schoemann et al., 2017). However, the suggestions from the existing body of work are limited to a narrow range of models, such as simple mediation models or path models. Few studies included SEMs, but even these did not consider the various analytic factors that are associated with mediation models (e.g., the number of indicators, the magnitude of factor loadings, etc.). In addition, the majority of previous studies focused on statistical power as a criterion in determining sample sizes. According to the sample size study by L. K. Muthén and Muthén (2002), there are several additional criteria (e.g., parameter bias and 95% coverage) that can be considered other than just power.

Given the limitations of the previous methodological studies on sample size requirements, we first conducted a systematic search of the literature to investigate the actual sample sizes used in substantive research on mediation analyses. Our review covered articles published between 2016 and 2018 in some psychology journals including Journal of Counseling Psychology, Developmental Psychology, Journal of Applied Psychology, and Journal of Educational Psychology. A total of 2,562 articles were published in the journals. We searched articles with the key words, “mediation,”“mediating,”“mediated,” or “indirect effect” in the abstract, using PsyINFO and identified 355 articles. We then eliminated articles that did not include the subject of mediational tests and those that concerned multilevel mediation and/or longitudinal mediation. Finally, 201 articles were examined, focusing on the mediation models and sample sizes used in them. The four most frequently used models identified in the articles are displayed in Figure 1 . To save space, path diagrams for structural equation mediation models only with three indicators are displayed. In this study, Model 1 is called the simple mediation model, and the other three models are called complex mediation models. Specifically, Models 2 and 4 are multiple mediation models, whereas Model 3 is a multiple-step multiple mediation model (Hayes, 2009). In other words, Models 2 and 4 have two simple mediation effects in each model, and Model 3 includes a multiple-step mediation effect (i.e., a 1 d b 2 ) besides two simple mediation effects ( a 1 b 1 and a 2 b 2 ). Detailed explanations about the mediation models are provided in the next section.

An external file that holds a picture, illustration, etc. Object name is 10.1177_00131644211003261-fig1.jpg

Table 1 shows the frequency (percentage) of mediation models (path models, SEMs, and the total) and sample sizes used for testing mediation effects in them. It should be noted that some articles employed more than one model, and thus, the total in Table 1 is 206 not 201. The most frequently used model was Model 1 (33.98%) of the total, and Model 3 was next (9.22%). Models 1 to 4 comprised 58.25% of the total, and the remainder of the studies used other types of models (e.g., complex models with multiple-dependent variables and/or multiple mediators). In addition, 57.28% of the total were SEMs, while the rest were path models. The sample sizes used for the models were summarized as minimum, maximum, mean, and median in Table 1 . The smallest sample size was 73, and the largest sample size was 13,645. The median sample sizes for path models ranged from 171 to 308, while those for SEMs ranged from 284 to 368. Although not shown in Table 1 , 74.31% of the total were involved in partial mediation effects, and the rest were associated with complete mediation effects.

Table 1.

Review of the Literature on Mediation Models and Sample Sizes in Research Applications.

Sample size
Mediation modelFrequency%MinimumMaximum M Mdn
Path modelModel 14348.86862,451340242
Model 277.951288,8791,442171
Model 333.41270347297274
Model 455.681411,851575299
Other a 3034.09931,042347308
Total88100.00868,879447266
SEMModel 12722.8873645334316
Model 286.782082,088585356
Model 31613.562211,000428365
Model 4119.32148545333284
Other5647.4611113,645700368
Total118100.007313,645538352
TotalModel 17033.98732,451335293
Model 2157.281288,879950317
Model 3199.222211,000409352
Model 4167.771411,851409292
Other8641.759313,645582335
Total206100.007313,645500311

Note. SEM = structural equation model. M = mean. Mdn = median.

a Other types of models include complex models with multiple-dependent variables and/or multiple mediators.

Considering the popularity of mediation analyses in substantive research, it is worthwhile to verify whether the sample sizes used in practice are appropriate and to investigate the minimum sample size requirements for frequently used mediation models. The main purpose of the present study is to provide substantive researchers with practical and general guidelines for determining sample sizes in the four commonly used mediation models, as shown in Figure 1 : one simple mediation model and three complex mediation models. We compared SEMs with path models to investigate the effect of including latent variables on sample size requirements. For each modeling approach, both partial and complete mediation conditions were considered to reflect the recent practice found in the literature search. Also, the percentile bootstrap method (Shrout & Bolger, 2002) and the multivariate delta method (MacKinnon, 2008; Sobel, 1982) were utilized to detect mediation effects, and their performances were compared. A series of Monte Carlo simulations was conducted under various simulation conditions. In particular, the number of indicators, the magnitude of factor loadings, and the proportion of missing data for the SEM approach were included as simulation factors. To our knowledge, these have not been extensively studied in the literature on mediation.

The remaining sections of the article proceed as follows. In the next section, a description of the mediation models used in this study is provided, along with mediation effects. In the following section, a Monte Carlo simulation study is presented in terms of simulation design and data analysis. Next, simulation results are separately provided for path models and SEMs. In the final section, a brief summary of findings is presented, and the implications of the results are discussed.

Mediation Models

Figure 1 displays the four mediation models with indicators (i.e., SEMs for mediation) used in this study. Model 1 is the simple mediation model, which involves three latent variables (i.e., X , Y , and M ) and examines if the effect of an independent variable ( X ) on a dependent variable ( Y ) is intervened through a mediator ( M ). In Model 1, a is the effect of X on M , b is the effect of M on Y , and c ′ is the direct effect of X on Y adjusted for M . The product of a and b ( ab ) is referred to as a mediation effect or an indirect effect. d M and d Y are residual or disturbance terms in the structural part of the model, which are usually assumed to be normally distributed with a mean of zero. m 1 to m 3 , x 1 to x 3 , and y 1 to y 3 are indicators of latent constructs, and the corresponding e terms are the errors in the measurement part of the model. If only X and Y were present in the model without M , c would indicate the total effect of X on Y . The total effect ( c ) is equal to the sum of the direct effect ( c ′ ) and indirect effect ( ab ). Complete mediation occurs when X no longer affects Y after M is held constant, making c ′ statistically zero, while partial mediation is present when c ′ is less than c , but is still different from zero (Baron & Kenny, 1986). 2 Models 2 and 3 are complex mediation models with two mediators ( M 1 and M 2 ). Model 2 shows two simple mediation effects, a 1 b 1 and a 2 b 2 , while Model 3 presents three mediation effects, that is, two single-step mediation effects ( a 1 b 1 and a 2 b 2 ) and a multiple-step mediation effect ( a 1 d b 2 ). Model 4 is a mediation model with two independent variables and shows two single-step mediation effects, a 1 b and a 2 b . For each model, a corresponding path model (without latent variables and indicators) is also considered in this study.

In the present investigation, all variables (i.e., indicators and latent variables for SEMs and measured variables for path models) are assumed to be continuous and standardized (i.e., mean of zero and variance of one) as in other simulation studies (e.g., Falk & Biesanz, 2015; Gagné & Hancock, 2006; Jackson et al., 2013). Due to the use of the standardized variables, the path coefficients are also standardized, as are the indirect effects. The reasons for using the standardized indirect effects are as follows. They have desirable statistical properties, such as invariant to proper linear transformations, independent of sample sizes (Lachowicz et al., 2018), and scale-free. Besides, standardized indirect effects are frequently used in substantive mediation studies (Liu & Wang, 2019).

Method

Study Design and Data Generation

A Monte Carlo simulation study was conducted to examine sample size requirements for the introduced simple and complex mediation models. Under each model, path coefficients were simulated to indicate three different levels of effect sizes. Table 2 displays the selected path coefficients and variance parameters under the three indirect effect size conditions for each model. For the simple mediation model (Model 1 in Figure 1 ), three magnitudes, 0.14, 0.36, and 0.51, were considered for a and b , corresponding to Cohen’s (1988) R 2 criteria for small (2% of the variance in the dependent variable that is explained), medium (13% of the variance), and large (26% of the variance) effect sizes, respectively. These values were selected to resemble those used in previous studies (e.g., Fritz & MacKinnon, 2007; Liu & Wang, 2019; Thoemmes et al., 2010). For c ′ , only two magnitudes, 0 and 0.14, were used to manipulate the complete mediation condition 3 and the partial mediation (a small direct effect size) condition, respectively (Liu & Wang, 2019). Once the path coefficients were chosen, the error variances of d M and d Y could be calculated. Calculation examples are shown in Appendix A.

Table 2.

Path Coefficients and Residual Variance Parameters Under Three Indirect Effect Size Conditions.

Partial mediationComplete mediation
ModelParameterSmallMediumLargeSmallMediumLarge
Model 1 a 0.140.360.510.140.360.51
b 0.140.360.510.140.360.51
c ′ 0.140.140.140.000.000.00
Var ( d M ) 0.980.870.740.980.870.74
Var ( d Y ) 0.950.810.650.980.870.74
Model 2 a 1 , a 2 0.140.360.510.140.360.51
b 1 , b 2 0.090.250.410.100.270.42
c ′ 0.040.140.140.000.000.00
Var ( d M 1 ) 0.980.870.740.980.870.74
Var ( d M 2 ) 0.980.870.740.980.870.74
Var ( d Y ) 0.980.800.530.980.850.65
Model 3 a 1 0.140.360.510.140.360.51
a 2 0.100.270.420.100.270.42
d 0.100.270.420.100.270.42
b 1 0.090.250.410.100.270.42
b 2 0.090.250.410.100.270.42
c ′ 0.040.140.140.000.000.00
Var ( d M 1 ) 0.980.870.740.980.870.74
Var ( d M 2 ) 0.980.800.470.980.800.47
Var ( d Y ) 0.980.800.540.980.850.65
Model 4 a 1 , a 2 0.100.270.420.100.270.42
b 0.130.330.560.140.360.51
c 1 ′ , c 2 ′ 0.040.140.140.000.000.00
Var ( d M ) 0.970.810.550.970.810.55
Var ( d Y ) 0.980.790.510.980.870.74

For the three complex mediation models (Models 2 to 4), Cohen’s (1988) R 2 were used again to simulate the three effect size conditions. In Models 2 and 3, two mediators were included as endogenous variables, and the residual covariance between d M 1 and d M 2 was fixed to 0 for simplicity. For Model 4, 0.3 was used as the parameter value for the covariance between X 1 and X 2 regardless of the effect sizes, following the value used for the correlation between factors considered in previous studies (Gagné & Hancock, 2006; Marsh et al., 1998; Wolf et al., 2013). In addition, model-specific path coefficients were selected as follows. In Model 2, the amount of explained variance in M 1 and M 2 were determined by X only, and thus the values for the corresponding path coefficients, a 1 and a 2 were selected for each of the three effect size conditions as in the simple mediation model. The amount of explained variance in Y was determined by X , M 1 , and M 2 , which were related to three path coefficients, c ′ , b 1 , and b 2 , respectively. For simplicity, M 1 and M 2 were assumed to contribute equally, and X was presumed to have a small effect on Y (i.e., the explained variance of 2% and thus, c ′ = 0 . 14 as in Model 1) under the medium and large effect size conditions. For the small effect size (2%) condition, assigning the whole 2% to X made the other two path coefficients equal to zero ( b 1 = b 2 = 0 ). Therefore, only 0.2% was arbitrarily assigned to X , creating c ′ = 0 . 04 . The complete mediation condition ( c ′ = 0 ) was also simulated. In Model 3, the amount of explained variance in M 2 was determined by X and M 1 , which were assumed to contribute equally (i.e., a 2 = d ). c ' , b 1 , and b 2 were set following the way implemented in Model 2. Finally, X 1 and X 2 were involved in Model 4, and they were assumed to have the same effect on M (i.e., a 1 = a 2 ) and on Y (i.e., c 1 ′ = c 2 ′ ). c 1 ′ and c 2 ′ were determined like c ' in Model 2. In each of the complex mediation models, the error variances were also calculated in accordance with the selected values for the path coefficients.

As mentioned earlier, two modeling approaches (i.e., SEMs and path models) were considered in this study. In both SEMs and path models, the same effect size conditions were applied. Unlike path models, the latent constructs in SEMs are measured by indicators, assuming the existence of measurement errors. Hence, two more simulation factors relating to indicators were included for SEMs: the number of indicators and the magnitude of factor loadings. Following the two-indicator rule (Bollen, 1989), the minimum value for the number of indicators was set to 2, and the maximum value was selected as 4, generating three numbers of indicators (i.e., two, three, and four). For the factor loading ( λ ), 0.4 was chosen as a minimum suggested value from the literature (e.g., Ford et al., 1986; Wang & Wang, 2012), and 0.7 was selected to ensure sufficient convergent validity (Kline, 2016). The two middle values (0.5 and 0.6) were also included, yielding four different factor loading conditions. Under each factor loading condition, all λ s were set as equal. Because all variables, including indicators, were assumed to have a variance of 1 in this study, the measurement error variances for the indicators were calculated as 1 − λ 2 .

In sum, a total of 3 × 2 × 4 × 2 = 48 (i.e., three levels of indirect effect sizes, two levels of direct effect, four mediation models, and two testing methods) conditions were considered in path models, while a total of 3 × 2 × 4 × 3 × 4 × 2 = 576 (i.e., three levels of indirect effect sizes, two levels of direct effect, four mediation models, three numbers of indicators, four magnitudes of factor loadings, and two testing methods) conditions were manipulated in SEMs. In addition, the impact of missing data on sample size requirements was examined by including five proportions of missing data: 0%, 5%, 10%, 15%, and 20%. The range of 0% to 20% was chosen to resemble the values used in Wolf et al. (2013). The missing data were generated as missing completely at random (Little & Rubin, 1989). The missing conditions were not fully crossed with the other conditions explained earlier. Using the delta and bootstrap methods, the four SEMs with the partial mediation condition were considered only under the fixed condition of three indicators, 0.7 factor loading, and medium effect size.

One thousand replications were generated for each condition. For the bootstrap method, the number of bootstrap samples was chosen to be 1,000 following previous studies (Fritz et al., 2012; Kim, 2012; Shrout & Bolger, 2002; Tofighi & Kelley, 2020). In order to determine a sample size necessary for each condition, sample sizes from 10 to the minimum required sample size, in increments of 10, were considered. The minimum required sample size was capped at 10,000 in consideration of potential sampling difficulty in real applications.

Data Analysis

All data generation and Monte Carlo simulations were carried out using Mplus 8.3 (L. K. Muthén & Muthén, 1998-2020) 4 and Python (Van Rossum & Drake, 1995). 5 In order to test mediation effects, two testing methods were used: the percentile bootstrap method and the multivariate delta method. 6 The delta method was chosen because it has been used frequently in recent substantive research (e.g., Heatly & Votruba-Drzal, 2017; Wade et al., 2018; Wentzel et al., 2018) and is easily implemented in Mplus by default. The percentile bootstrap method was also considered because it has shown to be more powerful than the delta method (i.e., Sobel test), have less inflated Type I error, and offer better coverage than the bias-corrected bootstrap test (Hayes & Scharkow, 2013). Also, the percentile bootstrap tended to show low Type I error rates and high coverage rates, compared to the bias-corrected bootstrap test and the bias-corrected and accelerated bootstrap test (Falk & Biesanz, 2015). Hence, the percentile bootstrap method as well as the delta method were used to test the significance of the mediation effect for each generated data set.

Considering the sample size study of L. K. Muthén and Muthén (2002), three criteria were considered to determine the required minimum sample sizes in the present study: parameter bias, 95% coverage, and power. 7 Parameter bias is the difference between a population parameter value and the average of parameter estimates over the replications of a Monte Carlo study. Ninety-five percent coverage is the proportion of replications for which the 95% confidence interval contains the true parameter value. Power is the proportion of replications for which a null hypothesis is correctly rejected for each nonzero parameter at an α level of .05. The first criterion to determine sample sizes is that the parameter bias does not exceed 10% for any parameter whose population value is not equal to zero in the model. The second criterion is that 95% coverage remains between .91 and .98, and is applied to all parameters in a model being considered. Other suggestions, such as 95% coverage between .925 and .975 (Algina et al., 2005), were also proposed, but the present study followed L. K. Muthén and Muthén (2002). The third criterion is that the sample size is chosen to maintain power close to .8 or greater (Cohen, 1988; Fritz & MacKinnon, 2007) with an alpha level of .05. The power is evaluated only for specific parameter effects of interest, that is, mediation effects in this study.

Results

For Models 1 to 4, minimum required sample sizes that met the three criteria (parameter bias, 95% coverage, and power) were obtained under each simulation condition. Before summarizing simulation results, the overall patterns regarding the three criteria are briefly described. In path models, once the power criterion was met, the other two criteria were already satisfied in most conditions. In other words, it was relatively difficult to achieve the power of .8 for detecting mediation effects. Such a pattern was again observed with SEMs. In addition, large parameter biases frequently occurred with the conditions of two indicators under Model 1 and the conditions of large indirect effect size under Models 1, 2, and 4, especially with the bootstrap method. Finally, nonconvergence rates were also checked: They ranged from 0% to 5% in most conditions, which did not seem very serious based on the results from previous simulation research (e.g., Gagné & Hancock, 2006; Jia et al., 2014; Moineddin et al., 2007). The nonconvergence issue occurred with the two-indicator conditions and mostly under Models 1 and 4 with some complete mediation conditions.

Path Models

The minimum required sample sizes of the path mediation models are provided in Table 3 . The minimum required sample sizes ranged from 50 to 1,610 for the delta method, while they ranged from 40 to 1,210 for the bootstrap method. The largest sample sizes in both methods were obtained in the partial mediation condition of Model 3 with the small effect size. Overall, Model 3 required the largest sample size on average ( n ¯ = 531 . 7 ), which was larger than the required sample sizes obtained for Model 2 ( n ¯ = 392 . 5 ) and Model 4 ( n ¯ = 385 . 8 ). Model 1, which is the simplest of the four models, required the smallest sample size on average ( n ¯ = 253 . 3 ). It is well-known that as the complexity of the model increases, the sample size required for accurate and stable estimation is expected to increase. In our case, Model 3 can be considered to be the most complex model, because it has a multiple-step mediation in addition to two single-step mediation effects, whereas Models 2 and 4 included only two single-step mediation effects in each model. The partial mediation conditions required larger sample sizes on average than the complete mediation conditions. This was probably attributable to the fact that one or two fewer parameters were estimated in the complete mediation conditions due to the constraint of c ' = 0 , depending on the model being considered. The difference in the required sample sizes between the partial and complete mediation conditions ranged from −20 to 170 across conditions with an average of 33. Larger differences tended to be observed as the effect size became smaller.

Table 3.

Minimum Required Sample Sizes for Path Models.

Partial mediationComplete mediation
MethodEffect sizeModel 1Model 2Model 3Model 4Model 1Model 2Model 3Model 4
DeltaSmall6801,1701,6101,0906701,0001,4401,040
Medium100150220150100130200140
Large5060905050508050
Average277460640430273393573410
BootstrapSmall5609801,2109205608301,090880
Medium8013016012090110150110
Large4050704060506040
Average227387480360237330433343

Note. Effect size: The effect size of the indirect effect(s).

Regardless of the models, as the level of indirect effect size decreased, the required sample size increased. In particular, when the effect size decreased from the medium to small conditions, the sample sizes dramatically increased compared to the change from the large to medium conditions. The required sample sizes increased more than six times (up to eight times) when the effect size changed from the medium to small conditions. However, the required sample sizes changed about two or three times from the large to medium effect size conditions.

Last, the bootstrap method required smaller sample sizes than the delta method in most conditions. In particular, as the indirect effect size decreased and the model complexity increased, the difference in the required sample sizes between the two methods became larger. For example, the largest difference was observed in Model 3 with the small effect size and the partial mediation condition. In this case, the bootstrap method required 1,210, while the delta method needed 1,610. Overall, the differences in the required sample sizes between the two methods ranged from 110 to 400 under the small effect size conditions, while they ranged from 10 to 60 under the medium effect size conditions. More comparisons between the two methods are provided later in Table 6 .

Table 6.

Averaged Percentages Based on the Ratios of the Required Sample Sizes in the Bootstrap Method to Those in the Delta Method.

ModelModel 1Model 2Model 3Model 4Average
Path model89.3%86.9%75.2%81.3%83.2%
SEM90.9%92.9%89.2%93.1%91.5%
Average90.1%89.9%82.2%87.2%87.4%

Note. The numbers were calculated as the ratios of the required sample sizes in the bootstrap method to those in the delta method multiplied by 100%, which were then averaged across conditions. SEM = structural equation model.

Structural Equation Models

The minimum required sample sizes of the structural equation mediation models with partial and complete mediations are provided in Table 4 and Table 5 , respectively. The smallest sample size ( n = 70 ) was required for the complete mediation condition of Model 1 with the large indirect effect size, four indicators, and 0.7 loadings in the bootstrap method. As indicated earlier, the required sample sizes were capped at 10,000, and some conditions required sample sizes over 10,000 ( n > 10 , 000 ) for accurate estimation. Those conditions were all associated with the small effect size and mostly with the conditions of two indicators and 0.4 loadings. Like the results of the path models, Model 3 required larger sample sizes on average ( n ¯ = 2 , 572 ) than Model 4 ( n ¯ = 1 , 996 ), which required larger sample sizes than Model 2 ( n ¯ = 1 , 877 ). Model 1 showed the smallest required sample sizes ( n ¯ = 1 , 273 ) as expected. This pattern is clearly shown in Figure 2 , which displays the required sample sizes across the models. These were separately averaged for each model with each testing method in the partial and complete mediation conditions. The partial mediation conditions needed larger average sample sizes than the complete mediation conditions. In particular, Models 2 to 4 with the partial mediation required much larger sample sizes (about 340 more) than those with the complete mediation. However, for Model 1, the partial mediation condition required just a little larger sample sizes (about 30 more) than the complete mediation condition on average. In fact, some complete mediation conditions in Model 1 based on Tables 4 and ​ and5 5 required larger sample sizes than the partial mediation conditions. All the small effect size conditions in the delta method and some two-indicator conditions in the bootstrap method showed such a pattern. This was attributed to the fact that parameter biases were more serious in the two-indicator and/or small effect size conditions with the complete mediation than those with the partial mediation in Model 1. Therefore, a caution would be necessary when the complete mediation of Model 1 is used with two indicators and/or small effect sizes. Figure 2 also shows that the delta method required larger sample sizes than the bootstrap method across all models on average. There were some conditions (19 out of 288 conditions) that showed less sample sizes in the delta method compared to the bootstrap method (see Tables 4 and ​ and5). 5 ). However, no systematic pattern regarding those 19 conditions was observed. Like the path models, the largest difference between the two methods was found in Model 3. The delta method required about 330 larger sample sizes than the bootstrap method on average in Model 3. Model 4 and Model 2 showed 143 and 181 differences between the two methods, respectively. The smallest difference was about 115, which was observed in Model 1.

Table 4.

Minimum Required Sample Sizes for Structural Equation Models with Partial Mediation.

Method Effect size Standardized factor loadingModel 1Model 2Model 3Model 4
Number of indicatorsNumber of indicatorsNumber of indicatorsNumber of indicators
234234234234
DeltaSmall0.49,4104,8903,330>10,0008,9506,500>10,000>10,0008,420>10,0009,3506,390
0.54,4402,5901,9408,0104,7203,670>10,0006,5604,8007,8404,8903,590
0.62,4801,6501,3404,4702,9002,4605,8204,1403,2804,4803,0002,390
0.71,5901,1801,0202,8702,1301,8503,8302,9502,4802,8302,0801,810
Medium0.41,9409306502,8301,4201,0304,3402,3001,5203,1501,6001,100
0.58504803601,2607405701,9501,1108201,370770570
0.64702902406804503801,000650510730470360
0.7290190170420300280610450380430320260
Large0.41,9508805902,7901,3307005,4402,4501,5204,0401,8201,200
0.57103602401,2405103302,1101,0106601,350730460
0.6330180140480260180920510350550370200
0.7170120100210160120440270220250190120
BootstrapSmall0.48,8704,4202,920>10,0008,1005,860>10,0009,6706,590>10,0008,2705,360
0.54,2202,2901,6608,0004,1603,1909,2405,1503,7707,1904,1703,050
0.62,4101,4101,1404,4702,5502,2305,1903,1602,5104,0602,5602,040
0.71,5909608703,0101,8401,6703,4302,2401,9102,6301,8001,520
Medium0.41,6508706202,9201,4901,0504,0102,0901,3202,8101,4801,010
0.57604503301,2807205501,7309807101,270700520
0.6420260210700410340900570430700430320
0.7270180150420280250550370310410280220
Large0.41,7308805902,5101,1406505,2002,4401,4503,9901,8101,200
0.56003402309004502902,0109906501,330700460
0.6290180140350260170880480340510370200
0.716011090200160110420250200230190110

Note. Effect size: The effect size of the indirect effect(s).

Table 5.

Minimum Required Sample Sizes for Structural Equation Models with Complete Mediation.

Method Effect size Standardized factor loadingModel 1Model 2Model 3Model 4
Number of indicatorsNumber of indicatorsNumber of indicatorsNumber of indicators
234234234234
DeltaSmall0.4>10,0005,0103,450>10,0007,5805,340>10,000>10,0007,450>10,0008,9306,000
0.55,0602,6701,9906,8503,8903,0709,5105,8304,3207,4804,5803,440
0.62,8401,7101,3503,8202,4502,0905,3103,7302,9304,2302,8402,280
0.71,7701,2001,0602,4801,7801,6003,4602,6302,2202,6702,0001,740
Medium0.41,6507805302,1001,0907903,9502,0501,4002,3901,230870
0.57804103209405704401,7309907201,090630480
0.6430250210550350290910580440600400310
0.7270180160340250230540400330370270230
Large0.49304302901,2406004003,9301,8001,1301,190580390
0.54202201605302902101,530760520510280200
0.6220130110270170130700380290290170130
0.7160908016011010037023018016011090
BootstrapSmall0.4>10,0004,1402,830>10,0006,4304,450>10,0008,5805,840>10,0008,0205,320
0.54,9602,2401,6506,3003,3602,6408,1504,4803,2806,7304,0002,940
0.63,0701,3901,1203,6302,1201,7704,6602,8102,2303,8002,4702,000
0.71,8809608502,4001,5201,3503,0402,0101,6802,3701,7301,480
Medium0.41,3007005101,8701,0207203,4501,7801,2002,2401,190830
0.57303602808805304101,5008506001,030600450
0.6450230190520320280800500390580370280
0.7260160140330220200490330270360240200
Large0.48203902701,2205704004,1901,9101,1401,190590400
0.53301901504602702101,620760510480320210
0.6230120100250160130710380280250170130
0.7130907015011010036022017015011090

Note. Effect size: The effect size of the indirect effect(s).

An external file that holds a picture, illustration, etc. Object name is 10.1177_00131644211003261-fig2.jpg

Average required sample sizes for structural equation models.

Regarding the results in Tables 4 and ​ and5, 5 , the influences of three simulation factors (i.e., the level of indirect effect sizes, the number of indicators, and the magnitude of factor loadings) are illustrated in Figure 3a through Figure 3c . Figure 3a shows the average minimum required sample sizes for Models 1 to 4 across the three indirect effect sizes in the delta method (left) and the bootstrap method (right). The partial and complete mediation conditions were averaged in each figure. In general, the required sample sizes increased as the indirect effect size decreased. This tendency was dramatic when the effect size moved from the small to medium conditions. The required sample sizes in the small effect size conditions were, on average about 5.5 times larger than those in the medium effect size conditions. However, the required sample sizes were raised 1.3 times on average when the effect size changed from the large to medium conditions. Meanwhile, there were some unexpected deviations when the effect size moved from the large to medium conditions. That is, for some simulated conditions, larger sample sizes were required with the large effect size than with the medium effect size. This pattern was observed only in the partial mediation conditions. An examination of the results in Table 4 reveals that this deviated pattern (13 out of 288 conditions) was observed in Models 1, 3, and 4 when the number of indicators was small (i.e., two indicators) and/or the magnitude of factor loadings was small (i.e., 0.4). This pattern disappeared as the number of indicators and/or the size of factor loadings increased. Figure 3b displays the effect of the number of indicators on the minimum required sample sizes for each model. The number of indicators is relatively easier to control in reality by researchers. Thus, it would be valuable to examine the effect of the number of indicators in sample size studies. It is evident from the results that the sample sizes decreased as the number of indicators increased across all conditions. This tells us that more indicators (i.e., more parameters) did not always make the model more complex. As shown in Figure 3b , the decreasing rates of the sample sizes across the conditions of the number of indicators were slightly larger from two to three indicators than from three to four indicators. Last, Figure 3c illustrates the required sample sizes across four different factor loadings in each model. Overall, the required sample sizes decreased as the magnitude of factor loadings increased across all models. Like the effect of the number of indicators, the dropping rates declined as the factor loadings increased. There was no reverse pattern associated with the effect of the number of indicators or the effect of factor loadings given the other factors stayed constant (see also Tables 4 and ​ and5). 5 ). Overall, the two testing methods showed similar patterns regarding the three simulation factors, but the bootstrap method required less sample sizes than the delta method as mentioned earlier.

An external file that holds a picture, illustration, etc. Object name is 10.1177_00131644211003261-fig3.jpg

The impact of (a) indirect effect sizes, (b) number of indicators, and (c) magnitude of factor loadings on required sample sizes for structural equation models.

The impact of missing data on sample size requirements was examined for the four SEMs with partial mediation by including five proportions of missing data: 0%, 5%, 10%, 15%, and 20%. As explained above, the fixed condition of three indicators, 0.7 factor loading, and medium effect size was only considered. Figure 4 shows that higher proportions of missing data in the four models generally required larger sample sizes, as found in previous studies (e.g., Wolf et al., 2013). For example, Model 1 with the delta method required a minimum sample size of 190 when there were no missing data. However, the sample size increased to 230 with 20% missing data. Overall, models with 20% missing data necessitated, on average, about 20% and 24% increases in sample size requirements for the delta method and the bootstrap method, respectively. Among the three criteria (parameter bias, 95% coverage, and power), it was the most difficult to achieve the power of .8 for detecting indirect effects across all conditions. Although the increasing patterns were similar among the four models and the two testing methods in Figure 4 , the increasing rates tended to get higher as the model complexity increased (i.e., Model 3 > Model 4, Model 2 > Model 1).

An external file that holds a picture, illustration, etc. Object name is 10.1177_00131644211003261-fig4.jpg

The impact of missing proportions on required sample sizes for partially mediated structural equation models.

Comparisons of the Findings

Based on the results of the path models and the SEMs, the averaged required sample size for each path model in the delta method ranged from 253.3 to 531.7, while those in the SEMs ranged from 1,273 to 2,572. The inclusion of latent variables with indicators required much larger sample sizes than the corresponding path models for accurate estimation of parameters. For example, for the simple model (Model 1) with the partial mediation and the bootstrap method, the SEMs needed 1,230 on average, which is 5.4 times larger than the path models that required 227 on average. Across all four models, the SEMs required five times the sample sizes required for the corresponding path models. Therefore, if it is not feasible to obtain sufficient sample sizes for mediation models, using path models would be a plausible option to obtain an accurate and stable estimation of model parameters. 8

The required sample sizes for the path models and SEMs through the Monte Carlo study can be compared with the search results in Table 1 . If the median values were considered from Table 1 , quite smaller sample sizes were actually being used in applied psychological studies than the minimum required sample sizes from the current study. Overall, this tendency was more severe in the SEMs, though the path models also showed quite large discrepancies between the survey results and the simulation results. This comparison implies that applied researchers, in reality, tended to use insufficient sample sizes to estimate their models of interest, which might result in unstable and inaccurate estimation of the model parameters including mediation effects, possibly leading to invalid inferences concerning their research questions.

Finally, based on the results from the delta and bootstrap methods across conditions, Table 6 displays average percentages. These numbers were calculated as the ratios of the required sample sizes in the bootstrap to those in the delta multiplied by 100%, which were then averaged across conditions. The bootstrap method required less sample sizes (i.e., more powerful) than the delta method. The effect of using the bootstrap instead of the delta was larger in path models than SEMs. In path models, the bootstrap method, on average, required 83.2% of the sample size needed in the delta method. Among the four models, Model 3 showed the largest reduction in the sample size (i.e., about 25%) by using the bootstrap. In SEMs, the bootstrap method, on average, required 91.5% of the sample size needed in the delta method. Unlike the path models, all four SEMs tended to show similar results.

Discussion and Conclusion

The primary purpose of the present study was to provide practical and general guidelines on minimum sample size requirements for accurate estimation of four mediation models, which are commonly used in the literature. That is, the aim was to provide substantive researchers with concrete and tangible guidance on the required sample sizes for frequently used mediation models. We compared SEMs with path models to investigate the effect of including latent variables with indicators on sample size requirements through a Monte Carlo simulation study. The overall results of this study support the following conclusions. First, the actual sample sizes used in the literature are much less than the minimum required sample sizes identified in the current Monte Carlo study. Second, the sample size requirements are affected by several factors, such as the model complexity, the level of indirect effect sizes, the number of indicators, and the magnitude of factor loadings. As the model complexity increased, the required sample sizes increased. Also, as the level of indirect effect sizes, the number of indicators, and the magnitude of factor loadings decreased, the required sample sizes increased. Testing methods also affected the results. The bootstrap method required less sample sizes than the delta method for the similar level of accuracy in the parameter estimation. By using the bootstrap method instead of the delta method, more reduction in sample sizes was obtained in the path models, compared to the SEMs. In addition, the missing proportion had an effect on the minimum required sample sizes. The sample sizes for the four SEMs increased as the missing proportion increased.

Some important findings and relevant suggestions are as follows. First, regarding the influence of the indirect effect size on the sample size, the dropping rates in sample sizes were much higher for the effect size conditions from small to medium than for the conditions from medium to large. A similar pattern was also observed in previous studies (e.g., Liu & Wang, 2019; Wolf et al., 2013). This implies that researchers need to choose predictors (and/or mediators) with at least medium effect sizes, if ever possible. In that case, a much smaller sample size would be required than using predictors with small effect sizes. From a different point of view, when a researcher determines a sample size for his/her own research, he/she may assume a medium effect size rather than small or large effect sizes, as Cohen (1988) suggested that a medium effect size should represent the average effect for a given research area. Another finding in the simulations was that the level of indirect effect sizes seemed to interact with the number of indicators and the magnitude of factor loadings in affecting sample size requirements. The large effect size conditions required larger sample sizes than the medium effect size conditions, particularly when the number of indicators was small (i.e., two indicators) and/or the factor loading was small (i.e., 0.4). These reverse patterns disappeared as the number of indicators and/or the magnitude of the factor loadings increased. Previous studies (e.g., Liu & Wang, 2019; Wolf et al., 2013) also showed similar results (i.e., sample sizes increased as indirect effect sizes increased).

Second, the number of indicators also played an important role in estimating the mediation model of interest. The required sample sizes decreased as the number of indicators increased. Like the influence of the effect size, the dropping rates were higher for the conditions from two to three indicators than for the conditions from three to four indicators. Also, several unexpected results were found regarding the two-indicator conditions as explained earlier. It has been known that at least two indicators per factor are needed when more than one factor is in the model (Bollen, 1989). However, when the number of indicators was two, some reverse patterns were observed as previously mentioned. Marsh et al. (1998) also suggested not to use two indicators in the context of CFA. Instead, they mentioned that researchers are typically recommended to use three or more indicators per factor. Based on the findings in the present study, we also suggest researchers use at least three indicators, if possible, to obtain both accurate estimation and affordable sample sizes. Researchers also need to keep in mind that according to the current simulation results, increasing the number of indicators (at least up to four) did not make the model more complex.

Third, regarding the effect of factor loadings (0.4 to 0.7), when the factor loading was 0.4, the largest sample size was needed. As the factor loadings increased, the required sample sizes decreased. A similar pattern was also found in Wolf et al. (2013), who studied CFA with three magnitudes of factor loadings (0.5, 0.65, 0.8). In the current study, the dropping rate in sample sizes was the largest when the factor loadings moved from 0.4 to 0.5. Wang and Wang (2012) suggested that standardized factor loadings need to be at least 0.4. Still, as shown in the simulation results, having 0.4 loadings for all indicators can hinder stable estimation, eventually leading to large sample sizes required. Therefore, we suggest that researchers use more reliable indicators for a latent variable (e.g., more than 0.4 factor loading). If applicable, researchers may acquire a reasonable conjecture on the range of factor loadings from previous findings in the literature.

Fourth, findings of this study also emphasize that researchers should pay attention to the existence of missing data in determining required samples sizes. Based on the results, it was necessary to have about 22% increase in sample size requirements, on average, under 20% missing data, compared to 0% missing data. Also, the missing data affected statistical power for detecting indirect effects. These results imply that ignoring missing data when determining sample sizes may result in unstable and inaccurate estimation of the model parameters including mediation effects.

When researchers decide sample sizes for their mediation models, they typically consider sample sizes used in previous studies or follow some popular rules-of-thumb, such as 5 or 10 observations per estimated parameter (Bentler & Chou, 1987; Bollen, 1989) or a minimum sample size of 100 or 200 (Boomsma, 1982, 1985). The former approach would be problematic because some previous studies, as we see in Table 1 , arbitrarily determined less than optimal sample sizes without careful investigations. The latter approach would also be improper because the rules-of-thumb are not model-specific and may lead to inordinately larger or smaller sample sizes than needed (Wolf et al., 2013). We hope that applied researchers will discover the findings in the present study applicable to their work and use them to determine minimum sample size requirements for their models of interest. It should be noted that the sample sizes in the result tables need to be considered as reference materials, not as absolute values. In particular, mediation models with small samples need to be used with caution. For example, using the required sample sizes as low as 40 for path models (see Table 3 ) is likely to violate model assumptions such as multivariate normality. In such cases, Bayesian estimation with informative priors may be used as suggested in the literature (e.g., Koopman et al., 2015; Miočević et al., 2017).

Although SEMs are known to be superior to path models because they take measurement errors into account, researchers can choose path mediation models instead of corresponding SEMs if insufficient sample sizes were inevitably collected (e.g., a study with a small-sized population). Also, in case of choosing the bootstrap method versus the delta method, researchers should choose the bootstrap method, which resulted in about 13% (17% for path models and 8.5% for SEMs) reduction on the required sample sizes compared to the delta method on average. If researchers have already gathered a sample of subjects for fitting a mediation SEM, they should then focus on the number of indicators, which is a relatively controllable factor by researchers. When the sample size is not large enough to get an accurate and stable estimation for SEMs, the researchers should increase the number of indicators per factor. If possible, researchers need to take other factors, such as indirect effect sizes and factor loadings, into consideration, as explained above. Furthermore, the selection of partial mediation versus complete mediation should be determined by theoretical backgrounds or actual (observed) relationships among the variables being considered.

This study has several limitations. First, although we examined the four mediation models which were frequently used in the literature, we did not consider more complex mediation models, such as multilevel mediation, longitudinal mediation, moderated mediation, and so on. These models have received recent attention (Chen et al., 2018; Hu et al., 2018; Stäbler et al., 2017; Wentzel et al., 2018), but including all these models was not possible in a single study. Second, we only evaluated SEMs with two, three, and four indicators. According to the results of the current study, as the investigation moved from two to three, and then from three to four indicators, the required sample sizes decreased. However, we did not investigate the further effect of more numbers of indicators (e.g., five, six, or seven). Hence, it might be interesting to examine the effect of more than four indicators on sample size requirements in the context of mediation analysis, as the number of indicators can be relatively easily manipulated by researchers, as compared to the other factors such as effect sizes or factor loadings. Third, the numbers reported in the current study were not sufficient required sample sizes for accurate estimation of the models but rather minimum required sample sizes. Because of that, even though the three criteria (parameter bias, 95% coverage, and power) were satisfied with a sample size of 500, for example, it was still possible that a part of those criteria may not be met with a sample of 510. Finally, the findings and suggestions are limited to the conditions included in this Monte Carlo study. For example, we did not consider nonnormal data and categorical (or ordinal) variables. It remains a task to examine systematically the impact of other aspects of research that were not included in this study.

The present study was carried out to provide applied researchers with practical guidelines in determining sample sizes when conducting analyses with some frequently used mediation models, including both path models and SEMs. Although the simulation approach limits the generalizability of the findings, they should be useful in understanding which factors affect sample sizes of mediation models and what considerations are involved in choosing sample sizes for mediation analysis.

Appendix A

The coefficient, a , for example, was calculated as follows. In the simple regression (without intercept),