A new mixture regression model is proposed to cope with bounded continuous outcomes lying in the closed interval [0,1]. An increasingly common approach to deal with them is the beta regression model which allows for heteroskedasticity and asymmetry. Nevertheless, this model has two main limitations, namely the inability to represent a wide range of phenomena (bimodality, heavy tails and outlying observations), and the failure in modeling values at the boundary of the support. To overcome these limitations, a new regression model is proposed, which is based on a special mixture of two betas (referred to as flexible beta) sharing the same precision parameter but displaying two distinct component means subject to an inequality constraint. This distribution shows strong identifiability and a.s. likelihood boundedness from above, which facilitate its computational tractability. In addition, the distribution is augmented by adding positive probabilities of occurrences of zeros and /or ones. Thus, the final model (augmented flexible beta) is based on a mixed discrete-continuous density, the continuous part of which is itself a mixture. Intensive simulation studies show the good fit of our new regression model in comparison with other models. Inferential issues are dealt with by a (Bayesian) Hamiltonian Monte Carlo algorithm.
A new augmented mixed regression model for proportions
Di Brisco, AM;
2018-01-01
Abstract
A new mixture regression model is proposed to cope with bounded continuous outcomes lying in the closed interval [0,1]. An increasingly common approach to deal with them is the beta regression model which allows for heteroskedasticity and asymmetry. Nevertheless, this model has two main limitations, namely the inability to represent a wide range of phenomena (bimodality, heavy tails and outlying observations), and the failure in modeling values at the boundary of the support. To overcome these limitations, a new regression model is proposed, which is based on a special mixture of two betas (referred to as flexible beta) sharing the same precision parameter but displaying two distinct component means subject to an inequality constraint. This distribution shows strong identifiability and a.s. likelihood boundedness from above, which facilitate its computational tractability. In addition, the distribution is augmented by adding positive probabilities of occurrences of zeros and /or ones. Thus, the final model (augmented flexible beta) is based on a mixed discrete-continuous density, the continuous part of which is itself a mixture. Intensive simulation studies show the good fit of our new regression model in comparison with other models. Inferential issues are dealt with by a (Bayesian) Hamiltonian Monte Carlo algorithm.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.