zeroinfl {pscl} | R Documentation |
Fit zero-inflated regression models for count data via maximum likelihood.
zeroinfl(formula, data, subset, na.action, weights, offset, dist = c("poisson", "negbin", "geometric"), link = c("logit", "probit", "cloglog", "cauchit", "log"), control = zeroinfl.control(...), model = TRUE, y = TRUE, x = FALSE, ...)
formula |
symbolic description of the model, see details. |
data, subset, na.action |
arguments controlling formula processing
via model.frame . |
weights |
optional numeric vector of weights. |
offset |
optional numeric vector with an a priori known component to be included in the linear predictor of the count model. |
dist |
character specification of count model family (a log link is always used). |
link |
character specification of link function in the binary zero-inflation model (a binomial family is always used). |
control |
a list of control arguments specified via
zeroinfl.control . |
model, y, x |
logicals. If TRUE the corresponding components
of the fit (model frame, response, model matrix) are returned. |
... |
arguments passed to zeroinfl.control in the
default setup. |
Zero-inflated count models are two-component mixture models combining a point mass at zero with a proper count distribution. Thus, there are two sources of zeros: zeros may come from both the point mass and from the count component. Usually the count model is a poisson or negative binomial regression (with log link). The geometric distribution is a special case of the negative binomial with size parameter equal to 1. For modeling the unobserved state (zero vs. count), a binary model is used: in the simplest case only with an intercept but potentially containing regressors. For this zero-inflation model, a binomial model with different links can be used, typically logit or probit.
The formula
can be used to specify both components of the model:
If a formula
of type y ~ x1 + x2
is supplied, then the same
regressors are employed in both components. This is equivalent to
y ~ x1 + x2 | x1 + x2
. Of course, a different set of regressors
could be specified for the count and zero-inflation component, e.g.,
y ~ x1 + x2 | z1 + z2 + z3
giving the count data model y ~ x1 + x2
conditional on (|
) the zero-inflation model y ~ z1 + z2 + z3
.
A simple inflation model where all zero counts have the same
probability of belonging to the zero component can by specified by the formula
y ~ x1 + x2 | 1
.
All parameters are estimated by maximum likelihood using optim
,
with control options set in zeroinfl.control
.
Starting values can be supplied, estimated by the EM (expectation maximization)
algorithm, or by glm.fit
(the default). Standard errors
are derived numerically using the Hessian matrix returned by optim
.
See zeroinfl.control
for details.
The returned fitted model object is of class "zeroinfl"
and is similar
to fitted "glm"
objects. For elements such as "coefficients"
or
"terms"
a list is returned with elements for the zero and count component,
respectively. For details see below.
A set of standard extractor functions for fitted model objects is available for
objects of class "zeroinfl"
, including methods to the generic functions
print
, summary
, coef
,
vcov
, logLik
, residuals
,
predict
, fitted
, terms
,
model.matrix
. See predict.zeroinfl
for more details
on all methods.
An object of class "zeroinfl"
, i.e., a list with components including
coefficients |
a list with elements "count" and "zero"
containing the coefficients from the respective models, |
residuals |
a vector of raw residuals (observed - fitted), |
fitted.values |
a vector of fitted means, |
optim |
a list with the output from the optim call for
minimizing the negative log-likelihood, |
control |
the control arguments passed to the optim call, |
start |
the starting values for the parameters passed to the optim call, |
weights |
the case weights used, |
offset |
the offset vector used (if any), |
n |
number of observations, |
df.null |
residual degrees of freedom for the null model (= n - 2 ), |
df.residual |
residual degrees of freedom for fitted model, |
terms |
a list with elements "count" , "zero" and
"full" containing the terms objects for the respective models, |
theta |
estimate of the additional theta parameter of the negative binomial model (if a negative binomial regression is used), |
SE.logtheta |
standard error for log(theta), |
loglik |
log-likelihood of the fitted model, |
vcov |
covariance matrix of all coefficients in the model (derived from the
Hessian of the optim output), |
dist |
character string describing the count distribution used, |
link |
character string describing the link of the zero-inflation model, |
linkinv |
the inverse link function corresponding to link , |
converged |
logical indicating successful convergence of optim , |
call |
the original function call, |
formula |
the original formula, |
levels |
levels of the categorical regressors, |
contrasts |
a list with elements "count" and "zero"
containing the contrasts corresponding to levels from the
respective models, |
model |
the full model frame (if model = TRUE ), |
y |
the response count vector (if y = TRUE ), |
x |
a list with elements "count" and "zero"
containing the model matrices from the respective models
(if x = TRUE ), |
Achim Zeileis <Achim.Zeileis@R-project.org>
Cameron, A. Colin and Pravin K. Trevedi. 1998. Regression Analysis of Count Data. New York: Cambridge University Press.
Cameron, A. Colin and Pravin K. Trivedi. 2005. Microeconometrics: Methods and Applications. Cambridge: Cambridge University Press.
Lambert, Diane. 1992. “Zero-Inflated Poisson Regression, with an Application to Defects in Manufacturing.” Technometrics. 34(1):1-14
Zeileis, Achim, Christian Kleiber and Simon Jackman 2008. “Regression Models for Count Data in R.” Journal of Statistical Software, 27(8). URL http://www.jstatsoft.org/v27/i08/.
zeroinfl.control
, glm
,
glm.fit
, glm.nb
,
hurdle
## data data("bioChemists", package = "pscl") ## without inflation ## ("art ~ ." is "art ~ fem + mar + kid5 + phd + ment") fm_pois <- glm(art ~ ., data = bioChemists, family = poisson) fm_qpois <- glm(art ~ ., data = bioChemists, family = quasipoisson) fm_nb <- glm.nb(art ~ ., data = bioChemists) ## with simple inflation (no regressors for zero component) fm_zip <- zeroinfl(art ~ . | 1, data = bioChemists) fm_zinb <- zeroinfl(art ~ . | 1, data = bioChemists, dist = "negbin") ## inflation with regressors ## ("art ~ . | ." is "art ~ fem + mar + kid5 + phd + ment | fem + mar + kid5 + phd + ment") fm_zip2 <- zeroinfl(art ~ . | ., data = bioChemists) fm_zinb2 <- zeroinfl(art ~ . | ., data = bioChemists, dist = "negbin")