Fit instrumentalvariable regression by twostage least squares (2SLS). This is equivalent to direct instrumentalvariables estimation when the number of instruments is equal to the number of regressors. Alternative robustregression estimators are also provided, based on Mestimation (2SM) and MMestimation (2SMM).
ivreg( formula, instruments, data, subset, na.action, weights, offset, contrasts = NULL, model = TRUE, y = TRUE, x = FALSE, ... )
formula, instruments  formula specification(s) of the regression
relationship and the instruments. Either 

data  an optional data frame containing the variables in the model.
By default the variables are taken from the environment of the

subset  an optional vector specifying a subset of observations to be used in fitting the model. 
na.action  a function that indicates what should happen when the data
contain 
weights  an optional vector of weights to be used in the fitting process. 
offset  an optional offset that can be used to specify an a priori known component to be included during fitting. 
contrasts  an optional list. See the 
model, x, y  logicals. If 
...  further arguments passed to 
ivreg
returns an object of class "ivreg"
that inherits from
class "lm"
, with the following components:
parameter estimates, from the stage2 regression.
vector of model residuals.
matrix of residuals from the stage1 regression.
vector of residuals from the stage2 regression.
vector of predicted means for the response.
either the vector of weights used (if any) or NULL
(if none).
either the offset used (if any) or NULL
(if none).
a matrix containing the empirical estimating functions.
number of observations.
number of observations with nonzero weights.
number of columns in the model matrix x of regressors.
number of columns in the instrumental variables model matrix z
numeric rank of the model matrix for the stage2 regression.
residual degrees of freedom for fitted model.
unscaled covariance matrix for the coefficients.
residual standard deviation.
QR decomposition for the stage2 regression.
QR decomposition for the stage1 regression.
numeric rank of the model matrix for the stage1 regression.
matrix of coefficients from the stage1 regression.
residual degrees of freedom for the stage1 regression.
columns of the "regressors"
matrix that are exogenous.
columns of the "regressors"
matrix that are endogenous.
columns of the "instruments"
matrix that are
instruments for the endogenous variables.
the method used for the stage 1 and 2 regressions, one of "OLS"
,
"M"
, or "MM"
.
a matrix of robustness weights with columns for each of the stage1
regressions and for the stage2 regression (in the last column) if the fitting method is
"M"
or "MM"
, NULL
if the fitting method is "OLS"
.
a matrix of hatvalues. For method = "OLS"
, the matrix consists of two
columns, for each of the stage1 and stage2 regression; for method = "M"
or "MM"
,
there is one column for each stage=1 regression and for the stage2 regression.
residual degrees of freedom for fitted model.
the original function call.
the model formula.
function applied to missing values in the model fit.
a list with elements "regressors"
and "instruments"
containing the terms objects for the respective components.
levels of the categorical regressors.
the contrasts used for categorical regressors.
the full model frame (if model = TRUE
).
the response vector (if y = TRUE
).
a list with elements "regressors"
, "instruments"
, "projected"
,
containing the model matrices from the respective components (if x = TRUE
).
"projected"
is the matrix of regressors projected on the image of the instruments.
ivreg
is the highlevel interface to the workhorse function
ivreg.fit
. A set of standard methods (including print
,
summary
, vcov
, anova
, predict
, residuals
,
terms
, model.matrix
, bread
, estfun
) is available
and described in ivregMethods
. For methods related to regression
diagnotics, see ivregDiagnostics
.
Regressors and instruments for ivreg
are most easily specified in a
formula with two parts on the righthand side, e.g., y ~ x1 + x2  z1
+ z2 + z3
, where x1
and x2
are the explanatory variables and z1
,
z2
, and z3
are the instrumental variables. Note that exogenous regressors
have to be included as instruments for themselves.
For example, if there is
one exogenous regressor ex
and one endogenous regressor en
with instrument in
, the appropriate formula would be y ~ en +
ex  in + ex
. Alternatively, a formula with three parts on the righthand
side can also be used: y ~ ex  en  in
. The latter is typically more convenient, if
there is a large number of exogenous regressors.
Moreover, two further equivalent specification strategies are possible that are
typically less convenient compared to the strategies above. One option is to use
an update formula with a .
in the second part of the formula is used:
y ~ en + ex  .  en + in
. Another option is to use a separate formula
for the instruments (only for backward compatibility with earlier versions):
formula = y ~ en + ex, instruments = ~ in + ex
.
Internally, all specifications are converted to the version with two parts on the righthand side.
Greene, W.H. (1993) Econometric Analysis, 2nd ed., Macmillan.
ivreg.fit
, ivregDiagnostics
, ivregMethods
,
lm
, lm.fit
## data data("CigaretteDemand", package = "ivreg") ## model m < ivreg(log(packs) ~ log(rprice) + log(rincome)  salestax + log(rincome), data = CigaretteDemand) summary(m)#> #> Call: #> ivreg(formula = log(packs) ~ log(rprice) + log(rincome)  salestax + #> log(rincome), data = CigaretteDemand) #> #> Residuals: #> Min 1Q Median 3Q Max #> 0.611000 0.086072 0.009423 0.106912 0.393159 #> #> Coefficients: #> Estimate Std. Error t value Pr(>t) #> (Intercept) 9.4307 1.3584 6.943 1.24e08 *** #> log(rprice) 1.1434 0.3595 3.181 0.00266 ** #> log(rincome) 0.2145 0.2686 0.799 0.42867 #> #> Diagnostic tests: #> df1 df2 statistic pvalue #> Weak instruments 1 45 45.158 2.65e08 *** #> WuHausman 1 44 1.102 0.3 #> Sargan 0 NA NA NA #>  #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> #> Residual standard error: 0.1896 on 45 degrees of freedom #> Multiple RSquared: 0.4189, Adjusted Rsquared: 0.3931 #> Wald test: 6.534 on 2 and 45 DF, pvalue: 0.003227 #>#> #> Call: #> ivreg(formula = log(packs) ~ log(rprice) + log(rincome)  salestax + #> log(rincome), data = CigaretteDemand) #> #> Residuals: #> Min 1Q Median 3Q Max #> 0.611000 0.086072 0.009423 0.106912 0.393159 #> #> Coefficients: #> Estimate Std. Error z value Pr(>z) #> (Intercept) 9.4307 1.2194 7.734 1.04e14 *** #> log(rprice) 1.1434 0.3605 3.172 0.00151 ** #> log(rincome) 0.2145 0.3018 0.711 0.47729 #> #> Diagnostic tests: #> df1 df2 statistic pvalue #> Weak instruments 1 45 47.713 1.4e08 *** #> WuHausman 1 44 1.287 0.263 #> Sargan 0 NA NA NA #>  #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> #> Residual standard error: 0.1896 on Inf degrees of freedom #> Multiple RSquared: 0.4189, Adjusted Rsquared: 0.3931 #> Wald test: 2 on NA DF, pvalue: NA #>#> Analysis of Variance Table #> #> Model 1: log(packs) ~ log(rprice) + log(rincome)  salestax + log(rincome) #> Model 2: log(packs) ~ log(rprice)  salestax #> Res.Df RSS Df Sum of Sq F Pr(>F) #> 1 45 1.6172 #> 2 46 1.6668 1 0.049558 0.6379 0.4287#> Analysis of Deviance Table (Type II tests) #> #> Response: log(packs) #> Df F Pr(>F) #> log(rprice) 1 10.1161 0.002662 ** #> log(rincome) 1 0.6379 0.428667 #> Residuals 45 #>  #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1## same model specified by formula with threepart righthand side ivreg(log(packs) ~ log(rincome)  log(rprice)  salestax, data = CigaretteDemand)#> #> Call: #> ivreg(formula = log(packs) ~ log(rincome)  log(rprice)  salestax, data = CigaretteDemand) #> #> Coefficients: #> (Intercept) log(rprice) log(rincome) #> 9.4307 1.1434 0.2145 #># Robust 2SLS regression data("Kmenta", package = "ivreg") Kmenta1 < Kmenta Kmenta1[20, "Q"] < 95 # corrupted data deq < ivreg(Q ~ P + D  D + F + A, data=Kmenta) # demand equation, uncorrupted data deq1 < ivreg(Q ~ P + D  D + F + A, data=Kmenta1) # standard 2SLS, corrupted data deq2 < ivreg(Q ~ P + D  D + F + A, data=Kmenta1, subset=20) # standard 2SLS, removing bad case deq3 < ivreg(Q ~ P + D  D + F + A, data=Kmenta1, method="MM") # 2SLS MM estimation car::compareCoefs(deq, deq1, deq2, deq3)#> Calls: #> 1: ivreg(formula = Q ~ P + D  D + F + A, data = Kmenta) #> 2: ivreg(formula = Q ~ P + D  D + F + A, data = Kmenta1) #> 3: ivreg(formula = Q ~ P + D  D + F + A, data = Kmenta1, subset = 20) #> 4: ivreg(formula = Q ~ P + D  D + F + A, data = Kmenta1, method = "MM") #> #> Model 1 Model 2 Model 3 Model 4 #> (Intercept) 94.63 117.96 92.42 91.09 #> SE 7.92 11.64 9.67 10.62 #> #> P 0.2436 0.4054 0.2300 0.2374 #> SE 0.0965 0.1417 0.1047 0.1135 #> #> D 0.3140 0.2351 0.3233 0.3468 #> SE 0.0469 0.0690 0.0527 0.0569 #>#> P stage_2 #> 1922 0.97 0.98 #> 1923 0.97 0.98 #> 1924 1.00 0.87 #> 1925 1.00 0.96 #> 1926 0.98 0.90 #> 1927 1.00 0.98 #> 1928 0.97 0.95 #> 1929 0.64 0.53 #> 1930 0.80 0.91 #> 1931 0.89 0.77 #> 1932 0.98 1.00 #> 1933 1.00 0.91 #> 1934 0.97 0.92 #> 1935 0.89 1.00 #> 1936 0.72 0.88 #> 1937 0.84 0.53 #> 1938 0.94 1.00 #> 1939 0.53 0.69 #> 1940 1.00 0.98 #> 1941 0.98 0.00