Next: S-estimates for Regression
Up: Robust Point Estimation
Previous: M-estimates for location and
GM-estimates for Regression
In the following we will assume a regression model with
random explanatory
variables. Let
be random vectors in
satisfying the model
 |
(8) |
where
.
We will assume that the
errors
are independent of
and that
they have a symmetric distribution around zero, with
variance
.
The parameter of interest
is
.
If
we denote
the density of the
by g, and k is the
density (in
)
of the explanatory variables, then
for each
the
joint distribution of the vector
is given by
For each vector
,
denote the
corresponding residuals
by
 |
(9) |
Define
the GM-estimate
of
(see Krasker and
Welsch, 1982, Maronna and Yohai, 1981 and Huber et. al.,
1986)
as the solution for
of
 |
(10) |
where the function
satisfies:
- for all
is odd, uniformly continuous and
-
for
.
As before, Sn is an estimate of
.
It can
be defined by an equation of the form
 |
(11) |
By varying
we obtain different
types of estimates. For example,
if
only depends on the residuals ri, i.e.
for some function
,
we get the class of M-estimates for regression.
More generally, all the proposals can be written as
for different functions
and
.
The main idea is
to penalize not only those observations with large residuals
but also the ones with high leverage
(see Weisberg, 1985, page 111).
Mallow's and Andrews' proposal corresponds to
(see Hill, 1977).
Schweppe's function
is obtained when
(see Merrill and Schweppe, 1971).
See also Hampel et. al. (1986) for a more detailed discussion.
Maronna and Yohai (1981) showed that if the
system of equations
 |
(12) |
 |
(13) |
has a unique solution
,
then the GM estimates defined by
(10) and
(11) are consistent for
and
asymptotically normal, with
covariance matrix
 |
(14) |
where
is a p+1 vector with the functions
involved in the equations (12) and
(13):
and
J is the
matrix of derivatives
of
with respect to
.
This formula simplifies
when the distribution G
of the errors is symmetric. In this case,
and Sn are asymptotically independent
and the covariance matrix (14) is
where
and
.
I want to stress that the uniqueness of the solution
to equations (12) and (13) is a
strong condition.
Two sufficient conditions
for this property to hold are that the distribution of the
errors is symmetric and that
is
increasing for each
(Maronna and Yohai, 1981). See also
Yohai and Maronna (1979) for the case when the
explanatory variables are fixed (symmetry of the distribution
of the errors
is also needed to obtain consistency here).
A global measure of robustness is the breakdown-point (BP).
Donoho and Huber (1983) gave the following definition for finite
samples.
Let
be a
random sample and
be the estimate calculated with the
sample Zn. For each integer m let
where the supremum is calculated over all the samples Wm of
size m. Define the breakdown-point of T for
the sample Zn as
The breakdown point is the smallest proportion of arbitrary
observations that the estimator can resist without becoming
unbounded.
We have the following result for the breakdown of the
estimators defined by
(10) and (11):
if and only if
.
In this case the breakdown point is positive but it
decreases to zero as the number of predictors increases
(roughly as 1/(p+1)) (see Maronna, Bustos, and Yohai, 1979).
We see then that the BP of the GM-estimates cannot be
0.5 when p > 1. It is of interest to have a class
of regression estimates with high breakdown point
independent of
the number of explanatory variables. The S-estimates for
regression have this property.
Next: S-estimates for Regression
Up: Robust Point Estimation
Previous: M-estimates for location and
Department Web Master
2000-05-29