Next: S-estimates for Regression Up: Robust Point Estimation Previous: M-estimates for location and

### GM-estimates for Regression

In the following we will assume a regression model with random explanatory variables. Let be random vectors in satisfying the model

 (8)

where . We will assume that the errors are independent of and that they have a symmetric distribution around zero, with variance . The parameter of interest is . If we denote the density of the by g, and k is the density (in ) of the explanatory variables, then for each the joint distribution of the vector is given by

For each vector , denote the corresponding residuals by

 (9)

Define the GM-estimate of (see Krasker and Welsch, 1982, Maronna and Yohai, 1981 and Huber et. al., 1986) as the solution for of

 (10)

where the function satisfies:
• for all is odd, uniformly continuous and
• for .
As before, Sn is an estimate of . It can be defined by an equation of the form

 (11)

By varying we obtain different types of estimates. For example, if only depends on the residuals ri, i.e. for some function , we get the class of M-estimates for regression. More generally, all the proposals can be written as

for different functions and . The main idea is to penalize not only those observations with large residuals but also the ones with high leverage (see Weisberg, 1985, page 111). Mallow's and Andrews' proposal corresponds to (see Hill, 1977). Schweppe's function is obtained when (see Merrill and Schweppe, 1971). See also Hampel et. al. (1986) for a more detailed discussion. Maronna and Yohai (1981) showed that if the system of equations

 (12)

 (13)

has a unique solution , then the GM estimates defined by (10) and (11) are consistent for and asymptotically normal, with covariance matrix

 (14)

where

is a p+1 vector with the functions involved in the equations (12) and (13):

and J is the matrix of derivatives of with respect to . This formula simplifies when the distribution G of the errors is symmetric. In this case, and Sn are asymptotically independent and the covariance matrix (14) is

where

and . I want to stress that the uniqueness of the solution to equations (12) and (13) is a strong condition. Two sufficient conditions for this property to hold are that the distribution of the errors is symmetric and that is increasing for each (Maronna and Yohai, 1981). See also Yohai and Maronna (1979) for the case when the explanatory variables are fixed (symmetry of the distribution of the errors is also needed to obtain consistency here). A global measure of robustness is the breakdown-point (BP).   Donoho and Huber (1983) gave the following definition for finite samples. Let be a random sample and be the estimate calculated with the sample Zn. For each integer m let

where the supremum is calculated over all the samples Wm of size m. Define the breakdown-point of T for the sample Zn as

The breakdown point is the smallest proportion of arbitrary observations that the estimator can resist without becoming unbounded. We have the following result for the breakdown of the estimators defined by (10) and (11): if and only if . In this case the breakdown point is positive but it decreases to zero as the number of predictors increases (roughly as 1/(p+1)) (see Maronna, Bustos, and Yohai, 1979). We see then that the BP of the GM-estimates cannot be 0.5 when p > 1. It is of interest to have a class of regression estimates with high breakdown point independent of the number of explanatory variables. The S-estimates for regression have this property.

Next: S-estimates for Regression Up: Robust Point Estimation Previous: M-estimates for location and
Department Web Master
2000-05-29