next up previous
Next: S-estimates for Regression Up: Robust Point Estimation Previous: M-estimates for location and

   
GM-estimates for Regression

In the following we will assume a regression model with random explanatory variables. Let $\left( y_i,
{\bf x}_i \right)$ be random vectors in ${\mbox{I}\!\mbox{R}}^{p+1}$ satisfying the model

 \begin{displaymath}
y_i = {\bf x}_i' \theta + \sigma \epsilon_i, \hspace{.5in}
i = 1, \ldots, n,
\end{displaymath} (8)

where ${\bf x}_i \in {\mbox{I}\!\mbox{R}}^p$. We will assume that the errors $\epsilon_i$ are independent of ${\bf x}_i$ and that they have a symmetric distribution around zero, with variance $\sigma^2 < \infty$. The parameter of interest is $\theta$. If we denote the density of the $\epsilon_i$ by g, and k is the density (in ${\mbox{I}\!\mbox{R}}^p$) of the explanatory variables, then for each $\theta$ the joint distribution of the vector $\left( y, {\bf x}\right)$ is given by

\begin{displaymath}f_\theta \left(y, {\bf x}\right) =
\sigma^{-1} \ g \left( \...
...x}'\theta \right) / \sigma \right) \ k \left( {\bf x}\right).
\end{displaymath}

For each vector $\beta \in {\mbox{I}\!\mbox{R}}^p$, denote the corresponding residuals by

 \begin{displaymath}
r_i \left( \beta \right) = y_i - {\bf x}_i' \beta.
\end{displaymath} (9)

Define the GM-estimate of $\theta$ (see Krasker and Welsch, 1982, Maronna and Yohai, 1981 and Huber et. al., 1986) as the solution for ${\bf T}_n$ of

 \begin{displaymath}
\sum_{i=1}^n\phi \left( {\bf x}_i, \frac{y_i - {\bf T}_n' {\bf x}_i}{S_n}
\right)
{\bf x}_i = 0
\end{displaymath} (10)

where the function $\phi \left( \cdot, \cdot \right) : {\mbox{I}\!\mbox{R}}^p \times
{\mbox{I}\!\mbox{R}}\rightarrow {\mbox{I}\!\mbox{R}}$ satisfies: As before, Sn is an estimate of $\sigma$. It can be defined by an equation of the form

 \begin{displaymath}
\frac1n
\sum_{i=1}^n\chi \left( \frac{y_i - {\bf T}_n' {\bf x}_i}{S_n} \right) =
E_G \chi \left( e \right).
\end{displaymath} (11)

By varying $\phi$ we obtain different types of estimates. For example, if $\phi$ only depends on the residuals ri, i.e. $\phi \left(
{\bf x}, r \right) = \psi \left( r \right)$ for some function $\psi$, we get the class of M-estimates for regression. More generally, all the proposals can be written as

\begin{displaymath}\phi \left( {\bf x}, r \right) = w \left( {\bf x}\right) \cdot \psi \left(
r \ v \left( {\bf x}\right) \right)
\end{displaymath}

for different functions $w:{\mbox{I}\!\mbox{R}}^p \rightarrow {\mbox{I}\!\mbox{R}}$ and $v: {\mbox{I}\!\mbox{R}}^p \rightarrow {\mbox{I}\!\mbox{R}}$. The main idea is to penalize not only those observations with large residuals $r_i \left( {\bf T}_n
\right) = y_i - {\bf T}_n' {\bf x}_i$ but also the ones with high leverage (see Weisberg, 1985, page 111). Mallow's and Andrews' proposal corresponds to $v \left( {\bf x}\right) = 1$ (see Hill, 1977). Schweppe's function $\phi \left( {\bf x}, r \right) = w \left( {\bf x}\right) \ \psi \left(
r / w \left( {\bf x}\right) \right)$ is obtained when \( v \left( {\bf x}\right) =
1 / w \left( {\bf x}\right) \) (see Merrill and Schweppe, 1971). See also Hampel et. al. (1986) for a more detailed discussion. Maronna and Yohai (1981) showed that if the system of equations

 \begin{displaymath}
E_F \phi \left( {\bf x}, \frac{y - \beta' {\bf x}}{\sigma} \right)
{\bf x}= {\bf0}
\end{displaymath} (12)


 \begin{displaymath}
%
E_F \chi \left( \frac{y - \beta' {\bf x}}{\sigma} \right) =
E_G \chi \left( e \right)
\end{displaymath} (13)

has a unique solution $\eta_0 = \left( \beta_0,
\sigma_0 \right)$, then the GM estimates defined by (10) and (11) are consistent for $\eta_0$ and asymptotically normal, with covariance matrix

 \begin{displaymath}
A^{-1} \
E_F \left( \Psi \left({\bf x},y,\eta_0\right) \P...
...({\bf x},y,\eta_0\right)' \right) \
\left( A^{-1} \right)^t,
\end{displaymath} (14)

where

\begin{displaymath}A = E_F J \left({\bf x},y,\eta_0 \right),
\end{displaymath}

$\Psi$ is a p+1 vector with the functions involved in the equations (12) and (13):

\begin{displaymath}\Psi \left( {\bf x}, y, \eta \right) = \left( \begin{array}{c...
...( r \left( \eta \right) / \sigma \right)
\end{array} \right)
\end{displaymath}

and J is the $(p+1) \times (p+1)$ matrix of derivatives of $\Psi$ with respect to $\eta = \left( \beta,\sigma\right)$. This formula simplifies when the distribution G of the errors is symmetric. In this case, ${\bf T}_n$ and Sn are asymptotically independent and the covariance matrix (14) is

\begin{displaymath}B^{-1}
E \left( \phi \left( {\bf x}, \frac{y-\beta_0' {\bf x}}{\sigma_0}
\right)^2 {\bf x}{\bf x}' \right)
B^{-t},
\end{displaymath}

where

\begin{displaymath}B = E \left( \phi'
\left( {\bf x}, \frac{Y-\theta_0'{\bf x}}{\sigma_0}
\right) {\bf x}{\bf x}' \right)
\end{displaymath}

and $\phi' \left( {\bf x}, r \right) =
\partial \phi / \partial r \left( {\bf x}, r \right)$. I want to stress that the uniqueness of the solution to equations (12) and (13) is a strong condition. Two sufficient conditions for this property to hold are that the distribution of the errors is symmetric and that $\phi \left( {\bf x}, \cdot \right)$ is increasing for each ${\bf x}$ (Maronna and Yohai, 1981). See also Yohai and Maronna (1979) for the case when the explanatory variables are fixed (symmetry of the distribution of the errors is also needed to obtain consistency here). A global measure of robustness is the breakdown-point (BP).   Donoho and Huber (1983) gave the following definition for finite samples. Let $Z_n = \left( z_1, \ldots, z_n \right)$ be a random sample and $T_n \left( Z_n \right)$ be the estimate calculated with the sample Zn. For each integer m let

\begin{displaymath}b \left( m, T, Z_n \right) = \sup \left\vert T_{m+n} \left(
Z_n, W_m \right) - T_n \left( Z_n \right) \right\vert
\end{displaymath}

where the supremum is calculated over all the samples Wm of size m. Define the breakdown-point of T for the sample Zn as

\begin{displaymath}\epsilon^*_n \left( T, Z_n \right) = \min \left\{
m / (m+n) \ : \ b \left( m, T, Z_n \right) = \infty
\right\}.
\end{displaymath}

The breakdown point is the smallest proportion of arbitrary observations that the estimator can resist without becoming unbounded. We have the following result for the breakdown of the estimators defined by (10) and (11): $\mbox{BP} > 0$ if and only if $ \sup_{{\bf x},r} \left\vert {\bf x}\phi \left( {\bf x}, r
\right) \right\vert <
\infty$. In this case the breakdown point is positive but it decreases to zero as the number of predictors increases (roughly as 1/(p+1)) (see Maronna, Bustos, and Yohai, 1979). We see then that the BP of the GM-estimates cannot be 0.5 when p > 1. It is of interest to have a class of regression estimates with high breakdown point independent of the number of explanatory variables. The S-estimates for regression have this property.
next up previous
Next: S-estimates for Regression Up: Robust Point Estimation Previous: M-estimates for location and
Department Web Master
2000-05-29