# PROJECTION PURSUIT REGRESSION

### 谢益辉 / 2006-03-19

PROJECTION PURSUIT REGRESSION

Jerome H. Friedman
Stanford Linear Accelerator Center
Stanford University
Stanford, California 94305

Werner Stuetzle
Stanford Linear Accelerator Center
and
Department of Statistics
Stanford University
Stanford, California 94305

A new method for nonparametric multiple regression is presented. The procedure models the regression surface as a sum of general-smooth functions of linear combinations of the predictor variables in an iterative manner. It is more general than standard stepwise and stagewise regression procedures, does not require the definition of a metric in the predictor space, and lends itself to graphical interpretation.

(Submitted to Journal of the American Statistical Association)

1981年的论文，25年后我才看到……啥叫“学海无涯”？体会体会吧……

R中的ppr函数说明如下：

ppr                  package:stats                  R Documentation

**Projection Pursuit Regression**

**Description:**
Fit a projection pursuit regression model.

**Usage:**
ppr(x, ...)
## S3 method for class 'formula':
ppr(formula, data, weights, subset, na.action,
contrasts = NULL, ..., model = FALSE)
## Default S3 method:
ppr(x, y, weights = rep(1,n),
ww = rep(1,q), nterms, max.terms = nterms, optlevel = 2,
sm.method = c("supsmu", "spline", "gcvspline"),
bass = 0, span = 0, df = 5, gcvpen = 1, ...)

**Arguments:**
formula: a formula specifying one or more numeric response variables
and the explanatory variables.
x: numeric matrix of explanatory variables.  Rows represent
observations, and columns represent variables.  Missing
values are not accepted.
y: numeric matrix of response variables.  Rows represent
observations, and columns represent variables.  Missing
values are not accepted.
nterms: number of terms to include in the final model.
data: data frame from which variables specified in 'formula' are
preferentially to be taken.
weights: a vector of weights 'w_i' for each _case_.
ww: a vector of weights for each _response_, so the fit criterion
is the sum over case 'i' and responses 'j' of 'w_i ww_j (y_ij
- fit_ij)^2' divided by the sum of 'w_i'.
subset: an index vector specifying the cases to be used in the
training sample.  (NOTE: If given, this argument must be
named.)
na.action: a function to specify the action to be taken if 'NA's are
found. The default action is given by
'getOption("na.action")'. (NOTE: If given, this argument must
be named.)
contrasts: the contrasts to be used when any factor explanatory
variables are coded.
max.terms: maximum number of terms to choose from when building the
model.
optlevel: integer from 0 to 3 which determines the thoroughness of an
optimization routine in the SMART program. See the *Details*
section.
sm.method: the method used for smoothing the ridge functions.  The
default is to use Friedman's super smoother 'supsmu'.  The
alternatives are to use the smoothing spline code underlying
'smooth.spline', either with a specified (equivalent) degrees
of freedom for each ridge functions, or to allow the
smoothness to be chosen by GCV.
bass: super smoother bass tone control used with automatic span
selection (see 'supsmu'); the range of values is 0 to 10,
with larger values resulting in increased smoothing.
span: super smoother span control (see 'supsmu').  The default,
'0', results in automatic span selection by local cross
validation. 'span' can also take a value in '(0, 1]'.
df: if 'sm.method' is '"spline"' specifies the smoothness of each
ridge term via the requested equivalent degrees of freedom.
gcvpen: if 'sm.method' is '"gcvspline"' this is the penalty used in
the GCV selection for each degree of freedom used.
...: arguments to be passed to or from other methods.
model: logical.  If true, the model frame is returned.

**Details:**
The basic method is given by Friedman (1984), and is essentially
the same code used by S-PLUS's 'ppreg'.  This code is extremely
sensitive to the compiler used.
The algorithm first adds up to 'max.terms' ridge terms one at a
time; it will use less if it is unable to find a term to add that
makes sufficient difference.  It then removes the least
"_important_" term at each step until 'nterms' terms are left.
The levels of optimization (argument 'optlevel') differ in how
thoroughly the models are refitted during this process. At level 0
the existing ridge terms are not refitted.  At level 1 the
projection directions are not refitted, but the ridge functions
and the regression coefficients are. Levels 2 and 3 refit all the
terms and are equivalent for one response; level 3 is more careful
to re-balance the contributions from each regressor at each step
and so is a little less likely to converge to a saddle point of
the sum of squares criterion.

**Value:**
A list with the following components, many of which are for use by
the method functions.
call: the matched call
p: the number of explanatory variables (after any coding)
q: the number of response variables
mu: the argument 'nterms'
ml: the argument 'max.terms'
gof: the overall residual (weighted) sum of squares for the
selected model
gofn: the overall residual (weighted) sum of squares against the
number of terms, up to 'max.terms'.  Will be invalid (and
zero) for less than 'nterms'.
df: the argument 'df'
edf: if 'sm.method' is '"spline"' or '"gcvspline"' the equivalent
number of degrees of freedom for each ridge term used.
xnames: the names of the explanatory variables
ynames: the names of the response variables
alpha: a matrix of the projection directions, with a column for each
ridge term
beta: a matrix of the coefficients applied for each response to the
ridge terms: the rows are the responses and the columns the
ridge terms
yb: the weighted means of each response
ys: the overall scale factor used: internally the responses are
divided by 'ys' to have unit total weighted sum of squares.
fitted.values: the fitted values, as a matrix if 'q > 1'.
residuals: the residuals, as a matrix if 'q > 1'.
smod: internal work array, which includes the ridge functions
evaluated at the training set points.
model: (only if 'model=TRUE') the model frame.

**References:**
Friedman, J. H. and Stuetzle, W. (1981) Projection pursuit
regression. _Journal of the American Statistical Association_,
*76*, 817-823.
Friedman, J. H. (1984) SMART User's Guide. Laboratory for
Computational Statistics, Stanford University Technical Report No.
1.
Venables, W. N. &amp; Ripley, B. D. (2002) _Modern Applied Statistics
with S._  Springer.

'plot.ppr', 'supsmu', 'smooth.spline'

**Examples:**
# Note: your numerical values may differ
attach(rock)
area1 <- area/10000; peri1 <- peri/10000
rock.ppr <- ppr(log(perm) ~ area1 + peri1 + shape,
data = rock, nterms = 2, max.terms = 5)
rock.ppr
# Call:
# ppr.formula(formula = log(perm) ~ area1 + peri1 + shape, data = rock,
#     nterms = 2, max.terms = 5)
#
# Goodness of fit:
#  2 terms  3 terms  4 terms  5 terms
# 8.737806 5.289517 4.745799 4.490378
summary(rock.ppr)
# .....  (same as above)
# .....
#
# Projection direction vectors:
#       term 1      term 2
# area1  0.34357179  0.37071027
# peri1 -0.93781471 -0.61923542
# shape  0.04961846  0.69218595
#
# Coefficients of ridge terms:
#    term 1    term 2
# 1.6079271 0.5460971
par(mfrow=c(3,2))# maybe: , pty="s")
plot(rock.ppr, main="ppr(log(perm)~ ., nterms=2, max.terms=5)")
plot(update(rock.ppr, bass=5), main = "update(..., bass = 5)")
plot(update(rock.ppr, sm.method="gcv", gcvpen=2),
main = "update(..., sm.method=\"gcv\", gcvpen=2)")
cbind(perm=rock$perm, prediction=round(exp(predict(rock.ppr)), 1)) detach()  S-Plus中的ppreg函数说明如下： **Projection Pursuit Regression** **DESCRIPTION:** Computes an exploratory nonlinear regression method that models y as a sum of nonparametric functions of projections of the x variables. **USAGE:** ppreg(x, y, min.term, max.term=min.term, wt=rep(1, nrow(x)), rwt=rep(1, ncol(y)), xpred=NULL, optlevel=2, bass=0, span="cv") **REQUIRED ARGUMENTS:** x matrix of explanatory variables. Rows represent observations, and columns represent variables. Missing values are not accepted. The ppreg function is not very useful if x contains only one column. y vector or matrix of response variables. Rows represent observations, and columns represent variables. Missing values are not accepted. min.term minimum number of terms to include in the model; ppreg will return complete results only for this minimum number of terms. **OPTIONAL ARGUMENTS:** max.term maximum number of terms to choose from in the model. wt vector of weights for the observations. The length must be the same as the number of rows in x. Missing values are not accepted. rwt vector of weights for the responses. The length must be the same as the number of columns in y. Missing values are not accepted. xpred= vector or matrix of explanatory variables for which responses are to be estimated. If xpred is omitted, then the original x data will be regressed on, and the residuals will be returned in ypred. Missing values are not accepted. optlevel= integer from 0 to 3 which determines the throughness of an optimization routine in ppreg. A higher number means more optimization. bass= super smoother bass tone control used with automatic span selection (see supsmu); the range of values is 0 to 10, with increasing values resulting in increased smoothing. span= super smoother span control (see supsmu). The default is "cv", which results in automatic span selection by local cross validation. span can also take a value from 0 < span <= 1. **VALUE:** a list containing the following components: ypred matrix of predicted values for y given the matrix xpred. If xpred was not input, then ypred contains the residuals for the model fit. fl2 the sum of squared residuals divided by the total corrected sums of squares. alpha a minterm by ncol(x) matrix of the direction vectors, alpha[m,j] contains the j-th component of the direction in the m-th term. beta a minterm by ncol(y) matrix of term weights, beta[m,k] contains the value of the term weight for the m-th term and the k-th response variable. z a matrix of values to be plotted against zhat. z[i,m] contains the z value of the i-th observation in the m-th model term, i.e., z equals x %*% t(alpha). The columns of z have been sorted. zhat a matrix of function values to be plotted. zhat[i,m] is the smoothed ordinate value (phi) of the i-th observation in the m-th model term evaluated at z[i,m]. allalpha a three dimensional array, the [m,j,M] element contains the j-th component of the direction in the m-th model term for the solution consisting of M terms. Values are zero for M less than minterm. allbeta a three dimensional array, the [m,k,M] element contains the term weight for the m-th term and the k-th response variable for the solution consisting of M terms. Values are zero for M less than minterm. esq esq[M] contains the fraction of unexplained variance for the solution consisting of M terms. Values are zero for M less than minterm. esqrsp matrix that is ncol(y) by maxterm containing the fraction of unexplained variance for each response. esqrsp[k,M] is for the k-th response variable for the solution consisting of M terms, for M ranging from min.term to max.term. Other columns are zero. DETAILS: The z component of the result is sorted, thus it can not be compared with the original data. **REFERENCES:** Friedman, J. H. and Stuetzle, W. (1981). Projection pursuit regression. Journal of the American Statistical Association 76, 817-823. The chapter "Regression and Smoothing for Continous Response Data" in the S-PLUS Guide to Statistical and Mathematical Analysis. **SEE ALSO:** ace , avas , supsmu . **EXAMPLES:** x1 <- rnorm(100) ; x2 <- rnorm(100) ; eps <- rnorm(100, 0, .1) x <- matrix(c(x1, x2), 100, 2) y <- x1*x2 + eps # Set up a matrix of predictor values. xpred <- matrix(c(0, 0, 0, 1, 1, 0, 1, 1), 4, 2, byrow=T) # Use ppreg with unit weights for both the observations and # the response, and a 2 term regression model (picked from 3 terms). a <- ppreg(x, y, 2, 3, xpred=xpred) # Plot the function values versus their abscissas, to look for structure. matplot(a$z, a\$zhat)