SEARCH is a binary segmentation procedure used to develop a predictive model for a dependent variable. It searches among a set of predictor variables for those predictors which most increase the researcher's ability to account for the variance or distribution of a dependent variable. The question, "what dichotomous split on which single predictor variable will give us a maximum improvement in our ability to predict values of the dependent variable?," embedded in an iterative scheme, is the basis for the algorithm used in this command.

SEARCH divides the sample, through a series of binary splits, into a mutually exclusive series of subgroups.

They are chosen so that, at each step in the procedure, the split into the two new subgroups accounts for more of the variance or distribution (reduces the predictive error more) than a split into any other pair of subgroups. The predictor variables may be ordinally or nominally scaled. The dependent variable may be continuous or categorical.Research questions are often of the type "What is the effect of X on Y?"
But the answer requires answering a larger question "What set of variables
and their combinations seems to affect Y?" With SEARCH a variable X that
seems to have an overall effect may have its apparent influence disappear
after a few splits, with the final groups, while varying greatly as to their
levels of Y, showing no effect of X. The implication is that, given other
things, X does not really affect Y.

Conversely, while X may seem to have no overall effect on Y, after
splitting the sample into groups that take account of other powerful
factors, there may be some groups in which X has a substantial effect. Think
of economists' notion of the actor at the margin. A motivating factor might
affect those not constrained or compelled by other forces. Those who, other
things considered, have a 40-60 percent probability of acting, might show
substantial response to some motivator. Or a group with very high or very
low likelihood of acting might be discouraged or encouraged by some
motivator. But if X has no effect on any of the subgroups generated by
Search, one has pretty good evidence that it does not matter, even in an
interactive way.

SEARCH makes a sequence of binary divisions of a dataset in such a way
that each split maximally reduces the error variance or increases the
information (chi-square or rank correlation). It finds the best split on
each predictor and takes the best of the best.

Splitting creteria:

There can be four splitting criteria, based on the dependent variable
type:

Means

Regressions

Classifications

Ranks

The splitting criterion in each case is the reduction
in ignorance (error variance, etc.) or increase in information. Terms like
classification and regression trees should be replaced by binary
segmentation or unrestricted analysis of variance components, or searching
for structure. With rich bodies of data, many non-linearity’s and
non-additivity possible, and many competing theories, the usual restrictions
and assumptions that one is testing a single model are not appropriate. What
does remain, however, is a systematic, pre-stated searching strategy that is
reproducible, not a free ransacking.

Means. For means the splitting criterion is the
reduction in error variance, that is, the sum of squares around the mean,
using two subgroup means instead of one parent group mean.

Regressions. For regressions (y=a+bx) the splitting
criterion is the reduction in error variance from using two regressions
rather than one.

Classifications
(Chi option). For classifications (categorical dependent variable),
the splitting criterion is the likelihood-ratio chi-square for dividing the
parent group into two subgroups.

Ranks (Tau
option). For rankings (ordered dependent variable), the splitting
criterion is Kendall's tau-b, a rank correlation measure.

The major components of output:

The analysis of variance or distribution on final groups (except for “analysis=tau”)

The split summary

The final group summary

Summary table of best splits for each predictor for each group (except for “analysis= tau”)

The predictor summary table. You may request the first group (PRINT=FIRST), the final groups (PRINT=FINAL), or all groups (PRINT=TABLE). The tables are printed in reverse group order, i.e., last group first and first group last.

__Group Tree Structure__

A structure table with entries for each group, numbered in order and indented, so that one can easily see the pedigree of each final group and its detail.

References:

Agresti, Alan (1996), *Introduction to Categorical Data Analysis*,
New York: John Wiley & Sons, Inc.

Dunn, Olive Jean, and Virginia A. Clark (1974), *Applied
Statistics: Analysis of Variance and Regression*, New York: Holt, Rinehart
and Winston.

Chow, G. (1960), "Test of Equality between Sets of
Coefficients in Two Linear Regressions," *Econometrica*, 29:591-605.

Gibbons, Jean Dickinson (1997), *Nonparametric Methods for
Quantitative Analysis*, 3rd edition, Syracuse: American Sciences Press.

Hays, William (1988), *Statistics*, 4th edition, New York:
Holt, Rinehart, & Winston.

Klem, Laura (1974), "Formulas and Statistical
References," in *Osiris III*, Volume 5, Ann Arbor: Institute for
Social Research.

Sonquist, J. A., E. L. Baker and J. N. Morgan (1974), *Searching for Structure*,
revised edition, Ann Arbor: Institute for Social Research, The University of
Michigan.

Example: Investigates income (V268)

ANALYSIS TYPE: MEANS

Dependent variable: 268 Income

Predictor variables: 32 37 251 30

The number of cases is 326

The partitioning ends with 9 final groups

The variation explained is 38.2 percent

One-way Analysis of Final Groups

Source Variation DF

Explained .701177E+10 8

Error .113438E+11 317

Total .183555E+11 325

Split Summary Table

Group 1, N=326

Mean(Y)=10451.0, Var(Y)=.564786E+08, Variation=.183555E+11

Split on V37: RACE, Var expl=.216040E+08, Significance=.544344

Into Group 2, Codes 1

And Group 3, Codes 0,2-9

Group 2, N=299

Mean(Y)=10528.3, Var(Y)=.570540E+08, Variation=.170021E+11

Split on V30: MARITAL STATUS, Var expl=.312812E+10, Significance=0.000100

Into Group 4, Codes 1

And Group 5, Codes 2-5

Group 4, N=221

Mean(Y)=12449.9, Var(Y)=.571999E+08, Variation=.125840E+11

Split on V32: EDUC OF HEAD, Var expl=.173944E+10, Significance=0.000100

Into Group 6, Codes 1-5

And Group 7, Codes 6-8

Group 6, N=171

Mean(Y)=10932.9, Var(Y)=.430128E+08, Variation=.731217E+10

Split on V251: OCCUPATION B, Var expl=.140900E+10, Significance=0.000100

Into Group 8, Codes 0

And Group 9, Codes 1-9

Group 9, N=142

Mean(Y)=12230.1, Var(Y)=.402303E+08, Variation=.567247E+10

Split on V251: OCCUPATION B, Var expl=.423362E+09, Significance=0.001380

Into Group 10, Codes 1-3

And Group 11, Codes 4-9

Group 11, N=115

Mean(Y)=11393.4, Var(Y)=.249652E+08, Variation=.284603E+10

Split on V251: OCCUPATION B, Var expl=.495146E+08, Significance=.156284

Into Group 12, Codes 4-6

And Group 13, Codes 7-9

Group 12, N=69

Mean(Y)=11929.2, Var(Y)=.212965E+08, Variation=.144816E+10

Split on V32: EDUC OF HEAD, Var expl=.571610E+08, Significance=0.097853

Into Group 14, Codes 1-3

And Group 15, Codes 4,5

Group 5, N=78

Mean(Y)=5083.86, Var(Y)=.167531E+08, Variation=.128999E+10

Split on V251: OCCUPATION B, Var expl=.183562E+09, Significance=0.000992

Into Group 16, Codes 0

And Group 17, Codes 1,2,4-9

Final Group Summary Table

Group 3, N=27

Mean(Y)=9594.30, Var(Y)=.512249E+08, Variation=.133185E+10

Group 7, N=50

Mean(Y)=17638.2, Var(Y)=.720890E+08, Variation=.353236E+10

Group 8, N=29

Mean(Y)=4580.97, Var(Y)=.823915E+07, Variation=.230696E+09

Group 10, N=27

Mean(Y)=15793.6, Var(Y)=.924261E+08, Variation=.240308E+10

Group 13, N=46

Mean(Y)=10589.8, Var(Y)=.299634E+08, Variation=.134835E+10

Group 14, N=28

Mean(Y)=13030.6, Var(Y)=.309307E+08, Variation=.835128E+09

Group 15, N=41

Mean(Y)=11177.0, Var(Y)=.138968E+08, Variation=.555873E+09

Group 16, N=35

Mean(Y)=3383.49, Var(Y)=.515942E+07, Variation=.175420E+09

Group 17, N=43

Mean(Y)=6467.88, Var(Y)=.221668E+08, Variation=.931006E+09

Percent Total Variation Explained by Best Split for Each Group (*=Final Groups)

1 2 3* 4 5 6 7* 8* 9 10*

V32 12.00 11.90 0.00 9.48 0.86 3.62 0.00 0.00 0.68 0.00

V37 0.12 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

V251 18.12 16.90 0.00 9.14 1.00 7.68 0.00 0.00 2.31 0.00

V30 17.92 17.04 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Percent Total Variation Explained by Best Split for Each Group (*=Final Groups) - continued

11 12 13* 14* 15* 16* 17*

V32 0.16 0.31 0.00 0.00 0.00 0.00 0.00

V37 0.00 0.00 0.00 0.00 0.00 0.00 0.00

V251 0.27 0.01 0.00 0.00 0.00 0.00 0.00

V30 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Group TREE Structure

Group 1: All Cases

N=326, Mean(Y)=10451.0

Group 2 V37: RACE, Codes 1

N=299, Mean(Y)=10528.3

Group 4 V30: MARITAL STATUS, Codes 1

N=221, Mean(Y)=12449.9

Group 6 V32: EDUC OF HEAD, Codes 1-5

N=171, Mean(Y)=10932.9

Group 8 V251: OCCUPATION B, Codes 0

N=29, Mean(Y)=4580.97

Group 9 V251: OCCUPATION B, Codes 1-9

N=142, Mean(Y)=12230.1

Group 10 V251: OCCUPATION B, Codes 1-3

N=27, Mean(Y)=15793.6

Group 11 V251: OCCUPATION B, Codes 4-9

N=115, Mean(Y)=11393.4

Group 12 V251: OCCUPATION B, Codes 4-6

N=69, Mean(Y)=11929.2

Group 14 V32: EDUC OF HEAD, Codes 1-3

N=28, Mean(Y)=13030.6

Group 15 V32: EDUC OF HEAD, Codes 4,5

N=41, Mean(Y)=11177.0

Group 13 V251: OCCUPATION B, Codes 7-9

N=46, Mean(Y)=10589.8

Group 7 V32: EDUC OF HEAD, Codes 6-8

N=50, Mean(Y)=17638.2

Group 5 V30: MARITAL STATUS, Codes 2-5

N=78, Mean(Y)=5083.86

Group 16 V251: OCCUPATION B, Codes 0

N=35, Mean(Y)=3383.49

Group 17 V251: OCCUPATION B, Codes 1,2,4-9

N=43, Mean(Y)=6467.88

Group 3 V37: RACE, Codes 0,2-9

N=27, Mean(Y)=9594.30