MCA — MULTIPLE CLASSIFICATION ANALYSIS
MCA examines the relationships between several categorical independent variables and a single dependent variable, and determines the effects of each predictor before and after adjustment for its inter-correlations with other predictors in the analysis. It also provides information about the bivariate and multivariate relationships between the predictors and the dependent variable.
The dependent variables must be measured on an interval scale or must be a dichotomy. Predictor variables must be categorical, preferably with six or fewer categories.
See Andrews, F. M., J. N. Morgan, J. A. Sonquist and L. Klem. Multiple Classification Analysis. Second edition. Ann Arbor: Institute for Social Research, The University of Michigan, 1973 for a complete description of the methodology used.
MCA produces:
Dependent Variable Statistics: For the dependent variable (Y):
Grand mean
Standard deviation (square root of unbiased estimator of the population variance.)
Sum of Y
Sum of Y-squared
Total sum of squares
Explained sum of squares
Residual sum of squares
Number of cases used in the analysis
The sum of weights
Independent Variable Category Statistics: For each category of an independent variable:
The number of cases (raw, weighted, and percentages)
Mean and standard deviation
Deviation of the category mean (unadjusted and adjusted)
Adjusted class mean MCA coefficient
Eta and eta squared
Partial beta and beta-squared coefficients
Unadjusted and adjusted sum of squares
Bivariate frequency tables for every pair of predictors (optional)
One-Way Analysis of Variance Summary Statistics: If only one independent variable is specified, the following are printed:
Eta squared
Adjustment factor
Adjusted eta and eta squared
Total sum of squares
Between-mean sum of squares
Within-groups sum of squares
F value (degrees of freedom are printed)
Interpretation of Results
(from Multiple Classification Analysis, Andrews, Morgan, et al, 1973)
The major interpretation in a MCA is of the adjusted and unadjusted coefficients printed out for each subclass. In a population where there was no correlation among the predictors, the observations in one class of characteristic A would be distributed over all classes of the other characteristics in a fashion identical to the way in which those in other classes of A were distributed. Hence, the unadjusted mean Y for each subclass of A would be an unbiased estimate of the effect of belonging to that class of characteristic A. In the real world, however, characteristics are correlated. Young people are more likely to be in lower income groups, and in higher education groups than are older people. The multivariate process is essentially one of adjusting for these “non-orthogonalities.” The adjusted means are estimates of what the mean would have been if the group had been exactly like the total population in its distribution over all the other predictor classifications. It is useful not only to have the “pure” effects of each class adjusted for all the other characteristics, but also to see how these adjusted effects differ from the unadjusted effects.
The adjusted coefficients for any predictor may be considered an estimate of the effect of that predictor alone “holding constant” all other predictors in the analysis. Differences between the adjusted and unadjusted coefficients can be analyzed, and explanations for these differences may often be found in the two-way tables of predictors. It is often valuable to compare the coefficients within a predictor to see whether there is a pattern or, possibly, a lack of pattern which is of theoretical interest.
Presentation of Results
It is most informative to present first the etas and betas, measures of the relative importance of each predictor singly and in competition with the others, and then to present the unadjusted and adjusted sub-group averages, together with a detailed description of what the subclasses represent and with the number of cases in each. The number of cases it is an indicator of the potential variability of the estimates. Multiple R2 unadjusted and multiple R2 adjusted are also usually reported.
Examples of presentation of MCA results can be found in Barfield and Morgan (1969), Blumenthal, Kahn, Andrews and Head (1972), Johnston and Bachman (1972), Johnston (1973), Katona, Strumpel and Zahn (1971), Morgan. David, Cohen and Brazes’ (1962), Mueller (1969), and Pelz and Andrews (1966).
Example: Predicting income (V268) from occupation, marital status, and education.
DEPENDENT VARIABLE (Y) = V268 TOTAL FAMILY INC
MEAN 10528.32
STANDARD DEVIATION 7553.407
SUM OF Y 3147968.
SUM OF Y SQUARE .5014490E+11
TOTAL SUM OF SQUARES .1700208E+11
EXPLAINED SUM OF SQUARES .8352816E+10
RESIDUAL SUM OF SQUARES .8649263E+10
NUMBER OF CASES 299
PREDICTOR V251 OCCUPATION B
UNADJUSTED
NO OF SUM OF CLASS DEVIATION FROM STANDARD
CLASS CASES WEIGHTS % MEAN GRAND MEAN COEFFICIENT ADJUSTED MEAN DEVIATION
0 68 68 22.7 4592.206 -5936.115 -4256.094 6272.228 4161.586
1 30 30 10.0 16396.07 5867.746 1165.547 11693.87 9158.358
2 22 22 7.4 19716.09 9187.770 7577.927 18106.25 6896.417
3 14 14 4.7 15615.71 5087.393 3987.124 14515.45 11944.88
4 22 22 7.4 9988.636 -539.6847 547.4017 11075.72 5269.902
5 42 42 14.0 12596.05 2067.727 1663.999 12192.32 5372.033
6 36 36 12.0 10407.06 -121.2655 461.7471 10990.07 4254.318
7 36 36 12.0 7910.333 -2617.988 -1574.841 8953.480 5063.992
8 21 21 7.0 11960.00 1431.679 1774.740 12303.06 6163.097
9 8 8 2.7 4009.000 -6519.321 -5901.890 4626.431 2196.427
ETA-SQUARE = .380238 BETA-SQUARE .195452
ETA = .616634 BETA .442099
ETA-SQUARE (ADJ) = .360938
ETA (ADJ) = .600781
UNADJUSTED DEVIATION SS = .646484E+10
ADJUSTED DEVIATION SS = .332309E+10
PREDICTOR V30 MARITAL STATUS
UNADJUSTED
NO OF SUM OF CLASS DEVIATION FROM STANDARD
CLASS CASES WEIGHTS % MEAN GRAND MEAN COEFFICIENT ADJUSTED MEAN DEVIATION
1 221 221 73.9 12449.90 1921.575 1123.470 11651.79 7563.060
2 17 17 5.7 7115.882 -3412.439 -2828.932 7699.389 4465.809
3 41 41 13.7 3732.463 -6795.858 -2956.380 7571.941 2752.520
4 16 16 5.4 5748.750 -4779.571 -4603.841 5924.480 4340.339
5 4 4 1.3 7640.000 -2888.321 -1330.495 9197.826 8306.206
ETA-SQUARE = .194470 BETA-SQUARE .658475E-01
ETA = .440988 BETA .256608
ETA-SQUARE (ADJ) = .183511
ETA (ADJ) = .428382
UNADJUSTED DEVIATION SS = .330640E+10
ADJUSTED DEVIATION SS = .111955E+10PREDICTOR SUMMARY STATISTICS
PREDICTOR V32 EDUC OF HEAD
UNADJUSTED
NO OF SUM OF CLASS DEVIATION FROM STANDARD
CLASS CASES WEIGHTS % MEAN GRAND MEAN COEFFICIENT ADJUSTED MEAN DEVIATION
1 16 16 5.4 5973.375 -4554.946 -564.7311 9963.590 6006.004
2 71 71 23.7 6579.493 -3948.828 -2085.182 8443.139 4868.404
3 44 44 14.7 11013.86 485.5426 397.8526 10926.17 8730.284
4 70 70 23.4 10257.70 -270.6211 -789.0604 9739.261 6009.121
5 37 37 12.4 11210.03 681.7060 -1273.955 9254.365 5760.727
6 30 30 10.0 14161.87 3633.546 2836.744 13365.06 7470.542
7 17 17 5.7 16022.71 5494.385 3034.737 13563.06 6769.267
8 14 14 4.7 19327.71 8799.393 7518.277 18046.60 12470.24
ETA-SQUARE = .203802 BETA-SQUARE .949135E-01
ETA = .451445 BETA .308080
ETA-SQUARE (ADJ) = .184650
ETA (ADJ) = .429709
UNADJUSTED DEVIATION SS = .346507E+10
ADJUSTED DEVIATION SS = .161373E+10
ANALYSIS SUMMARY STATISTICS
DEPENDENT VARIABLE (Y) = V268 TOTAL FAMILY INC
R-SQUARED(UNADJUSTED) = PROP. OF VARIATION EXPLAINED BY FITTED MODEL: .49128
ADJUSTMENT FOR DEGREES OF FREEDOM = 1.07194
*** MULTIPLE R (ADJUSTED) = .67430 MULTIPLE R-SQUARED (ADJUSTED) = .45468
LISTING OF BETAS IN DESCENDING ORDER
RANK VAR. NO. NAME BETA
1 V251 OCCUPATION B .442099
2 V32 EDUC OF HEAD .308080
3 V30 MARITAL STATUS .256608
*** MULTIPLE R (ADJUSTED) = .67430 MULTIPLE R-SQUARED (ADJUSTED) = .45468