MNA — MULTIVARIATE NOMINAL ANALYSIS
MNA performs a multivariate analysis of nominal-scale dependent variables, using a series of parallel dummy-variable regressions derived from each of the dependent variable codes, dichotomized to a 0-1 variable. The program’s major use is to give an additive multivariate model showing the relationship between a set of predictors and the dependent variable in terms of a set of coefficients analogous to MCA coefficients.
The advantage MNA has over other techniques applicable to the same data is the simplicity and direct interpretability of the MNA coefficients and the categorical prediction algorithm. See Andrews and Messenger, Multivariate Nominal Scale Analysis for a complete description of the MNA technique.
Statistics: MNA computes the univariate distribution of the dependent variable, gives (in effect) a bivariate distribution of the dependent variable with each predictor, and computes and prints the multivariate “MNA coefficients.” Bivariate statistics are the bivariate theta and the code specific and generalized eta-square; they provide two alternatives for measuring the strength of the simple bivariate relationship between a specific predictor and the dependent variable. The program also prints a series of statistics for each predictor called “Beta Square.” These indicate the relative importance of the predictor when holding all other independent variables constant. Multivariate statistics are the multivariate theta and the code specific and generalized R-square.
References:
Andrews, F. M., J. N. Morgan, J. A. Sonquist and L. Klem. Multiple Classification Analysis. Second edition. Ann Arbor: Institute for Social Research, The University of Michigan, 1973.
Andrews, F. M. and R. C. Messenger. Multivariate Nominal Scale Analysis. Ann Arbor: Institute for Social Research, The University of Michigan, 1973.
Information on the analysis.
Numbers of cases eliminated due to missing data on the dependent variable and range of valid codes
Non-empty predictor codes
Minimum number of significant digits in solution vectors
Dependent Variable Statistics.
Frequency distribution
Weighted frequency distribution
Weighted frequency distribution expressed as a percent
R-squared (for each dependent variable code)
Adjusted R-squared (for each dependent variable code)
Predictor Variable Statistics.
Frequency for each code
Weighted frequency for each code
Weighted frequency expressed as a percent for each code
For each predictor code:
Weighted frequency marginal for each code of the dependent variable (Y) expressed as percents
Adjusted percents (sums of percents and coefficients) for each code of the dependent variable
Coefficients for each code of the dependent variable
Theta
Eta-squared (for each dependent variable code)
Beta-squared (for each dependent variable code)
Generalized eta-squared
Joint and Multivariate Prediction.
Generalized R-squared
Joint theta (proportion of cases correctly classed)
Classification matrix. Rows of the matrix indicate actual codes; columns indicate predicted codes.
INTERPRETING MNA OUTPUT
Consult the example at the end of this write-up as noted in the following discussions. See Multivariate Nominal Scale Analysis (Andrews and Messenger, 1973) for a complete description of how to interpret MNA results.
Examination Strategies
In looking at a large number of detail statistics from MNA, two things are of particular interest: 1) large coefficients, and 2) large differences between the percents and the adjusted percents.
If an independent variable is ordinal scale, the occurrence of monotonic change across successive coefficients or percentages may also be of interest. This occurs in the example in the way V46, “Better or worse a year from now” affects the likelihood of the first car being a compact.
Theta Statistic
The multivariate statistic Theta indicates the proportion of cases correctly classified after taking into account each respondent’s scores on all dependent variables. In the example, Theta is .8043 indicating that 80% of the cases could be correctly classified after taking into account each respondent’s scores on all independent variables. This is a gain of more than 10 percentage points over the mode of the overall percentage distribution (69.6% for “Large” car).
Identifying the mode is important; it shows that even if you know nothing about the respondents, you could predict the first car for everyone to be large and be correct 69.6% of the time. Relationships of the independent variables to the dependent variable act to increase predictability above this 69.6% level.
The bivariate Theta statistic indicates the proportion correctly classified for a single independent variable.
Forecasts and the Proportion Classed Correctly
For any case a forecast can be derived. The forecast consists of a set of probabilities; it shows the likelihood of that case falling into each category of the dependent variable. You compute the probability for each category by summing the coefficients relevant to that case and adding in the overall percent. Assume we have a person who earns $20,000 a year, is 28 years old, single, has a college degree, expects to be about as well off next year, expects his/her income to be a little bit more next year, and holds a professional position. The forecast is computed as shown in the table below:
Size of First Car Small Compact Mid-Size Large
Overall Percents 7.2 8.7 14.5 69.6
Coeff: $20,000/yr -5.05 8.10 2.30 -5.35
Coeff: 28 Years old 11.41 -.25 13.13 -24.29
Coeff: Single 10.40 -2.10 -1.11 -7.19
Coeff: College Degree 15.64 -4.17 1.23 -12.69
Coeff: About the Same. 6.73 -4.91 -4.25 2.44
Coeff: A Little More Income 2.03 -.97 -2.74 1.68
Coeff: Professional -19.31 1.05 15.73 2.52
Forecast: 29.10 5.45 38.78 26.69
The forecast gives a set of predicted scores for each case; you predict a case to be in the dependent variable for which the probability is highest. The person represented in the table above would be assigned the “Mid-Size” category.
Example: Explaining size of first car for childless families. Predictors are income (bracketed), age of head of household (bracketed), education, and feelings of “well-offness.”
*** MNA – MULTIVARIATE NOMINAL SCALE ANALYSIS **
EXPLAINING SIZE OF FIRST CAR FOR CHILDLESS FAMILIES
Number of variables: 8
The data are not weighted
Transforming the data by RECODE number 1
For the dependent variable, cases with MD1 or MD2 values will be deleted
Number of cases = 138
0 cases deleted due to missing data on the dependent variable.
0 cases deleted due to missing data on the independent variables.
0 cases deleted due to predictor codes outside range -99 to 999.
PREDICTOR NON-EMPTY CODES
R1 BRACKETED INCOME 2 3 4 5 6
R2 BRACKETED AGE 1 2 3 4
V30 MARITAL STATUS 1 2 3 4 5
V32 EDUC OF HEAD 1 2 3 4 5 6 7 8
V46 B/W YEAR FROM NOW 1 3 5 8
V49 SM/LG INC NEXT YEAR 0 1 3 5 8 9
V251 OCCUPATION B 0 1 2 3 4 5 6 7 8 9
*** THE MINIMUM NUMBER OF SIGNIFICANT DIGITS IN THE SOLUTION VECTORS IS 4
DEPENDENT VARIABLE V193 SIZE OF CAR
Code 1 2 3 5
Small Compact Mid-Size Large Totals
Frequency 10 12 20 96 138
Percent 7.2 8.7 14.5 69.6 100.0
R-squared .2532 .3325 .3349 .3295
Adjusted .0000 .1035 .1066 .0994
*** MULTIVARIATE STATISTICS ***
GENERALIZED R-SQUARED .3207 MULTIVARIATE THETA .8043
CASES CORRECTLY CLASSED
1 2 3 5
Small Compact Mid-Size Large
N 2.000 7.000 9.000 93.000
PROPORTION .200 .583 .450 .969
ACTUAL(rows) vs. PREDICTED(columns) CLASSIFICATION MATRIX
| 1| 2| 3| 5|
| Small| Compact|Mid-Size| Large| Totals
|——–|——–|——–|——–|
Small 1| 2| 1| 1| 6| 10
ROW %| 20.0| 10.0| 10.0| 60.0| 100.0
|——–|——–|——–|——–|
Compact 2| 0| 7| 0| 5| 12
ROW %| .0| 58.3| .0| 41.7| 100.0
|——–|——–|——–|——–|
Mid-Size 3| 0| 1| 9| 10| 20
ROW %| .0| 5.0| 45.0| 50.0| 100.0
|——–|——–|——–|——–|
Large 5| 0| 1| 2| 93| 96
ROW %| .0| 1.0| 2.1| 96.9| 100.0
|——–|——–|——–|——–|
Totals 2 10 12 114 138
ROW % 1.4 7.2 8.7 82.6 100.0