The Multinomial distribution generalizes the Binomial o
\[Prob \{ \mathbf{Y} = \mathbf{y} \} = {n \choose {y_1 \ldots y_J}} \prod_{j=1}^{J} {p_j}^{y_j} ~ ~ \text{with} ~~ \sum_{j=1}^{n} y_j = n\]
\[\eta _ { i j } = x _ { i } ^ { T } \beta _ { j } = \log \frac { p _ { i j } } { p _ { i 1 } } , \quad j = 2 , \ldots , J \]
We have
\[p _ { i j } = \frac { \exp \left( \eta _ { i j } \right) } { 1 + \sum _ { j = 2 } ^ { J } \exp \left( \eta _ { i j } \right) } \]
by setting \(\eta_{i,1} = 0\)
This model can be given a conventional logistic regression interpretation
High School and Beyond study (USA), subset of 200 students - outcome = program (academic, vocational, or general) chosen for high school - predictor variables = social economic status, sex, and test scores (writing, etc.)
Initial data examination is complicated by structure of data. Trichotomize the writing score data to for initial examination of relationships.
hsb <- read.dta("https://stats.idre.ucla.edu/stat/data/hsbdemo.dta")
hsb$write.c <- qcut(hsb$write, nint=3)
unisum(hsb[c(2,3,5,7)])
## hsb[c(2, 3, 5, 7)] female ses
## N :200 male : 91(45.5%) low :47(23.5%)
## Complete cases :200 female:109(54.5%) middle:95(47.5%)
## % missing items: 0 high :58(29.0%)
## Variables : 4
## Factors : 3
##
##
##
##
## prog write
## general : 45(22.5%) Mean : 52.775000
## academic:105(52.5%) S.D. : 9.478586
## vocation: 50(25.0%) Min. : 31.000000
## 1st Qu.: 45.750000
## Median : 54.000000
## 3rd Qu.: 60.000000
## Max. : 67.000000
## n :200.000000
##
prelim.stats( prog ~ female + ses + write.c, data=hsb)
## prog
## general : 45(22.5%)
## academic:105(52.5%)
## vocation: 50(25.0%)
##
##
## female
## general academic vocation
## male 21/91(23.1%) 47/91(51.6%) 23/91(25.3%)
## female 24/109(22.0%) 58/109(53.2%) 27/109(24.8%)
##
##
## ses
## general academic vocation
## low 16/47(34.0%) 19/47(40.4%) 12/47(25.5%)
## middle 20/95(21.1%) 44/95(46.3%) 31/95(32.6%)
## high 9/58(15.5%) 42/58(72.4%) 7/58(12.1%)
##
## write.c
## general academic vocation
## <=49 19/72(26.4%) 20/72(27.8%) 33/72(45.8%)
## (49,59] 19/75(25.3%) 45/75(60.0%) 11/75(14.7%)
## >59 7/53(13.2%) 40/53(75.5%) 6/53(11.3%)
##
Two separate fits are required to look at the 3 possible comparisons. The first looks at comparing outcomes \(academic\) and \(vocation\) verus \(general\) (reference outcome)
library(nnet)
multiFit <- multinom( prog ~ female + ses + write, data=hsb)
## # weights: 18 (10 variable)
## initial value 219.722458
## iter 10 value 178.928854
## final value 178.858982
## converged
summary(multiFit)
## Call:
## multinom(formula = prog ~ female + ses + write, data = hsb)
##
## Coefficients:
## (Intercept) femalefemale sesmiddle seshigh write
## academic -2.853727 -0.07993539 0.5190018 1.1472866 0.05896798
## vocation 2.489106 0.52635457 0.9258690 0.3281005 -0.06559949
##
## Std. Errors:
## (Intercept) femalefemale sesmiddle seshigh write
## academic 1.166499 0.3967916 0.4506565 0.5229045 0.02236771
## vocation 1.184545 0.4629303 0.5015360 0.6628617 0.02507210
##
## Residual Deviance: 357.718
## AIC: 377.718
We can change the reference outcome to academic.
hsb$prog2 <- relevel(hsb$prog, ref = "academic")
multiFit <- multinom( prog2 ~ female + ses + write, data=hsb)
## # weights: 18 (10 variable)
## initial value 219.722458
## iter 10 value 179.133277
## final value 178.858982
## converged
summary(multiFit)
## Call:
## multinom(formula = prog2 ~ female + ses + write, data = hsb)
##
## Coefficients:
## (Intercept) femalefemale sesmiddle seshigh write
## general 2.853832 0.07994116 -0.5190013 -1.1472955 -0.05896978
## vocation 5.342915 0.60630081 0.4068678 -0.8192049 -0.12456908
##
## Std. Errors:
## (Intercept) femalefemale sesmiddle seshigh write
## general 1.166501 0.3967915 0.4506564 0.5229043 0.02236774
## vocation 1.176285 0.4225739 0.4863964 0.6070177 0.02390867
##
## Residual Deviance: 357.718
## AIC: 377.718
As usual we can check for significance of predictor variables using analysis of deviance.
hsb$prog2 <- relevel(hsb$prog, ref = "academic")
multiFit2 <- multinom( prog2 ~ ses + write, data=hsb)
## # weights: 15 (8 variable)
## initial value 219.722458
## iter 10 value 179.982880
## final value 179.981726
## converged
anova(multiFit2,multiFit)
## Model Resid. df Resid. Dev Test Df LR stat.
## 1 ses + write 392 359.9635 NA NA
## 2 female + ses + write 390 357.7180 1 vs 2 2 2.245489
## Pr(Chi)
## 1 NA
## 2 0.3253855
A form of multinomial regression appropriate when response categories are ordered. - e.g. above data, suppose we wish to predict socio-economic status from test scores, including writing, math, science. - proportional odds model arises from considering dichomotizing outcome as follows
\[logit(Prob\{Y_i>j\}) = \alpha_j + \mathbf{x}_i^T \mathbf{\beta}\]
To illustrate, we can fit two binary logistic models to outcomes \(Y < 2\) (“Low SES”) or \(Y < 3\) (“Low or Middle SES”) as follows:
##
## Call: glm(formula = I(as.integer(ses) > 1) ~ write + math + science +
## socst, family = binomial, data = hsb)
##
## Coefficients:
## (Intercept) write math science socst
## -2.69172 -0.04138 0.02028 0.04280 0.05574
##
## Degrees of Freedom: 199 Total (i.e. Null); 195 Residual
## Null Deviance: 218.1
## Residual Deviance: 198.1 AIC: 208.1
##
## Call: glm(formula = I(as.integer(ses) > 2) ~ write + math + science +
## socst, family = binomial, data = hsb)
##
## Coefficients:
## (Intercept) write math science socst
## -5.596828 -0.002574 0.017054 0.024608 0.048587
##
## Degrees of Freedom: 199 Total (i.e. Null); 195 Residual
## Null Deviance: 240.9
## Residual Deviance: 220.5 AIC: 230.5
##
## Re-fitting to get Hessian
## Call:
## polr(formula = ses ~ write + math + science + socst, data = hsb)
##
## Coefficients:
## Value Std. Error t value
## write -0.02081 0.02072 -1.0042
## math 0.02011 0.02064 0.9743
## science 0.03382 0.01902 1.7786
## socst 0.05158 0.01706 3.0243
##
## Intercepts:
## Value Std. Error t value
## low|middle 3.0936 0.9247 3.3456
## middle|high 5.4322 0.9856 5.5113
##
## Residual Deviance: 391.0894
## AIC: 403.0894
##
## Test for proportional odds
## --------------------------------------------
## Test for X2 df probability
## --------------------------------------------
## Omnibus 1.74 4 0.78
## write 1.55 1 0.21
## math 0.01 1 0.92
## science 0.42 1 0.52
## socst 0.08 1 0.78
## --------------------------------------------
##
## H0: Parallel Regression Assumption holds
Note: another motivation for this and other ordinal response models is to assume categories arise from underlying but unobserved (latent) variable.