Multinomial outcome models

The Multinomial distribution generalizes the Binomial o

  • n independent trials, each the same J distinct outcomes, \(1 , 2 , \ldots , J\).
  • Probabilities for \(J\) outcomes are \(p_j , j = 1, \ldots J\).
  • Record vector of counts \(Y\) for each outcome, i.e. \(\mathbf{Y} = \left( Y_1 , Y_2, ..., Y_J) \right)\),

Multinomial distribution function

\[Prob \{ \mathbf{Y} = \mathbf{y} \} = {n \choose {y_1 \ldots y_J}} \prod_{j=1}^{J} {p_j}^{y_j} ~ ~ \text{with} ~~ \sum_{j=1}^{n} y_j = n\]

Multinomial logit model (a.k.a. polytomous regression)

  • Observe m sets of counts, \(Y_i\), \(i = 1, \ldots, m\)
  • Model dependence of the outcome probabilities by \(p_{ij} , j = 1, \ldots J , i=1, \ldots , m\) on \(\mathbf{x_i}\), vectors of explanatory variables
  • Define logits

\[\eta _ { i j } = x _ { i } ^ { T } \beta _ { j } = \log \frac { p _ { i j } } { p _ { i 1 } } , \quad j = 2 , \ldots , J \]

We have

\[p _ { i j } = \frac { \exp \left( \eta _ { i j } \right) } { 1 + \sum _ { j = 2 } ^ { J } \exp \left( \eta _ { i j } \right) } \]

by setting \(\eta_{i,1} = 0\)

This model can be given a conventional logistic regression interpretation

  • Consider \(i^{th}\) observation \(Y_i\) unaggregated case (i.e. \(n_i = 1\))
  • Condition on \(Y_{ij} + Y_{ij'}=1\) for two particular outcomes \(j\) and \(j'\)
    • i.e. we know that we’ve observed either outcome \(j\) or \(j'\)
  • Easily shown that \[ Prob\{ ~ \mathrm{observe} ~ j ~ | ~ \mathrm{observe ~ one ~ of} ~ j ~ \mathrm{or} ~ j' ~ \} = \frac{exp(\eta_{ij} - \eta_{ij'})} {1 + exp(\eta_{ij} - \eta_{ij'})}\]
    • Log odds = \(\mathbf{x}_i^T (\mathbf{\beta}_j - \mathbf{\beta}_{j'})\)

Faraway example

  • voter affiliation as Democratic, Independent or Republican.
  • \(x\)’s are age, education level and income group for each respondents.
  • if we model individual respondents we have \(n_i = 1\) for each \(Y_i\)

UCLA Institute for Digital Research and Education example

High School and Beyond study (USA), subset of 200 students - outcome = program (academic, vocational, or general) chosen for high school - predictor variables = social economic status, sex, and test scores (writing, etc.)

Initial data examination is complicated by structure of data. Trichotomize the writing score data to for initial examination of relationships.

hsb <- read.dta("https://stats.idre.ucla.edu/stat/data/hsbdemo.dta")
hsb$write.c <- qcut(hsb$write, nint=3) 
unisum(hsb[c(2,3,5,7)])
##  hsb[c(2, 3, 5, 7)]    female              ses               
##  N              :200   male  : 91(45.5%)   low   :47(23.5%)  
##  Complete cases :200   female:109(54.5%)   middle:95(47.5%)  
##  % missing items:  0                       high  :58(29.0%)  
##  Variables      :  4                                         
##  Factors        :  3                                         
##                                                              
##                                                              
##                                                              
##                                                              
##  prog                  write               
##  general : 45(22.5%)   Mean   : 52.775000  
##  academic:105(52.5%)   S.D.   :  9.478586  
##  vocation: 50(25.0%)   Min.   : 31.000000  
##                        1st Qu.: 45.750000  
##                        Median : 54.000000  
##                        3rd Qu.: 60.000000  
##                        Max.   : 67.000000  
##                        n      :200.000000  
## 
prelim.stats( prog ~ female + ses + write.c, data=hsb)
##  prog                  
##  general : 45(22.5%)   
##  academic:105(52.5%)   
##  vocation: 50(25.0%)   
##                        
##                        
##  female                                             
##            general       academic      vocation     
##  male    21/91(23.1%)  47/91(51.6%)  23/91(25.3%)   
##  female 24/109(22.0%) 58/109(53.2%) 27/109(24.8%)   
##                                                     
##                                                     
##  ses                                             
##            general     academic     vocation     
##  low    16/47(34.0%) 19/47(40.4%) 12/47(25.5%)   
##  middle 20/95(21.1%) 44/95(46.3%) 31/95(32.6%)   
##  high    9/58(15.5%) 42/58(72.4%)  7/58(12.1%)   
##                                                  
##  write.c                                          
##             general     academic     vocation     
##  <=49    19/72(26.4%) 20/72(27.8%) 33/72(45.8%)   
##  (49,59] 19/75(25.3%) 45/75(60.0%) 11/75(14.7%)   
##  >59      7/53(13.2%) 40/53(75.5%)  6/53(11.3%)   
## 

Two separate fits are required to look at the 3 possible comparisons. The first looks at comparing outcomes \(academic\) and \(vocation\) verus \(general\) (reference outcome)

library(nnet)
multiFit <- multinom( prog ~ female + ses + write, data=hsb)
## # weights:  18 (10 variable)
## initial  value 219.722458 
## iter  10 value 178.928854
## final  value 178.858982 
## converged
summary(multiFit)
## Call:
## multinom(formula = prog ~ female + ses + write, data = hsb)
## 
## Coefficients:
##          (Intercept) femalefemale sesmiddle   seshigh       write
## academic   -2.853727  -0.07993539 0.5190018 1.1472866  0.05896798
## vocation    2.489106   0.52635457 0.9258690 0.3281005 -0.06559949
## 
## Std. Errors:
##          (Intercept) femalefemale sesmiddle   seshigh      write
## academic    1.166499    0.3967916 0.4506565 0.5229045 0.02236771
## vocation    1.184545    0.4629303 0.5015360 0.6628617 0.02507210
## 
## Residual Deviance: 357.718 
## AIC: 377.718

We can change the reference outcome to academic.

hsb$prog2 <- relevel(hsb$prog, ref = "academic")
multiFit <- multinom( prog2 ~ female + ses + write, data=hsb)
## # weights:  18 (10 variable)
## initial  value 219.722458 
## iter  10 value 179.133277
## final  value 178.858982 
## converged
summary(multiFit)
## Call:
## multinom(formula = prog2 ~ female + ses + write, data = hsb)
## 
## Coefficients:
##          (Intercept) femalefemale  sesmiddle    seshigh       write
## general     2.853832   0.07994116 -0.5190013 -1.1472955 -0.05896978
## vocation    5.342915   0.60630081  0.4068678 -0.8192049 -0.12456908
## 
## Std. Errors:
##          (Intercept) femalefemale sesmiddle   seshigh      write
## general     1.166501    0.3967915 0.4506564 0.5229043 0.02236774
## vocation    1.176285    0.4225739 0.4863964 0.6070177 0.02390867
## 
## Residual Deviance: 357.718 
## AIC: 377.718

As usual we can check for significance of predictor variables using analysis of deviance.

hsb$prog2 <- relevel(hsb$prog, ref = "academic")
multiFit2 <- multinom( prog2 ~ ses + write, data=hsb)
## # weights:  15 (8 variable)
## initial  value 219.722458 
## iter  10 value 179.982880
## final  value 179.981726 
## converged
anova(multiFit2,multiFit)
##                  Model Resid. df Resid. Dev   Test    Df LR stat.
## 1          ses + write       392   359.9635           NA       NA
## 2 female + ses + write       390   357.7180 1 vs 2     2 2.245489
##     Pr(Chi)
## 1        NA
## 2 0.3253855

Ordinal logistic regression - Proportional odds model

A form of multinomial regression appropriate when response categories are ordered. - e.g. above data, suppose we wish to predict socio-economic status from test scores, including writing, math, science. - proportional odds model arises from considering dichomotizing outcome as follows

\[logit(Prob\{Y_i>j\}) = \alpha_j + \mathbf{x}_i^T \mathbf{\beta}\]

To illustrate, we can fit two binary logistic models to outcomes \(Y < 2\) (“Low SES”) or \(Y < 3\) (“Low or Middle SES”) as follows:

## 
## Call:  glm(formula = I(as.integer(ses) > 1) ~ write + math + science + 
##     socst, family = binomial, data = hsb)
## 
## Coefficients:
## (Intercept)        write         math      science        socst  
##    -2.69172     -0.04138      0.02028      0.04280      0.05574  
## 
## Degrees of Freedom: 199 Total (i.e. Null);  195 Residual
## Null Deviance:       218.1 
## Residual Deviance: 198.1     AIC: 208.1
## 
## Call:  glm(formula = I(as.integer(ses) > 2) ~ write + math + science + 
##     socst, family = binomial, data = hsb)
## 
## Coefficients:
## (Intercept)        write         math      science        socst  
##   -5.596828    -0.002574     0.017054     0.024608     0.048587  
## 
## Degrees of Freedom: 199 Total (i.e. Null);  195 Residual
## Null Deviance:       240.9 
## Residual Deviance: 220.5     AIC: 230.5
## 
## Re-fitting to get Hessian
## Call:
## polr(formula = ses ~ write + math + science + socst, data = hsb)
## 
## Coefficients:
##            Value Std. Error t value
## write   -0.02081    0.02072 -1.0042
## math     0.02011    0.02064  0.9743
## science  0.03382    0.01902  1.7786
## socst    0.05158    0.01706  3.0243
## 
## Intercepts:
##             Value   Std. Error t value
## low|middle   3.0936  0.9247     3.3456
## middle|high  5.4322  0.9856     5.5113
## 
## Residual Deviance: 391.0894 
## AIC: 403.0894
## 
## Test for proportional odds
## -------------------------------------------- 
## Test for X2  df  probability 
## -------------------------------------------- 
## Omnibus      1.74    4   0.78
## write        1.55    1   0.21
## math     0.01    1   0.92
## science      0.42    1   0.52
## socst        0.08    1   0.78
## -------------------------------------------- 
## 
## H0: Parallel Regression Assumption holds

Note: another motivation for this and other ordinal response models is to assume categories arise from underlying but unobserved (latent) variable.