Yihui Xie

Logit Loglinear Analysis

谢益辉 / 2006-03-27


The Logit Loglinear Analysis procedure analyzes the relationship between dependent (or response) variables and independent (or explanatory) variables. The dependent variables are always categorical, while the independent variables can be categorical (factors). Other independent variables, cell covariates, can be continuous, but they are not applied on a case-by-case basis. The weighted covariate mean for a cell is applied to that cell. The logarithm of the odds of the dependent variables is expressed as a linear combination of parameters. A multinomial distribution is automatically assumed; these models are sometimes called multinomial logit models. This procedure estimates parameters of logit loglinear models using the Newton-Raphson algorithm.

You can select from 1 to 10 dependent and factor variables combined. A cell structure variable allows you to define structural zeros for incomplete tables, include an offset term in the model, fit a log-rate model, or implement the method of adjustment of marginal tables. Contrast variables allow computation of generalized log-odds ratios (GLOR). The values of the contrast variable are the coefficients for the linear combination of the logs of the expected cell counts.

SPSS automatically displays model information and goodness-of-fit statistics. You can also display a variety of statistics and plots or save residuals and predicted values in the working data file.

Example. A study in Florida included 219 alligators. How does the alligators’ food type vary with their size and the four lakes in which they live? The study found that the odds of a smaller alligator preferring reptiles to fish is 0.70 times lower than for larger alligators; also, the odds of selecting primarily reptiles instead of fish were highest in lake 3.

Statistics. Observed and expected frequencies; raw, adjusted, and deviance residuals; design matrix; parameter estimates; generalized log-odds ratio; Wald statistic; and confidence intervals. Plots: adjusted residuals, deviance residuals, and normal probability plots.

多说几句:红色标记的前三个词是提醒注意统计专业术语的表达的(不要只认识因变量而不认识response variable,不要只认识概率而不认识odds);最后一个红色标记的是强调统计计算的地位(这个模型需要用Newton-Raphson算法求解);也可以看看老外们的例子,便会觉得好玩儿,他们研究的是佛罗里达州的鳄鱼吃虫子还是吃鱼与鳄鱼的体型大小和所生长的湖的关系。在中国估计不会有人研究这个吧,毕竟填饱肚子要紧(还是能看出经济文化差异的)。