Learning and Cognitive Systems

Purpose of Model

The overall purpose of the model is to identify the underlying latent classes of a set of i.i.d. binomial distributed frequency counts. These frequencies could be the number of a set of i.i.d. bernouilli distributed 0-1, no-yes, false-true answers to test or questionionaire item.

The model is designed (1) to detect the number ng of latent classes, (2) to estimate the group-specific ability parameter to generate 'true' or 'yes' responses, (3) to identify for each of the n subjects the class zmax[i] with the greatest membership probability, (4) to estimate the person-specific ability parameter theta[i] to generate 'true' or 'yes' responses, and (5) the person-specific parameter theta[i] should be a mixture of the group-specific ability parameters ability[g], where the person-specific class-membership probabilities z[i,g] are the mixture coefficients.

BUGS-Code for Models with two or three Hypothized Latent Classes and Latent Dirichlet Allocation (LDA)

This is a code walk-through. The group-specific ability parameters ability[g] are sampled from a beta distribution with priors from a beta(1,1) which is a uniform(0,1). These samples are constrained like conventional probabilities: 0 <= ability[g] <= 1 and sum(ability[]) = 1.

The class membership probabilities pr_class[g] are sampled from a Dirichlet distribution with priors alpha_g[g] = 1. This is a multivariate generalization of the beta(1,1). This means that we assume, that all membership probabilities are samled from a maximal uninformed prior.

For each person i we sample a set class membership probabilities z[i,1:ng] from a Dirichlet distribution with priors which are biased by the class membership probabilities pr_class[g] and the total number of subjects p.

Next we compute the person-specific ability parameter theta[i] as an expected value of the group-specific ability parameters ability[g]. Mixture coefficients are the person-specific class-membership probabilities z[i,g]. As a side effect we get the class membership z_max[i] for person i which is the group g whith highest z[i,g].

The third last line of the program describes that the frequency count k[i] of person i is a binomial distributed variable with person-specific ability parameter theta[i] and n, which denotes the number of items.

The first data set contains the original data from Lee & Wagenmakers Exam Scores problem but one exception. The second datum is changed from 17 to 16 to make the results more concise. We assume two latent classes (ng = 2). To our surprise the ability parameters have a tendency toward the extremes (Fig.01). There is a low tendency group 1 with mean(ability[1] ) = 0.0732 and Pr(Class=1)=0.2107 and a high tendency group 2 with mean(ability[2] ) = 0.9745 and Pr(Class=2)=0.7893 (Fig.04). By mixtures both ability parameters generate the person-specific ability parameters theta[i]. Pr(Class) and all theta[i] are within our expectations. When we look at the class assignments z_max (Fig.02-03) we see that subjects 6-15 are classified without any doubt into latent class 2 (high tendency class). The classification of subjects 1-5 is done with uncertainty. All these subjects (except subject 2) have a higher classification probability for class 2 but at the same time a nonzero proability for class 1. Now we see the reason that we diminished the score of subject 2.

When we analyze the same data set under the assumption of ng=3 latent classes, the posteriori distribution of the ability parameters are bimodal with nearly the same shape (Fig.05). We suspect that the hypothesis of ng=3 should be abandoned in favor of ng=2. The class assignments z_max[i] support this conclusion by no clear class preference besides class 2.

A second data set was generated by copying the first five data, adding 10 to n and to the data of subjects 11-15. From our intuition we would expect three classes Pr(Class=1)=0.50, Pr(Class=2)=0.25, and Pr(Class=3)=0.25.The results (Fig.09-13) demonstrate the clear separation of two latent classes.

The same is true for the analysis of the same data set under the hypothesis ng=3 (Fig.14-19). The classes 2 and 3 could not be clearly discriminated. This means that subjects 1-5 and 16-20 belong to one latent class and subjects 6-15 to the other.

What has to be reflected is the Pr(Class). These results are not so clear cut. But as a summary the results are very promising.

To our knowledge this model is new. It does not need the EM-algorithm. In the next future we will study what is the relation between our model to the LDA-Topic Model (Blei, D.M.; Ng, A.Y.; Jordan, M.I.; Lafferty, J.; Latent Dirichlet Allocation, Journal of Machine Learning Research, 3 (4-5), pp. 993-1022).

JAGS-Code for Models with two or three Hypothized Latent Classes and Latent Dirichlet Allocation (LDA)

Various authors prefer JAGS to BUGS (e.g. Kruschke, J.K., Doing Bayesian Data Analysis, 2015, 2/e, Academic Press, ISBN 978-0-12-405888-0). The main differences are described in Lunn et al. in chapter 12.6  (Lunn, D.; Jackson, Chr.; Best, N.; Thomas, A., Spiegelhalter, D.; The BUGS Book: A Practical Introduction to Bayesian Analysis, Boca Raton, FL USA: CRC Press, 2013, ISBN 978-1-58488-849-9). We wanted to explore whether there are significant differences in modelling. We found out that are subtle differences in the modelling languages. The R-integration of JAGS is better than that of OpenBUGS. Models can be described as a user-defined parameterless R-function. So all editing, debugging and testing can be done with the excellent R-IDE RStudio.

We had to revise our OpenBUGS-model in three places: (1) we replaced the model{...} BUGS-code-snippet by the R-code-snippet function(){...}; (2) because the definition of rank(...)-function differ in BUGS and JAGS we had to recode the computation of the class assignment z_max[]; (3) all data are encoded as R-statements and used as arguments in the jags-function of the R2jags-library.

The results for two latent classes and the slightly modified data of Lee & Wagenmakers (k[2]=16 instead of 17, 15 subjects, 40 items) are nearly identical to those of the OpenBugs analysis (Fig.01-04). The means of the posteriors for abilities and class probabilities differ only by 0.08.

Nearly the same is true for the slightly modified problem ((k[2]=16 instead of 17, k[16]-k[20] added, 20 subjects, 50 items) (Fig.05-08).