Learning and Cognitive Systems

BCM-Ch04.2: The Seven Scientists

David MacKay presents in his book 'Information Theory, Inference, and Learning Algorithms' a Nonbayesian exercise, which was reused by Lee & Wagenmakers (2014) for a Bayesian modeling demonstration in WinBUGS. We'll discuss their model and demonstrate that their WinBUGS-modelling approach has shortcomings.

"N datapoints {x(n)} are drawn from N distributions, all of which are Gaussian with a common mean mu but with di fferent unknown standard deviations sigma(n). What are the maximum likelihood parameters mu, {sigma(n)} given the data? For example, seven scientists (A, B, C, D, E, F, G) with wildly-di ffering experimental skills measure mu. You expect some of them to do accurate work (i.e., to have small sigma(n), and some of them to turn in wildly inaccurate answers (i.e.,to have enormous sigma(n). Figure 22.9 shows their seven results. What is mu, and how reliable is each scientist? I hope you agree that, intuitively, it looks pretty certain that A and B are both inept measurers, that D-G are better, and that the true value of mu is somewhere close to 10. But what does maximizing the likelihood tell you?" (MacKay, David J.C., Information Theory, Inference, and Learning Algorithms, Cambridge University Press 2003, Exercise 22.15, p.309).

We disagree with MacKay about the 'ineptness' of the two scientists A and B. The data could be interpreted in different ways. We think that there two extreme scientific hypotheses: (1) All seven scientists have the same measurement skills with the same noise level. The coincidence of the measurements from scientists D-G is by chance. (2) All seven scientists have individual measurement skills with the different noise levels. The coincidence of the measurements from scientists D-G is due partly by chance and partly by their small noise distributions.

BCM-Ch04.2: Originally published BUGS-Code for the 'Seven Scientists'

TheLee & Wagenmakers were inspired by MacKay's exercise 22.15: "This problem is from MacKay (2003, p.309) where it is, among other things, treated to a Bayesian solution, but not quite using a graphical modeling approach, nor relying on computational sampling methods" (Lee & Wagenmakers, Bayesian Cognitive Modeling, 2013, p.56). The specification for their WinBUGS code (p.56f and below) was the probabilistic graphical model in Fig. 4.2 (p.56).

Only by inspection without any simulation run we found several issues in the probabilistic graphical model and the BUGS code:

  • Graphical model and code are not consistent: the graph contains only one set of plates for the data, but the model has two loops over i=1:n.
  • The datum i depends on the precision parameter lambda (the BUGS developers use the symbol tau for this), which has no intuitive semantics in the scale of the measurement compared to the standard deviation sigma.
  • For precisions near zero the corresponding sigmas tend to approach infinity.
  • the model contains with n+1 parameters but only n independent measurements more parameters than measurements: the model seems to be underconstraint.

Model runs were done within OpenBUGS. Code and output are shown below. The results tend to support MacKays intuitions.

The priors for the precision parameter lambda(i) = 1/variance(i) were declared by the authors as lambda(i) ~ dgamma(.001, .001). What is lacking in their book is the meaning of these abstract parameters in terms of the better interpretable parameter sigma of the scientist-specific measurement errors. In a first step we determined the mean and the variance of the precision lambda. For the hyperparameters a=0.001 and b=0.001 of the gamma distribution we calculated the parameters mean(lambda(i)) = a/b = 0.001/0.001 = 1.0, variance(lambda(i)) = a/b^2 = 0.001/0.001^2 = 1000.0, and std(lambda(i)) = sqrt(variance(lambda(i))) = sqrt(a)/b = 31.6228. The next step should be the determination of mean(sigma), var(sigma), and std(sigma) of the prior sigma distribution. We could not obtain these values by an OpenBUGS simulation run. These runs were aborted with error messages.

Lee & Wagenmakers did not provide these parameters, too. Instead they write "This raises the issue of whether inference is sensitive to the essentially arbritary value 0.001, and it is sometimes the case that using other small values such as 0.01 or 0.1 leads to more stable sampling in WinBUGS." (Lee & Wagenmakers, 2013, p.57). 

But why dealing with precisions instead of standard deviations for using a Gaussian prior? "Since the very first version of BUGS, the normal has been parameterised by the precision tau = 1/sigma^2 rather than the commoner variance sigma^2 or standard deviation sigma. When used as a prior, for example, smaller precisions give vaguer priors. In retrospect this was an unwise decision - although using tau gives tidier expressions for the posterior distributions of the parameters under conjugate priors, it has caused a lot of confusion. However, changing the parameterisation at this stage would be likely to redouble the confusion for existing users!" (Lunn et al., The BUGS Book, 2013, p.343f).

Stunning is that Lee & Wagenmakers do not present and discuss their simulation results.

We do that here for our OpenBUGS simulation and below for our WebCHURCH models. The posterior mean of mu is 9.908 and deviates very substantially from the non-Bayesian sample mean 3.47. The non-Bayesian sample mean is so low because of the two 'outliers' with measurements -27.020 and 3.570. Both could be easily identified without Bayesian analysis from the visual inspection of the data plot alone (MacKay, 2003, Fig. 22.9, p.310). For these two 'bumblers' the precisions lambda[1] and lambda[2] are lowest and their sigmas are highest. Because of this divergence of Bayesian and non-Bayesian means it is evident that the careful selection of priors has an important influence on the Bayesian results.

It is far more difficult to rank the non-outliers with respect of their measurement precision. This could not be done by visual inspection alone. Now, the sigmas and lambdas of the Bayesian model are obligatory. The range of the sigma[i] is between 0.8636 for scientist D and 386.7 for scientist A. Though scientist A is an outlier, such a large standard deviation for his personal posterior distribution of measurement errors is a bit irritating. This would mean that the scientist would have lost complete control over his measurements.

This is the right motivation to remodel the problem in WebCHURCH. We do that in several steps. First we examine the question whether it is wise to first sample the precisions lambda[i] and the derive the sigma[i] by the deterministic assignment sigma[i] <- 1/sqrt(lambda[i]). Second, we make a small grid search in the parameter space to obtain more plausible parameters for the gamma-priors. Third, we study the plausability of two alternative research hypotheses: (1) 'All seven scientists measure the same mu with the same precision' and (2) 'All seven scientists measure the same mu with different precisions'.

BCM-Ch04.2: Preliminary Reflections for Compiling the BUGS Code to WebCHURCH

We could try to compile the BUGS code line by line into WebCHURCH code. But we have to be careful because of the different parameterizations of the Gaussians and the Gammas.

In BUGS the Gaussian distribution dnorm is parameterized with the mean mu and the precision tau (Lee & Wagenmakers use instead of tau the term lambda). The relation between the precision and the variance is:

  • tau = 1/variance
  • variance = 1/tau

(Lunn et al, The BUGS Book, 2013, p.343). In WebCHURCH the Gaussian is parameterized more intuitively with mu and std.

In BUGS the Gamma distribution dgamma(a, b) is parameterized with a and b so that the mean is m = a/b and the variance is var = a/b^2. In WebCHURCH you get the mean and variance by replacing b <- 1/b, so the mean is m = a*b, the variance is var = a*b^2, and the standard deviation is std = sqrt(a)*b.

This parameterization in BUGS is the only reason to use in BUGS code the precision instead of the more intuitive standard deviation. But what are the parameters of the priors of the standard deviations when the hyperparameters of the priors of the precisions are:

  • lambda[i] ~ dgamma(0.001, 0.001) ?

According a rough calculation we expect for the gamma-distributed lambda[i] a mean m = a/b = 0.001/0.001 = 1 and var s^2 = a/b^2 = 0.001/0.001^2 = 1000.0. So the std s = sqrt(1000) = 31.62.

But what are the mean and std of the sigmas? An exact answer would require the transformation of the lambda-pdf into the sigma-pdf with the help of Jacobians (e.g. ch 2.4 "Transformation of Random Variables", in: Lunn et al., The BUGS Book, 2013, p.24)."However, transformations are straightforward when using a simulation approach" (Lunn et al., The BUGS Book, 2013, p.24). When trying that we run into new problems.

In the BUGS code we find the statement sigma[i] <- 1/sqrt(lambda[i]). The disturbing thing is that a gamma-distributed variable with mean 1 and such a large standard deviation has many values close or equal to zero, so that the generated sample value of sigma[i] comes close or identical to infinity. We tried to compute in OpenBUGS the priors for lambda ~ dgamma(0.001, 0.001). This was not possible. The same was true for lambda ~ dgamma(0.01, 0.01). Instead we were successful for lambda ~ dgamma(0.1, 0.1) with the same mean(lambda) = 1.0 but smaller std(lambda) = 3.162. The corresponding values for sigma <- 1/sqrt(lambda) are bit mysterious.

We compiled the following BUGS code snippet to WebCHURCH (code and output below):

  • lambda ~ dgamma(0.001, 0.001)
  • sigma <- 1/sqrt(lambda)

We see that 47% (!) of the sampled lambda values are equal to 0. The mean(lambda) and the std(lambda) are nearly equal to the expected values 1 and 31.62, but mean(sigma) and std(sigma) could due to zero-valued lambdas not be computed. We did the same computations with (code and output below):

  • lambda ~ dgamma(0.01, 0.01)
  • sigma <- 1/sqrt(lambda)

and (code and output below):

  • lambda ~ dgamma(0.1, 0.1)
  • sigma <- 1/sqrt(lambda).

Results are shown in the table:

Parameters of Precision and Sigma Priors
abmax(pr)min(pr)%(0)m(pr)sd(pr)max(sig)min(sig)mean(sig)sd(sig)
0.11062.003.7e-5601.003.155.2e+270.138.7e+222.1e+25
0.01100810.4204.830.9810.74Infinity0.04InfinityNaN
0.00110003806.01047.000.9131.87Infinity0.02InfinityNaN

We see that the BUGS precision ~ dgamma(0.001, 0.001) or precision ~ dgamma(0.01, 0.01) which corresponds to WebCHURCH (define precision (gamma 0.001 1000)) or (define precision (gamma 0.01 100)) leads to improper priors for sigma. Only the hyperparameters with BUGS precision ~ dgamma(0.1, 0.1) which corresponds to WebCHURCH (define precision (gamma 0.1 10)) (first line in table) leads to a proper uninformed prior for sigma. But the range for this prior is far too wide. We know that the sd of the invidual measurement noise is far smaller.

We could not compute in OpenBUGS the priors of sigma for precision ~ dgamma(0.001, 0.001) or precision ~ dgamma(0.01, 0.01). The runs stopped with error messages. Astonishing was that the whole model produced posterior values for mu and sigma[i]. We think that some 'magic hand' or 'expert sytem' inside BUGS plugs in a proper prior in that cases.

Instead of using the detour called 'precision' one should sample sigmas directly as we do in the WebCHURCH model. This is in agreement with a recommendation of the BUGS developers. "It is preferable to construct a prior distribution on a scale on which one has a good interpretation of magnitude, such as standard deviation, rather than one which may be convenient for mathematical purposes but is fairly incomprehensible, such as the logarithm of the precision." (Lunn et al., The BUGS Book, 2013, p.82).

BCM-Ch04.2: WebCHURCH-code for Single-Sigma-Model of the 'Seven Scientists'

According to our second research hypothesis ('All seven scientist measure the same mu but with different precisions sigma(i) ') we estimated for various combinations of alpha and beta the Bayesian model. For each model we computed the ratio of the smallest to largest mean(sigma) (table below). If this ratio is equal to 1.0 then all seven sigmas are equal. In this case it is evident that the first research hypothesis ('All seven scientist measure the same mu with same precision sigma')  is an adequate explanation of the data.

 

Prior and Posterior Parameters of Multiple-Sigma Models
alphabetaprior mean mprior std smin(mean(sigma))max(mean(sigma))min-to-max-ratioposterior mean mu
0.200.200.040.090.691.930.366.79
0,300,300.090.161.012.430.426.72
0.400.400.160.250.383.450.118.92
0.500.500.250.350.583.850.159.16
0.600.600.360.460.225.420.049.87
0.700.700.490.590.276.160.049.85
0.800.800.640.720.327.790.049.85
0.900.900.810.850.479.840.059.77
1.001.001.001.000.468.750.059.85
2.002.004.002.832.5915.220.179.24
3.003.009.005.206.7018.920.358.29
4.003.0016.008.0012.8623.230.557.26
5.005.0025.0011.1820.8728.880.725.78
6.006.0036.0014.7030.9536.570.854.77
7.007.0049.0018.5242.8846.670.923.65
8.008.0064.0022.6356.9259.440.963.00
9.009.0081.0027.0072.7674.490.982.10
10.0010.00100.0031.6290.7092.070.991.82
20.0020.00400.0089.44379.73380.991.000.63
30.0030.00900.00164.32867.90871.681.00-0.60

For alpha = 7 and beta = 7 we found a model under the second hypothesis that has nearly equal small mean(sigmas) for all scientists. For this parameter combination we simplified the model: one common mu and one common sigma. Code and output are below.

The WebCHURCH-code can be found here.

BCM-Ch04.2: WebCHURCH-code for Multiple-Sigma-Model (a=0.6, b=0.6) of the 'Seven Scientists'

According to our second research hypothesis ('All seven scientist measure the same mu but with different precisions sigma(i) ') we estimated for various combinations of alpha and beta the Bayesian model. For each model we computed the ratio of the smallest to largest mean(sigma) (table below). If this ratio is equal to 1.0 then all seven sigmas are equal. In this case it is evident that the first research hypothesis ('All seven scientist measure the same mu with same precision sigma')  is an adequate explanation of the data.

 

Prior and Posterior Parameters of Multiple-Sigma Models
alphabetaprior mean mprior std smin(mean(sigma))max(mean(sigma))min-to-max-ratioposterior mean mu
0.200.200.040.090.691.930.366.79
0,300,300.090.161.012.430.426.72
0.400.400.160.250.383.450.118.92
0.500.500.250.350.583.850.159.16
0.600.600.360.460.225.420.049.87
0.700.700.490.590.276.160.049.85
0.800.800.640.720.327.790.049.85
0.900.900.810.850.479.840.059.77
1.001.001.001.000.468.750.059.85
2.002.004.002.832.5915.220.179.24
3.003.009.005.206.7018.920.358.29
4.003.0016.008.0012.8623.230.557.26
5.005.0025.0011.1820.8728.880.725.78
6.006.0036.0014.7030.9536.570.854.77
7.007.0049.0018.5242.8846.670.923.65
8.008.0064.0022.6356.9259.440.963.00
9.009.0081.0027.0072.7674.490.982.10
10.0010.00100.0031.6290.7092.070.991.82
20.0020.00400.0089.44379.73380.991.000.63
30.0030.00900.00164.32867.90871.681.00-0.60

For alpha = 0.6 and beta = 0.6 we found a model under the second hypothesis that shows the greatest range of individual measurement precisions for all scientists. Code and output are below.

The WebCHURCH-code can be found <media 112633 - - "SONSTIGES, PCM BCM Ch04, PCM_BCM_Ch04.2_ChurchCode_a_b_0_6, 7.7 KB">here</media>.

BCM-Ch04.2: WebCHURCH-code for Multiple-Sigma-Model (a=7, b=7) of the 'Seven Scientists'

According to our second research hypothesis ('All seven scientist measure the same mu but with different precisions sigma(i) ') we estimated for various combinations of alpha and beta the Bayesian model. For each model we computed the ratio of the smallest to largest mean(sigma) (table below). If this ratio is equal to 1.0 then all seven sigmas are equal. In this case it is evident that the first research hypothesis ('All seven scientist measure the same mu with same precision sigma')  is an adequate explanation of the data.

 

Prior and Posterior Parameters of Multiple-Sigma Models
alphabetaprior mean mprior std smin(mean(sigma))max(mean(sigma))min-to-max-ratioposterior mean mu
0.200.200.040.090.691.930.366.79
0,300,300.090.161.012.430.426.72
0.400.400.160.250.383.450.118.92
0.500.500.250.350.583.850.159.16
0.600.600.360.460.225.420.049.87
0.700.700.490.590.276.160.049.85
0.800.800.640.720.327.790.049.85
0.900.900.810.850.479.840.059.77
1.001.001.001.000.468.750.059.85
2.002.004.002.832.5915.220.179.24
3.003.009.005.206.7018.920.358.29
4.003.0016.008.0012.8623.230.557.26
5.005.0025.0011.1820.8728.880.725.78
6.006.0036.0014.7030.9536.570.854.77
7.007.0049.0018.5242.8846.670.923.65
8.008.0064.0022.6356.9259.440.963.00
9.009.0081.0027.0072.7674.490.982.10
10.0010.00100.0031.6290.7092.070.991.82
20.0020.00400.0089.44379.73380.991.000.63
30.0030.00900.00164.32867.90871.681.00-0.60

For alpha = 7 and beta = 7 we found a model under the second hypothesis that has nearly equal small mean(sigmas) for all scientists. For this parameter combination we simplified the model: one common mu and one common sigma. Code and output are below.