#### Learning and Cognitive Systems

# BCM-Ch04.2: The Seven Scientists

David MacKay presents in his book 'Information Theory, Inference, and Learning Algorithms' a Nonbayesian exercise, which was reused by Lee & Wagenmakers (2014) for a Bayesian modeling demonstration in WinBUGS. We'll discuss their model and demonstrate that their WinBUGS-modelling approach has shortcomings.

"N datapoints {x(n)} are drawn from N distributions, all of which are Gaussian with a common mean *mu* but with different unknown standard deviations *sigma(n)*. What are the maximum likelihood parameters *mu*, {*sigma(n)*} given the data? For example, seven scientists (A, B, C, D, E, F, G) with wildly-differing experimental skills measure *mu*. You expect some of them to do accurate work (i.e., to have small sigma(n), and some of them to turn in wildly inaccurate answers (i.e.,to have enormous *sigma(n)*. Figure 22.9 shows their seven results. What is *mu*, and how reliable is each scientist? I hope you agree that, intuitively, it looks pretty certain that A and B are both inept measurers, that D-G are better, and that the true value of *mu* is somewhere close to 10. But what does maximizing the likelihood tell you?" (MacKay, David J.C., Information Theory, Inference, and Learning Algorithms, Cambridge University Press 2003, Exercise 22.15, p.309).

We disagree with MacKay about the 'ineptness' of the two scientists A and B. The data could be interpreted in different ways. We think that there two extreme scientific hypotheses: (1) *All* seven scientists have the *same* measurement skills with the *same* noise level. The coincidence of the measurements from scientists D-G is *by chance*. (2) *All* seven scientists have *individual* measurement skills with the *different* noise levels. The coincidence of the measurements from scientists D-G is due *partly by chance* and *partly by their small noise distributions*.

# BCM-Ch04.2: Originally published BUGS-Code for the 'Seven Scientists'

TheLee & Wagenmakers were inspired by MacKay's exercise 22.15: "This problem is from MacKay (2003, p.309) where it is, among other things, treated to a Bayesian solution, but not quite using a graphical modeling approach, nor relying on computational sampling methods" (Lee & Wagenmakers, Bayesian Cognitive Modeling, 2013, p.56). The specification for their WinBUGS code (p.56f and below) was the probabilistic graphical model in Fig. 4.2 (p.56).

Only by inspection without any simulation run we found several issues in the probabilistic graphical model and the BUGS code:

- Graphical model and code are not consistent: the graph contains only
*one*set of plates for the data, but the model has*two*loops over i=1:n. - The datum i depends on the precision parameter
*lambda*(the BUGS developers use the symbol*tau*for this), which has no intuitive semantics in the scale of the measurement compared to the standard deviation sigma. - For precisions near zero the corresponding sigmas tend to approach infinity.
- the model contains with n+1 parameters but only n independent measurements
*more*parameters than measurements: the model seems to be underconstraint.

Model runs were done within OpenBUGS. Code and output are shown below. The results tend to support MacKays intuitions.

The priors for the precision parameter lambda(i) = 1/variance(i) were declared by the authors as lambda(i) ~ dgamma(.001, .001). What is lacking in their book is the meaning of these abstract parameters in terms of the better interpretable parameter sigma of the scientist-specific measurement errors. In a first step we determined the mean and the variance of the precision lambda. For the hyperparameters a=0.001 and b=0.001 of the gamma distribution we calculated the parameters mean(lambda(i)) = a/b = 0.001/0.001 = 1.0, variance(lambda(i)) = a/b^2 = 0.001/0.001^2 = 1000.0, and std(lambda(i)) = sqrt(variance(lambda(i))) = sqrt(a)/b = 31.6228. The next step should be the determination of mean(sigma), var(sigma), and std(sigma) of the *prior* sigma distribution. We could not obtain these values by an OpenBUGS simulation run. These runs were aborted with error messages.

Lee & Wagenmakers did not provide these parameters, too. Instead they write "This raises the issue of whether inference is sensitive to the essentially arbritary value 0.001, and it is sometimes the case that using other small values such as 0.01 or 0.1 leads to more stable sampling in WinBUGS." (Lee & Wagenmakers, 2013, p.57).

But why dealing with precisions instead of standard deviations for using a Gaussian prior? "Since the very first version of BUGS, the normal has been parameterised by the precision tau = 1/sigma^2 rather than the commoner variance sigma^2 or standard deviation sigma. When used as a prior, for example, smaller precisions give vaguer priors. In retrospect this was an unwise decision - although using tau gives tidier expressions for the posterior distributions of the parameters under conjugate priors, it has caused a lot of confusion. However, changing the parameterisation at this stage would be likely to redouble the confusion for existing users!" (Lunn et al., The BUGS Book, 2013, p.343f).

Stunning is that Lee & Wagenmakers do not present and discuss their simulation results.

We do that here for our OpenBUGS simulation and below for our WebCHURCH models. The posterior mean of mu is 9.908 and deviates very substantially from the non-Bayesian sample mean 3.47. The non-Bayesian sample mean is so low because of the two 'outliers' with measurements -27.020 and 3.570. Both could be easily identified without Bayesian analysis from the visual inspection of the data plot alone (MacKay, 2003, Fig. 22.9, p.310). For these two 'bumblers' the precisions lambda[1] and lambda[2] are lowest and their sigmas are highest. Because of this divergence of Bayesian and non-Bayesian means it is evident that the careful selection of priors has an important influence on the Bayesian results.

It is far more difficult to rank the non-outliers with respect of their measurement precision. This could not be done by visual inspection alone. Now, the sigmas and lambdas of the Bayesian model are obligatory. The range of the sigma[i] is between 0.8636 for scientist D and 386.7 for scientist A. Though scientist A is an outlier, such a large standard deviation for his personal posterior distribution of measurement errors is a bit irritating. This would mean that the scientist would have lost complete control over his measurements.

This is the right motivation to remodel the problem in WebCHURCH. We do that in several steps. *First* we examine the question whether it is wise to first sample the precisions lambda[i] and the derive the sigma[i] by the deterministic assignment sigma[i] <- 1/sqrt(lambda[i]). *Second*, we make a small grid search in the parameter space to obtain more plausible parameters for the gamma-priors. *Third*, we study the plausability of two alternative research hypotheses: (1) 'All seven scientists measure the *same* mu with the *same* precision' and (2) 'All seven scientists measure the *same* mu with *different* precisions'.

# BCM-Ch04.2: Preliminary Reflections for Compiling the BUGS Code to WebCHURCH

We could try to compile the BUGS code line by line into WebCHURCH code. But we have to be careful because of the different parameterizations of the Gaussians and the Gammas.

In BUGS the Gaussian distribution *dnorm* is parameterized with the mean *mu* and the precision *tau* (Lee & Wagenmakers use instead of *tau* the term *lambda*). The relation between the precision and the variance is:

*tau*= 1/variance- variance = 1/
*tau*

(Lunn et al, The BUGS Book, 2013, p.343). In WebCHURCH the Gaussian is parameterized more intuitively with *mu* and *std*.

In BUGS the Gamma distribution dgamma(a, b) is parameterized with *a* and *b* so that the *mean* is m = a/b and the *variance* is var = a/b^2. In WebCHURCH you get the mean and variance by replacing b <- 1/b, so the *mean* is m = a*b, the *variance* is var = a*b^2, and the *standard deviation* is std = sqrt(a)*b.

This parameterization in BUGS is the only reason to use in BUGS code the *precision* instead of the more intuitive *standard deviation*. But what are the parameters of the priors of the standard deviations when the hyperparameters of the priors of the precisions are:

- lambda[i] ~ dgamma(0.001, 0.001) ?

According a rough calculation we expect for the gamma-distributed lambda[i] a mean m = a/b = 0.001/0.001 = 1 and var s^2 = a/b^2 = 0.001/0.001^2 = 1000.0. So the std s = sqrt(1000) = 31.62.

But what are the mean and std of the sigmas? An exact answer would require the transformation of the lambda-pdf into the sigma-pdf with the help of Jacobians (e.g. ch 2.4 "Transformation of Random Variables", in: Lunn et al., The BUGS Book, 2013, p.24)."However, transformations are straightforward when using a simulation approach" (Lunn et al., The BUGS Book, 2013, p.24). When trying that we run into new problems.

In the BUGS code we find the statement sigma[i] <- 1/sqrt(lambda[i]). The disturbing thing is that a gamma-distributed variable with mean 1 and such a large standard deviation has many values close or equal to zero, so that the generated sample value of sigma[i] comes close or identical to infinity. We tried to compute in OpenBUGS the priors for lambda ~ dgamma(0.001, 0.001). This was not possible. The same was true for lambda ~ dgamma(0.01, 0.01). Instead we were successful for lambda ~ dgamma(0.1, 0.1) with the same mean(lambda) = 1.0 but smaller std(lambda) = 3.162. The corresponding values for sigma <- 1/sqrt(lambda) are bit mysterious.

We compiled the following BUGS code snippet to WebCHURCH (code and output below):

- lambda ~ dgamma(0.001, 0.001)
- sigma <- 1/sqrt(lambda)

We see that 47% (!) of the sampled lambda values are equal to 0. The mean(lambda) and the std(lambda) are nearly equal to the expected values 1 and 31.62, but mean(sigma) and std(sigma) could due to zero-valued lambdas not be computed. We did the same computations with (code and output below):

- lambda ~ dgamma(0.01, 0.01)
- sigma <- 1/sqrt(lambda)

and (code and output below):

- lambda ~ dgamma(0.1, 0.1)
- sigma <- 1/sqrt(lambda).

Results are shown in the table:

a | b | max(pr) | min(pr) | %(0) | m(pr) | sd(pr) | max(sig) | min(sig) | mean(sig) | sd(sig) |
---|---|---|---|---|---|---|---|---|---|---|

0.1 | 10 | 62.00 | 3.7e-56 | 0 | 1.00 | 3.15 | 5.2e+27 | 0.13 | 8.7e+22 | 2.1e+25 |

0.01 | 100 | 810.42 | 0 | 4.83 | 0.98 | 10.74 | Infinity | 0.04 | Infinity | NaN |

0.001 | 1000 | 3806.01 | 0 | 47.00 | 0.91 | 31.87 | Infinity | 0.02 | Infinity | NaN |

We see that the BUGS precision ~ dgamma(0.001, 0.001) or precision ~ dgamma(0.01, 0.01) which corresponds to WebCHURCH (define precision (gamma 0.001 1000)) or (define precision (gamma 0.01 100)) leads to * improper* priors for sigma. Only the hyperparameters with BUGS precision ~ dgamma(0.1, 0.1) which corresponds to WebCHURCH (define precision (gamma 0.1 10)) (first line in table) leads to a

*prior for sigma. But the range for this prior is far too wide. We know that the sd of the invidual measurement noise is far smaller.*

**proper uninformed**We could not compute in OpenBUGS the priors of sigma for precision ~ dgamma(0.001, 0.001) or precision ~ dgamma(0.01, 0.01). The runs stopped with error messages. Astonishing was that the whole model produced posterior values for mu and sigma[i]. We think that some *'magic hand' or 'expert sytem' *inside BUGS plugs in a proper prior in that cases.

Instead of using the detour called 'precision' one should sample sigmas *directly* as we do in the WebCHURCH model. This is in agreement with a recommendation of the BUGS developers. "It is preferable to construct a prior distribution on a scale on which one has a good interpretation of magnitude, such as standard deviation, rather than one which may be convenient for mathematical purposes but is fairly incomprehensible, such as the logarithm of the precision." (Lunn et al., The BUGS Book, 2013, p.82).

# BCM-Ch04.2: WebCHURCH-code for Single-Sigma-Model of the 'Seven Scientists'

According to our *second* research hypothesis ('All seven scientist measure the *same* mu but with *different* precisions sigma(i) ') we estimated for various combinations of alpha and beta the Bayesian model. For each model we computed the ratio of the smallest to largest mean(sigma) (table below). If this ratio is equal to 1.0 then all seven sigmas are equal. In this case it is evident that the *first* research hypothesis ('All seven scientist measure the *same* mu with *same* precision sigma') is an adequate explanation of the data.

alpha | beta | prior mean m | prior std s | min(mean(sigma)) | max(mean(sigma)) | min-to-max-ratio | posterior mean mu |
---|---|---|---|---|---|---|---|

0.20 | 0.20 | 0.04 | 0.09 | 0.69 | 1.93 | 0.36 | 6.79 |

0,30 | 0,30 | 0.09 | 0.16 | 1.01 | 2.43 | 0.42 | 6.72 |

0.40 | 0.40 | 0.16 | 0.25 | 0.38 | 3.45 | 0.11 | 8.92 |

0.50 | 0.50 | 0.25 | 0.35 | 0.58 | 3.85 | 0.15 | 9.16 |

0.60 | 0.60 | 0.36 | 0.46 | 0.22 | 5.42 | 0.04 | 9.87 |

0.70 | 0.70 | 0.49 | 0.59 | 0.27 | 6.16 | 0.04 | 9.85 |

0.80 | 0.80 | 0.64 | 0.72 | 0.32 | 7.79 | 0.04 | 9.85 |

0.90 | 0.90 | 0.81 | 0.85 | 0.47 | 9.84 | 0.05 | 9.77 |

1.00 | 1.00 | 1.00 | 1.00 | 0.46 | 8.75 | 0.05 | 9.85 |

2.00 | 2.00 | 4.00 | 2.83 | 2.59 | 15.22 | 0.17 | 9.24 |

3.00 | 3.00 | 9.00 | 5.20 | 6.70 | 18.92 | 0.35 | 8.29 |

4.00 | 3.00 | 16.00 | 8.00 | 12.86 | 23.23 | 0.55 | 7.26 |

5.00 | 5.00 | 25.00 | 11.18 | 20.87 | 28.88 | 0.72 | 5.78 |

6.00 | 6.00 | 36.00 | 14.70 | 30.95 | 36.57 | 0.85 | 4.77 |

7.00 | 7.00 | 49.00 | 18.52 | 42.88 | 46.67 | 0.92 | 3.65 |

8.00 | 8.00 | 64.00 | 22.63 | 56.92 | 59.44 | 0.96 | 3.00 |

9.00 | 9.00 | 81.00 | 27.00 | 72.76 | 74.49 | 0.98 | 2.10 |

10.00 | 10.00 | 100.00 | 31.62 | 90.70 | 92.07 | 0.99 | 1.82 |

20.00 | 20.00 | 400.00 | 89.44 | 379.73 | 380.99 | 1.00 | 0.63 |

30.00 | 30.00 | 900.00 | 164.32 | 867.90 | 871.68 | 1.00 | -0.60 |

For alpha = **7** and beta = **7** we found a model under the *second* hypothesis that has nearly equal small mean(sigmas) for all scientists. For this parameter combination we simplified the model: *one common* mu and *one common* sigma. Code and output are below.

The WebCHURCH-code can be found here.

# BCM-Ch04.2: WebCHURCH-code for Multiple-Sigma-Model (a=0.6, b=0.6) of the 'Seven Scientists'

According to our *second* research hypothesis ('*All* seven scientist measure the *same* mu but with *different* precisions sigma(i) ') we estimated for various combinations of alpha and beta the Bayesian model. For each model we computed the ratio of the smallest to largest mean(sigma) (table below). If this ratio is equal to 1.0 then all seven sigmas are equal. In this case it is evident that the *first* research hypothesis ('All seven scientist measure the *same* mu with *same* precision sigma') is an adequate explanation of the data.

alpha | beta | prior mean m | prior std s | min(mean(sigma)) | max(mean(sigma)) | min-to-max-ratio | posterior mean mu |
---|---|---|---|---|---|---|---|

0.20 | 0.20 | 0.04 | 0.09 | 0.69 | 1.93 | 0.36 | 6.79 |

0,30 | 0,30 | 0.09 | 0.16 | 1.01 | 2.43 | 0.42 | 6.72 |

0.40 | 0.40 | 0.16 | 0.25 | 0.38 | 3.45 | 0.11 | 8.92 |

0.50 | 0.50 | 0.25 | 0.35 | 0.58 | 3.85 | 0.15 | 9.16 |

0.60 | 0.60 | 0.36 | 0.46 | 0.22 | 5.42 | 0.04 | 9.87 |

0.70 | 0.70 | 0.49 | 0.59 | 0.27 | 6.16 | 0.04 | 9.85 |

0.80 | 0.80 | 0.64 | 0.72 | 0.32 | 7.79 | 0.04 | 9.85 |

0.90 | 0.90 | 0.81 | 0.85 | 0.47 | 9.84 | 0.05 | 9.77 |

1.00 | 1.00 | 1.00 | 1.00 | 0.46 | 8.75 | 0.05 | 9.85 |

2.00 | 2.00 | 4.00 | 2.83 | 2.59 | 15.22 | 0.17 | 9.24 |

3.00 | 3.00 | 9.00 | 5.20 | 6.70 | 18.92 | 0.35 | 8.29 |

4.00 | 3.00 | 16.00 | 8.00 | 12.86 | 23.23 | 0.55 | 7.26 |

5.00 | 5.00 | 25.00 | 11.18 | 20.87 | 28.88 | 0.72 | 5.78 |

6.00 | 6.00 | 36.00 | 14.70 | 30.95 | 36.57 | 0.85 | 4.77 |

7.00 | 7.00 | 49.00 | 18.52 | 42.88 | 46.67 | 0.92 | 3.65 |

8.00 | 8.00 | 64.00 | 22.63 | 56.92 | 59.44 | 0.96 | 3.00 |

9.00 | 9.00 | 81.00 | 27.00 | 72.76 | 74.49 | 0.98 | 2.10 |

10.00 | 10.00 | 100.00 | 31.62 | 90.70 | 92.07 | 0.99 | 1.82 |

20.00 | 20.00 | 400.00 | 89.44 | 379.73 | 380.99 | 1.00 | 0.63 |

30.00 | 30.00 | 900.00 | 164.32 | 867.90 | 871.68 | 1.00 | -0.60 |

For alpha = **0.6** and beta = **0.6** we found a model under the *second* hypothesis that shows the greatest range of individual measurement precisions for all scientists. Code and output are below.

The WebCHURCH-code can be found <media 112633 - - "SONSTIGES, PCM BCM Ch04, PCM_BCM_Ch04.2_ChurchCode_a_b_0_6, 7.7 KB">here</media>.

# BCM-Ch04.2: WebCHURCH-code for Multiple-Sigma-Model (a=7, b=7) of the 'Seven Scientists'

According to our *second* research hypothesis ('All seven scientist measure the *same* mu but with *different* precisions sigma(i) ') we estimated for various combinations of alpha and beta the Bayesian model. For each model we computed the ratio of the smallest to largest mean(sigma) (table below). If this ratio is equal to 1.0 then all seven sigmas are equal. In this case it is evident that the *first* research hypothesis ('All seven scientist measure the *same* mu with *same* precision sigma') is an adequate explanation of the data.

alpha | beta | prior mean m | prior std s | min(mean(sigma)) | max(mean(sigma)) | min-to-max-ratio | posterior mean mu |
---|---|---|---|---|---|---|---|

0.20 | 0.20 | 0.04 | 0.09 | 0.69 | 1.93 | 0.36 | 6.79 |

0,30 | 0,30 | 0.09 | 0.16 | 1.01 | 2.43 | 0.42 | 6.72 |

0.40 | 0.40 | 0.16 | 0.25 | 0.38 | 3.45 | 0.11 | 8.92 |

0.50 | 0.50 | 0.25 | 0.35 | 0.58 | 3.85 | 0.15 | 9.16 |

0.60 | 0.60 | 0.36 | 0.46 | 0.22 | 5.42 | 0.04 | 9.87 |

0.70 | 0.70 | 0.49 | 0.59 | 0.27 | 6.16 | 0.04 | 9.85 |

0.80 | 0.80 | 0.64 | 0.72 | 0.32 | 7.79 | 0.04 | 9.85 |

0.90 | 0.90 | 0.81 | 0.85 | 0.47 | 9.84 | 0.05 | 9.77 |

1.00 | 1.00 | 1.00 | 1.00 | 0.46 | 8.75 | 0.05 | 9.85 |

2.00 | 2.00 | 4.00 | 2.83 | 2.59 | 15.22 | 0.17 | 9.24 |

3.00 | 3.00 | 9.00 | 5.20 | 6.70 | 18.92 | 0.35 | 8.29 |

4.00 | 3.00 | 16.00 | 8.00 | 12.86 | 23.23 | 0.55 | 7.26 |

5.00 | 5.00 | 25.00 | 11.18 | 20.87 | 28.88 | 0.72 | 5.78 |

6.00 | 6.00 | 36.00 | 14.70 | 30.95 | 36.57 | 0.85 | 4.77 |

7.00 | 7.00 | 49.00 | 18.52 | 42.88 | 46.67 | 0.92 | 3.65 |

8.00 | 8.00 | 64.00 | 22.63 | 56.92 | 59.44 | 0.96 | 3.00 |

9.00 | 9.00 | 81.00 | 27.00 | 72.76 | 74.49 | 0.98 | 2.10 |

10.00 | 10.00 | 100.00 | 31.62 | 90.70 | 92.07 | 0.99 | 1.82 |

20.00 | 20.00 | 400.00 | 89.44 | 379.73 | 380.99 | 1.00 | 0.63 |

30.00 | 30.00 | 900.00 | 164.32 | 867.90 | 871.68 | 1.00 | -0.60 |

For alpha = **7** and beta = **7** we found a model under the *second* hypothesis that has nearly equal small mean(sigmas) for all scientists. For this parameter combination we simplified the model: *one common* mu and *one common* sigma. Code and output are below.