#### Learning and Cognitive Systems

# DAG of Bayesian Network 'Student Model'

Andrew D. Gordon, Thomas A. Henzinger, Aditya V. Nori, and Sriram K. Rajamani. 2014. Probabilistic programming. In *Proceedings of the on Future of Software Engineering* (FOSE 2014). ACM, New York, NY, USA, 167-181. DOI=10.1145/2593882.2593900 http://doi.acm.org/10.1145/2593882.2593900

# Example 6a: PROB-Code for Bayesian Network 'Student Model' with evidence

http://doi.acm.org/10.1145/2593882.2593900"Bayesian Networks can be used to pose and answer conditional queries, and this can be encoded in probabilistic programs using observe statements. For example, we can ask the question P(L | G = g1) which asks for the probability distribution (or the expected value of L, given that we observe G = g1). Such a question can be encoded as a probabilistic program shown left. At line 12 the observe statement observe(g = 1) conditions the value of g to 1. Then, at line 21 the program returns l. Thus, the meaning of the program is equal to P(L | G = g1).

In general, every Bayesian Network can be encoded as an acyclic probabilistic program in a straightforward manner, and conditioning can be modeled using observe statements in the probabilistic program." (Gordon et al., 2014)

Here is a summary of the domains due to the modifications of Gordon et al. (2014):

Val(D) = <d0, d1> = <easy, hard>

Val(I) = <i0, i1> = <non smart, smart>

Val(G) = <**g0, g1**> = <A, **B+C**>

= <excellent, **good+average**>

Val(S) = <s0, s1> = <low score, high score>

Val(L) = <l0, l1>

= <**strong_letter, weak_letter**>

# Ex6a: CHURCH-Code for Bayesian Network 'Student Model' with inference P(L | G=1)

The PROB-code snippet from Gordon et al. is translated by us to a functional CHURCH-program to clarify its semantics. The generative model is contained in the CHURCH function "take-a-sample". What is a bit puzzling is that in example 6a the authors chose with the conditional probability P(L | G=1) a direction of inference which is the same as in the generative Bayesian network. So the to be inferred conditional probability P(L | G=1) = P(Letter | Grade=good+average) could be obtained by a simple look up in the local CPD P(L | G). Despite of this we implemented the CHURCH program with the same inference direction to demonstrate the usefulness of the simple rejection sampling scheme.

Here is a summary of the domains due to the modifications of Gordon et al. (2014):

Val(D) = <d0, d1> = <easy, hard>

Val(I) = <i0, i1> = <non smart, smart>

Val(G) = <**g0, g1**> = <A, **B+C**> = <excellent, **good+average**>

Val(S) = <s0, s1> = <low score, high score>

Val(L) = <l0, l1> = <**strong_letter, weak_letter**>

The number of samples taken was set to 20000 in this run. This number could in principle be increased to get a better precision of estimates. The sampling method used is the simple-to-understand 'forward sampling'. The screen-shot presented was generated by using the PlaySpace environment of WebCHURCH.

The inferred E(L | G=1) = P(L=1 | G=1) is near 0.60, so the verbal interpretation is "If you have a good or average grade, the probability of a weak recommendation letter is approximately 0.60".