# Bayesian Reasoning and Machine Learning

## David Barber

Language: English

Pages: 735

ISBN: 0521518148

Format: PDF / Kindle (mobi) / ePub

Machine learning methods extract value from vast data sets quickly and with modest resources. They are established tools in a wide range of industrial applications, including search engines, DNA sequencing, stock market analysis, and robot locomotion, and their use is spreading rapidly. People who know the methods have their choice of rewarding jobs. This hands-on text opens these opportunities to computer science students with modest mathematical backgrounds. It is designed for final-year undergraduates and master's students with limited background in linear algebra and calculus. Comprehensive and coherent, it develops everything from basic reasoning to advanced techniques within the framework of graphical models. Students learn more than a menu of techniques, they develop analytical and problem-solving skills that equip them for the real world. Numerous examples and exercises, both computer based and theoretical, are included in every chapter. Resources for students and instructors, including a MATLAB toolbox, are available online.

Markov networks in Chapter 4. 2.1 Graphs 23 Path, ancestors, descendants A path A → B from node A to node B is a sequence of nodes that connects A to B. That is, a path is of the form A0 , A1 , . . . , An−1 , An , with A0 = A and An = B and each edge (Ak−1 , Ak ), k = 1, . . . , n being in the graph. A directed path is a sequence of nodes which when we follow the direction of the arrows leads us from A to B. In directed graphs, the nodes A such that A → B and B → A are the ancestors of B. The

clique table size (the product of the size of all the state dimensions of the neighbours of node i). See Fig. 6.8 for an example, and Fig. 6.9 for the resulting junction tree. Procedure 6.1 (Variable elimination) In variable elimination, one simply picks any non-deleted node x in the graph, and then adds links to all the neighbours of x. Node x is then deleted. One repeats this until all nodes have been deleted [250]. Definition 6.9 Perfect elimination order Let the n variables in a Markov

variables X1 whose states are revealed before the second decision D2 . Subsequently the set of variables Xt is P arty V isit Rain F riend Uparty Uvisit Figure 7.4 An influence diagram for the Party–Friend problem, Example 7.2. The partial ordering is Party∗ ≺ Rain ≺ Visit∗ ≺ Friend. The dashed link from party to visit is not strictly necessary but retained in order to satisfy the convention that there is a directed path connecting all decision nodes. 7.3 Extending Bayesian networks for

(x1 , x2 , d1 )∗ = ρ3 (x1 , x2 , d1 ) ρ2−3 (x2 )∗ = p(x2 |x1 , d1 ) (7.4.26) and μ3 (x1 , x2 , d1 )∗ = μ3 (x2 , x1 , d1 ) + (7.4.27) p(x3 |x2 , d2 ) u(x2 ) + max u(x3 ) + = max d2 μ2−3 (x2 )∗ ρ2−3 (x2 )∗ x3 d3 p(x4 |x3 , d3 )u(x4 ) . x4 (7.4.28) 7.5 Markov decision processes d1 x1 d2 141 d3 x2 x3 x4 u2 u3 u4 Figure 7.8 Markov decision process. These can be used to model planning problems of the form ‘how do I get to where I want to be incurring the lowest total cost?’.

states that are not observed. This seemingly innocuous extension of the MDP case can lead however to computational difficulties. Let’s consider the situation in Fig. 7.13, and attempt to compute the optimal expected utility based on the sequence of summations and maximisations. The sum over the hidden variables couples all the decisions and observations, meaning that we no longer have a simple chain structure for the remaining maximisations. For a POMDP of length t, this leads to an intractable