# Machine Learning: The Art and Science of Algorithms that Make Sense of Data

Language: English

Pages: 409

ISBN: 1107422221

Format: PDF / Kindle (mobi) / ePub

As one of the most comprehensive machine learning texts around, this book does justice to the field's incredible richness, but without losing sight of the unifying principles. Peter Flach's clear, example-based approach begins by discussing how a spam filter works, which gives an immediate introduction to machine learning in action, with a minimum of technical fuss. Flach provides case studies of increasing complexity and variety with well-chosen examples and illustrations throughout. He covers a wide range of logical, geometric and statistical models and state-of-the-art topics such as matrix factorisation and ROC analysis. Particular attention is paid to the central role played by features. The use of established terminology is balanced with the introduction of new and useful concepts, and summaries of relevant background material are provided with pointers for revision if necessary. These features ensure Machine Learning will set a new standard as an introductory textbook.

class. The key question in machine learning is how to model the relationship between X and Y . The statistician’s approach is to assume that there is some underlying random process that generates the values for these variables, according to a well-deﬁned but unknown probability distribution. We want to use the data to ﬁnd out more about this distribution. Before we look into that, let’s consider how we could use that distribution once we have learned it. Since X is known for a particular instance

further reading . . . . . . . . . . . . . . . . 228 Distance-based models 231 8.1 So many roads. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 8.2 Neighbours and exemplars . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 8.3 Nearest-neighbour classiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . 242 8.4 Distance-based clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 K -means algorithm . . . . . . . . . . . . . . . . .

recognition. Since this is a classiﬁcation task, we need to learn an appropriate classiﬁer from training data. Many different types of classiﬁers exist: linear classiﬁers, Bayesian classiﬁers, distancebased classiﬁers, to name a few. We will refer to these different types as models; they are the subject of Chapters 4–9. Classiﬁcation is just one of a range of possible tasks for which we can learn a model: other tasks that will pass the review in this chapter are class probability estimation and

f (x), after which terms can be rearranged to yield Equation 3.2. 3. Beyond binary classiﬁcation 94 X X X XX X X X X X X X XX X X X X X X Figure 3.3. A dartboard metaphor illustrating the concepts of bias and variance. Each dartboard corresponds to a different learning algorithm, and each dart signiﬁes a different training sample. The top row learning algorithms exhibit low bias, staying close to the bull’s eye (the true function value for a particular x) on average, while the ones on

Formally, a subgroup is a mapping gˆ : D → {true, false} and is learned from a set of labelled examples (x i , l (x i )), where l : X → C is the true labelling function. Note that gˆ is the characteristic function of the set G = {x ∈ D|gˆ (x) = true}, which is called the extension of the subgroup. Note also that we used the given data D rather than the whole instance space X for the domain of a subgroup, since it is a descriptive model. Example 3.11 (Subgroup discovery). Imagine you want to