Commercial Data Mining: Processing, Analysis and Modeling for Predictive Analytics Projects (The Savvy Manager's Guides)
Format: PDF / Kindle (mobi) / ePub
Whether you are brand new to data mining or working on your tenth predictive analytics project, Commercial Data Mining will be there for you as an accessible reference outlining the entire process and related themes. In this book, you'll learn that your organization does not need a huge volume of data or a Fortune 500 budget to generate business using existing information assets. Expert author David Nettleton guides you through the process from beginning to end and covers everything from business objectives to data sources, and selection to analysis and predictive modeling.
Commercial Data Mining includes case studies and practical examples from Nettleton's more than 20 years of commercial experience. Real-world cases covering customer loyalty, cross-selling, and audience prediction in industries including insurance, banking, and media illustrate the concepts and techniques explained throughout the book.
- Illustrates cost-benefit evaluation of potential projects
- Includes vendor-agnostic advice on what to look for in off-the-shelf solutions as well as tips on building your own data mining tools
- Approachable reference can be read from cover to cover by readers of all experience levels
- Includes practical examples and case studies as well as actionable business insights from author's own experience
commodities is a specific genre that is out of the scope of this book. For further reading, see: Azoff, E. M. 1994. Neural Network Time Series Forecasting of Financial Markets. Hoboken, NJ: John Wiley and Sons Ltd. Wikipedia: Stock Market Prediction. See: http://en.wikipedia.org/wiki/Stock market prediction. Major composite indices are barometers of national economies and, often, of the world economic situation. However, certain sectors, such as finance or technology, can perform above or below
same scale must be considered. Or, a non normalized version may be used in part of the analysis phase and then normalization performed before inputting to a predictive model. Distribution of the Values of a Variable The distribution of a variable is related to the normalization process just discussed. It shows how the values of a variable are distributed, from the smallest to the largest. For example, consider the variable “number of visits of a given customer by quarter”: the minimum value
is the time zone (-5 means five hours before Greenwich Mean Time); the fourth is the web page action; the fifth contains the user session ID, actions details, and parameters; and the sixth field contains the action code. The records are ordered chronologically by the second field. Table 5.5 shows six records with four different IP addresses. In the action detail field column, the second record contains the ISBN number 1558607528 and user session ID z21q49j12b95. The user session ID is made up of
variables, “home owner” and “time as customer,” have been incorporated, which were not present in the initial list in Table 6.1. Both variables could easily be obtained by deriving them from existing data. If the home owner information is not available for a significant number of clients, the bank could consider designing and launching a survey questionnaire to obtain it. (See Chapter 3 for details about obtaining data from various sources.) CHAPTER 6 Selection of Variables and Factor
usually stored in a coded form, such as A, B, C, D or 1, 2, 3, 4, to save memory space. Typically, the codes and descriptions are stored in a set of small auxiliary tables related to the main data file by secondary indexes. Table 3.3 shows the data from the survey in Example 2, where customers have the option to identify themselves. All of the data consists of categories except for field three, “info utility,” and the last field, “customer name,” which is free format text. This last type of field