The Nature of Statistical Learning Theory

by
Edition: 2nd
Format: Hardcover
Pub. Date: 1999-12-01
Publisher(s): Springer-Verlag New York Inc
List Price: $169.99

Rent Textbook

Select for Price
There was a problem. Please try again later.

Digital

Rent Digital Options
Online:30 Days access
Downloadable:30 Days
$100.44
Online:60 Days access
Downloadable:60 Days
$133.92
Online:90 Days access
Downloadable:90 Days
$167.40
Online:120 Days access
Downloadable:120 Days
$200.88
Online:180 Days access
Downloadable:180 Days
$217.62
Online:1825 Days access
Downloadable:Lifetime Access
$334.80
*To support the delivery of the digital material to you, a non-refundable digital delivery fee of $3.99 will be charged on each digital item.
$217.62*

New Textbook

We're Sorry
Sold Out

Used Textbook

We're Sorry
Sold Out

Summary

The aim of this book is to discuss the fundamental ideas which lie behind the statistical theory of learning and generalization. It considers learning as a general problem of function estimation based on empirical data. Omitting proofs and technical details, the author concentrates on discussing the main results of learning theory and their connections to fundamental problems in statistics. These include:* the setting of learning problems based on the model of minimizing the risk functional from empirical data* a comprehensive analysis of the empirical risk minimization principle including necessary and sufficient conditions for its consistency* non-asymptotic bounds for the risk achieved using the empirical risk minimization principle* principles for controlling the generalization ability of learning machines using small sample sizes based on these bounds* the Support Vector methods that control the generalization ability when estimating function using small sample size.The second edition of the book contains three new chapters devoted to further development of the learning theory and SVM techniques. These include:* the theory of direct method of learning based on solving multidimensional integral equations for density, conditional probability, and conditional density estimation* a new inductive principle of learning.Written in a readable and concise style, the book is intended for statisticians, mathematicians, physicists, and computer scientists.Vladimir N. Vapnik is Technology Leader AT&T Labs-Research and Professor of London University. He is one of the founders of statistical learning theory, and the author of seven books published in English, Russian, German, and Chinese.

Table of Contents

Preface to the Second Edition vii
Preface to the First Edition ix
Introduction: Four Periods in the Research of the Learning Problem 1(300)
Rosenblatt's Perceptron (The 1960s)
1(6)
Construction of the Fundamentals of Learning Theory (The 1960s--1970s)
7(4)
Neural Networks (The 1980s)
11(3)
Returning to the Origin (The 1990s)
14(3)
Setting of the Learning Problem
17(18)
Function Estimation Model
17(1)
The Problem of Risk Minimization
18(1)
Three Main Learning Problems
18(2)
Pattern Recognition
19(1)
Regression Estimation
19(1)
Density Estimation (Fisher-Wald Setting)
19(1)
The General Setting of the Learning Problem
20(1)
The Empirical Risk Minimization (ERM) Inductive Principle
20(1)
The Four Parts of Learning Theory
21(2)
Informal Reasoning and Comments --- 1
23(1)
The Classical Paradigm of Solving Learning Problems
23(4)
Density Estimation Problem (Maximum Likelihood Method)
24(1)
Pattern Recognition (Discriminant Analysis) Problem
24(1)
Regression Estimation Model
25(1)
Narrowness of the ML Method
26(1)
Nonparametric Methods of Density Estimation
27(3)
Parzen's Windows
27(1)
The Problem of Density Estimation Is Ill-Posed
28(2)
Main Principle for Solving Problems Using a Restricted Amount of Information
30(1)
Model Minimization of the Risk Based on Empirical Data
31(2)
Pattern Recognition
31(1)
Regression Estimation
31(1)
Density Estimation
32(1)
Stochastic Approximation Inference
33(2)
Consistency of Learning Processes
35(34)
The Classical Definition of Consistency and the Concept of Nontrivial Consistency
36(2)
The Key Theorem of Learning Theory
38(2)
Remark on the ML Method
39(1)
Necessary and Sufficient Conditions for Uniform Two-Sided Convergence
40(5)
Remark on Law of Large Numbers and Its Generalization
41(1)
Entropy of the Set of Indicator Functions
42(1)
Entropy of the Set of Real Functions
43(2)
Conditions for Uniform Two-Sided Convergence
45(1)
Necessary and Sufficient Conditions for Uniform One-Sided Convergence
45(2)
Theory of Nonfalsifiability
47(2)
Kant's Problem of Demarcation and Popper's Theory of Nonfalsifiability
47(2)
Theorems on Nonfalsifiability
49(6)
Case of Complete (Popper's) Nonfalsifiability
50(1)
Theorem on Partial Nonfalsifiability
50(2)
Theorem on Potential Nonfalsifiability
52(3)
Three Milestones in Learning Theory
55(5)
Informal Reasoning and Comments --- 2
59(1)
The Basic Problems of Probability Theory and Statistics
60(3)
Axioms of Probability Theory
60(3)
Two Modes of Estimating a Probability Measure
63(2)
Strong Mode Estimation of Probability Measures and the Density Estimation Problem
65(1)
The Glivenko-Cantelli Theorem and its Generalization
66(1)
Mathematical Theory of Induction
67(2)
Bounds on the Rate of Convergence of Learning Processes
69(24)
The Basic Inequalities
70(2)
Generalization for the Set of Real Functions
72(3)
The Main Distribution-Independent Bounds
75(1)
Bounds on the Generalization Ability of Learning Machines
76(2)
The Structure of the Growth Function
78(2)
The VC Dimension of a Set of Functions
80(3)
Constructive Distribution-Independent Bounds
83(2)
The Problem of Constructing Rigorous (Distribution-Dependent) Bounds
85(2)
Informal Reasoning and Comments --- 3
87(1)
Kolmogorov-Smirnov Distributions
87(2)
Racing for the Constant
89(1)
Bounds on Empirical Processes
90(3)
Controlling the Generalization Ability of Learning Processes
93(30)
Structural Risk Minimization (SRM) Inductive Principle
94(3)
Asymptotic Analysis of the Rate of Convergence
97(2)
The Problem of Function Approximation in Learning Theory
99(2)
Examples of Structures for Neural Nets
101(2)
The Problem of Local Function Estimation
103(1)
The Minimum Description Length (MDL) and SRM Principles
104(8)
The MDL Principle
106(1)
Bounds for the MDL Principle
107(1)
The SRM and MDL Principles
108(2)
A Weak Point of the MDL Principle
110(1)
Informal Reasoning and Comments --- 4
111(1)
Methods for Solving Ill-Posed Problems
112(1)
Stochastic Ill-Posed Problems and the Problem of Density Estimation
113(2)
The Problem of Polynomial Approximation of the Regression
115(1)
The Problem of Capacity Control
116(3)
Choosing the Degree of the Polynomial
116(1)
Choosing the Best Sparse Algebraic Polynomial
117(1)
Structures on the Set of Trigonometric Polynomials
118(1)
The Problem of Features Selection
119(1)
The Problem of Capacity Control and Bayesian Inference
119(4)
The Bayesian Approach in Learning Theory
119(2)
Discussion of the Bayesian Approach and Capacity Control Methods
121(2)
Methods of Pattern Recognition
123(58)
Why Can Learning Machines Generalize?
123(2)
Sigmoid Approximation of Indicator Functions
125(1)
Neural Networks
126(5)
The Back-Propagation Method
126(4)
The Back-Propagation Algorithm
130(1)
Neural Networks for the Regression Estimation Problem
130(1)
Remarks on the Back-Propagation Method
130(1)
The Optimal Separating Hyperplane
131(2)
The Optimal Hyperplane
131(1)
δ-margin hyperplanes
132(1)
Constructing the Optimal Hyperplane
133(5)
Generalization for the Nonseparable Case
136(2)
Support Vector (SV) Machines
138(8)
Generalization in High-Dimensional Space
139(1)
Convolution of the Inner Product
140(1)
Constructing SV Machines
141(1)
Examples of SV Machines
141(5)
Experiments with SV Machines
146(8)
Example in the Plane
146(1)
Handwritten Digit Recognition
147(4)
Some Important Details
151(3)
Remarks on SV Machines
154(2)
SVM and Logistic Regression
156(7)
Logistic Regression
156(3)
The Risk Function for SVM
159(1)
The SVMn Approximation of the Logistic Regression
160(3)
Ensemble of the SVM
163(8)
The AdaBoost Method
164(3)
The Ensemble of SVMs
167(4)
Informal Reasoning and Comments --- 5
171(1)
The Art of Engineering Versus Formal Inference
171(3)
Wisdom of Statistical Models
174(2)
What Can One Learn from Digit Recognition Experiments?
176(5)
Influence of the Type of Structures and Accuracy of Capacity Control
177(1)
SRM Principle and the Problem of Feature Construction
178(1)
Is the Set of Support Vectors a Robust Characteristic of the Data?
179(2)
Methods of Function Estimation
181(44)
∈-Insensitive Loss-Function
181(2)
SVM for Estimating Regression Function
183(7)
SV Machine with Convolved Inner Product
186(2)
Solution for Nonlinear Loss Functions
188(2)
Linear Optimization Method
190(1)
Constructing Kernels for Estimating Real-Valued Functions
190(4)
Kernels Generating Expansion on Orthogonal Polynomials
191(2)
Constructing Multidimensional Kernels
193(1)
Kernels Generating Splines
194(2)
Spline of Order d With a Finite Number of Nodes
194(1)
Kernels Generating Splines With an Infinite Number of Nodes
195(1)
Kernels Generating Fourier Expansions
196(2)
Kernels for Regularized Fourier Expansions
197(1)
The Support Vector ANOVA Decomposition for Function Approximation and Regression Estimation
198(2)
SVM for Solving Linear Operator Equations
200(4)
The Support Vector Method
201(3)
Function Approximation Using the SVM
204(4)
Why Does the Value of ∈ Control the Number of Support Vectors?
205(3)
SVM for Regression Estimation
208(11)
Problem of Data Smoothing
209(1)
Estimation of Linear Regression Functions
209(7)
Estimation Nonlinear Regression Functions
216(3)
Informal Reasoning and Comments --- 6
219(1)
Loss Functions for the Regression Estimation Problem
219(2)
Loss Functions for Robust Estimators
221(2)
Support Vector Regression Machine
223(2)
Direct Methods in Statistical Learning Theory
225(42)
Problem of Estimating Densities, Conditional Probabilities, and Conditional Densities
226(3)
Problem of Density Estimation: Direct Setting
226(1)
Problem of Conditional Probability Estimation
227(1)
Problem of Conditional Density Estimation
228(1)
Solving an Approximately Determined Integral Equation
229(1)
Glivenko-Cantelli Theorem
230(3)
Kolmogorov-Smirnov Distribution
232(1)
Ill-Posed Problems
233(2)
Three Methods of Solving Ill-Posed Problems
235(2)
The Residual Principle
236(1)
Main Assertions of the Theory of Ill-Posed Problems
237(3)
Deterministic Ill-Posed Problems
237(1)
Stochastic Ill-Posed Problem
238(2)
Nonparametric Methods of Density Estimation
240(4)
Consistency of the Solution of the Density Estimation Problem
240(1)
The Parzen's Estimators
241(3)
SVM Solution of the Density Estimation Problem
244(5)
The SVM Density Estimate: Summary
247(1)
Comparison of the Parzen's and the SVM methods
248(1)
Conditional Probability Estimation
249(7)
Approximately Defined Operator
251(2)
SVM Method for Conditional Probability Estimation
253(2)
The SVM Conditional Probability Estimate: Summary
255(1)
Estimation of Conditional Density and Regression
256(2)
Remarks
258(3)
One Can Use a Good Estimate of the Unknown Density
258(1)
One Can Use Both Labeled (Training) and Unlabeled (Test) Data
259(1)
Method for Obtaining Sparse Solutions of the Ill-Posed Problems
259(2)
Informal Reasoning and Comments --- 7
261(1)
Three Elements of a Scientific Theory
261(2)
Problem of Density Estimation
262(1)
Theory of Ill-Posed Problems
262(1)
Stochastic Ill-Posed Problems
263(4)
The Vicinal Risk Minimization Principle and the SVMs
267(24)
The Vicinal Risk Minimization Principle
267(4)
Hard Vicinity Function
269(1)
Soft Vicinity Function
270(1)
VRM Method for the Pattern Recognition Problem
271(4)
Examples of Vicinal Kernels
275(4)
Hard Vicinity Functions
276(3)
Soft Vicinity Functions
279(1)
Nonsymmetric Vicinities
279(2)
Generalization for Estimation Real-Valued Functions
281(3)
Estimating Density and Conditional Density
284(7)
Estimating a Density Function
284(1)
Estimating a Conditional Probability Function
285(1)
Estimating a Conditional Density Function
286(1)
Estimating a Regression Function
287(2)
Informal Reasoning and Comments --- 8
289(2)
Conclusion: What Is Important in Learning Theory?
291(10)
What Is Important in the Setting of the Problem?
291(3)
What Is Important in the Theory of Consistency of Learning Processes?
294(1)
What Is Important in the Theory of Bounds?
295(1)
What Is Important in the Theory of Controlling the Generalization Ability of Learning Machines?
296(1)
What Is Important in the Theory for Constructing Learning Algorithms?
297(1)
What Is the Most Important?
298(3)
References 301(10)
Remarks on References
301(1)
References
302(9)
Index 311

An electronic version of this book is available through VitalSource.

This book is viewable on PC, Mac, iPhone, iPad, iPod Touch, and most smartphones.

By purchasing, you will be able to view this book online, as well as download it, for the chosen number of days.

Digital License

You are licensing a digital product for a set duration. Durations are set forth in the product description, with "Lifetime" typically meaning five (5) years of online access and permanent download to a supported device. All licenses are non-transferable.

More details can be found here.

A downloadable version of this book is available through the eCampus Reader or compatible Adobe readers.

Applications are available on iOS, Android, PC, Mac, and Windows Mobile platforms.

Please view the compatibility matrix prior to purchase.