The Nature of Statistical Learning Theory

by Vapnik, Vladimir Naumovich

Edition: 2nd

ISBN13: 9780387987804

ISBN10: 0387987800

Format: Hardcover

Pub. Date: 1999-12-01

Publisher(s): Springer-Verlag New York Inc

Other versions by this Author

List Price: ~~$169.99~~

Rent Textbook

Select for Price

Add to Cart

There was a problem. Please try again later.

Digital

Online:30 Days access
Downloadable:30 Days

$100.44

Online:60 Days access
Downloadable:60 Days

$133.92

Online:90 Days access
Downloadable:90 Days

$167.40

Online:120 Days access
Downloadable:120 Days

$200.88

Online:180 Days access
Downloadable:180 Days

$217.62

Online:1825 Days access
Downloadable:Lifetime Access

$334.80

*To support the delivery of the digital material to you, a non-refundable digital delivery fee of $3.99 will be charged on each digital item.

$217.62*

Add to Cart

New Textbook

We're Sorry
Sold Out

Used Textbook

We're Sorry
Sold Out

Summary

The aim of this book is to discuss the fundamental ideas which lie behind the statistical theory of learning and generalization. It considers learning as a general problem of function estimation based on empirical data. Omitting proofs and technical details, the author concentrates on discussing the main results of learning theory and their connections to fundamental problems in statistics. These include:* the setting of learning problems based on the model of minimizing the risk functional from empirical data* a comprehensive analysis of the empirical risk minimization principle including necessary and sufficient conditions for its consistency* non-asymptotic bounds for the risk achieved using the empirical risk minimization principle* principles for controlling the generalization ability of learning machines using small sample sizes based on these bounds* the Support Vector methods that control the generalization ability when estimating function using small sample size.The second edition of the book contains three new chapters devoted to further development of the learning theory and SVM techniques. These include:* the theory of direct method of learning based on solving multidimensional integral equations for density, conditional probability, and conditional density estimation* a new inductive principle of learning.Written in a readable and concise style, the book is intended for statisticians, mathematicians, physicists, and computer scientists.Vladimir N. Vapnik is Technology Leader AT&T Labs-Research and Professor of London University. He is one of the founders of statistical learning theory, and the author of seven books published in English, Russian, German, and Chinese.

Preface to the Second Edition

vii

Preface to the First Edition

Introduction: Four Periods in the Research of the Learning Problem

(300)

Rosenblatt's Perceptron (The 1960s)

(6)

Construction of the Fundamentals of Learning Theory (The 1960s--1970s)

(4)

Neural Networks (The 1980s)

(3)

Returning to the Origin (The 1990s)

(3)

Setting of the Learning Problem

(18)

Function Estimation Model

(1)

The Problem of Risk Minimization

(1)

Three Main Learning Problems

(2)

Pattern Recognition

(1)

Regression Estimation

(1)

Density Estimation (Fisher-Wald Setting)

(1)

The General Setting of the Learning Problem

(1)

The Empirical Risk Minimization (ERM) Inductive Principle

(1)

The Four Parts of Learning Theory

(2)

Informal Reasoning and Comments --- 1

(1)

The Classical Paradigm of Solving Learning Problems

(4)

Density Estimation Problem (Maximum Likelihood Method)

(1)

Pattern Recognition (Discriminant Analysis) Problem

(1)

Regression Estimation Model

(1)

Narrowness of the ML Method

(1)

Nonparametric Methods of Density Estimation

(3)

Parzen's Windows

(1)

The Problem of Density Estimation Is Ill-Posed

(2)

Main Principle for Solving Problems Using a Restricted Amount of Information

(1)

Model Minimization of the Risk Based on Empirical Data

(2)

Pattern Recognition

(1)

Regression Estimation

(1)

Density Estimation

(1)

Stochastic Approximation Inference

(2)

Consistency of Learning Processes

(34)

The Classical Definition of Consistency and the Concept of Nontrivial Consistency

(2)

The Key Theorem of Learning Theory

(2)

Remark on the ML Method

(1)

Necessary and Sufficient Conditions for Uniform Two-Sided Convergence

(5)

Remark on Law of Large Numbers and Its Generalization

(1)

Entropy of the Set of Indicator Functions

(1)

Entropy of the Set of Real Functions

(2)

Conditions for Uniform Two-Sided Convergence

(1)

Necessary and Sufficient Conditions for Uniform One-Sided Convergence

(2)

Theory of Nonfalsifiability

(2)

Kant's Problem of Demarcation and Popper's Theory of Nonfalsifiability

(2)

Theorems on Nonfalsifiability

(6)

Case of Complete (Popper's) Nonfalsifiability

(1)

Theorem on Partial Nonfalsifiability

(2)

Theorem on Potential Nonfalsifiability

(3)

Three Milestones in Learning Theory

(5)

Informal Reasoning and Comments --- 2

(1)

The Basic Problems of Probability Theory and Statistics

(3)

Axioms of Probability Theory

(3)

Two Modes of Estimating a Probability Measure

(2)

Strong Mode Estimation of Probability Measures and the Density Estimation Problem

(1)

The Glivenko-Cantelli Theorem and its Generalization

(1)

Mathematical Theory of Induction

(2)

Bounds on the Rate of Convergence of Learning Processes

(24)

The Basic Inequalities

(2)

Generalization for the Set of Real Functions

(3)

The Main Distribution-Independent Bounds

(1)

Bounds on the Generalization Ability of Learning Machines

(2)

The Structure of the Growth Function

(2)

The VC Dimension of a Set of Functions

(3)

Constructive Distribution-Independent Bounds

(2)

The Problem of Constructing Rigorous (Distribution-Dependent) Bounds

(2)

Informal Reasoning and Comments --- 3

(1)

Kolmogorov-Smirnov Distributions

(2)

Racing for the Constant

(1)

Bounds on Empirical Processes

(3)

Controlling the Generalization Ability of Learning Processes

(30)

Structural Risk Minimization (SRM) Inductive Principle

(3)

Asymptotic Analysis of the Rate of Convergence

(2)

The Problem of Function Approximation in Learning Theory

(2)

Examples of Structures for Neural Nets

101

(2)

The Problem of Local Function Estimation

103

(1)

The Minimum Description Length (MDL) and SRM Principles

104

(8)

The MDL Principle

106

(1)

Bounds for the MDL Principle

107

(1)

The SRM and MDL Principles

108

(2)

A Weak Point of the MDL Principle

110

(1)

Informal Reasoning and Comments --- 4

111

(1)

Methods for Solving Ill-Posed Problems

112

(1)

Stochastic Ill-Posed Problems and the Problem of Density Estimation

113

(2)

The Problem of Polynomial Approximation of the Regression

115

(1)

The Problem of Capacity Control

116

(3)

Choosing the Degree of the Polynomial

116

(1)

Choosing the Best Sparse Algebraic Polynomial

117

(1)

Structures on the Set of Trigonometric Polynomials

118

(1)

The Problem of Features Selection

119

(1)

The Problem of Capacity Control and Bayesian Inference

119

(4)

The Bayesian Approach in Learning Theory

119

(2)

Discussion of the Bayesian Approach and Capacity Control Methods

121

(2)

Methods of Pattern Recognition

123

(58)

Why Can Learning Machines Generalize?

123

(2)

Sigmoid Approximation of Indicator Functions

125

(1)

Neural Networks

126

(5)

The Back-Propagation Method

126

(4)

The Back-Propagation Algorithm

130

(1)

Neural Networks for the Regression Estimation Problem

130

(1)

Remarks on the Back-Propagation Method

130

(1)

The Optimal Separating Hyperplane

131

(2)

The Optimal Hyperplane

131

(1)

δ-margin hyperplanes

132

(1)

Constructing the Optimal Hyperplane

133

(5)

Generalization for the Nonseparable Case

136

(2)

Support Vector (SV) Machines

138

(8)

Generalization in High-Dimensional Space

139

(1)

Convolution of the Inner Product

140

(1)

Constructing SV Machines

141

(1)

Examples of SV Machines

141

(5)

Experiments with SV Machines

146

(8)

Example in the Plane

146

(1)

Handwritten Digit Recognition

147

(4)

Some Important Details

151

(3)

Remarks on SV Machines

154

(2)

SVM and Logistic Regression

156

(7)

Logistic Regression

156

(3)

The Risk Function for SVM

159

(1)

The SVMn Approximation of the Logistic Regression

160

(3)

Ensemble of the SVM

163

(8)

The AdaBoost Method

164

(3)

The Ensemble of SVMs

167

(4)

Informal Reasoning and Comments --- 5

171

(1)

The Art of Engineering Versus Formal Inference

171

(3)

Wisdom of Statistical Models

174

(2)

What Can One Learn from Digit Recognition Experiments?

176

(5)

Influence of the Type of Structures and Accuracy of Capacity Control

177

(1)

SRM Principle and the Problem of Feature Construction

178

(1)

Is the Set of Support Vectors a Robust Characteristic of the Data?

179

(2)

Methods of Function Estimation

181

(44)

∈-Insensitive Loss-Function

181

(2)

SVM for Estimating Regression Function

183

(7)

SV Machine with Convolved Inner Product

186

(2)

Solution for Nonlinear Loss Functions

188

(2)

Linear Optimization Method

190

(1)

Constructing Kernels for Estimating Real-Valued Functions

190

(4)

Kernels Generating Expansion on Orthogonal Polynomials

191

(2)

Constructing Multidimensional Kernels

193

(1)

Kernels Generating Splines

194

(2)

Spline of Order d With a Finite Number of Nodes

194

(1)

Kernels Generating Splines With an Infinite Number of Nodes

195

(1)

Kernels Generating Fourier Expansions

196

(2)

Kernels for Regularized Fourier Expansions

197

(1)

The Support Vector ANOVA Decomposition for Function Approximation and Regression Estimation

198

(2)

SVM for Solving Linear Operator Equations

200

(4)

The Support Vector Method

201

(3)

Function Approximation Using the SVM

204

(4)

Why Does the Value of ∈ Control the Number of Support Vectors?

205

(3)

SVM for Regression Estimation

208

(11)

Problem of Data Smoothing

209

(1)

Estimation of Linear Regression Functions

209

(7)

Estimation Nonlinear Regression Functions

216

(3)

Informal Reasoning and Comments --- 6

219

(1)

Loss Functions for the Regression Estimation Problem

219

(2)

Loss Functions for Robust Estimators

221

(2)

Support Vector Regression Machine

223

(2)

Direct Methods in Statistical Learning Theory

225

(42)

Problem of Estimating Densities, Conditional Probabilities, and Conditional Densities

226

(3)

Problem of Density Estimation: Direct Setting

226

(1)

Problem of Conditional Probability Estimation

227

(1)

Problem of Conditional Density Estimation

228

(1)

Solving an Approximately Determined Integral Equation

229

(1)

Glivenko-Cantelli Theorem

230

(3)

Kolmogorov-Smirnov Distribution

232

(1)

Ill-Posed Problems

233

(2)

Three Methods of Solving Ill-Posed Problems

235

(2)

The Residual Principle

236

(1)

Main Assertions of the Theory of Ill-Posed Problems

237

(3)

Deterministic Ill-Posed Problems

237

(1)

Stochastic Ill-Posed Problem

238

(2)

Nonparametric Methods of Density Estimation

240

(4)

Consistency of the Solution of the Density Estimation Problem

240

(1)

The Parzen's Estimators

241

(3)

SVM Solution of the Density Estimation Problem

244

(5)

The SVM Density Estimate: Summary

247

(1)

Comparison of the Parzen's and the SVM methods

248

(1)

Conditional Probability Estimation

249

(7)

Approximately Defined Operator

251

(2)

SVM Method for Conditional Probability Estimation

253

(2)

The SVM Conditional Probability Estimate: Summary

255

(1)

Estimation of Conditional Density and Regression

256

(2)

Remarks

258

(3)

One Can Use a Good Estimate of the Unknown Density

258

(1)

One Can Use Both Labeled (Training) and Unlabeled (Test) Data

259

(1)

Method for Obtaining Sparse Solutions of the Ill-Posed Problems

259

(2)

Informal Reasoning and Comments --- 7

261

(1)

Three Elements of a Scientific Theory

261

(2)

Problem of Density Estimation

262

(1)

Theory of Ill-Posed Problems

262

(1)

Stochastic Ill-Posed Problems

263

(4)

The Vicinal Risk Minimization Principle and the SVMs

267

(24)

The Vicinal Risk Minimization Principle

267

(4)

Hard Vicinity Function

269

(1)

Soft Vicinity Function

270

(1)

VRM Method for the Pattern Recognition Problem

271

(4)

Examples of Vicinal Kernels

275

(4)

Hard Vicinity Functions

276

(3)

Soft Vicinity Functions

279

(1)

Nonsymmetric Vicinities

279

(2)

Generalization for Estimation Real-Valued Functions

281

(3)

Estimating Density and Conditional Density

284

(7)

Estimating a Density Function

284

(1)

Estimating a Conditional Probability Function

285

(1)

Estimating a Conditional Density Function

286

(1)

Estimating a Regression Function

287

(2)

Informal Reasoning and Comments --- 8

289

(2)

Conclusion: What Is Important in Learning Theory?

291

(10)

What Is Important in the Setting of the Problem?

291

(3)

What Is Important in the Theory of Consistency of Learning Processes?

294

(1)

What Is Important in the Theory of Bounds?

295

(1)

What Is Important in the Theory of Controlling the Generalization Ability of Learning Machines?

296

(1)

What Is Important in the Theory for Constructing Learning Algorithms?

297

(1)

What Is the Most Important?

298

(3)

References

301

(10)

Remarks on References

301

(1)