Statistical Learning Theory

by Vapnik, Vladimir N.

Edition: 1st

ISBN13: 9780471030034

ISBN10: 0471030031

Format: Hardcover

Pub. Date: 1998-09-30

Publisher(s): Wiley-Interscience

Upgraded Edition: Click here!

Other versions by this Author

List Price: ~~$268.74~~

Buy New

Usually Ships in 8 - 10 Business Days.

$267.40

Add to Cart

Rent Textbook

Select for Price

Add to Cart

There was a problem. Please try again later.

Used Textbook

We're Sorry
Sold Out

eTextbook

We're Sorry
Not Available

Summary

A comprehensive look at learning and generalization theory. The statistical theory of learning and generalization concerns the problem of choosing desired functions on the basis of empirical data. Highly applicable to a variety of computer science and robotics fields, this book offers lucid coverage of the theory as a whole. Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.

Author Biography

Vladimir Naumovich Vapnik is one of the main developers of the Vapnik-Chervonenkis theory of statistical learning, and the co-inventor of the support vector machine method, and support vector clustering algorithm.

PREFACE

xxi

Introduction: The Problem of Induction and Statistical Inference

(18)

0.1 Learning Paradigm in Statistics

(1)

0.2 Two Approaches to Statistical Inference: Particular (Parametric Inference) and General (Nonparametric Inference)

(2)

0.3 The Paradigm Created by the Parametric Approach

(1)

0.4 Shortcoming of the Parametric Paradigm

(1)

0.5 After the Classical Paradigm

(1)

0.6 The Renaissance

(1)

0.7 The Generalization of the Glivenko-Cantelli-Kolmogorov Theory

(2)

0.8 The Structural Risk Minimization Principle

(1)

0.9 The Main Principle of Inference from a Small Sample Size

(2)

0.10 What This Book is About

(6)

I THEORY OF LEARNING AND GENERALIZATION

(356)

1 Two Approaches to the Learning Problem

(40)

1.1 General Model of Learning from Examples

(2)

1.2 The Problem of Minimizing the Risk Functional from Empirical Data

(3)

1.3 The Problem of Pattern Recognition

(2)

1.4 The Problem of Regression Estimation

(2)

1.5 Problem of Interpreting Results of Indirect Measuring

(2)

1.6 The Problem of Density Estimation (the Fisher-Wald Setting)

(2)

1.7 Induction Principles for Minimizing the Risk Functional on the Basis of Empirical Data

(1)

1.8 Classical Methods for Solving the Function Estimation Problems

(2)

1.9 Identification of Stochastic Objects: Estimation of the Densities and Conditional Densities

(3)

1.9.1 Problem of Density Estimation. Direct Setting

(1)

1.9.2 Problem of Conditional Probability Estimation

(1)

1.9.3 Problem of Conditional Density Estimation

(1)

1.10 The Problem of Solving an Approximately Determined Integral Equation

(1)

1.11 Glivenko-Cantelli Theorem

(5)

1.11.1 Convergence in Probability and Almost Sure Convergence

(2)

1.11.2 Glivenko-Cantelli Theorem

(1)

1.11.3 Three Important Statistical Laws

(2)

1.12 Ill-Posed Problems

(4)

1.13 The Structure of the Learning Theory

(3)

Appendix to Chapter 1: Methods for Solving III-Posed Problems

(8)

A1.1 The Problem of Solving an Operator Equation

(2)

A1.2 Problems Well-Posed in Tikhonov's Sense

(1)

A1.3 The Regularization Method

(1)

A1.3.1 Idea of Regularization Method

(1)

A1.3.2 Main Theorems About the Regularization Method

(4)

2 Estimation of the Probability Measure and Problem of Learning

(20)

2.1 Probability Model of a Random Experiment

(2)

2.2 The Basic Problem of Statistics

(4)

2.2.1 The Basic Problems of Probability and Statistics

(1)

2.2.2 Uniform Convergence of Probability Measure Estimates

(3)

2.3 Conditions for the Uniform Convergence of Estimates to the Unknown Probability Measure

(4)

2.3.1 Structure of Distribution Function

(3)

2.3.2 Estimator that Provides Uniform Convergence

(1)

2.4 Partial Uniform Convergence and Generalization of Glivenko-Cantelli Theorem

(3)

2.4.1 Definition of Partial Uniform Convergence

(2)

2.4.2 Generalization of the Glivenko-Cantelli Problem

(1)

2.5 Minimizing the Risk Functional Under the Condition of Uniform Convergence of Probability Measure Estimates

(2)

2.6 Minimizing the Risk Functional Under the Condition of Partial Uniform Convergence of Probability Measure Estimates

(3)

2.7 Remarks About Modes of Convergence of the Probability Measure Estimates and Statements of the Learning Problem

(2)

3 Conditions for Consistency of Empirical Risk Minimization Principle

(42)

3.1 Classical Definition of Consistency

(3)

3.2 Definition of Strict (Nontrivial) Consistency

(3)

3.2.1 Definition of Strict Consistency for the Pattern Recognition and the Regression Estimation Problems

(2)

3.2.2 Definition of Strict Consistency for the Density Estimation Problem

(1)

3.3 Empirical Processes

(3)

3.3.1 Remark on the Law of Large Numbers and Its Generalization

(2)

3.4 The Key Theorem of Learning Theory (Theorem About Equivalence)

(1)

3.5 Proof of the Key Theorem

(3)

3.6 Strict Consistency of the Maximum Likelihood Method

(1)

3.7 Necessary and Sufficient Conditions for Uniform Convergence of Frequencies to Their Probabilities

(5)

3.7.1 Three Cases of Uniform Convergence

(1)

3.7.2 Conditions of Uniform Convergence in the Simplest Model

(1)

3.7.3 Entropy of a Set of Functions

(2)

3.7.4 Theorem About Uniform Two-Sided Convergence

(1)

3.8 Necessary and Sufficient Conditions for Uniform Convergence of Means to Their Expectations for a Set of Real-Valued Bounded Functions

(2)

3.8.1 Entropy of a Set of Real-Valued Functions

(1)

3.8.2 Theorem About Uniform Two-Sided Convergence

(1)

3.9 Necessary and Sufficient Conditions for Uniform Convergence of Means to Their Expectations for Sets of Unbounded Functions

100

(6)

3.9.1 Proof of Theorem 3.5

101

(5)

3.10 Kant's Problem of Demarcation and Popper's Theory of Nonfalsifiability

106

(2)

3.11 Theorems About Nonfalsifiability

108

(4)

3.11.1 Case of Complete Nonfalsifiability

108

(1)

3.11.2 Theorem About Partial Nonfalsifiability

109

(1)

3.11.3 Theorem About Potential Nonfalsifiability

110

(2)

3.12 Conditions for One-Sided Uniform Convergence and Consistency of the Empirical Risk Minimization Principle

112

(6)

3.13 Three Milestones in Learning Theory

118

(3)

4 Bounds on the Risk for Indicator Loss Functions

121

(62)

4.1 Bounds for the Simplest Model: Pessimistic Case

122

(3)

4.1.1 The Simplest Model

123

(2)

4.2 Bounds for the Simplest Model: Optimistic Case

125

(2)

4.3 Bounds for the Simplest Model: General Case

127

(2)

4.4 The Basic Inequalities: Pessimistic Case

129

(2)

4.5 Proof of Theorem 4.1

131

(6)

4.5.1 The Basic Lemma

131

(1)

4.5.2 Proof of Basic Lemma

132

(2)

4.5.3 The Idea of Proving Theorem 4.1

134

(1)

4.5.4 Proof of Theorem 4.1

135

(2)

4.6 Basic Inequalities: General Case

137

(2)

4.7 Proof of Theorem 4.2

139

(5)

4.8 Main Nonconstructive Bounds

144

(1)

4.9 VC Dimension

145

(5)

4.9.1 The Structure of the Growth Function

145

(3)

4.9.2 Constructive Distribution-Free Bounds on Generalization Ability

148

(1)

4.9.3 Solution of Generalized Glivenko-Cantelli Problem

149

(1)

4.10 Proof of Theorem 4.3

150

(5)

4.11 Example of the VC Dimension of the Different Sets of Functions

155

(5)

4.12 Remarks About the Bounds on the Generalization Ability of Learning Machines

160

(3)

4.13 Bound on Deviation of Frequencies in Two Half-Samples

163

(6)

Appendix to Chapter 4: Lower Bounds on the Risk of the ERM Principle

169

(14)

A4.1 Two Strategies in Statistical Inference

169

(2)

A4.2 Minimax Loss Strategy for Learning Problems

171

(2)

A4.3 Upper Bounds on the Maximal Loss for the Empirical Risk Minimization Principle

173

(1)

A4.3.1 Optimistic Case

173

(1)

A4.3.2 Pessimistic Case

174

(3)

A4.4 Lower Bound for the Minimax Loss Strategy in the Optimistic Case

177

(2)

A4.5 Lower Bound for Minimax Loss Strategy for the Pessimistic Case

179

(4)

5 Bounds on the Risk for Real-Valued Loss Functions

183

(36)

5.1 Bounds for the Simplest Model: Pessimistic Case

183

(3)

5.2 Concepts of Capacity for the Sets of Real-Valued Functions

186

(6)

5.2.1 Nonconstructive Bounds on Generalization for Sets of Real-Valued Functions

186

(2)

5.2.2 The Main Idea

188

(2)

5.2.3 Concepts of Capacity for the Set of Real-Valued Functions

190

(2)

5.3 Bounds for the General Model: Pessimistic Case

192

(2)

5.4 The Basic Inequality

194

(2)

5.4.1 Proof of Theorem 5.2

195

(1)

5.5 Bounds for the General Model: Universal Case

196

(4)

5.5.1 Proof of Theorem 5.3

198

(2)

5.6 Bounds for Uniform Relative Convergence

200

(7)

5.6.1 Proof of Theorem 5.4 for the Case p greater than 2

201

(3)

5.6.2 Proof of Theorem 5.4 for the Case 1 less than p is greater than equal to 2

204

(3)

5.7 Prior Information for the Risk Minimization Problem in Sets of Unbounded Loss Functions

207

(3)

5.8 Bounds on the Risk for Sets of Unbounded Nonnegative Functions

210

(4)

5.9 Sample Selection and the Problem of Outliers

214

(2)

5.10 The Main Results of the Theory of Bounds

216

(3)

6 The Structural Risk Minimization Principle

219

(74)

6.1 The Scheme of the Structural Risk Minimization Induction Principle

219

(5)

6.1.1 Principle of Structural Risk Minimization

221

(3)

6.2 Minimum Description Length and Structural Risk Minimization Inductive Principles

224

(5)

6.2.1 The Idea About the Nature of Random Phenomena

224

(1)

6.2.2 Minimum Description Length Principle for the Pattern Recognition Problem

224

(2)

6.2.3 Bounds for the Minimum Description Length Principle

226

(1)

6.2.4 Structural Risk Minimization for the Simplest Model and Minimum Description Length Principle

227

(1)

6.2.5 The Shortcoming of the Minimum Description Length Principle

228

(1)

6.3 Consistency of the Structural Risk Minimization Principle and Asymptotic Bounds on the Rate of Convergence

229

(8)

6.3.1 Proof of the Theorems

232

(3)

6.3.2 Discussions and Example

235

(2)

6.4 Bounds for the Regression Estimation Problem

237

(9)

6.4.1 The Model of Regression Estimation by Series Expansion

238

(3)

6.4.2 Proof of Theorem 6.4

241

(5)

6.5 The Problem of Approximating Functions

246

(11)

6.5.1 Three Theorems of Classical Approximation Theory

248

(3)

6.5.2 Curse of Dimensionality in Approximation Theory

251

(1)

6.5.3 Problem of Approximation in Learning Theory

252

(2)

6.5.4 The VC Dimension in Approximation Theory

254

(3)

6.6 Problem of Local Risk Minimization

257

(14)

6.6.1 Local Risk Minimization Model

359

(3)

6.6.2 Bounds for the Local Risk Minimization Estimator

262

(3)

6.6.3 Proofs of the Theorems

265

(3)

6.6.4 Structural Risk Minimization Principle for Local Function Estimation

268

(3)

Appendix to Chapter 6: Estimating Functions on the Basis of Indirect Measurements

271

(22)

A6.1 Problems of Estimating the Results of Indirect Measurements

271

(2)

A6.2 Theorems on Estimating Functions Using Indirect Measurements

273

(3)

A6.3 Proofs of the Theorems

276

(1)

A6.3.1 Proofs of Theorem A6.1

276

(5)

A6.3.2 Proofs of Theorem A6.2

281

(2)

A6.3.3 Proof of Problem A6.3

283

(10)

7 Stochastic III-Posed Problems

293

(46)

7.1 Stochastic III-Posed Problems

293

(4)

7.2 Regularization Method for Solving Stochastic III-Posed Problems

297

(2)

7.3 Proofs of the Theorems

299

(6)

7.3.1 Proof of Theorem 7.1

299

(3)

7.3.2 Proof of Theorem 7.2

302

(1)

7.3.3 Proof of Theorem 7.3

303

(2)

7.4 Conditions for Consistency of the Methods of Density Estimation

305

(3)

7.5 Nonparametric Estimators of Density: Estimators Based on Approximations of the Distribution Function by an Empirical Distribution Function

308

(7)

7.5.1 The Parzen Estimators

308

(5)

7.5.2 Projection Estimators

313

(1)

7.5.3 Splines Estimate of the Density. Approximation by Splines of the Odd Order

313

(1)

7.5.4 Spline Estimate of the Density. Approximation by Splines of the Even Order

314

(1)

7.6 Nonclassical Estimators

315

(4)

7.6.1 Estimators for the Distribution Function

315

(1)

7.6.2 Polygon Approximation of Distribution Function

316

(1)

7.6.3 Kernel Density Estimator

316

(2)

7.6.4 Projection Method of the Density Estimator

318

(1)

7.7 Asymptotic Rate of Convergence for Smooth Density Functions

319

(3)

7.8 Proof of Theorem 7.4

322

(5)

7.9 Choosing a Value of Smoothing (Regularization) Parameter for the Problem of Density Estimation

327

(3)

7.10 Estimation of the Ratio of Two Densities

330

(4)

7.10.1 Estimation of Conditional Densities

333

(1)

7.11 Estimation of Ratio of Two Densities on the Line

334

(3)

7.12 Estimation of a Conditional Probability on a Line

337

(2)

8 Estimating the Values of Function at Given Points

339

(36)

8.1 The Scheme of Minimizing the Overall Risk

339

(4)

8.2 The Method of Structural Minimization of the Overall Risk

343

(1)

8.3 Bounds on the Uniform Relative Deviation of Frequencies in Two Subsamples

344

(3)

8.4 A Bound on the Uniform Relative Deviation of Means in Two Subsamples

347

(3)

8.5 Estimation of Values of an Indicator Function in a Class of Linear Decision Rules

350

(5)

8.6 Sample Selection for Estimating the Values of an Indicator Function

355

(4)

8.7 Estimation of Values of a Real Function in the Class of Functions Linear in Their Parameters

359

(3)

8.8 Sample Selection for Estimation of Values of Real-Valued Functions

362

(1)

8.9 Local Algorithms for Estimating Values of an Indicator Function

363

(2)

8.10 Local Algorithms for Estimating Values of a Real-Valued Function

365

(2)

8.11 The Problem of Finding the Best Point in a Given Set

367

(8)

8.11.1 Choice of the Most Probable Representative of the First Class

368

(2)

8.11.2 Choice of the Best Point of a Given Set

370

(5)

II SUPPORT VECTOR ESTIMATION OF FUNCTIONS

375

(196)

9 Perceptrons and Their Generalizations

375

(26)

9.1 Rosenblatt's Perceptron

375

(5)

9.2 Proofs of the Theorems

380

(3)

9.2.1 Proof of Novikoff Theorem

380

(2)

9.2.2 Proof of Theorem 9.3

382

(1)

9.3 Method of Stochastic Approximation and Sigmoid Approximation of Indicator Functions

383

(4)

9.3.1 Method of Stochastic Approximation

384

(1)

9.3.2 Sigmoid Approximations of Indicator Functions

385

(2)

9.4 Method of Potential Functions and Radial Basis Functions

387

(3)

9.4.1 Method of Potential Functions in Asymptotic Learning Theory

388

(1)

9.4.2 Radial Basis Function Method

389

(1)

9.5 Three Theorems of Optimization Theory

390

(5)

9.5.1 Fermat's Theorem (1629)

390

(1)

9.5.2 Lagrange Multipliers Rule (1788)

391

(2)

9.5.3 Kuhn-Tucker Theorem (1951)

393

(2)

9.6 Neural Networks

395

(6)

9.6.1 The Back-Propagation Method

395

(3)

9.6.2 The Back-Propagation Algorithm

398

(1)

9.6.3 Neural Networks for the Regression Estimation Problem

399

(1)

9.6.4 Remarks on the Back-Propagation Method

399

(2)

10 The Support Vector Method for Estimating Indicator Functions

401

(42)

10.1 The Optimal Hyperplane

401

(7)

10.2 The Optimal Hyperplane for Nonseparable Sets

408

(4)

10.2.1 The Hard Margin Generalization of the Optimal Hyperplane

408

(3)

10.2.2 The Basic Solution. Soft Margin Generalization

411

(1)

10.3 Statistical Properties of the Optimal Hyperplane

412

(3)

10.4 Proofs of the Theorems

415

(6)

10.4.1 Proof of Theorem 10.3

415

(1)

10.4.2 Proof of Theorem 10.4

415

(1)

10.4.3 Leave-One-Out Procedure

416

(1)

10.4.4 Proof of Theorem 10.5 and Theorem 9.2

417

(1)

10.4.5 Proof of Theorem 10.6

418

(3)

10.4.6 Proof of Theorem 10.7

421

(1)

10.5 The Idea of the Support Vector Machine

421

(5)

10.5.1 Generalization in High-Dimensional Space

422

(1)

10.5.2 Hilbert-Schmidt Theory and Mercer Theorem

423

(1)

10.5.3 Constructing SV Machines

424

(2)

10.6 One More Approach to the Support Vectors Method

426

(2)

10.6.1 Minimizing the Number of Support Vectors

426

(1)

10.6.2 Generalization for the Nonseparable Case

427

(1)

10.6.3 Linear Optimization Method for SV Machines

427

(1)

10.7 Selection of SV Machine Using Bounds

428

(2)

10.8 Examples of SV Machines for Pattern Recognition

430

(4)

10.8.1 Polynomial Support Vector Machines

430

(1)

10.8.2 Radial Basis Function SV Machines

431

(1)

10.8.3 Two-Layer Neural SV Machines

432

(2)

10.9 Support Vector Method for Transductive Inference

434

(3)

10.10 Multiclass Classification

437

(3)

10.11 Remarks on Generalization of the SV Method

440

(3)

11 The Support Vector Method for Estimating Real-Valued Functions

443

(50)

11.1 Epsilon-Insensitive Loss Functions

443

(2)

11.2 Loss Functions for Robust Estimators

445

(3)

11.3 Minimizing the Risk with Epsilon-Insensitive Loss Functions

448

(6)

11.3.1 Minimizing the Risk for a Fixed Element of the Structure

449

(3)

11.3.2 The Basic Solutions

452

(1)

11.3.3 Solution for the Huber Loss Function

453

(1)

11.4 SV Machines for Function Estimation

454

(6)

11.4.1 Minimizing the Risk for a Fixed Element of the Structure in Feature Space

455

(1)

11.4.2 The Basic Solutions in Feature Space

456

(2)

11.4.3 Solution for Huber Loss Function in Feature Space

458

(1)

11.4.4 Linear Optimization Method

459

(1)

11.4.5 Multi-Kernel Decomposition of Functions

459

(1)

11.5 Constructing Kernels for Estimation of Real-Valued Functions

460

(4)

11.5.1 Kernels Generating Expansion on Polynomials

461

(1)

11.5.2 Constructing Multidimensional Kernels

462

(2)

11.6 Kernels Generating Splines

464

(4)

11.6.1 Spline of Order d with a Finite Number of Knots

464

(1)

11.6.2 Kernels Generating Splines with an Infinite Number of Knots

465

(1)

11.6.3 B(d) Spline Approximations

466

(2)

11.6.4 B(d) Splines with an Infinite Number of Knots

468

(1)

11.7 Kernels Generating Fourier Expansions

468

(3)

11.7.1 Kernels for Regularized Fourier Expansions

469

(2)

11.8 The Support Vector ANOVA Decomposition (SVAD) for Function Approximation and Regression Estimation

471

(2)

11.9 SV Method for Solving Linear Operator Equations

473

(6)

11.9.1 The SV Method

473

(5)

11.9.2 Regularization by Choosing Parameters of Epsilon(i) Insensitivity

478

(1)

11.10 SV Method of Density Estimation

479

(5)

11.10.1 Spline Approximation of a Density

480

(1)

11.10.2 Approximation of a Density with Gaussian Mixture

481

(3)

11.11 Estimation of Conditional Probability and Conditional Density Functions

484

(5)

11.11.1 Estimation of Conditional Probability Functions

484

(4)

11.11.2 Estimation of Conditional Density Functions

488

(1)

11.12 Connections Between the SV Method and Sparse Function Approximation

489

(4)

11.12.1 Reproducing Kernels Hilbert Spaces

490

(1)

11.12.2 Modified Sparse Approximation and its Relation to SV Machines

491

(2)

12 SV Machines for Pattern Recognition

493

(28)

12.1 The Quadratic Optimization Problem

493

(3)

12.1.1 Iterative Procedure for Specifying Support Vectors

494

(2)

13.1.2 Methods for Solving the Reduced Optimization Problem

496

(1)

12.2 Digit Recognition Problem. The U.S. Postal Service Database

496

(10)

12.2.1 Performance for the U.S. Postal Service Database

496

(4)

12.2.2 Some Important Details

500

(3)

12.2.3 Comparison of Performance of the SV Machine with Gaussian Kernel to the Gaussian RBF Network

503

(2)

12.2.4 The Best Results for U.S. Postal Service Database

505

(1)

12.3 Tangent Distance

506

(5)

12.4 Digit Recognition Problem. The NIST Database

511

(3)

12.4.1 Performance for NIST Database

511

(1)

12.4.2 Further Improvement

512

(1)

12.4.3 The Best Results for NIST Database

512

(2)

12.5 Furture Racing

514

(7)

12.5.1 One More Opportunity. The Transductive Inference

518

(3)

13 SV Machines for Function Approximations, Regression Estimation, and Signal Processing

521

(50)

13.1 The Model Selection Problem

521

(9)

13.1.1 Functional for Model Selection Based on the VC Bound

522

(2)

13.1.2 Classical Functionals

524

(1)

13.1.3 Experimental Comparison of Model Selection Methods

525

(1)

13.1.4 The Problem of Feature Selection Has No General Solution

526

(4)

13.2 Structure on the Set of Regularized Linear Functions

530

(13)

13.2.1 The L-Curve Method

532

(2)

13.2.2 The Method of Effective Number of Parameters

534

(2)

13.2.3 The Method of Effective VC Dimension

536

(4)

13.2.4 Experiments on Measuring the Effective VC Dimension

540

(3)

13.3 Function Approximation Using the SV Method

543

(6)

13.3.1 Why Does the Value of Epsilon Control the Number of Support Vectors?

546

(3)

13.4 SV Machine for Regression Estimation

549

(9)

13.4.1 Problem of Data Smoothing

549

(1)

13.4.2 Estimation of Linear Regression Functions

550

(6)

13.4.3 Estimation of Nonlinear Regression Functions

556

(2)

13.5 SV Method for Solving the Position Emission Tomography (PET) Problem

558

(9)

13.5.1 Description of PET

558

(2)

13.5.2 Problem of Solving the Radon Equation

560

(1)

13.5.3 Generalization of the Residual Principle of Solving PET Problems

561

(1)

13.5.4 The Classical Methods of Solving the PET Problem

562

(1)

13.5.5 The SV Method for Solving the PET Problem

563

(4)

13.6 Remark About the SV Method

567

(4)

III STATISTICAL FOUNDATION OF LEARNING THEORY

571

(110)

14 Necessary and Sufficient Conditions for Uniform Convergence of Frequencies to Their Probabilities

571

(26)

14.1 Uniform Convergence of Frequencies to their Probabilities

572

(1)

14.2 Basic Lemma

573

(3)

14.3 Entropy of the Set of Events

576

(2)

14.4 Asymptotic Properties of the Entropy

578

(6)

14.5 Necessary and Sufficient Conditions of Uniform Convergence. Proof of Sufficiency

584

(3)

14.6 Necessary and Sufficient Conditions of Uniform Convergence. Proof of Necessity

587

(5)

14.7 Necessary and Sufficient Conditions. Continuation of Proving Necessity

592

(5)

15 Necessary and Sufficient Conditions for Uniform Convergence of Means to Their Expectations

597

(32)

15.1 Epsilon Entropy

597

(6)

15.1.1 Proof of the Existence of the Limit

600

(1)

15.1.2 Proof of the Convergence of the Sequence

601

(2)

15.2 The Quasicube

603

(5)

15.3 Epsilon-Extension of a Set

608

(2)

15.4 An Auxiliary Lemma

610

(4)

15.5 Necessary and Sufficient Conditions for Uniform Convergence. The Proof of Necessity

614

(4)

15.6 Necessary and Sufficient Conditions for Uniform Convergence. The Proof of Sufficiency

618

(6)

15.7 Corollaries from Theorem 15.1

624

(5)

16 Necessary and Sufficient Conditions for Uniform One-Sided Convergence of Means to Their Expectations

629

(52)

16.1 Introduction

629

(1)

16.2 Maximum Volume Sections

630

(6)

16.3 The Theorem on the Average Logarithm

636

(6)

16.4 Theorem on the Existence of the Corridor

642

(9)

16.5 Theorem on the Existence of Functions Close to the Corridor Boundaries (Theorem on Potential Nonfalsifiability)

651

(9)

16.6 The Necessary Conditions

660

(6)

16.7 The Necessary and Sufficient Conditions

666

(15)

Comments and Bibliographical Remarks

681

(42)

References

723

(10)

Index

733

Statistical Learning Theory

Buy New

Rent Textbook

Used Textbook

eTextbook

Summary

Author Biography

Table of Contents

Digital License