Optimal estimators in learning theory

V. N. Temlyakov

doi:10.4064/bc72-0-23

Streszczenie

This paper is a survey of recent results on some problems of supervised learning in the setting formulated by Cucker and Smale. Supervised learning, or learning-from-examples, refers to a process that builds on the base of available data of inputs $x_i$ and outputs $y_i$, $i=1,\dots,m$, a function that best represents the relation between the inputs $x\in X$ and the corresponding outputs $y\in Y$. The goal is to find an estimator $f_{\bf z}$ on the base of given data ${\bf z}:=((x_1,y_1),\dots,(x_m,y_m))$ that approximates well the regression function $f_\rho$ of an unknown Borel probability measure $\rho$ defined on $Z=X\times Y$. We assume that $(x_i,y_i)$, $i=1,\dots,m$, are indepent and distributed according to $\rho$. We discuss a problem of finding optimal (in the sense of order) estimators for different classes $\Theta$ (we assume $f_\rho\in\Theta$). It is known from the previous works that the behavior of the entropy numbers $\epsilon_n(\Theta,B)$ of $\Theta$ in a Banach space $B$ plays an important role in the above problem. The standard way of measuring the error between a target function $f_\rho$ and an estimator $f_{\bf z}$ is to use the $L_2(\rho_X)$ norm ($\rho_X$ is the marginal probability measure on $X$ generated by $\rho$). The usual way in regression theory to evaluate the performance of the estimator $f_{\bf z}$ is by studying its convergence in expectation, i.e. the rate of decay of the quantity $E(\|f_\rho-f_{\bf z}\|^2_{L_2(\rho_X)})$ as the sample size $m$ increases. Here the expectation is taken with respect to the product measure $\rho^m$ defined on $Z^m$. A more accurate and more delicate way of evaluating the performance of $f_{\bf z}$ has been pushed forward in [CS]. In [CS] the authors study the probability distribution function $$ \rho^m\{{\bf z}:\|f_\rho-f_{\bf z}\|_{L_2(\rho_X)}\ge \eta\} $$ instead of the expectation $E(\|f_\rho-f_{\bf z}\|^2_{L_2(\rho_X)})$. In this survey we mainly discuss the optimization problem formulated in terms of the probability distribution function.

Wydawnictwa / Banach Center Publications / Wszystkie tomy

Banach Center Publications