This book explores minimum divergence methods of statistical
machine learning for estimation, regression, prediction, and so
forth, in which we engage in information geometry to elucidate
their intrinsic properties of the corresponding loss functions,
learning algorithms, and statistical models. One of the most
elementary examples is Gauss's least squares estimator in a linear
regression model, in which the estimator is given by minimization
of the sum of squares between a response vector and a vector of the
linear subspace hulled by explanatory vectors. This is extended to
Fisher's maximum likelihood estimator (MLE) for an exponential
model, in which the estimator is provided by minimization of the
Kullback-Leibler (KL) divergence between a data distribution and a
parametric distribution of the exponential model in an empirical
analogue. Thus, we envisage a geometric interpretation of such
minimization procedures such that a right triangle is kept with
Pythagorean identity in the sense of the KL divergence. This
understanding sublimates a dualistic interplay between a
statistical estimation and model, which requires dual geodesic
paths, called m-geodesic and e-geodesic paths, in a framework of
information geometry. We extend such a dualistic structure of the
MLE and exponential model to that of the minimum divergence
estimator and the maximum entropy model, which is applied to robust
statistics, maximum entropy, density estimation, principal
component analysis, independent component analysis, regression
analysis, manifold learning, boosting algorithm, clustering,
dynamic treatment regimes, and so forth. We consider a variety of
information divergence measures typically including KL divergence
to express departure from one probability distribution to another.
An information divergence is decomposed into the cross-entropy and
the (diagonal) entropy in which the entropy associates with a
generative model as a family of maximum entropy distributions; the
cross entropy associates with a statistical estimation method via
minimization of the empirical analogue based on given data. Thus
any statistical divergence includes an intrinsic object between the
generative model and the estimation method. Typically, KL
divergence leads to the exponential model and the maximum
likelihood estimation. It is shown that any information divergence
leads to a Riemannian metric and a pair of the linear connections
in the framework of information geometry. We focus on a class of
information divergence generated by an increasing and convex
function U, called U-divergence. It is shown that any generator
function U generates the U-entropy and U-divergence, in which there
is a dualistic structure between the U-divergence method and the
maximum U-entropy model. We observe that a specific choice of U
leads to a robust statistical procedure via the minimum
U-divergence method. If U is selected as an exponential function,
then the corresponding U-entropy and U-divergence are reduced to
the Boltzmann-Shanon entropy and the KL divergence; the minimum
U-divergence estimator is equivalent to the MLE. For robust
supervised learning to predict a class label we observe that the
U-boosting algorithm performs well for contamination of mislabel
examples if U is appropriately selected. We present such maximal
U-entropy and minimum U-divergence methods, in particular,
selecting a power function as U to provide flexible performance in
statistical machine learning.
General
Is the information for this product incomplete, wrong or inappropriate?
Let us know about it.
Does this product have an incorrect or missing image?
Send us a new image.
Is this product missing categories?
Add more categories.
Review This Product
No reviews yet - be the first to create one!