# Neural Information Processing

The goal of structured prediction is to build machine learning models that predict relational information that itself has structure, such as being composed of multiple interrelated parts. These models, which reflect prior knowledge, task-specific relations, and constraints, are used in fields including computer vision, speech recognition, natural language processing, and computational biology. They can carry out such tasks as predicting a natural language sentence, or segmenting an image into meaningful components.

Sparse modeling is a rapidly developing area at the intersection of statistical learning and signal processing, motivated by the age-old statistical problem of selecting a small number of predictive variables in high-dimensional datasets. This collection describes key approaches in sparse modeling, focusing on its applications in fields including neuroscience, computational biology, and computer vision.

The interplay between optimization and machine learning is one of the most important developments in modern computational science. Optimization formulations and methods are proving to be vital in designing algorithms to extract essential knowledge from huge volumes of data. Machine learning, however, is not simply a consumer of optimization technology but a rapidly evolving field that is itself generating new optimization ideas. This book captures the state of the art of the interaction between optimization and machine learning in a way that is accessible to researchers in both fields.

Dataset shift is a common problem in predictive modeling that occurs when the joint distribution of inputs and outputs differs between training and test stages. Covariate shift, a particular case of dataset shift, occurs when only the input distribution changes. Dataset shift is present in most practical applications, for reasons ranging from the bias introduced by experimental design to the irreproducibility of the testing conditions at training time.

The Internet gives us access to a wealth of information in languages we don't understand. The investigation of automated or semi-automated approaches to translation has become a thriving research field with enormous commercial potential. This volume investigates how machine learning techniques can improve statistical machine translation, currently at the forefront of research in the field.

Pervasive and networked computers have dramatically reduced the cost of collecting and distributing large datasets. In this context, machine learning algorithms that scale poorly could simply become irrelevant. We need learning algorithms that scale linearly with the volume of the data while maintaining enough statistical efficiency to outperform algorithms that simply process a random subset of the data.

Machine learning develops intelligent computer systems that are able to generalize from previously seen examples. A new domain of machine learning, in which the prediction must satisfy the additional constraints found in structured data, poses one of machine learning’s greatest challenges: learning functional dependencies between arbitrary input and output domains. This volume presents and analyzes the state of the art in machine learning algorithms and theory in this novel field.

Interest in developing an effective communication interface connecting the human brain and a computer has grown rapidly over the past decade. The brain-computer interface (BCI) would allow humans to operate computers, wheelchairs, prostheses, and other devices, using brain signals only.

Signal processing and neural computation have separately and significantly influenced many disciplines, but the cross-fertilization of the two fields has begun only recently. Research now shows that each has much to teach the other, as we see highly sophisticated kinds of signal processing and elaborate hierachical levels of neural computation performed side by side in the brain. In *New Directions in Statistical Signal Processing*, leading researchers from both signal processing and neural computation present new work that aims to promote interaction between the two disciplines.

Regression and classification methods based on similarity of the input to stored examples have not been widely used in applications involving very large sets of high-dimensional data. Recent advances in computational geometry and machine learning, however, may alleviate the problems in using these methods on large data sets. This volume presents theoretical and practical discussions of nearest-neighbor (NN) methods in machine learning and examines computer vision as an application domain in which the benefit of these advanced methods is often dramatic.

The process of inductive inference—to infer general laws and principles from particular instances—is the basis of statistical modeling, pattern recognition, and machine learning. The Minimum Descriptive Length (MDL) principle, a powerful method of inductive inference, holds that the best explanation, given a limited set of observed data, is the one that permits the greatest compression of the data—that the more we are able to compress the data, the more we learn about the regularities underlying the data.

Neurophysiological, neuroanatomical, and brain imaging studies have helped to shed light on how the brain transforms raw sensory information into a form that is useful for goal-directed behavior. A fundamental question that is seldom addressed by these studies, however, is why the brain uses the types of representations it does and what evolutionary advantage, if any, these representations confer. It is difficult to address such questions directly via animal experiments.

A major problem in modern probabilistic modeling is the huge computational complexity involved in typical calculations with multivariate probability distributions when the number of random variables is large. Because exact computations are infeasible in such cases and Monte Carlo sampling techniques may reach their limits, there is a need for methods that allow for efficient approximate computations. One of the simplest approximations is based on the mean field method, which has a long history in statistical physics.

The concept of large margins is a unifying principle for the analysis of many different approaches to the classification of data from examples, including boosting, mathematical programming, neural networks, and support vector machines. The fact that it is the margin, or confidence level, of a classification—that is, a scale parameter—rather than a raw training error that matters has become a key tool for dealing with classifiers. This book shows how this idea applies to both the theoretical analysis and the design of algorithms.