2009 Workshop on Information Theoretic Methods in Science and Engineering

August 17 - 19, 2009 | Tampere, Finland

*Monday August 17 ^{th}-
*

09:30-10:30 Plenary Talk: Shun-ichi Amari, **RIKEN Brain Science Institute**

*Divergence Function, Information, Monotonicity, and
Information Geometry*

* *

10:30-11:00 Coffee break

11:00-12:00 Information Theoretic Methods (invited session)

11:00-11:30 Fabio
Benatti, *University of Trieste*

“Quantum algorithmic complexity”

11:30-12:00 Kenji Yamanishi, *University of Tokyo*

“Dynamic Model Selection with Its Applications to Data Mining Issues”

12:00-13:30 Lunch break

13:30-14:30 Plenary Talk: Bin Yu, **University
of California, Berkeley **

*Seeking Interpretable Models for High Dimensional
Data*

14:30-15:00 Cofee break

15:00-17:30 ITM in Engineering (invited session)

15:00-15:30 Petri
Myllymäki, *University of Helsinki*

“A Framework for MDL Clustering”

15:30-16:00 Po-Hsiang Lai, Joseph A. O'Sullivan, Robert Pless, *Washington
University in St. Louis*

“Minimum Description Length for Clustering with Exemplars”

16:00-16:30 Teemu
Roos, *University of Helsinki*

“Recovering sparse models by parameter transformations: applications in Markov models and logistic regression”

16:30-17:00 Jaakko
Astola, Radomir Stankovic*, Tampere University of Technology*

“Covering codes and sparse representations of switching functions”

17:00-17:30 Ciprian
Doru Giurcaneanu and Seyed Alireza Razavi, *Tampere University of Technology*

“Stochastic complexity and its applications in signal processing: some new results”

18:00 Dinner

* *

*Tuesday August 18th**- **Room TB 109*

09:00-10:00 Plenary Talk: Vladimir Vovk, **University
of London**

*Emergence
of objective probabilities with increasing information*

10:00-10:30 Coffee break

10:30-12:00 ITM in Statistics and Engineering (invited session)

10:30-11:00 Vela Velupillai,
*University of Trento*

“The Algorithmic Revolution in the Social Sciences: Mathematical Economics, GameTheory and Statistics”

11:00-11:30 Peter
Harremoes, *CWI Amsterdam*

“Regret versus Redundancy for Exponential Families”

11:30-12:00 Sanjoy
Mitter, *Massachusetts Institute of Technology*

“Towards a Unified View of Control and Communication”

12:00-13:30 Lunch break

13:30-14:30 Plenary Talk: Paul Vitanyi, **CWI
Amsterdam**

*Similarity by Compression*

** **

14:30-15:00 Cofee break

15:00-17:30 ITM in Engineering (invited and contributed session)

15:00-15:30 Alfred
M. Bruckstein, *Technion *

“An Information Theoretic Approach to Translation Estimation in Image-Based Metrology”

15:30-16:00 Mikhail
B. Malioutov, *Northeastern University*

“The MDL-principle in testing homogeneity between literary texts”

16:00-16:30 Krista
Lagus, Sami Virpioja – *Helsinki
University of Technology*

“Morpheme segmentation by optimizing two-part MDL codes”

16:30-17:00 Adriana
Vasilache, Ioan Tabus, Septimia Sarbu – *Nokia Research Center, Tampere
University of Technology*

“Low bit rate coding with binary Golay codes”

17:00-17:30 Ioan
Tabus, Florin Ghido, Yinghua Yang *Tampere University of Technology*

“Interleaved optimization-quantization for lossless coding”

18:00 Dinner

*Wednesday August 19th**- **Room
TB 109*

09:00-10:00 Plenary Talk: Jorma Rissanen, **Tampere
University of Technology**

*Optimal estimation*

10:00-10:30 Coffee break

10:30-12:00 ITM in Statistics (organized by Hannu Oja with partners from University of Dortmund)

10:30-11:00 Hannu
Oja – *University of Tampere*

“Robust Independent Component Analysis (ICA) based on two scatter matrices”

11:00-11:20 Klaus Nordhausen – *University of Dortmund*

“Multivariate models and first four moments”

11:20-11:40 Daniel
Vogel – *University of Dortmund*

“Robust estimation in Gaussian graphical models”

11:40-12:00 Sami
Helle – *University of Tampere*

“Experimental Designs for Model Discrimination”

12:00-13:30 Lunch break

13:30-15:00 ITM in Statistics and Machine Learning (Contributed session)

13:30-14:00 Shigeki
Miyake, *NTT, Japan*

“An information theoretic approach to construction of search system indices”

14:00-14:20 Antti
Liski, Ioan Tabus and Reijo Sund, *Tampere University of Technology*

“Model selection with sNML in logistic regression”

14:20-14:40
Esa Elovaara and Petri Myllymäki, *University of Helsinki*

“MDL-based attribute models in naive Bayes classification”

14:40-15:00 Toni
Merivuori and Teemu Roos, *University of Helsinki*

“Some observations on the applicability of normalized compression distance to stemmatology”

15:30-22:00 Free discussions at a sauna by Näsijärvi lake

Shun-ichi
Amari, **RIKEN Brain Science Institute**

**Divergence Function,
Information Monotonicity and Information Geometry**

A divergence, which measures discrepancy between two points, is used in many problems in information theory, statistics, mathematical programming, computational vision, and neural networks. The Kullback-Leibler divergence is a typical example which is defined between two probability distributions. This is generalized to a class of Csiszar f-divergences, which are monotone under information reduction. The Bregman divergence is another type of divergences, which are used often in optimization and signal processing. This is a class of divergences having dually flat geometrical structure.

A divergence function gives rise to a geometrical structure to the underlying manifold. We search for its differential geometrical background. We show that the class of f-divergences is unique, giving invariant divergences in the manifold of probability distributions and positive measures. The class of Bregman divergences gives a dually flat geometrical structure. The Kullback-Leibler divergence is unique, having both invariant and dually flat structure, while the alpha divergences are unique in this sense in the space of positive measures.

We further study geometrical structures of hierarchical statistical models such as multilayer perceptrons and Gaussian mixtures, proving they have geometrical singularities. Singular structure gives rise to strange behaviors of MDL and AIC. In the case of dynamics of learning, such a singular model has a Milnor type attractor, which affects speed of learning. We study its topological property.

Vladimir Vovk, **University of London**

** **

**Emergence of objective probabilities with
increasing information**

A standard way to measure the success of a sequence of probability forecasts is to look at the length of code resulting from using these forecasts for encoding the sequence of actual outcomes (as in the predictive MDL principle). Suppose we believe that the sequence of forecasts leads to a code length that is not far off the optimum value.

For example, the forecasts may be produced by a universal distribution constructed from a statistical model that we trust, in which case we expect the code length to be close to the stochastic complexity of the sequence of outcomes. Can we say that the individual forecasts have an objective meaning? In some sense, the answer is "yes": any two sequences of forecasts that do not asymptotically agree with each other can be transformed into a third sequence of forecasts that leads to a much smaller code length. This phenomenon has been dubbed "Jeffreys's law" by A.P.Dawid. In this talk I will review several formalizations of Jeffreys's law obtained in probability theory, statistics, machine learning, and algorithmic information theory. Some of these formalizations have been extended to games of predictions different from coding.

Paul Vitanyi, **CWI Amsterdam**

** **

**Similarity
by Compression**

We survey a new area of parameter-free similarity distance measures useful in data-mining, pattern recognition, learning and automatic semantics extraction.

Given a family of distances on a set of objects, a distance is universal up to a certain precision for that family if it minorizes every distance in the family between every two objects in the set, up to the stated precision (we do not require the universal distance to be an element of the family).

We consider similarity distances for two types of objects: literal objects that as such contain all of their meaning, like genomes or books, and names for objects.

The latter may have literal embodyments like the first type, but may also be abstract like ``red'' or ``christianity.'' For the first type we consider a family of computable distance measures corresponding to parameters expressing similarity according to particular features between pairs of literal objects. For the second type we consider similarity distances generated by web users corresponding to particular semantic relations between the (names for) the designated objects.

For both families we give universal similarity distance measures, incorporating all particular distance measures in the family. In the first case the universal distance is based on compression and in the second case it is based on Google page counts related to search terms.

In both cases experiments on a massive scale give evidence of the viability of the approaches.

** **

** **

Bin Yu, **University of California, Berkeley **

** **

**Seeking Interpretable Models for High Dimensional
Data**

Extracting useful information from high-dimensional data is the focus of today's statistical research and practice. After broad success of statistical machine learning on prediction through regularization, interpretability is gaining attention and sparsity has been used as its proxy. With the virtues of both regularization and sparsity, L1 penalized minimization (e.g. Lasso) has been very popular recently.

In this talk, I would like to cover both theory and pratcice of L1 penalized minimization. First, I will give a brief overview of recent theoretical results on model selection consistency (when p>>n) of Lasso and graphical Lasso. Second, I will present on-going collaborative research with the Gallant Lab at Berkeley on understanding visual pathway. In particular, sparse models (linear, non-linear, and graphical) have been built to relate natural images to fMRI responses in human primary visual cortex area V1. Issues of model validation will be discussed.

Jorma Rissanen, **Helsinki Institute for Information Technology, Technical
University of Tampere, and University of London**

**Optimal estimation**

** **

Dropping the "truth" assumption changes the perspective of estimation. We no longer can estimate the "true" parameters in isolation without regard to the other models, but we need to take into account all the models in the family, which define the estimation capacity. It generalizes to interval estimation and defines the estimated interval of optimal size. The center points, in turn, define the set of optimally distinguishable models, which form the relevant models for hypothesis testing and give an idea of the degree and confidence we can have in the selected winner.

Last updated: Thursday, August 13, 2009