Tampere International Center for Signal Processing IEEE SP/CAS Chapter of Finland
2009 Workshop on Information Theoretic Methods in Science and Engineering
August 17 - 19, 2009 | Tampere, Finland

Technical Program


Monday August 17th- Room TB 109


09:30-10:30 Plenary Talk: Shun-ichi Amari, RIKEN Brain Science Institute

Divergence Function, Information, Monotonicity, and Information Geometry


10:30-11:00 Coffee break


11:00-12:00  Information Theoretic Methods (invited session)


11:00-11:30 Fabio Benatti, University of Trieste

“Quantum algorithmic complexity”

11:30-12:00 Kenji Yamanishi, University of Tokyo

“Dynamic Model Selection with Its Applications to Data Mining Issues”


12:00-13:30 Lunch break


13:30-14:30 Plenary Talk: Bin Yu, University of California, Berkeley 

Seeking Interpretable Models for High Dimensional Data


14:30-15:00 Cofee break


15:00-17:30  ITM in Engineering (invited session)


15:00-15:30 Petri Myllymäki, University of Helsinki

“A Framework for MDL Clustering”

15:30-16:00 Po-Hsiang Lai, Joseph A. O'Sullivan, Robert Pless, Washington University in St. Louis

“Minimum Description Length for Clustering with Exemplars”

16:00-16:30 Teemu Roos, University of Helsinki

“Recovering sparse models by parameter transformations: applications in Markov models and logistic regression”

16:30-17:00 Jaakko Astola, Radomir Stankovic, Tampere University of Technology

“Covering codes and sparse representations of switching functions”

17:00-17:30 Ciprian Doru Giurcaneanu and Seyed Alireza Razavi, Tampere University of Technology

“Stochastic complexity and its applications in signal processing: some new results”


18:00 Dinner



Tuesday August 18th- Room TB 109


09:00-10:00 Plenary Talk: Vladimir Vovk, University of London

Emergence of objective probabilities with increasing information


10:00-10:30 Coffee break


10:30-12:00   ITM in Statistics and Engineering (invited session)


10:30-11:00 Vela Velupillai, University of Trento

“The Algorithmic Revolution in the Social Sciences: Mathematical Economics, GameTheory and Statistics”

11:00-11:30 Peter Harremoes, CWI Amsterdam

“Regret versus Redundancy for Exponential Families”

11:30-12:00 Sanjoy Mitter, Massachusetts Institute of Technology

“Towards a Unified View of Control and Communication”


12:00-13:30 Lunch break


13:30-14:30 Plenary Talk: Paul Vitanyi, CWI Amsterdam

Similarity by Compression


14:30-15:00 Cofee break


15:00-17:30   ITM in Engineering (invited and contributed session)


15:00-15:30 Alfred M. Bruckstein, Technion 

“An Information Theoretic Approach to Translation Estimation in Image-Based Metrology”

15:30-16:00 Mikhail B. Malioutov,  Northeastern University

“The MDL-principle in testing homogeneity between literary texts”

16:00-16:30 Krista Lagus, Sami Virpioja  – Helsinki University of Technology

“Morpheme segmentation by optimizing two-part MDL codes”

16:30-17:00 Adriana Vasilache, Ioan Tabus, Septimia Sarbu – Nokia Research Center, Tampere University of Technology

“Low bit rate coding with binary Golay codes”

17:00-17:30 Ioan Tabus, Florin Ghido, Yinghua Yang Tampere University of Technology

“Interleaved optimization-quantization for lossless coding”


18:00 Dinner



Wednesday August 19th- Room TB 109


09:00-10:00 Plenary Talk: Jorma Rissanen, Tampere University of Technology

Optimal estimation


10:00-10:30 Coffee break


10:30-12:00   ITM in Statistics (organized by Hannu Oja with partners from University of Dortmund)


10:30-11:00 Hannu Oja – University of Tampere 

“Robust Independent Component Analysis (ICA) based on two scatter matrices”

11:00-11:20 Klaus Nordhausen University of Dortmund

“Multivariate models and first four moments”

11:20-11:40 Daniel Vogel – University of Dortmund

“Robust estimation in Gaussian graphical models”

11:40-12:00 Sami Helle – University of Tampere 

“Experimental Designs for Model Discrimination”


12:00-13:30 Lunch break 


13:30-15:00  ITM in Statistics and Machine Learning (Contributed session)


13:30-14:00 Shigeki Miyake, NTT, Japan

“An information theoretic approach to construction of search system indices”

14:00-14:20 Antti Liski, Ioan Tabus and Reijo Sund, Tampere University of Technology

“Model selection with sNML in logistic regression” 

14:20-14:40 Esa Elovaara and Petri Myllymäki, University of Helsinki

“MDL-based attribute models in naive Bayes classification”

14:40-15:00 Toni Merivuori and Teemu Roos, University of Helsinki

“Some observations on the applicability of normalized compression distance to stemmatology”


15:30-22:00 Free discussions at a sauna by Näsijärvi lake







Shun-ichi Amari, RIKEN Brain Science Institute

Divergence Function, Information Monotonicity and Information Geometry

  A divergence, which measures discrepancy between two points, is used in many problems in information theory, statistics, mathematical programming, computational vision, and neural networks.  The Kullback-Leibler divergence is a typical example which is defined between two probability distributions.  This is generalized to a class of Csiszar f-divergences, which are monotone under information reduction.  The Bregman divergence is another type of divergences, which are used often in optimization and signal processing.  This is a class of divergences having dually flat geometrical structure.

A divergence function gives rise to a geometrical structure to the underlying manifold.  We search for its differential geometrical background.  We show that the class of f-divergences is unique, giving invariant divergences in the manifold of probability distributions and positive measures.  The class of Bregman divergences gives a dually flat geometrical structure.  The Kullback-Leibler divergence is unique, having both invariant and dually flat structure, while the alpha divergences are unique in this sense in the space of positive measures.

We further study geometrical structures of hierarchical statistical models such as multilayer perceptrons and Gaussian mixtures, proving they have geometrical singularities.  Singular structure gives rise to strange behaviors of MDL and AIC.  In the case of dynamics of learning, such a singular model has a Milnor type attractor, which affects speed of learning.  We study its topological property.




Vladimir Vovk, University of London


Emergence of objective probabilities with increasing information


A standard way to measure the success of a sequence of probability forecasts is to look at the length of code resulting from using these forecasts for encoding the sequence of actual outcomes (as in the predictive MDL principle). Suppose we believe that the sequence of forecasts leads to a code length that is not far off the optimum value.

For example, the forecasts may be produced by a universal distribution constructed from a statistical model that we trust, in which case we expect the code length to be close to the stochastic complexity of the sequence of outcomes. Can we say that the individual forecasts have an objective meaning? In some sense, the answer is "yes": any two sequences of forecasts that do not asymptotically agree with each other can be transformed into a third sequence of forecasts that leads to a much smaller code length. This phenomenon has been dubbed "Jeffreys's law" by A.P.Dawid. In this talk I will review several formalizations of Jeffreys's law obtained in probability theory, statistics, machine learning, and algorithmic information theory. Some of these formalizations have been extended to games of predictions different from coding.




Paul Vitanyi, CWI Amsterdam


    Similarity by Compression


We survey a new area of parameter-free similarity distance measures useful in data-mining, pattern recognition, learning and automatic semantics extraction.

Given a family of distances on a set of objects, a distance is universal up to a certain precision for that family if it minorizes every distance in the family between every two objects in the set, up to the stated precision (we do not require the universal distance to be an element of the family).

We consider similarity distances for two types of objects: literal objects that as such contain all of their meaning, like genomes or books, and names for objects.

The latter may have literal embodyments like the first type, but may also be abstract like ``red'' or ``christianity.'' For the first type we consider a family of computable distance measures corresponding to parameters expressing similarity according to particular features between pairs of literal objects. For the second type we consider similarity distances generated by web users corresponding to particular semantic relations between the (names for) the designated objects.

For both families we give universal similarity distance measures, incorporating all particular distance measures in the family. In the first case the universal distance is based on compression and in the second case it is based on Google page counts related to search terms.

In both cases experiments on a massive scale give evidence of the viability of the approaches.




Bin Yu, University of California, Berkeley 


Seeking Interpretable Models for High Dimensional Data


Extracting useful information from high-dimensional data is the focus of today's statistical research and practice. After broad success of statistical machine learning on prediction through regularization, interpretability is gaining attention and sparsity has been used as its proxy.  With the virtues of both regularization and sparsity, L1 penalized minimization (e.g. Lasso) has been very popular recently.

In this talk, I would like to cover both theory and pratcice of L1 penalized minimization.  First, I will give a brief overview of recent theoretical results on model selection consistency (when p>>n) of Lasso and graphical Lasso.  Second, I will present on-going collaborative research with the Gallant Lab at Berkeley on understanding visual pathway.  In particular, sparse models (linear, non-linear, and graphical) have been built to relate natural images to fMRI responses in human primary visual cortex area V1. Issues of model validation will be discussed.




Jorma Rissanen, Helsinki Institute for Information Technology, Technical University of Tampere, and University of London


Optimal estimation


Dropping the "truth" assumption changes the perspective of estimation. We no longer can estimate the "true" parameters in isolation without regard to the other models, but we need to take into account all the models in the family, which define the estimation capacity. It generalizes to interval estimation and defines the estimated interval of optimal size. The center points, in turn, define the set of optimally distinguishable models, which form the relevant models for hypothesis testing and give an idea of the degree and confidence we can have in the selected winner.



Last updated: Thursday, August 13, 2009