Joint Nobots and Computational Learning Seminar


 
Probabilistic Clustering of Sequences, Curves, and Other Non-Vector Data

Padhraic Smyth
Dept. of Information and Computer Science
University of California, Irvine
http://www.ics.uci.edu/~smyth/

Clustering has a long history in exploratory data analysis and data-driven discovery. Probabilistic model-based clustering focuses on the use of mixture models as an underlying generative model for observed data. This talk will focus on one particularly useful aspect of the model-based approach, namely the ability to generalize from clustering in vector spaces to clustering sequences, trajectories, and other `dynamic' data that we commonly observe from individuals and/or systems. The probabilistic approach solves in a coherent manner the dual difficulties of (a) how to define distance metrics between non-vector observations (e.g., sequences of different lengths), and (b) how to weight different individuals for whom we have different amounts of data. Illustrative applications include unsupervised learning of gestures from video data, clustering of individuals based on Web browsing behavior, modeling of gene expression data, modeling of cyclone trajectories, clustering locusts based on observed motor behavior, and clustering of medical patients based on histograms of red blood cells. A subset of these applications will be discussed, time permitting. The talk will conclude with a brief discussion of some apparently useful connections between mixture modeling in this context and Bayesian hierarchical models.


Date: Thurs., Nov 2

Time: 4:15-5:30PM

Place: Gates 104


Return to the seminar schedule