Joint Nobots and Computational Learning Seminar
Probabilistic Clustering of Sequences, Curves,
and Other Non-Vector Data
Padhraic Smyth
Dept. of Information and Computer Science
University of California, Irvine
http://www.ics.uci.edu/~smyth/
Clustering has a long history in exploratory data analysis and
data-driven discovery. Probabilistic model-based clustering focuses on
the use of mixture models as an underlying generative model for
observed data. This talk will focus on one particularly useful aspect
of the model-based approach, namely the ability to generalize from
clustering in vector spaces to clustering sequences, trajectories, and
other `dynamic' data that we commonly observe from individuals and/or
systems. The probabilistic approach solves in a coherent manner the
dual difficulties of (a) how to define distance metrics between
non-vector observations (e.g., sequences of different lengths), and
(b) how to weight different individuals for whom we have different
amounts of data. Illustrative applications include unsupervised
learning of gestures from video data, clustering of individuals based
on Web browsing behavior, modeling of gene expression data, modeling
of cyclone trajectories, clustering locusts based on observed motor
behavior, and clustering of medical patients based on histograms of
red blood cells. A subset of these applications will be discussed,
time permitting. The talk will conclude with a brief discussion of
some apparently useful connections between mixture modeling in this
context and Bayesian hierarchical models.
Date: Thurs., Nov 2
|
Time: 4:15-5:30PM
|
Place: Gates 104
|
Return to the seminar schedule