Text Mining for Bioinformatics

The slides of a modified version of this text mining tutorial at ICDM '01 are available.

Intended Audience

Our goal is to make this tutorial a practical guide for how to use text mining in bioinformatics while at the same time highlighting some of the interesting research issues that arise when mining techniques are applied in bioinformatics. Participants will be able to broaden the set of tools they are comfortable with if they work in bioinformatics (drug discovery, pharmaceutical companies etc). Or they will learn about one of the most exciting areas of application of data discovery and analysis techniques if they are data miners currently working on non-biological problems.

Previous exposure to biology will be helpful, but the tutorial will be accessible to those who have no biology background. We will assume familiarity with basic statistical and probabilistic concepts.

Time

November 29, 2001, Morning

Short Biographies of the Presenters

Russ B. Altman, MD, PhD

Russ Altman is an Associate Professor in the Medical Informatics group at the Stanford University Medical Center. His work is widely recognized for innovating biological research through the use of machine learning. Among the data-intensive research areas he has published on are protein function prediction, microarray and gene expression assay analysis and, more recently, text mining for bioinformatics. Prof. Altman is director of the Biomedical Informatics Training Program at Stanford, Thrust Leader for Molecular Science at the San Diego Supercomputing Center and President of the International Society for Computational Biology.

Hinrich Schütze, PhD

After receiving a Ph.D. in Natural Language Processing from Stanford University in 1995, Hinrich Schütze joined the Xerox Palo Alto Research Center, where he developed a scaleable approach to semantic analysis of natural language based on mining of association data. He then co-founded Outride, a search personalization company, and led the development of personalization software that learns user preferences from surfing behavior. He is author of the best-selling textbook on data-driven natural language processing (with Chris Manning, MIT Press) and of a dozen issued and pending patents. Dr. Schütze is currently CTO of Novation Biosciences, a bioinformatics company focussed on text and data mining of biological data. He is also Consulting Faculty at Stanford.