Seminar on Computational Learning and Adaptation




Knowledge-Lean Word Sense Disambiguation


Ted Pedersen
Department of Computer Science and Engineering
Southern Methodist University
Dallas, TX 75275
pedersen@seas.smu.edu
[joint work with Rebecca Bruce, also of SMU]



Natural language processing applications often require that the meanings of ambiguous words be resolved. Automatic methods of word sense disambiguation are usually dependent on the availability of costly knowledge sources such as manually annotated text, semantic networks, or machine readable dictionaries. This limits the applicability of such approaches to domains where this type of knowledge is already available. This presentation discusses several knowledge-lean alternatives that are able to make word sense distinctions based only on features found in the raw text surrounding the ambiguous words. McQuitty's and Ward's agglomerative clustering algorithms and the EM algorithm are evaluated with respect to their disambiguation accuracy. These results show that (1) McQuitty's algorithm is more accurate when the underlying sense distribution is very skewed while the EM algorithm is more accurate given a somewhat balanced sense distribution and (2) relying on features that occur within 1 or 2 positions of the ambiguous words may be sufficient to attain reasonable levels of disambiguation accuracy.




Date: Thurs., February 5; Time: 4:15-5:30PM; Place: Gates 100


The goal of this seminar is to increase communication among local researchers with interests in computational approaches to learning and adaptation. If you would like to be added to (or removed from) the mailing list, or if you are interested in giving a talk in the seminar, please send email to iba@isle.org.


Return to seminar schedule.