|SPCH07: Maximum entropy language modeling for speech recognition|
Maximum entropy (ME) modeling is a powerful statistical learning paradigm for classification and prediction tasks. It has been successfully applied to a wide range of domains (econometrics, astrophysics, medicine,...). The power of ME models lies with their inherent flexibility: there are practically no limits concerning the statistical feature set. So knowledge from various sources can easily be integrated. However, ME models generally require considerable computational resources.
In this project we want to incorporate ME into the construction of statistical language models (LMs). Language models are an essential part of speech recognition engines as they determine which word strings are likely in a language and which are not. ME modeling should allow us to introduce advanced linguistic and world knowledge into our language models so as to improve their accuracy and genericity.
The specific aim of this project is the construction of a general ME toolkit, the construction of a ME language model with this toolkit and finally the evaluation of the model in a speech recognition task. Challenges include the efficient implementation of an ME training algorithm and the selection of informative features for the language model. In addition, the student will have to gain insight in the complete speech recognition process.