- FLaVoR -- Flexibel Large Vocabulary Recognition: Incorporating Linguistic Knowledge Sources Through a Modular Recogniser Architecture
- 1 October 2002
- 4 years
- Project leader:
- ESAT/PSI speech
Current speech recognition research programmes aim at the recognition of
unconstrained speech input, higher accuracy, less domain dependency and
richer transcription output [EARS].
Yet, the introduction of powerful techniques
necessary to realize these goals is often hampered by the standard,
monolithic recognition framework: as all knowledge sources --lexicon,
acoustic model and language model-- are combined into a single search space,
they must be kept extremely simple. This has particularly inhibited progress
at the linguistic level. Consequently, almost all recognizers still employ
non-optimal linguistic knowledge components such as static lexica
(lexicalization of morphological processes) and N-gram language models.
In this project we deviate from the standard framework and investigate a
novel, flexible speech recognition architecture. We believe that more
sophisticated linguistic models are indispensable in order to meet the
current challenges in speech recognition. Therefore we opt for a framework
which allows for the direct integration of such complex knowledge sources.
The key aspect of the proposed framework consists of splitting up the search
engine into two separate layers. The first layer performs phoneme recognition
and outputs a dense phoneme network, which acts as an interface to the second
layer. In this second layer, the actual word decoding is accomplished by
means of sophisticated probabilistic morpho-phonological and morpho-syntactic
models. These models can be made more complex because the decoupling of the
acoustic-phonemic decoding from the word decoding eliminates most of the
traditional constraints on them.
- A morpho-phonological model for Dutch, capable of deriving all common pronunciation
variants of Dutch words and capable of predicting the word stress.
This model should also be able to cope with word formation processes such as inflection
- A morpho-syntactical model for Dutch, capable of explaining word formation processes
such as inflection and compounding, and capable of describing the word usage in
- A new layered recognition framework which allows for the direct integration of the
above mentioned complex knowledge sources, and which provides rich output (words,
underlaying phonemes, prosodic information, syntactic analysis, ...).
- Detailled project planning:
- See here.