General Info

2006-11-17: FLaVoR workshop
A one day workshop on flexible architectures for large vocabulary recognition.
Faculty Club, Leuven, Belgium
Please register in time.

Participants
FLaVoR is a project with funding from the Flemish government (IWT) and is jointly carried out by the Katholieke Universiteit Leuven (ESAT/PSI speech group) and the University of Antwerp (CNTS research center).
CNTS ESAT IWT
 

Project info
Title:
FLaVoR -- Flexibel Large Vocabulary Recognition: Incorporating Linguistic Knowledge Sources Through a Modular Recogniser Architecture
Start:
1 October 2002
Duration:
4 years
Project leader:
ESAT/PSI speech
Abstract:
Current speech recognition research programmes aim at the recognition of unconstrained speech input, higher accuracy, less domain dependency and richer transcription output [EARS]. Yet, the introduction of powerful techniques necessary to realize these goals is often hampered by the standard, monolithic recognition framework: as all knowledge sources --lexicon, acoustic model and language model-- are combined into a single search space, they must be kept extremely simple. This has particularly inhibited progress at the linguistic level. Consequently, almost all recognizers still employ non-optimal linguistic knowledge components such as static lexica (lexicalization of morphological processes) and N-gram language models.

In this project we deviate from the standard framework and investigate a novel, flexible speech recognition architecture. We believe that more sophisticated linguistic models are indispensable in order to meet the current challenges in speech recognition. Therefore we opt for a framework which allows for the direct integration of such complex knowledge sources.

The key aspect of the proposed framework consists of splitting up the search engine into two separate layers. The first layer performs phoneme recognition and outputs a dense phoneme network, which acts as an interface to the second layer. In this second layer, the actual word decoding is accomplished by means of sophisticated probabilistic morpho-phonological and morpho-syntactic models. These models can be made more complex because the decoupling of the acoustic-phonemic decoding from the word decoding eliminates most of the traditional constraints on them.

Objectives:
  • A morpho-phonological model for Dutch, capable of deriving all common pronunciation variants of Dutch words and capable of predicting the word stress. This model should also be able to cope with word formation processes such as inflection and compounding.
  • A morpho-syntactical model for Dutch, capable of explaining word formation processes such as inflection and compounding, and capable of describing the word usage in sentences.
  • A new layered recognition framework which allows for the direct integration of the above mentioned complex knowledge sources, and which provides rich output (words, underlaying phonemes, prosodic information, syntactic analysis, ...).
Detailled project planning:
See here.
 

Contact
Prof. Dr. ir. Dirk Van Compernolle
K.U.Leuven - ESAT/PSI
Kasteelpark Arenberg 10
3001 Heverlee
BELGIUM

Tel:  +32-16-32.1055
Fax:  +32-16-32.1723
E-mail:  Dirk.VanCompernolle@esat.kuleuven.ac.be