![]() |
|
The amount of data available to molecular biologists is exploding because of the high-throughput automation made possible by large-scale investments in the pharmaceutical sector. To manage and analyze these mountains of distributed and heterogeneous data, biologists have turned to computer science and data mining and have created the field of bioinformatics. While the basic bioinformatics tasks (maintenance of databases, searches, and sequence comparisons) are classical computer science tasks, new sources of data (data about gene and protein expression data or protein-protein and protein-DNA interaction) and new applications (gene prediction, functional genomics, diagnostics) have moved bioinformatics into the realm of machine learning and data mining. Microarray technology is a new technology to measure the expression of thousands of genes in parallel in cells. It is are a perfect illustration of high-throughput technologies in molecular biology. Microarrays capture information with which we can build models of the dynamics of the genetic mechanisms of cells, of pathologies, or of drug action. From a data-mining perspectives, these tasks can be formulated as visualization, clustering, detection of frequent patterns, discovery of association rules, classification, and modeling of Boolean networks or dynamic neural networks. We present a number of applications we are developing, including discovery of genetic switches in plant genomes, diagnosis of cancer, modeling of the interactions between bacteria, and a generic toolkit for the processing of microarray data.
Bart De Moor and Yves Moreau