Quality of Data and Multi-Source Information Systems

ARA Masses de Données 2006-2009 (ANR)

Data quality problems in databases, data warehouses or more generally in multi-source information systems are widely spread in an endemic way on all types of data and in all application domains: commercial data, biomedical data, industrial data, scientific or geographical data. As examples, among the numerous problems encountered in the massive data sets now available (structured or semi-structured data), let's cite data errors, outliers, duplicates, data inconsistencies, missing values, incomplete, uncertain, obsolete, or unreliable data. These problems harm seriously the result of information searching process (even effective) and also the result of data analysis preliminary to any decision-making.

QUADRIS is a project funded by the ARA «Masses de Données» research program from the French Agence Nationale de la Recherche.

The objective of the QUADRIS project (36 months duration) is to solve the various data quality problems that appear when modelling, designing information systems, integrating and querying multi-source information and finally, evaluating multi-source information systems. QUADRIS will provide theoretical solutions validated in real situations on very large data volumes for three representative disciplinary fields in order to solve the multiple problems of data and information system quality.

The multi-disciplinary research project QUADRIS will tackle these problems by organizing research work according to four directive axes:

1. The methodological axis (mainly carried out by the CEDRIC Lab of CNAM) aims at adapting the current methods of conceptual analysis-design, engineering, reverse engineering and migration of multi-source information systems in order to include the evaluation and the control of the various facets of data quality jointly with the evaluation of system quality,
2. The theoretical and technical axis (carried out jointly by IRISA and PRISM Labs) is organized in two objectives: i) proposing metrics, methods and algorithmic approaches to analyze, detect, control and "clean" continuously various data quality problems in multi-source information systems ; ii) reconsidering the multi-source information mediation, integration and optimization of multi-source queries in order to take into account data quality control methods with the adaptive query processing based on the negotiation between the query cost and the quality of the multi-source retrieved data,
3. The technological axis that aims at developing an experimental and original prototype of middleware that is configurable and allows: i) to detect, measure, control and correct the quality of large data volumes for any type of multi-source information systems; ii) to evaluate the quality of a multi-source information system; iii) to study the mediation and integration driven by data quality and the optimization of multi-source queries based on data quality control and system quality control.
4. The applicative axis for which the project QUADRIS will validate the aforementioned research proposals in three application areas that are representative for their huge volumes of data, their complex underlying models and for their numerous and specific data quality problems. These application domains are: the biomedical domain (medical records collected by health professionals (Curie Institute), the commercial domain (data of EDF's Customer Relationship Management - CRM) and the geographical domain (LSIS).
Créé par Administrator le 2007/10/01 14:47

Copyright 2004-2019 XWiki