Optimizing Approximations of DNF Query Lineage in Probabilistic XML

Abstract :

Probabilistic XML is a probabilistic model for uncertain tree-structured data, with applications to data integration, information extraction, or uncertain version control. We explore in this work efficient algorithms for evaluating tree-pattern queries with joins over probabilistic XML or, more specifically, for listing the answers to a query along with their computed or approximated probability. The approach relies on, first, producing the lineage query by evaluating it over the probabilistic XML document, and, second, looking for an optimal strategy to compute the probability of the lineage formula. This latter part relies on a query-optimizer–like approach: exploring different evaluation plans for different parts of the formula and estimating the cost of each plan, using a cost model for the various evaluation algorithms. We demonstrate the efficiency of this approach on datasets used in previous research on probabilistic XML querying, as well as on synthetic data. We also compare the performance of our query engine with EvalDP, Trio, and MayBMS/SPROUT.

Type de document :
Communication dans un congrès
ICDE (International Conference on Data Engineering), Apr 2013, Brisbane, Australia. pp.721-732, 2013
Liste complète des métadonnées

https://hal-imt.archives-ouvertes.fr/hal-00874442
Contributeur : Admin Télécom Paristech <>
Soumis le : jeudi 17 octobre 2013 - 18:23:19
Dernière modification le : samedi 3 mars 2018 - 15:12:01

Identifiants

  • HAL Id : hal-00874442, version 1

Citation

Asma Souihli, Pierre Senellart. Optimizing Approximations of DNF Query Lineage in Probabilistic XML. ICDE (International Conference on Data Engineering), Apr 2013, Brisbane, Australia. pp.721-732, 2013. 〈hal-00874442〉

Partager

Métriques

Consultations de la notice

77