Enhancing Aggregation over Uncertain Databases

Document Type


Publication Date

Summer 10-28-2015


Queries with aggregation represent an important aspect in database systems. They are widely used in online analytical processing, decision support systems, and data analytics. Aggregate functions usually perform calculations on a set of values of a particular column and return a single summarized value. However, handling aggregate functions becomes a challenge when dealing with uncertain data as there can be an exponential number of possible instances, with potentially different aggregation results for each one. The aim of this paper is to enhance aggregate queries over uncertain databases through a twofold aspect: First, proposing a Probability-Based Aggregation (PBA) technique that considers the probability of each instance in the database. Second, proposing a Probability-Based Entropy (PBE) technique that introduces a new class of aggregate functions to measure the level of uncertainty over databases. Entropy and information gain are two well-known measures stemmed from the information theory but can be used in uncertain databases. The two measures, if used as two aggregate functions in uncertain databases, will allow for more data analytics and mining. Experimental results show that the proposed aggregation technique (PBA) outperforms other similar techniques in terms of precision, recall, uncertainty density, and answer decisiveness. Moreover, using the proposed probabilistic entropy function (PBE) which considers the probability of each instance while calculating the entropy helps in identifying the threshold that gives the maximum information gain