|The increasing amount of electronically available information in society today is undeniable. Examples include the numbers of general web pages, scientific publications, and items in online auctions. From a user's perspective, this trend will lead to information overflow. Moreover, information publishers are compromised by this situation, as users have greater difficulty in identifying useful information.
Publish-subscribe systems can be applied to cope with the reality of information overflow. In these systems, users specify their information interests as subscriptions and, subsequently, only matching information (event messages) is delivered; uninteresting information is filtered out before reaching users. In this dissertation, we consider content-based publish-subscribe systems, a sophisticated example of these systems. They perform the information-filtering task based on the content of provided information. In order to deal with high numbers of subscriptions and frequencies of event messages, publish-subscribe systems are realized as distributed systems. Advertisements---publisher specifications of potential future event messages---are optionally applied in these systems to reduce the internal distribution of subscriptions.
Existing work on content-based publish-subscribe concepts mainly focuses on subscriptions and advertisements as pure conjunctive expressions. Therefore, subscriptions or advertisements using operators other than conjunction need to be canonically converted to disjunctive normal form by these systems. Each conjunctive component is then treated as individual subscription or advertisement. Unfortunately, the size of converted expressions is exponential in the worst case.
In this dissertation, we show that the direct support of general Boolean subscriptions and advertisements improves the time and space efficiency of general-purpose content-based publish-subscribe systems. For this purpose, we develop suitable approaches for the filtering and routing of general Boolean expressions in these systems. Our approaches represent solutions to exactly those components of content-based publish-subscribe systems that currently restrict subscriptions and advertisements to conjunctive expressions.
On the subscription side, we present an effective generic filtering algorithm, and a novel approach to optimize event routing tables, which we call subscription pruning. To support advertisements, we show how to calculate the overlap between subscriptions and advertisements, and introduce the first designated subscription routing optimization, which we refer to as advertisement pruning. We integrate these approaches into our prototype BoP (BOolean Publish-subscribe) which allows for the full support of general Boolean expressions in its filtering and routing components.
In the evaluation part of this dissertation, we empirically analyze our prototypical implementation BoP and compare its algorithms to existing conjunctive solutions. We firstly show that our general-purpose Boolean filtering algorithm is more space- and time-efficient than a general-purpose conjunctive filtering algorithm. Secondly, we illustrate the effectiveness of the subscription pruning routing optimization and compare it to the existing covering optimization approach. Finally, we demonstrate the optimization effect of advertisement pruning while maintaining the existing overlapping relationships in the system.