Video demo of Cyclops
Timely analysis of activity and operational data is critical for companies to stay competitive. Activity data from a company’s website contains page and content views, searches, and advertisements shown as well as clicked. Operational data includes monitoring data collected from web applications (e.g., request latency) and cluster resources (e.g., CPU usage).
The vast majority of analysis over activity and operational data involves continuous queries. A continuous query Q is a query that is issued once over data D that is constantly updated. Q runs continuously over D and lets users get new results as D changes, without having to issue the same query repeatedly. Continuous queries arise naturally over activity and operational data because of two reasons: (i) the data is generated continuously in the form of append-only streams; (ii) the data has a time component such that recent data is usually more relevant than older data.
The growing interest in continuous queries is reflected by the engineering resources that companies have recently been investing in building continuous query execution platforms. Esper, Storm, and Hadoop are some recent examples of systems that can run continuous queries. Each of these systems is usually designed to work well for a particular type of workload. Thus, there is not a single system that can outperform all other systems for all types of workload.
The number of systems that can run continuous queries poses a number of challenges for application developers and system administrators.
The Cyclops project addresses the challenges above. Cyclops is a management system for executing and optimizing continuous queries. It abstracts out the underlying systems for running continuous queries, by giving users a common interface to create and run continuous queries. It has an optimizer that can select the most appropriate execution plan, which includes the algorithm to use and system to run a given query (see the picture below showing the accuracy of the optimizer's estimated latency for processing an example continuous query under different execution plans vs. the actual latency). A high-level overview of Cyclops' system architecture can be found here.
Shivnath Babu, Associate Professor, Duke Computer Science
Harold Lim, Ph.D., Duke Computer Science
Howard Chung, Undergraduate, Duke Computer Science
On Cyclops' vision:Harold Lim, Yuzhang Han, Shivnath Babu. “How to Fit when No One Size Fits” In Proc. of the Sixth Biennial Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, USA, January 2013. Link