Stream computing in LARKC

EU Research FP7
-
Start date: 2008-04-01
Length: 24 months
Project abstract
LarKC is an EU FP7 project and aims at developing a platform for reasoning on massive heterogeneous information such as social media data. The role of Politecnico and Cefriel, forming a JRU (Joint research Unit) within LarKC, is addressing Stream Computing, i.e. supporting querying and reasoning tasks on top of RDF Streams. Stream-based data sources such as sensors, feeds, click streams, and stock quotations have become increasingly important in many application domains. Streaming data are received continuously and in real-time, either implicitly ordered by arrival time, or explicitly associated with timestamps. As it is typically impossible to store a stream in its entirety, Data Stream Management Systems (DSMS) allow one to register continuously running queries that return new results as new data flow within the streams. At the same time, reasoning upon very large RDF data collections is advancing fast, and SPARQL has gained the role of standard query language for RDF data. Also, SPARQL engines are now capable of querying integrated repositories and collecting data from multiple sources. Still, the large knowledge bases now accessible via SPARQL are static, and knowledge evolution is not adequately supported. There is a strong need for the convergence of SPARQL engines and stream data management.
Project results
The combination of static RDF data with streaming information leads to stream reasoning, an important step enabling logical reasoning in real time on huge and noisy data streams in order to support the decision process of extremely large numbers of concurrent users. So far, this step has received little attention by the Semantic Web community.
C-SPARQL, which we introduced in 2008, is an extension of SPARQL designed to express continuous queries, i.e., queries registered over both RDF repositories and RDF streams. C-SPARQL queries can be considered as inputs to specialized reasoners that use their knowledge about a domain to make real-time decisions. In such applications, reasoners operate upon knowledge snapshots, which are continuously refreshed by registered queries. It is important to note that, in this view, reasoners can be unaware of time changes and of the existence of streams. We have also explored the use of a reasoner that is aware of the time-dependent nature of data in 2010, proposing an algorithm for the incremental maintenance of snapshots.
Publications:
C-SPARQL, which we introduced in 2008, is an extension of SPARQL designed to express continuous queries, i.e., queries registered over both RDF repositories and RDF streams. C-SPARQL queries can be considered as inputs to specialized reasoners that use their knowledge about a domain to make real-time decisions. In such applications, reasoners operate upon knowledge snapshots, which are continuously refreshed by registered queries. It is important to note that, in this view, reasoners can be unaware of time changes and of the existence of streams. We have also explored the use of a reasoner that is aware of the time-dependent nature of data in 2010, proposing an algorithm for the incremental maintenance of snapshots.
Publications:
- Emanuele Della Valle, Stefano Ceri, Frank van Harmelen, Dieter Fensel. It's a Streaming World! Reasoning upon Rapidly Changing Information. IEEE Intelligent Systems 24(6): 83-89 (2009)
- Emanuele Della Valle, Stefano Ceri, Davide Francesco Barbieri, Daniele Braga, Alessandro Campi: A First Step Towards Stream Reasoning. FIS 2008: 72-81
- Davide Francesco Barbieri, Daniele Braga, Stefano Ceri, Emanuele Della Valle, Michael Grossniklaus: C-SPARQL: a Continuous Query Language for RDF Data Streams. Int. J. Semantic Computing 4(1): 3-25 (2010)
- Davide Francesco Barbieri, Daniele Braga, Stefano Ceri, Emanuele Della Valle, Michael Grossniklaus: Querying RDF streams with C-SPARQL. SIGMOD Record 39(1): 20-26 (2010)
- Davide Francesco Barbieri, Daniele Braga, Stefano Ceri and Michael Grossniklaus. An Execution Environment for C-SPARQL Queries. EDBT 2010
- Davide Francesco Barbieri, Daniele Braga, Stefano Ceri and Emanuele Della Valle and Michael Grossniklaus, Continuous Queries and Real-time Analysis of Social Semantic Data with C-SPARQL, in SDoW 2009 Colocated with ISWC 2009.
- Davide Francesco Barbieri, Daniele Braga, Stefano Ceri, Emanuele Della Valle, Michael Grossniklaus: C-SPARQL: SPARQL for continuous querying. WWW 2009: 1061-1062
- Davide Francesco Barbieri, Daniele Braga, Stefano Ceri, Emanuele Della Valle, Yi Huang, Volker Tresp, Achim Rettinger, Hendrik Wermser: Deductive and Inductive Stream Reasoning for Semantic Social Media Analytics. IEEE Intelligent Systems 25(6): 32-41 (2010)
- Davide Francesco Barbieri, Daniele Braga, Stefano Ceri, Emanuele Della Valle, Michael Grossniklaus. Incremental Reasoning on Streams and Rich Background Knowledge. In. 7th Extended Semantic Web Conference (ESWC 2010)
- Heiner Stuckenschmidt, Stefano Ceri, Emanuele Della Valle and Frank van Harmelen. Towards Expressive Stream Reasoning. Proceedings of the Dagstuhl Seminar on Semantic Aspects of Sensor Networks, 2010.