Prof. P. Sreenivasa Kumar, Prof. Srikanta Bedathur, and Dr. Jyoti Leeka after the defense talk

Congratulations Dr. Jyoti Leeka!

We have a freshly minted Ph.D. among our midst!

Dr. Jyoti Leeka earned her doctorate after defending her Ph.D. thesis titled Indexing and Query Processing in RDF Quad-Stores, in front of the examination committee consisting of Prof. Sudarshan S. (IIT-B), Prof. P. Sreenivasa Kumar (IIT-M), and Prof. Katja Hose (Aalborg University, Denmark). Hearty Congratulations to Dr. Leeka!

Abstract RDF data management has received a lot of attention in the past decade due to the widespread growth of Semantic Web and Linked Open Data initiatives. RDF data is expressed in the form of triples (as Subject - Predicate - Object), with SPARQL used for querying it. Many novel database systems such as RDF-3X, TripleBit, etc. – store RDF in its native form or within traditional relational storage – have demonstrated their ability to scale to large volumes of RDF content. However, it is increasingly becoming obvious from the knowledge representation applications of RDF that it is equally important to integrate with RDF triples additional information such as source, time and place of occurrence, uncertainty, etc. Consider an RDF fact (BarackObama, isPresidentOf, UnitedStates). While this fact is useful for finding information regarding president of United States, it does not provide sufficient information for answering many challenging questions like what is the temporal validity of this fact?, where did this fact come from?, etc. Annotations like confidence, geolocation, time, etc. can be modeled in RDF through a techniques called reification, which is also a W3C recommendations. Reification, retains the triple nature of RDF and associates annotations using blank nodes.

The focus of this thesis is on database aspects of storing and querying RDF graphs containing annotations like confidence, etc. on RDF triples. In this thesis, we start by developing an RDF database, named RQ-RDF-3X for efficiently querying these RDF graphs containing annotations over native RDF triples. Next, we noticed that more than 62% facts in real-world RDF datasets like YAGO, DBpedia, etc. have numerical object values. Suggesting the use of queries containing ORDER-BY clause on traditional graph pattern queries of SPARQL. State-of-the-art RDF processing systems such as Virtuoso, Jena, etc. handle such queries by first collecting the results and then sorting them in-memory based on the userspecified function, making them not very scalable. In order to efficiently retrieve results of top-𝑘 queries, i.e. queries returning the top-𝑘 results ordered by a user-defined scoring function, we developed a top-k query processing database named Quark-X. In Quark-X we propose indexing and query processing techniques for making top-𝑘 querying efficient. Motivated by the importance of geo-spatial data in critical applications such as emergency response, transportation, agriculture etc. In addition to its widespread use in knowledge bases such as YAGO, WikiData, LinkedGeoData, etc. We developed STREAK, a RDF data management system that is designed to support a wide-range of queries with spatial filters including complex joins, with top-𝑘 queries over spatially enriched databases. While developing STREAK we realized that to make effective use of this rich data, it is crucial to efficiently evaluate queries combining topological and spatial operators – e.g., overlap, distance, etc. – with traditional graph pattern queries of SPARQL. While there have been research efforts for efficient processing of spatial data in RDF/SPARQL, very little effort has gone into building systems that can handle both complex SPARQL queries as well as spatial filters.

Avatar
Srikanta Bedathur
DS Chair of Artificial Intelligence