A Distributed Graph Approach For Retrieving Linked RDF Data Using Supercomputing Systems
thesisposted on 01.02.2019, 00:00 authored by Michael J Lewis
Many RDF data systems are able to perform queries on different types of connected data structures for a scalable range of input. Partitioning techniques, graph algorithms, and mem- ory based indexing schemes have been heavily researched and integrated into different data systems, in order to produce faster query results with increasing data sizes and different query types. The focus of this work is on two types of powerful (top tier performance in aggregate processing capacity and bandwidth capacity) clustered systems to show conditionally, and de- finable, time improvements covering dataset preprocessing and query retrieval. Two different algorithmic approaches are used to evaluate query retrieval. One algorithmic approach utilizes a distributed linked data path indexing system to help retrieve queries, the other approach is graph exploration which is finding the linked data at query time according to the connected query patterns. Graph exploration is a common and effective approach used by a number of large scale proprietary RDF systems. In order to implement and evaluate both approaches, the work, called Mantona is developed. Mantona also makes it possible, through generating a preprocessed file cache-file, the ability to evaluate performance based on the contents of the cache-file and the type of query retrieval algorithm used. This dissertation includes a review of effective RDF query systems and shows the implementation and ramifications of creating a cache-file dataset from which the Mantona experiments are conducted over varied processor sizes and query types.