swget

Introduction

The Web of Data is a huge information space with URIs and links among them (in RDF triples) as first class citizens. RDF triples having URIs in the subject and object position define semantic relationships between entities. As an example, the triple (dbpedia:Rome, dbpedia-onto:BirthPlaceOf, dbpedia:Enrico_Fermi) states a semantic relationship between resources belonging to the DBPedia domain. On the other hand, the triple (dbpedia:Rome, owl:sameAs, freebase:Rome), defined in DBPedia, states that the resource identified by dbpedia:Rome is the same as the resource defined in a different database (i.e., Freebase) and identified by freebase:Rome. This enables to have access to information about Rome both from DBPedia and Freebase.

From the previous example, it can be noted that URIs in RDF triples enable to link pieces of information not only residing on the same information provider (i.e., having the same URI prefix) but also residing on external information providers.

It would be interesting to have a language and tool capable of (recursively) “navigating” chains of URIs with the aim to have access to information residing on possibly scattered information providers. In this respect, a central notion is that of dereferenceable URI. Dereferencing a URI amount at obtainining the description (e.g., a set of RDF triples) of the resource identified by a URI. This is achieved by performing a simple HTTP GET.

We defined such a language, called NautiLOD, and provided an implementation of it in swget. The NautiLOD language offers the following features:

it provides a way to recursively navigate and retrieve data sources from the Web of Data
it exploits regular expressions to declaratively specify fragments of the Web of Data
regular expressions can be intertwined with triggers on the form of ASK-SPARQL queries to control and orient the navigation
it enables to command actions during the navigation such as, send me by email a specific piece of information if encountered during the navigation.

NautiLOD has been implemented in swget that offers the following additional features:

it enables the user to specify limits on the amount of data retrieved. This can be done both in terms of maximum time and/or size (in MBs)
for developers it can be seen as a non-interactive commandline tool, to be easily called from scripts and terminals
it is available as a command line tool and with GUI.

swget

Introduction

Share this:

Email Subscription