Introduction

The Web of Data is a huge information space with URIs, RDF triples and links among them as first class citizens.

RDF triples that form links (i.e., with subject and object being URIs) enable to create references both within URIs belonging to different the same or different domains. As an example, the triple (dbpedia:Rome, dbpedia-onto:BirthPlaceOf, dbpedia:Enrico_Fermi) creates a link between two resources in DBPedia. On the other hand, the triple (dbpedia:Rome, owl:sameAs, freebase:Rome), defined in DBPedia, states that the resource identified by dbpedia:Rome is the same as the resource defined in a different database (i.e., Freebase) and identified by freebase:Rome. This enable to have access to information about Rome both in DBPedia and Freebase.

For the previous example, it can be noted that URIs in RDF triples enable to link pieces of information not only residing on the same information provider (i.e., having the same URI prefix) but also residing on external information providers. Therefore, it would be interesting to have a language and tool capable of (recursively) “following” chains of URIs with the aim to have access to a large number of pieces of information and information providers. The aim of swget is to provide a set of low-level functionalities to address these aspects.

Specifically, swget implements the NautiLOD language, which offers the following features:

  • it provides a way to recursively navigate and retrieve data sources from the Web of Data
  • it exploits regular expressions to declaratively specify which porting of the Web of Data has to be addressed
  • regular expressions can be intertwined with triggers on the form of ASK-SPARQL queries to control and orient the navigation
  • it enable to command actions during the navigation such as, send me by email a specific piece of information if encountered during the navigation and continue.

swget extends the feature of NautiLOD language with the following additional features:

  • it enables the user to specify limits on the amount of data retrieved. This can be done both in terms of maximum time and/or size (in MBs)
  • for developers it can be seen as a non-interactive commandline tool, to be easily called from scripts and terminals
  • it is also available as a off-the-shelf tool thanks to an intuitive GUI.
Follow

Get every new post delivered to your Inbox.