Introduction
The Web of Data is a huge information space with URIs, RDF triples and links among them as first class citizens.
RDF triples that form links (i.e., with subject and object being URIs) enable to create references between URIs belonging to the same or different domain. As an example, the triple (dbpedia:Rome, dbpedia-onto:BirthPlaceOf, dbpedia:Enrico_Fermi) creates a link between two resources in DBPedia. On the other hand, the triple (dbpedia:Rome, owl:sameAs, freebase:Rome), defined in DBPedia, states that the resource identified by dbpedia:Rome is the same as the resource defined in a different database (i.e., Freebase) and identified by freebase:Rome. This enables to have access to information about Rome both from DBPedia and Freebase.
From the previous example, it can be noted that URIs in RDF triples enable to link pieces of information not only residing on the same information provider (i.e., having the same URI prefix) but also residing on external information providers. Therefore, it would be interesting to have a language and tool capable of (recursively) “following” chains of URIs with the aim to have access to a large number of pieces of information and information providers.
We defined such a language, called NautiLOD, and provided an implementation of it in swget.
The NautiLOD language offers the following features:
- it provides a way to recursively navigate and retrieve data sources from the Web of Data
- it exploits regular expressions to declaratively specify fragments of the Web of Data
- regular expressions can be intertwined with triggers on the form of ASK-SPARQL queries to control and orient the navigation
- it enables to command actions during the navigation such as, send me by email a specific piece of information if encountered during the navigation.
NautiLOD has been implemented in swget that offers the following additional features:
- it enables the user to specify limits on the amount of data retrieved. This can be done both in terms of maximum time and/or size (in MBs)
- for developers it can be seen as a non-interactive commandline tool, to be easily called from scripts and terminals
- it is available as a command line tool and with GUI.