A Brief Introduction to RDF

SpotterBase annotations are based on the Resource Description Framework (RDF). To an extent, you can use SpotterBase without understanding RDF, but eventually it will be helpful to have at least a basic understanding of it.

This is not the place for a full introduction to RDF, but we will try to give a brief overview of the relevant concepts. This should enable you to read up on different aspects as needed. Alternatively, you can also take a look at the W3C RDF 1.1 Primer.

Representing data as triples

RDF can be used to represent data as a set of triples of the form subject predicate object. For example, we could represent the fact that the document doc01 has the topic topic01 as the triple doc01 hasTopic topic01.

In RDF, the subject, predicate and object are usually URIs, but literals and anonymous nodes (called “blank nodes”) are also possible. Using URIs, we could represent the above fact as the triple

<http://example.org/doc01> <http://example.org/hasTopic> <http://example.org/topic01> .

We can use more triples to represent further facts:

<http://example.org/doc01> <http://example.org/hasAuthor> <http://example.org/authorX> .
# for the name, we use a String literal
<http://example.org/authorX> <http://example.org/hasName> "John Doe" .
# similarly, we can use a number literal for the birth year
<http://example.org/authorX> <http://example.org/birthYear> 1987 .

In general, we can use any URIs we want. However, there are some well known vocabularies that define URIs for various use cases. For example, the FOAF vocabulary defines URIs for people, their names, etc. So instead of using our made-up <http://example.org/hasName> predicate, we could use the FOAF predicate <http://xmlns.com/foaf/0.1/name>.

Prefixes

In the above example, we used the full URIs for the predicates. This can get tedious, and many RDF formats allow us to specify prefixes for URIs. For example, we could define the prefix foaf: for the FOAF vocabulary, and then use foaf:name instead of the full URI (foaf: abbreviates <http://xmlns.com/foaf/0.1/>). This is particularly helpful as RDF vocabularies are usually based on a namespace URI, and the actual URIs are defined by appending a name. For example, we could now use foaf:workInfoHomepage to specify the work website of a person.

Graphs

We can also think of a set of triples as a graph, where the subjects and objects are nodes and the predicates are edges. The above example would look like this:

View Turtle
@prefix ex: <http://example.org/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

ex:doc01 ex:hasTopic ex:topic01 .
ex:doc01 ex:hasAuthor ex:authorX .
ex:authorX foaf:name "John Doe" .
ex:authorX ex:birthYear 1987 .
View N-Triples
<http://example.org/doc01> <http://example.org/hasTopic> <http://example.org/topic01> .
<http://example.org/authorX> <http://example.org/birthYear> "1987"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://example.org/authorX> <http://xmlns.com/foaf/0.1/name> "John Doe" .
<http://example.org/doc01> <http://example.org/hasAuthor> <http://example.org/authorX> .
View Graph

RDF formats

There are various formats for representing RDF data. The most established one is RDF/XML, but it is not the most human-readable one. N-Triples is a very simple format where you simply write out the triples. Turtle is a superset of N-Triples that allows you to use prefixes and some other shortcuts.

Databases and queries

RDF triples can be stored in specialized databases, often called “triple stores”. They can then be queried using the SPARQL query language. For example, we could query for all documents by “John Doe” with the following SPARQL query:

PREFIX ex: <http://example.org/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?doc WHERE {
    ?doc ex:hasAuthor ?author .
    ?author foaf:name "John Doe" .
}