LOD Service for OpenAIRE:

The OpenAIRE Linked Open Data (LOD) Services and their integration with the OpenAIRE information space have been released as a beta version. The LOD exporting process started with a specification of the OpenAIRE data model as an RDF vocabulary, and then mapping of the OpenAIRE data to the graph-based RDF data model. To interlink the OpenAIRE data with related data on the Web, we have identified a list of potential datasets to interlinked with, including the DBpedia dataset extracted from Wikipedia and the publication databases DBLP and CiteSeer.

LOD for non technical people:

If we go back and think of data, there are many kinds of data that we use: images, videos, text, graphs, websites containing pictures and text and links to other websites. The Web of Documents started with weaving HTML documents together with hyperlinks. This allows us as humans to follow the subject from one document to another document. Now think about weaving individual bits of data together and explicitly expressing the relationship between these bits of data to guide, e.g., intelligent search engines. This requires an increased level of explicitness because machines cannot truly interpret the meaning of the content of webpages in the same way as humans. A link with an explicit description of its relationship can be understood as a subject-predicate-object sentence (e.g. “Harry Potter was authored by Joanne K. Rowling”), or, in more technical terms, a triple. The process of making data from documents or other data sources machine-comprehensible is thus called triplification.


Triplification turns documents into graphs.

As anything can be the subject of multiple triples at the same time, and often also the object of other triples, things are becoming connected with each other in a network structure called a graph. Best practices for publishing such graphs on the Web in a way that is as reusable as possible are subsumed under the term “Linked Data”. Linked Data technology involves standards such as URIs, HTTP and RDF. It primarily enables machine to explore the Web of Data, but in a second step also humans who use machine services, such as search engines. The legal aspect of maximising reuse is making data available under open licenses. Linked Data is often openly licensed and then called Linked Open Data.


The following picture shows the workflow of RDF production in the context of OpenAIRE project:


We so far provide the following ways of accessing the OpenAIRE LOD: descriptions of each entity downloadable from its URI,a downloadable dump, and a SPARQL endpoint. An updated version of the OpenAIRE RDF dump is always available on Zenodo: https://zenodo.org/record/53077#.V6M34Pn5jRa

Due to upload restrictions of the Zenodo, we split the dataset into small files but one can download the whole dataset using the below command.

wget -r -l1 -H -nd -N -np -A gz -e robots=off LinkToDataSet

(at the time writing this description: https://zenodo.org/record/53077#.V6M34Pn5jRa)

And the data is also available through an endpoint: http://lod.openaire.eu/sparql


You can use SPARQL queries to access results you want. SPARQL is a standard query language for retrieving and manipulation of data stored in RDF format. SPARQL queries consists of triple patterns, conjunctions, disjunctions and optional patterns. SPARQL defines 4 query types namely SELECT, CONSTRUCT, ASK and DESCRIBE.

Some example SPARQL query over OpenAIRE are as follows:

  • Number of Publications:

    PREFIX oav: <http://lod.openaire.eu/vocab/>
    SELECT (COUNT(DISTINCT ?instance) AS ?count)
        ?instance oav:resultType "publication"

    Result at the time of running this: 16700625

  • This query retrieves first and last names of 10 people who’s first name start with "Robin".

    PREFIX foaf: <http://xmlns.com/foaf/0.1/>
    SELECT DISTINCT ?firstName, ?lastName
    WHERE {
        ?s a foaf:Person.
        ?s foaf:firstName ?firstName.
        FILTER REGEX(?firstName,"ˆRobin","i").
        ?s foaf:lastName ?lastName.
    LIMIT 10