This R package provides helpers for splitting, de-compressing and parsing OpenAIRE Research Graph dumps, a big scholarly data dump comprising metadata about various kinds of grant-supported research outputs, as well as the relationships between them. The package {openairegraph} targets users who wish to conduct their own analysis using the OpenAIRE Research Graph, but are wary of handling its large data dumps.

More information about OpenAIRE Research Graph, the dumps and the documentation of their structure can be found at

Manghi, Paolo, Atzori, Claudio, Bardi, Alessia, Schirrwagen, Jochen, Dimitropoulos, Harry, La Bruzzo, Sandro, … Summan, Friedrich. (2019). OpenAIRE Research Graph Dump (Version 1.0.0-beta) Zenodo.

The data model is more thoroughly explained here:

Manghi, Paolo, Bardi, Alessia, Atzori, Claudio, Baglioni, Miriam, Manola, Natalia, Schirrwagen, Jochen, & Principe, Pedro. (2019, April 17). The OpenAIRE Research Graph Data Model (Version 1.3). Zenodo.

Currently implemented methods

So far, openairegraph has been tested to work with the H2020 dump, h2020_results.gz. The first set provides helpers to split a large OpenAIRE Research Graph data dump into separate, de-coded XML records that can be stored individually. The other set consists of parsers that convert data from these XML files to a tibble.

For a long-form documentation including a use-case, see:


You can install the development version of openairegraph from GitHub using the remotes package



This work is supported by OpenAIRE-Advance. OpenAIRE-Advance receives funding from the European Union’s Horizon 2020 Research and Innovation programme under Grant Agreement No. 777541.