We present hoaddata, an experimental R package that combines open scholarly data from the German Open Access Monitor, Crossref and OpenAlex. Using this package, we illustrate the progress made in publishing open access content in hybrid journals included in nationwide transformative agreements in Germany across journal portfolios and countries.
Assessing the volume and share of open access articles in hybrid journals is crucial for the planning and implementation of transformative agreements, an evolving type of contracts between libraries and publishers where subscription spending is repurposed for open access publishing (Schimmer, Geschuhn, and Vogler 2015). In particular, library consortia who mainly negotiate transformative agreements with large publishers require such publication data according to the recently published ESAC Reference Guide to Transformative Agreements. Here, we present hoaddata, an experimental R package, in which openly available journal-level data about nationwide transformative agreements in Germany are combined with article-level open access status information and country affiliations. Accordingly, hoadata provides essential data for the monitoring and benchmarking of transformative agreements across hybrid journal portfolios and countries.
Interacting with data through R packages like hoaddata makes data analytics more transparent because R packages meet generic principles for computational reproducibility: coherent file organisation, separation of data, methods and results, and specification of the computational environment (Marwick, Boettiger, and Mullen 2018). For data science practitioners, R packages, thus, provide a reliable way to re-use data and code. In our specific case, hoaddata not only contains datasets about hybrid open access. It also comprises code used to compile the data by interfacing our cloud-based Google Big Query data warehouse, where we store open scholarly data from Crossref, OpenAlex and Unpaywall. hoaddata is automatically built and updated with GitHub Actions, a continuous integration service. Each merge event into the main branch triggers the execution of code to obtain up-to-date data about transformative agreements from the most recent open scholarly data snapshots available in our data warehouse. Data changes including updates will be incorporated in the package and tracked with Git that allows to reproduce different version of the data contained in hoaddata.
In this blog post, we describe the data analytics workflow behind hoaddata. The main purpose of hoaddata is to ship data for monitoring dashboards about the progress of nationwide transformative agreements, which we currently develop in the HOAD project with the support of the Deutsche Forschungsgemeinschaft, but everyone can install the package from GitHub and use it in R. To demonstrate its potential, we will use hoaddata to illustrate the current state of the transition of hybrid journals to fully open access relative to those journals, which are included in nationwide transformative agreements in Germany.
hoaddata focuses on nationwide transformative agreements in Germany. As a first step, we draw on the work of the German Open Access Monitor (OAM) to obtain a list of journals under these agreements (Pollack et al. 2022). We merged all journals into a single data file and enriched it with missing ISSN variants. Because publishers register journal-level metadata in Crossref when they first deposit metadata for a given journal including ISSNs, we furthermore matched the OAM journal list with Crossref’s title list to link ISSN variants to journals as they were represented in Crossref metadata.
After obtaining a list of hybrid journals linked to nationwide transformative agreements, we determined the article volume by journal and year using Crossref. Following Unpaywall’s approach, Crossref metadata records considered as front matter were excluded.1 Open access articles were identified through Creative Commons license URLs in Crossref metadata records. License URL were mapped to the different license versions like CC BY.
Because country affiliations are a key data point for nationwide
transformative agreements, we used OpenAlex to determine the country
share per journal and publisher portfolio. To our knowledge, OpenAlex
does not provide information about corresponding authors and their
affiliation, which is a key data point in most transformative
agreements. Instead, we made use of first-author affiliations. A first
author is often regarded as being the lead author who has
usually undertaken most of the research presented in the article,
although author roles can vary across disciplines. In case OpenAlex did
not record any country affiliation, we extracted country names from the
metadata field display_name
using regular expressions. We
applied full counting to account for multiple country affiliations.
As a result, hoaddata provides the following datasets:
oam_hybrid_jns: Hybrid journals included in the Open Access Monitor. Data were gathered from Pollack et al. (2022), validated and mapped to Crossref-indexed journals.
cc_jn_ind: Prevalence of Creative Commons license variants by year and hybrid journal as obtained from Crossref.
cc_openalex_inst_jn_ind: First author country affiliations per journal, year and Creative Commons license. Country affiliations were gathered from OpenAlex.
Article-level data
hoaddata can be installed from GitHub:
# install.packages("remotes")
::install_github("subugoe/hoaddash", dependencies = "Imports") remotes
You can also directly download the data as csv files from GitHub. The files are
stored in the data-raw
folder of the package together with the code used to create the
datasets. Specific SQL queries can be found in inst/sql/
.
At the time of writing, hoaddata comprised information about 5,562 hybrid journals included in twenty consortial transformative agreements in Germany. Since 2017, these journals have published 348,978 open access articles with Creative Commons license, representing a share of 10%.
Using OpenAlex’s affiliation information, we can break down the performance of transformative agreements to countries. Showing the publication year 2021, the following interactive table compares the global publication volume including open access with that of lead authors based in Germany by hybrid journal portfolio. The table highlights the dominant position of transformative agreements negotiated by the DEAL consortium in terms of articles published and open access. It is important to note that DEAL has reached no agreement with Elsevier so far, one of the largest scholarly publishers, which is, in turn, not included in the analysis.
Interestingly, Germany’s open access share in 2021 performed in most cases below 80%, suggesting that not all authors made use of open access options, or were not eligible to publish open access, likely because their institution was not part of a consortium. But also some article types might no be included in agreements. For instance, many article types in medical specialist journals from Springer Medizin, which are targeted at medical practitioners, are not covered by DEAL.2 The low open access uptake in the hybrid journal portfolios from ACM, AIP, Hogrefe and SPIE suggest that these publishers did not share Creative Commons license information through Crossref metadata records.
The table furthermore shows large discrepancies between the global open access uptake in hybrid journals and Germany’s. In 2021, 16% of articles published in hybrid journals included in nationwide transformative agreements in Germany were open access, while the open access percentage among articles published by lead authors based in Germany was 65%. Overall, 20% of open access articles published in hybrid journals included in transformative agreements in 2021 were from Germany, although its total publication volume accounted for 4.9% of articles published.
As can be seen from the following figure, which highlights the Top 20 most productive countries in terms of articles published, open access uptake in hybrid journals included in nationwide transformative agreements in Germany varies across countries. Notably, lead authors based in the US, China or India – which together accounted for roughly 40% of articles published – did utilize open access options to a much lesser extent than authors from European countries. Together with Germany, the UK, the Netherlands, Sweden and Switzerland have gained an open access share above 50% over the years, most likely because of providing nationwide transformative agreements with similar journal coverage.
Over the past years, Germany’s library consortia successfully negotiated transformative agreements with commercial publishers, resulting in an increase of open access articles from lead authors based in Germany. But this growth of open access is neither balanced across hybrid journal portfolios nor across countries.
As illustrated, there are substantial variations across journal portfolios in terms of open access articles published by lead authors based in Germany. They can be explained by different agreement terms such as the number and types of institutions involved or the restriction of open access publishing options to specific article types. In its recent Reference Guide to Transformative Agreements, and in line with previous research (Borrego, Anglada, and Abadal 2020), the ESAC initiative points out that agreement terms affecting the scope of contracts can have a large impact on the performance of transformative agreements.
In terms of country variations, although the ESAC Registry of Transformative Agreements discloses an increasing number of similar national-level agreements, more than 80% of articles published in hybrid journals included in nationwide transformative agreements in Germany are still behind a paywall. Tracking open access across country affiliations reveals that uptake rates are particular low in the most productive countries USA, China and India. It remains to be seen whether a transition of hybrid journal portfolios to fully open access through transformative agreements is feasible given these observed global differences.
In future, we want to use hoaddata as data source for monitoring dashboards about the progress of nationwide transformative agreements in Germany, which we currently develop in the HOAD project with the support of the Deutsche Forschungsgemeinschaft. We want to extend the R package’s current scope on data about publication volume and open access uptake across agreements and countries. The aim is to highlight open metadata gaps not only relative to Creative Commons license information. Following up on our work on metacheck, an email tool to check metadata compliance, we will also present information about the coverage of Text and Data Mining support, funder infos, ORCIDs or Open Abstracts in Crossref metadata records; these data are critical for an open and reproducible monitoring of transformative agreements.
The launch of dashboards specific to nationwide transformative agreements in Germany is targeted for September 2022.
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/subugoe/scholcomm_analytics, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Jahn & Haupka (2022, June 7). Scholarly Communication Analytics: How open are hybrid journals included in nationwide transformative agreements in Germany?. Retrieved from https://subugoe.github.io/scholcomm_analytics/posts/oam_hybrid/
BibTeX citation
@misc{jahn2022how, author = {Jahn, Najko and Haupka, Nick}, title = {Scholarly Communication Analytics: How open are hybrid journals included in nationwide transformative agreements in Germany?}, url = {https://subugoe.github.io/scholcomm_analytics/posts/oam_hybrid/}, year = {2022} }