This post presents a new dataset that combines open metadata from the cOAlition S Journal Checker Tool and OpenAlex to analyse transformative agreements. Data on these much-discussed agreements are scattered across different sources and are only partially available. To address this, we preserved and combined open metadata from the cOAlition S Journal Checker Tool and OpenAlex, resulting in a unified dataset for large-scale bibliometric studies.
Since their initial proposal (Schimmer, Geschuhn, and Vogler 2015), transformative agreements have become a predominant model to finance open access in scholarly journals (Dér 2025). Measuring their impact, however, remains challenging as data about these agreements are scattered across different sources (Kramer 2024).
The cOAlition S Public Transformative Agreement Data dump, which powers the Journal Checker Tool, is an important step towards transparency. This resource is based on transformative agreements recorded in the ESAC Registry. However, the Journal Checker Tool only presents current agreements and removes data on expired agreements. Another shortcoming in the analysis of transformative agreements is that bibliometric databases have not integrated data on transformative agreements such as those provided by ESAC or the COAlition S, making it difficult to identify articles published under these agreements (Bakker, Langham-Putrow, and Riegelman 2024). Open access monitoring services also lack comprehensive coverage of this data point.
To close this gap, this blog post introduces an open data release about transformative agreements developed as part of the initial OPENBIB data release of the German Competence Network for Bibliometrics(Haupka et al. 2025) at the SUB Göttingen. This dataset, licensed under CC0, combines cOAlition S data with OpenAlex to improve transparency and enable estimates of articles published under these agreements.
The dataset comprises:
Preliminary versions of this dataset were used in the SUB Göttingen’s Hybrid Open Access Dashboard, a comprehensive monitoring effort based on 13,000 hybrid journals in transformative agreements (Achterberg and Jahn 2023), and studies on the impact of transformation agreements on open access in hybrid journals (Jahn 2025b). The data were also used to compare findings when applied to open metadata and proprietary bibliometric databases Scopus and Web of Science (Jahn 2025a). Using Dutch Research Council NWO funded papers, de Jonge, Kramer, and Sondervan (2025) validated an open method based on transformative agreement data and OpenAlex and were able to accuratly identify the majority of articles under these agreements.
This blog post will present methods used to compile the dataset and will present a use case based on Google BigQuery to help with the first steps using this new open data source.
A dedicated bot has preserved weekly snapshots of the cOAlition S Public Transformative Agreement Data dump since December 2022. These snapshots, available on GitHub, were merged using a custom script that retains only the most recent data for each agreement.
The original data links agreements to journals through names and ISSNs. After mapping to linking ISSN (ISSN-L), journals were associated with publishers using the ESAC Registry. To improve institutional coverage, the data were enriched with ROR-IDs from OpenAlex’s institution data.
Because OpenAlex does not fully support corresponding authors, articles enabled by transformative agreements were estimated by matching first author affiliations with participating institutions, considering agreement durations from the ESAC Registry, as described in Jahn (2025b).
Processing was performed on Google BigQuery, with dataset compilation for the initial version completed in April 2025. Data files are available from Zenodo and programmatically via the Open Scholarly Data warehouse (dataset openbib).
The dataset comprises four main files:
Historic cOAlition S Transformative Agreement Data
jct_journals
links journals to transformative agreementsjct_institutions
links participating institutions to agreementsESAC snapshot
jct_esac
- Metadata about agreements including publisher name and durationArticles under Transformative Agreements
jct_articles
links OpenAlex articles to agreements through journals, institutions, and durationFull documentation of data files is available in the data documentation.
In the following, a use case based on Google BigQuery is presented. Anyone can view and query this data with a Google Cloud Computing account, with standard usage fees applying for querying the data. The dataset is also available on Zenodo.
This query retrieves annual counts for articles enabled by transformative agreements, focusing on articles and reviews as classified by OpenAlex:
SELECT
publication_year,
esac.publisher,COUNT(DISTINCT(jct.id)) AS n,
FROM
-collaborative.openbib.jct_articles` AS jct
`subugoeINNER JOIN
-collaborative.openalex.works` AS oalex
`subugoeON
= jct.doi
oalex.doi INNER JOIN
-collaborative.openbib.jct_esac` AS esac
`subugoeON
id = jct.esac_id
esac.WHERE oalex.type IN ('article', 'review') AND is_paratext = FALSE
GROUP BY
publication_year,
publisherORDER BY
DESC,
publication_year DESC n
bq_df
#> # A tibble: 292 × 3
#> publication_year publisher n
#> <int> <chr> <int>
#> 1 2025 Wiley 13741
#> 2 2025 Elsevier 11194
#> 3 2025 Springer Nature 7930
#> 4 2025 Taylor & Francis 6035
#> 5 2025 Sage 3194
#> 6 2025 Oxford University Press 2700
#> 7 2025 American Chemical Society 1202
#> 8 2025 Royal Society of Chemistry 697
#> 9 2025 American Physical Society 581
#> 10 2025 Cambridge University Press 548
#> # ℹ 282 more rows
Figure 1: Growth of articles enabled by transformative agreements between 2020 and 2024, showing the dominance of the five largest commercial publishers in the scholarly publishing market.
Figure 1 shows the growth of articles enabled by transformative agreements between 2020 and 2024, highlighting the dominance of five major commercial publishers, with Elsevier, Springer Nature and Wiley leading.
Transformative agreements vary in structure and implementation. Journal bundles may include open access journals, hybrid journals, and subscription journals, with varying document types allowed and potential limitations on open access article numbers. The following query examines the open access status of articles enabled by transformative agreements:
SELECT
publication_year,
esac.publisher,
oalex.open_access.oa_status,COUNT(DISTINCT(jct.id)) AS n,
FROM
-collaborative.openbib.jct_articles` AS jct
`subugoeINNER JOIN
-collaborative.openalex.works` AS oalex
`subugoeON
= jct.doi
oalex.doi INNER JOIN
-collaborative.openbib.jct_esac` AS esac
`subugoeON
id = jct.esac_id
esac.WHERE oalex.type IN ('article', 'review') AND is_paratext = FALSE
GROUP BY
publication_year,
publisher,
oalex.open_access.oa_statusORDER BY
DESC,
publication_year DESC n
bq_oa_df
#> # A tibble: 1,243 × 4
#> publication_year publisher oa_status n
#> <int> <chr> <chr> <int>
#> 1 2025 Wiley hybrid 7761
#> 2 2025 Elsevier hybrid 5417
#> 3 2025 Springer Nature hybrid 4682
#> 4 2025 Elsevier closed 4491
#> 5 2025 Taylor & Francis hybrid 3597
#> 6 2025 Wiley closed 3326
#> 7 2025 Sage closed 2495
#> 8 2025 Wiley gold 2188
#> 9 2025 Taylor & Francis closed 2110
#> 10 2025 Springer Nature gold 1562
#> # ℹ 1,233 more rows
Figure 2: Articles covered by transformative agreements by open access status
Figure 2 shows open access by business model. The majority of articles were made available in hybrid journals. The notable number of closed articles may reflect matching limitations or complexities of transformative agreements regarding journal inclusion, article caps, and document type restrictions. It may also signal issues with OpenAlex open access tagging.
While the data presented allow analysis of transformative agreements using bibliometric databases, some shortcomings must be acknowledged. The public Transformative Agreement Data Dumps from the Journal Checker Tool and the ESAC Registry are voluntary, crowd-sourced efforts. The information is subject to change.
Due to limited publicly available invoice data, the article dataset only represents estimates based on first author affiliations according to OpenAlex. Although transformative agreement guidelines typically refer to corresponding authors (data not fully available in OpenAlex), research has shown a strong correlation between first and corresponding authorship at the level of publishers and countries.
This work was supported by the Federal Ministry of Education and Research of Germany (BMBF) under grants 16WIK2301E / 16WIK2101F.
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/subugoe/scholcomm_analytics, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Jahn (2025, May 9). Scholarly Communication Analytics: Introducing Open Metadata about Transformative Agreements. Retrieved from https://subugoe.github.io/scholcomm_analytics/posts/openbib_ta_release/
BibTeX citation
@misc{jahn2025introducing, author = {Jahn, Najko}, title = {Scholarly Communication Analytics: Introducing Open Metadata about Transformative Agreements}, url = {https://subugoe.github.io/scholcomm_analytics/posts/openbib_ta_release/}, year = {2025} }