Introducing Open Metadata about Transformative Agreements

This post presents a new dataset that combines open metadata from the cOAlition S Journal Checker Tool and OpenAlex to analyse transformative agreements. Data on these much-discussed agreements are scattered across different sources and are only partially available. To address this, we preserved and combined open metadata from the cOAlition S Journal Checker Tool and OpenAlex, resulting in a unified dataset for large-scale bibliometric studies.

Najko Jahn (State and University Library Göttingen)https://www.sub.uni-goettingen.de/
2025-05-09

Since their initial proposal (Schimmer, Geschuhn, and Vogler 2015), transformative agreements have become a predominant model to finance open access in scholarly journals (Dér 2025). Measuring their impact, however, remains challenging as data about these agreements are scattered across different sources (Kramer 2024).

The cOAlition S Public Transformative Agreement Data dump, which powers the Journal Checker Tool, is an important step towards transparency. This resource is based on transformative agreements recorded in the ESAC Registry. However, the Journal Checker Tool only presents current agreements and removes data on expired agreements. Another shortcoming in the analysis of transformative agreements is that bibliometric databases have not integrated data on transformative agreements such as those provided by ESAC or the COAlition S, making it difficult to identify articles published under these agreements (Bakker, Langham-Putrow, and Riegelman 2024). Open access monitoring services also lack comprehensive coverage of this data point.

To close this gap, this blog post introduces an open data release about transformative agreements developed as part of the initial OPENBIB data release of the German Competence Network for Bibliometrics(Haupka et al. 2025) at the SUB Göttingen. This dataset, licensed under CC0, combines cOAlition S data with OpenAlex to improve transparency and enable estimates of articles published under these agreements.

The dataset comprises:

Preliminary versions of this dataset were used in the SUB Göttingen’s Hybrid Open Access Dashboard, a comprehensive monitoring effort based on 13,000 hybrid journals in transformative agreements (Achterberg and Jahn 2023), and studies on the impact of transformation agreements on open access in hybrid journals (Jahn 2025b). The data were also used to compare findings when applied to open metadata and proprietary bibliometric databases Scopus and Web of Science (Jahn 2025a). Using Dutch Research Council NWO funded papers, de Jonge, Kramer, and Sondervan (2025) validated an open method based on transformative agreement data and OpenAlex and were able to accuratly identify the majority of articles under these agreements.

This blog post will present methods used to compile the dataset and will present a use case based on Google BigQuery to help with the first steps using this new open data source.

Methods

Data retrieval and curation

A dedicated bot has preserved weekly snapshots of the cOAlition S Public Transformative Agreement Data dump since December 2022. These snapshots, available on GitHub, were merged using a custom script that retains only the most recent data for each agreement.

The original data links agreements to journals through names and ISSNs. After mapping to linking ISSN (ISSN-L), journals were associated with publishers using the ESAC Registry. To improve institutional coverage, the data were enriched with ROR-IDs from OpenAlex’s institution data.

Because OpenAlex does not fully support corresponding authors, articles enabled by transformative agreements were estimated by matching first author affiliations with participating institutions, considering agreement durations from the ESAC Registry, as described in Jahn (2025b).

Processing was performed on Google BigQuery, with dataset compilation for the initial version completed in April 2025. Data files are available from Zenodo and programmatically via the Open Scholarly Data warehouse (dataset openbib).

Data files

The dataset comprises four main files:

Historic cOAlition S Transformative Agreement Data

ESAC snapshot

Articles under Transformative Agreements

Full documentation of data files is available in the data documentation.

Use case

In the following, a use case based on Google BigQuery is presented. Anyone can view and query this data with a Google Cloud Computing account, with standard usage fees applying for querying the data. The dataset is also available on Zenodo.

How many articles were enabled by transformative agreements?

This query retrieves annual counts for articles enabled by transformative agreements, focusing on articles and reviews as classified by OpenAlex:

SELECT
  publication_year,
  esac.publisher,
  COUNT(DISTINCT(jct.id)) AS n,
FROM
  `subugoe-collaborative.openbib.jct_articles` AS jct
INNER JOIN
  `subugoe-collaborative.openalex.works` AS oalex
ON
  oalex.doi = jct.doi
INNER JOIN
  `subugoe-collaborative.openbib.jct_esac` AS esac
ON
  esac.id = jct.esac_id
WHERE oalex.type IN ('article', 'review') AND is_paratext = FALSE
GROUP BY
  publication_year,
  publisher
ORDER BY
  publication_year DESC,
  n DESC
bq_df
#> # A tibble: 292 × 3
#>    publication_year publisher                      n
#>               <int> <chr>                      <int>
#>  1             2025 Wiley                      13741
#>  2             2025 Elsevier                   11194
#>  3             2025 Springer Nature             7930
#>  4             2025 Taylor & Francis            6035
#>  5             2025 Sage                        3194
#>  6             2025 Oxford University Press     2700
#>  7             2025 American Chemical Society   1202
#>  8             2025 Royal Society of Chemistry   697
#>  9             2025 American Physical Society    581
#> 10             2025 Cambridge University Press   548
#> # ℹ 282 more rows
Growth of articles enabled by transformative agreements between 2020 and 2024, showing the dominance of the five largest commercial publishers in the scholarly publishing market.

Figure 1: Growth of articles enabled by transformative agreements between 2020 and 2024, showing the dominance of the five largest commercial publishers in the scholarly publishing market.

Figure 1 shows the growth of articles enabled by transformative agreements between 2020 and 2024, highlighting the dominance of five major commercial publishers, with Elsevier, Springer Nature and Wiley leading.

How many articles were made open access by transformative agreements?

Transformative agreements vary in structure and implementation. Journal bundles may include open access journals, hybrid journals, and subscription journals, with varying document types allowed and potential limitations on open access article numbers. The following query examines the open access status of articles enabled by transformative agreements:

SELECT
  publication_year,
  esac.publisher,
  oalex.open_access.oa_status,
  COUNT(DISTINCT(jct.id)) AS n,
FROM
  `subugoe-collaborative.openbib.jct_articles` AS jct
INNER JOIN
  `subugoe-collaborative.openalex.works` AS oalex
ON
  oalex.doi = jct.doi
INNER JOIN
  `subugoe-collaborative.openbib.jct_esac` AS esac
ON
  esac.id = jct.esac_id
WHERE oalex.type IN ('article', 'review') AND is_paratext = FALSE
GROUP BY
  publication_year,
  publisher, 
  oalex.open_access.oa_status
ORDER BY
  publication_year DESC,
  n DESC
bq_oa_df
#> # A tibble: 1,243 × 4
#>    publication_year publisher        oa_status     n
#>               <int> <chr>            <chr>     <int>
#>  1             2025 Wiley            hybrid     7761
#>  2             2025 Elsevier         hybrid     5417
#>  3             2025 Springer Nature  hybrid     4682
#>  4             2025 Elsevier         closed     4491
#>  5             2025 Taylor & Francis hybrid     3597
#>  6             2025 Wiley            closed     3326
#>  7             2025 Sage             closed     2495
#>  8             2025 Wiley            gold       2188
#>  9             2025 Taylor & Francis closed     2110
#> 10             2025 Springer Nature  gold       1562
#> # ℹ 1,233 more rows
Articles covered by transformative agreements by open access status

Figure 2: Articles covered by transformative agreements by open access status

Figure 2 shows open access by business model. The majority of articles were made available in hybrid journals. The notable number of closed articles may reflect matching limitations or complexities of transformative agreements regarding journal inclusion, article caps, and document type restrictions. It may also signal issues with OpenAlex open access tagging.

Responsible use

While the data presented allow analysis of transformative agreements using bibliometric databases, some shortcomings must be acknowledged. The public Transformative Agreement Data Dumps from the Journal Checker Tool and the ESAC Registry are voluntary, crowd-sourced efforts. The information is subject to change.

Due to limited publicly available invoice data, the article dataset only represents estimates based on first author affiliations according to OpenAlex. Although transformative agreement guidelines typically refer to corresponding authors (data not fully available in OpenAlex), research has shown a strong correlation between first and corresponding authorship at the level of publishers and countries.

Funding information

This work was supported by the Federal Ministry of Education and Research of Germany (BMBF) under grants 16WIK2301E / 16WIK2101F.

Achterberg, Inke, and Najko Jahn. 2023. “Introducing the Hybrid Open Access Dashboard (HOAD).” cOAlition S. https://www.coalition-s.org/blog/introducing-the-hybrid-open-access-dashboard-hoad/.
Bakker, Caitlin, Allison Langham-Putrow, and Amy Riegelman. 2024. “Impact of Transformative Agreements on Publication Patterns: An Analysis Based on Agreements from the ESAC Registry.” International Journal of Librarianship 8 (4): 67–96. https://doi.org/10.23974/ijol.2024.vol8.4.341.
de Jonge, Hans, Bianca Kramer, and Jeroen Sondervan. 2025. “Tracking Transformative Agreements Through Open Metadata: Method and Validation Using Dutch Research Council NWO Funded Papers.” MetaArXiv. https://doi.org/10.31222/osf.io/tz6be_v1.
Dér, Ádám. 2025. “What Gets Missed in the Discourse on Transformative Agreements.” Katina Magazine, February. https://doi.org/10.1146/katina-20250212-1.
Haupka, Nick, Jack Culbert, Paul Donner, Najko Jahn, Christopher Lenke, Philipp Mayr, Andreas Meier, et al. 2025. “OPENBIB: Selected Curated Open Metadata Based on OpenAlex.” Kompetenznetzwerk Bibliometrie. https://doi.org/10.5281/zenodo.15308680.
Jahn, Najko. 2025a. “Estimating Transformative Agreement Impact on Hybrid Open Access: A Comparative Large-Scale Study Using Scopus, Web of Science and Open Metadata.” https://arxiv.org/abs/2504.15038.
———. 2025b. “How Open Are Hybrid Journals Included in Transformative Agreements?” Quantitative Science Studies 6 (January): 242–62. https://doi.org/10.1162/qss_a_00348.
Kramer, Bianca. 2024. Study on Scientific Publishing in EuropeDevelopment, Diversity, and Transparency of Costs. Publications Office of the European Union. https://doi.org/doi/10.2777/89349.
Schimmer, Ralf, Kai Geschuhn, and Andreas Vogler. 2015. Disrupting the subscription journals’business model for the necessary large-scale transformation to open access.” Max Planck Digital Library. https://doi.org/10.17617/1.3.

References

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/subugoe/scholcomm_analytics, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Jahn (2025, May 9). Scholarly Communication Analytics: Introducing Open Metadata about Transformative Agreements. Retrieved from https://subugoe.github.io/scholcomm_analytics/posts/openbib_ta_release/

BibTeX citation

@misc{jahn2025introducing,
  author = {Jahn, Najko},
  title = {Scholarly Communication Analytics: Introducing Open Metadata about Transformative Agreements},
  url = {https://subugoe.github.io/scholcomm_analytics/posts/openbib_ta_release/},
  year = {2025}
}