About Hybrid Open Access Dashboards (HOAD)

Open source data analytics to measure hybrid open access through transformative agreements

Aim

As library consortia increasingly negotiate transformative agreements, the goal of HOAD is to highlight the extent to which hybrid journals that are part of these agreements have transitioned to full open access.

Our resources can help libraries and consortia to model for a fully open access future and derive key insights such as:

  • How much content in hybrid journals is already published openly under a Creative Commons license?
  • What is the distribution of Germany’s output in hybrid journals compared to that of other countries?
  • Do publishers make metadata openly available and if not, where are the gaps?

HOAD is open source and licensed under CC0. We believe that in order to overcome cost barriers and the growing ownership of data analytics by academic publishers, Open Access monitoring needs to be as open and reproducible as possible.

Data and methods

To engage with everyone, our open source dashboards combine open data from multiple sources as follows:

  • Journals included in transformative agreements were obtained from the cOAlition S Transformative Agreements Public Data (July 2021, July 2022, May 2023, and December 2023 snapshots). Agreements with German consortia were derived from Germany’s Open Access Monitor (March 2022). We enriched both journal data sources with ISSN-L, a linking identifier for ISSN variants. Then, we excluded fully open access journals from the cOAlition S Transformative Agreements Public Data by cross-checking with several datasets of fully open access journals: the Directory of Open Access Journals (DOAJ), Bielefeld GOLD OA and OpenAlex.
  • Crossref is our main data source for determining the publication volume in hybrid journals including the uptake of Open Access through Creative Commons licenses. The ESAC guidelines, a community recommendation for negotiating transformative agreements, require publishers to make key metadata elements publicly available through Crossref, such as Creative Commons licensing information for Open Access articles. As a Crossref Metadata Plus subscriber, we used monthly Crossref snapshots to derive publication data.
  • OpenAlex is our data source for determining the country affiliation of lead authors. A lead author is the first named author of a scholarly article who has usually carried out most of the research presented in the article, although author roles can vary across disciplines. We used OpenAlex monthly snaphots as a data source to determine lead author affiliations.
  • To highlight potential gaps between Crossref open licensing metadata and information provided via journal webpages, we used OpenAlex as a complementary data source for Open Access articles in hybrid journals.

This compiled data is openly available through an R package, hoaddata. hoaddata contains not only datasets about hybrid open access. It also includes code (mostly SQL) used to compile the data by connecting to our cloud-based Google Big Query data warehouse, where we store big scholarly data from Crossref, OpenAlex and Unpaywall. hoaddata is automatically built and updated per month using GitHub Actions, a continuous integration service.

Dashboard implementation

HOAD is made with Quarto, an open source tool for scientific and technical publishing. Quarto documents contain text and analytical source code, in our case R, and are rendered to HTML. More specifically, we created parameterised reports from which Quarto generates publisher-specific views. The resulting website is published on GitHub pages.

As statistical tools, we used a bunch of tidyverse packages. Visualisations were made interactive through ggiraph and plotly. Interactive tables were implemented using reactable. These packages have great documentation with lots of examples that has helped us to build HOAD 😀.

Limitations

Our work should be used in a responsible way. Limitations include:

  • Journal-level data associated with publishers and transformative agreements are highly dynamic and are subject of change.
  • Although we were able to exclude most full open access journals from our data, there are a few subscription-only journals in our dataset that do not offer open access publishing options.
  • Crossref metadata does not distinguish between article types. We used an extended version of the Unpwaywall paratext recognition approach to exclude non-scientific content from our data. However, the annual publication volume of some journals may be inflated by proceedings supplements.
  • OpenAlex, from which we obtained our affiliation data, is a relatively new scholarly data source. There is ongoing activity to improve the parsing, normalisation, and disambiguation of institutions (see here). We used pattern matching to increase the recall of country affiliations.

To better understand the strengths and limitations of our data work, we commissioned a study from the Deutsche Zentrum für Hochschul- und Wissenschaftsforschung. The aim is to compare our open data with information contained in the proprietary bibliometrics database Scopus in relation to hybrid open access. Key findings will be made available.

Funding

HOAD is a scholarly communication analytics project funded by the German Research Foundation (DFG) to support the implementation of consortial transformative agreements in Germany. The work is carried out by the scholarly communication analytics team at the Göttingen State and University Library (SUB Göttingen).

Contribute

This dashboard has been developed in the open using open tools. There are a number of ways you can help make the dashboard better:

  • If you don’t understand something or find a bug, please let us know and submit an issue.
  • Do you have an idea of what meaningful data analysis is missing? Please let everyone know and submit an issue.
  • Want to provide more context about your agreement? Please email us!
  • Let us know if we can help you to provide a detailed view of your consortial transformative agreements. We are not limited to Germany and can extend our dashboards to other countries and consortia. Please email us!

Contributors

SUB Göttingen Team: Dr. Inke Achterberg, Nick Haupka, Dr. Anne Hobert, Najko Jahn, Dr. Birgit Schmidt.

License

This work is license under CC0.

Anyone is free to copy, modify, publish, use, compile, sell, or distribute these materials in any form, for any purpose, commercial or non-commercial, and by any means.

HOAD reflects our current understanding and is likely to change. We work hard to keep the dashboards and the underlying data up to date and accurate, but we do not accept any liability in relation to any errors or omissions.

This work re-used the following dataset:

Pollack, Philipp; Lindstrot, Barbara; Barbers, Irene, Stanzel, Franziska, 2022, “Open Access Monitor: Zeitschriftenlisten (V2)”, https://doi.org/10.26165/JUELICH-DATA/VTQXLM

published under CC BY 4.0.

Contact

Feel free to contact Najko Jahn .