Background

Rich metadata ensure that open access (OA) articles can be found and reused. To manage OA publication fees and negotiate transformative agreements, it is therefore crucial to monitor metadata compliance (Borrego et al., 2020; Geschuhn & Stone, 2017; Marques et al., 2019).

metacheck lets you automatically check the quality of publisher-provided metadata for OA journal articles. The tool mainly targets libraries and consortia that manage OA publishing funds and negotiate transformative agreements.

metacheck focuses on DOIs from the registration agency Crossref. Crossref is used by all major publishers (Hendricks et al., 2020). As a consequence, Crossref is a key data source for leading open access discovery services like Unpaywall (Piwowar et al., 2018) and open access transparency initiatives like Open APC (Jahn & Tullney, 2016; Pieper & Broschinski, 2018).

metacheck uses Crossref Metadata Plus. A service-level agreement ensures performant access to the Crossref REST API.

For more details on compliance checking of licensing information in Crossref in the context of OA transformative agreements, see Voigt (2020).

Example

Pretest

You have provided 284 DOIs.

We first check whether these DOIs could have compliant open access metadata. DOIs that do not meet one or more of following criteria are excluded from the below analysis.

  • 284 (100%) thereof fulfill the criterion not_missing: DOIs must not be missing values. (0 dropped.)
  • 284 (100%) thereof fulfill the criterion unique: DOIs must be unique. (0 dropped.)
  • 284 (100%) thereof fulfill the criterion within_limits: Number of DOIs must be within the allowed limit. (0 dropped.)
  • 284 (100%) thereof fulfill the criterion doi_org_found: DOIs must be resolveable on DOI.org. (0 dropped.)
  • 284 (100%) thereof fulfill the criterion from_cr: DOIs must have been registered by the Crossref registration agency (RA). (0 dropped.)
  • 284 (100%) thereof fulfill the criterion cr_md: DOIs must have metadata on Crossref. (0 dropped.)
  • 284 (100%) thereof fulfill the criterion article: DOIs must resolve to a journal article. (0 dropped.)

Test Results

The following analysis covers only the remaining 284 DOIs.

Overview

The below table summarises the test results.

Article Share
CC License 284 100%  
Compliant CC 213 75%  
TDM Support 133 47%  
Funder info 188 66%  
ORCID 200 70%  
Open Abstracts 231 81%  
Open Citations 275 97%  

The table shows absolute counts and relative shares.

Tests include:

  • CC License: Availability of Creative Commons (CC) licenses
  • Compliant CC: Availability of compliant Creative Commons (CC) licenses
  • TDM Support: Support for text and data mining (TDM)
  • Funder info: Information regarding bodies funding the publication
  • ORCID: Open Researcher and Contributor ID for authors
  • Open Abstracts: Open Abstracts metadata
  • Open Citations: Open Citations metadata

Below are the results for each test in greater detail.

Creative-Commons Licenses

Open content licenses are essential to govern access and re-use to open access journal articles.

metacheck normalizes Creative Commons licenses and maps them to the different variants in use such as CC BY. We also check if license statements are compliant.

Used CC Licenses

The below table shows the number and percentage by Creative Commons variant.

Article Share
by 253 89%  
by-nc 25 9%  
by-nc-nd 4 1%  
by-sa 2 1%  

Many research funders recommend CC-BY, in particular those supporting the Plan S Principles. NA indicates articles where no licence metadata was found.

Compliance of CC Licensing Information

The below table displays the number and percentage of articles with compliant license metadata.

Article Share
All fine! 213 75%  
Difference between publication date and the CC license's start_date suggests delayed OA provision 65 23%  
No Creative Commons license metadata found for version of record 6 2%  

We consider license metadata as compliant, if and only if:

  • the license URL represents a Creative Commons license URL,
  • the Creative Commons license refers to the publisher version, the so-called “version-of-record”, and
  • the Creative Commons license came into effect at the date of publication (exclusion of delayed OA).

We noted that while more and more publishers do provide Creative Commons license metadata through Crossref, some metadata is still incomplete. This can negatively affect the discovery and re-use of open access articles. If you observe license metadata issues, we strongly encourage you to contact the publisher

Text and Data Mining (TDM)

The below table summarises which (MIME Type) file formats are supported for text and data mining (TDM).

Article Share
application/pdf 130 46%  
application/xml 14 5%  
text/html 115 40%  
text/plain 3 1%  
text/xml 3 1%  
No TDM links for version-of-records 151 53%  

Crossref metadata can include more than one format for each DOI. Therefore, percentages can add to more than 100%.

We recommend that publishers do not only provide PDF files for TDM purposes, but also XML to promote automated re-use.

Funding Information

The funding context of research articles becomes more and more important in open access monitoring. A growing number of publishers use Crossref to share such information.

The below table displays the share of DOIs with such funding information, along with the three most frequent funders among the DOIs submitted. Crossref metadata can include more than one funder for each DOI.

Article Share
Deutsche Forschungsgemeinschaft 94 33%  
Technische Universität Berlin 30 11%  
Bundesministerium für Bildung und Forschung 12 4%  
Other funders 100 35%  
No funding info 96 34%  

Here again, Crossref metadata can include more than one funder for each DOI, and the percentages can add to more than 100%.

Please note that due to the (still) limited coverage of Crossref, results may not be suitable to comprehensively analyse funding contexts of a publication.

References

Borrego, Á., Anglada, L., & Abadal, E. (2020). Transformative agreements: Do they pave the way to open access? Learned Publishing, 34(2), 216–232. https://doi.org/10.1002/leap.1347

Geschuhn, K., & Stone, G. (2017). It’s the workflows, stupid! What is required to make “offsetting” work for the open access transition. Insights the UKSG Journal, 30(3), 103–114. https://doi.org/10.1629/uksg.391

Hendricks, G., Tkaczyk, D., Lin, J., & Feeney, P. (2020). Crossref: The sustainable source of community-owned scholarly metadata. Quantitative Science Studies, 1(1), 414–427. https://doi.org/10.1162/qss_a_00022

Jahn, N., & Tullney, M. (2016). A study of institutional spending on open access publication fees in Germany. PeerJ, 4, e2323. https://doi.org/10.7717/peerj.2323

Marques, M., Woutersen-Windhouwer, S., & Tuuliniemi, A. (2019). Monitoring agreements with open access elements: Why article-level metadata are important. Insights the UKSG Journal, 32. https://doi.org/10.1629/uksg.489

Pieper, D., & Broschinski, C. (2018). OpenAPC: A contribution to a transparent and reproducible monitoring of fee-based open access publishing across institutions and nations. Insights the UKSG Journal, 31. https://doi.org/10.1629/uksg.439

Piwowar, H., Priem, J., Larivière, V., Alperin, J. P., Matthias, L., Norlander, B., Farley, A., West, J., & Haustein, S. (2018). The state of OA: A large-scale analysis of the prevalence and impact of open access articles. PeerJ, 6, e4375. https://doi.org/10.7717/peerj.4375

Voigt, M. (2020). DEAL Open-Access-Option optimal nutzen: Ein Bibliothekspraxisbericht. LIBREAS. Library Ideas, 38. https://libreas.eu/ausgabe38/voigt/