Changes in evidence for green open access in Scopus

In March 2024, Scopus announced changes to its open access tagging policy to better align with the Unpaywall definitions. In this blog post, I examine the impact of the policy change by comparing three Scopus snapshots, comprising around 20 million records. Although the overall share of open access did not change, the analysis found a decrease in the number of copies in repositories, affecting about 2 million items, that cannot be explained by the Unpaywall changes.

Sophia Dörner (Göttingen State and University Library)https://www.sub.uni-goettingen.de/
2024-12-16

On 23 March 2024 Scopus announced changes to its open access (OA) tagging policy to better align with the definitions of Unpaywall, the OA evidence source that Scopus uses for its bibliometric database. According to the announcement, these changes affected approximately 2 million items relative to publisher-provide OA where the OA status has changed from bronze to hybrid or gold, or from hybrid to gold. Despite this, Scopus claims that no items previously tagged as OA lost any OA status tags, or that articles published in closed access wrongly received an OA tag.

In this blogpost, I investigate the extent of the OA tagging policy update using three different Scopus snapshots provided by the German Competence Network of Bibliometrics. Using a shared corpus of more than 19 million items, I contrasted the situations before and after the policy changes came into effect. Results confirm shifts between tags relative to publisher-provided OA. However, a total of 1,879,531 items lost evidence about green open access tags. Checking a sample against the Unpaywall API suggests that these changes were only made on the Scopus side.

Data and Method

To analyse the effects of open access tagging policy changes in Scopus, I retrieved 19,519,565 items indexed in Scopus that were published between 2019 and 2023. Data were obtained from the German Competence Network of Bibliometrics using the Scopus April 2024, July 2024 and October 2024 snapshots. The three snapshots were matched by Scopus item ID to build a shared corpus.

Data preparation also involved labelling the Scopus provided open access tags to allow for comparision across the snapshots.

Scopus provided open access tag renamed value
publisherfullgold gold
publisherhybridgold hybrid
publisherfree2read bronze
repositoryvor green (vor)
repositoryam green (am)
NULL none

In some cases items did not have any open access tag assigned in one or several of the investigated snapshots. The respective entries had NULL values in the open access status column. Those were renamed to none in order to keep the values during data transformation and analysis. The differentiation between the two available green open access tags indicates the manuscript version published, where vor stands for version of record and am stands for accepted manuscript.

Results

Overall, Scopus recorded open access tags for approximately 46% of the publications it indexed between 2019 and 2023. The following table shows that the number and proportion of open access is consistent across the three snapshots.

Table 1: Scopus records representing journal articles published between 2019 and 2023 with open access evidence across three different database snapshots.
Snapshot Records with OA tag Share (in %)
April 24 8,975,035 45.98
July 24 8,959,215 45.90
October 24 8,943,508 45.82

Figure 1 illustrates the distribution of open access tags for each of the three snapshots. Item numbers with a bronze open access tag declined between the April and the July snapshots, while the number of items with a gold open access tag increased. These changes are in accordance with the Scopus announcement. However, the number of items tagged as green open access dropped between the April and July snapshots without this being offset by any other OA type.

Figure 1: Open access tag distribution in descending order.

To explore possible shifts between OA categories, Figure 2 shows the flows between snapshots, i.e. the number of tags assigned between snapshots. Like Unpaywall, Scopus assigns multiple open access tags to a single item in case more than one open access location could be found. For the 19,519,565 items analysed here, the number of open access tags per item varied between none and three; nodes titled Missing indicate items where this number of tags varied between snapshots for the same item.

Figure 2: Open Access Tag Comparison of Scopus April 24, July 24 and October 24 snapshots

There was a notable change relative to publisher-provided OA: 386,876 items with a bronze open access tag in the April snapshot were tagged with a gold open access tag in the July snapshot, which accounts for most of the bronze status changes. This suggests an improved identification of full OA journals.

But also substantial changes regarding repository-provided OA can be observed: Comparing the April and July snapshots (see green highlighted flows on the left in Figure 2), a total of 1,329,606 items with one of the green open access tags in the April snapshot lost this status in the July snapshot. This number decreases to 549,925 when comparing the July and October snapshots (see green highlighted flows on the right in Figure 2).

Although items did not lost their open access status through the changes of the tagging policy, my analysis found a decrease of evidence for copies in repositories as indicate by the tags green (am) and green (vor).

To better understand this difference, I took two samples of 10,000 DOIs representing items loosing green open access status between the April and July or July and October snapshots and queried the Unpaywall API to retrieve open access status information for these DOIs.

Between the April and July snapshots, Unpaywall recorded a green open access version for 85%. Furthermore, Unpaywall assigned green as primary open access status to 3.4%. Comparing the July and October snapshots samples shows a similar result: Unpaywall found a repository copy for 85%. Here, Unpaywall assigned green as primary open access status to 2.5% of the investigated items.

Although the number of articles for which Unpaywall identified a repository copy decreased slightly between the two comparisons, in most cases Unpaywall tracked a copy in a repository. The analysis suggests that the underlying reason for the absence of green OA tags in Scopus cannot be explained by changes in the Unpaywall data.

Furthermore, after manually checking the raw data used to populate the bibliometrics database of the German Competence Network of Bibliometrics and the Scopus online database, I was unable to detect any errors that could explain the decline in green open access evidence in Scopus.

Discussion

Recent changes to the Scopus open access tagging policy are not fully consistent with Scopus documentation. Examining a shared corpus of around 20 million records representing journal articles between 2019 and 2023, the results suggest that green open access evidence was removed from Scopus after the policy changes were introduced, affecting around 2 million records. This decline cannot be explained by changes to Unpaywall, the source of open access evidence used by Scopus. Although the overall share of open access remained constant over the three snapshots examined, analyses of open access that include green open access need to take these changes into account.

Code availability

The code used for data preparation, analysis and visualisation is available on GitHub.

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/subugoe/scholcomm_analytics, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Dörner (2024, Dec. 16). Scholarly Communication Analytics: Changes in evidence for green open access in Scopus. Retrieved from https://subugoe.github.io/scholcomm_analytics/posts/scopus_oa_tagging_changes/

BibTeX citation

@misc{dörner2024changes,
  author = {Dörner, Sophia},
  title = {Scholarly Communication Analytics: Changes in evidence for green open access in Scopus},
  url = {https://subugoe.github.io/scholcomm_analytics/posts/scopus_oa_tagging_changes/},
  year = {2024}
}