In March 2024, Scopus announced changes to its open access tagging policy to better align with the Unpaywall definitions. In this blog post, I examine the impact of the policy change by comparing three Scopus snapshots, comprising around 20 million records. Although the overall share of open access did not change, the analysis found a decrease in the number of copies in repositories, affecting about 2 million items, that cannot be explained by the Unpaywall changes.
On 23 March 2024 Scopus announced changes to its open access (OA) tagging policy to better align with the definitions of Unpaywall, the OA evidence source that Scopus uses for its bibliometric database. According to the announcement, these changes affected approximately 2 million items relative to publisher-provide OA where the OA status has changed from bronze to hybrid or gold, or from hybrid to gold. Despite this, Scopus claims that no items previously tagged as OA lost any OA status tags, or that articles published in closed access wrongly received an OA tag.
In this blogpost, I investigate the extent of the OA tagging policy update using three different Scopus snapshots provided by the German Competence Network of Bibliometrics. Using a shared corpus of more than 19 million items, I contrasted the situations before and after the policy changes came into effect. Results confirm shifts between tags relative to publisher-provided OA. However, a total of 1,879,531 items lost evidence about green open access tags. Checking a sample against the Unpaywall API suggests that these changes were only made on the Scopus side.
To analyse the effects of open access tagging policy changes in Scopus, I retrieved 19,519,565 items indexed in Scopus that were published between 2019 and 2023. Data were obtained from the German Competence Network of Bibliometrics using the Scopus April 2024, July 2024 and October 2024 snapshots. The three snapshots were matched by Scopus item ID to build a shared corpus.
Data preparation also involved labelling the Scopus provided open access tags to allow for comparision across the snapshots.
Scopus provided open access tag | renamed value |
---|---|
publisherfullgold | gold |
publisherhybridgold | hybrid |
publisherfree2read | bronze |
repositoryvor | green (vor) |
repositoryam | green (am) |
NULL | none |
In some cases items did not have any open access tag assigned in one or several of the investigated snapshots.
The respective entries had NULL
values in the open access status column.
Those were renamed to none in order to keep the values during data transformation and analysis.
The differentiation between the two available green open access tags indicates the manuscript version published, where vor stands for version of record and am stands for accepted manuscript.
Overall, Scopus recorded open access tags for approximately 46% of the publications it indexed between 2019 and 2023. The following table shows that the number and proportion of open access is consistent across the three snapshots.
Snapshot | Records with OA tag | Share (in %) |
---|---|---|
April 24 | 8,975,035 | 45.98 |
July 24 | 8,959,215 | 45.90 |
October 24 | 8,943,508 | 45.82 |
Figure 1 illustrates the distribution of open access tags for each of the three snapshots. Item numbers with a bronze open access tag declined between the April and the July snapshots, while the number of items with a gold open access tag increased. These changes are in accordance with the Scopus announcement. However, the number of items tagged as green open access dropped between the April and July snapshots without this being offset by any other OA type.
To explore possible shifts between OA categories, Figure 2 shows the flows between snapshots, i.e. the number of tags assigned between snapshots. Like Unpaywall, Scopus assigns multiple open access tags to a single item in case more than one open access location could be found. For the 19,519,565 items analysed here, the number of open access tags per item varied between none and three; nodes titled Missing indicate items where this number of tags varied between snapshots for the same item.
There was a notable change relative to publisher-provided OA: 386,876 items with a bronze open access tag in the April snapshot were tagged with a gold open access tag in the July snapshot, which accounts for most of the bronze status changes. This suggests an improved identification of full OA journals.
But also substantial changes regarding repository-provided OA can be observed: Comparing the April and July snapshots (see green highlighted flows on the left in Figure 2), a total of 1,329,606 items with one of the green open access tags in the April snapshot lost this status in the July snapshot. This number decreases to 549,925 when comparing the July and October snapshots (see green highlighted flows on the right in Figure 2).
Although items did not lost their open access status through the changes of the tagging policy, my analysis found a decrease of evidence for copies in repositories as indicate by the tags green (am)
and green (vor)
.
To better understand this difference, I took two samples of 10,000 DOIs representing items loosing green open access status between the April and July or July and October snapshots and queried the Unpaywall API to retrieve open access status information for these DOIs.
Between the April and July snapshots, Unpaywall recorded a green open access version for 85%. Furthermore, Unpaywall assigned green as primary open access status to 3.4%. Comparing the July and October snapshots samples shows a similar result: Unpaywall found a repository copy for 85%. Here, Unpaywall assigned green as primary open access status to 2.5% of the investigated items.
Although the number of articles for which Unpaywall identified a repository copy decreased slightly between the two comparisons, in most cases Unpaywall tracked a copy in a repository. The analysis suggests that the underlying reason for the absence of green OA tags in Scopus cannot be explained by changes in the Unpaywall data.
Furthermore, after manually checking the raw data used to populate the bibliometrics database of the German Competence Network of Bibliometrics and the Scopus online database, I was unable to detect any errors that could explain the decline in green open access evidence in Scopus.
Recent changes to the Scopus open access tagging policy are not fully consistent with Scopus documentation. Examining a shared corpus of around 20 million records representing journal articles between 2019 and 2023, the results suggest that green open access evidence was removed from Scopus after the policy changes were introduced, affecting around 2 million records. This decline cannot be explained by changes to Unpaywall, the source of open access evidence used by Scopus. Although the overall share of open access remained constant over the three snapshots examined, analyses of open access that include green open access need to take these changes into account.
The code used for data preparation, analysis and visualisation is available on GitHub.
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/subugoe/scholcomm_analytics, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Dörner (2024, Dec. 16). Scholarly Communication Analytics: Changes in evidence for green open access in Scopus. Retrieved from https://subugoe.github.io/scholcomm_analytics/posts/scopus_oa_tagging_changes/
BibTeX citation
@misc{dörner2024changes, author = {Dörner, Sophia}, title = {Scholarly Communication Analytics: Changes in evidence for green open access in Scopus}, url = {https://subugoe.github.io/scholcomm_analytics/posts/scopus_oa_tagging_changes/}, year = {2024} }