Decreasing affiliation metadata coverage in OpenAlex

Author
Affiliation

Najko Jahn

Published

December 15, 2025

Doi
Abstract
This blog post examines the decrease in affiliation metadata coverage in OpenAlex. An analysis of over 13 million articles published by major commercial publishers between 2018 and 2025 suggests that this is probably because publishers have not provided Crossref with sufficient affiliation metadata. However, technical issues also seem to have occurred during the transition to Walden, the new version of OpenAlex. Elsevier, the largest publisher, stands out in particular.

Introduction

Following the release of a new version of OpenAlex called ‘Walden’ in November 2025, the community reported a decrease in affiliation metadata coverage. While investigating country affiliations for articles published in journals included in transformative agreements to compile data for the OA2020 WG Financial flows and cost modelling for the open access transition, I also observed records with missing affiliation metadata, even though OpenAlex provided author information.

This observation is particularly concerning given SUB Göttingen’s work with the Hybrid Open Access Dashboard, where I found that OpenAlex affiliation metadata is comparable with Scopus and Web of Science for monitoring open access uptake in hybrid journals (Jahn 2025). Related studies (Alperin et al. 2024; van Eck, Waltman, and Neijssel 2024) have similarly shown that country affiliation coverage in OpenAlex is on par with that in proprietary bibliometric databases, while an increasing number of bibliometric rankings, institutions and transformative agreement monitoring activities (de Jonge, Kramer, and Sondervan 2025) make use of OpenAlex. A decrease in country affiliations would therefore not only make figures about who is publishing in hybrid journals less reliable, but affect all studies using affiliation metadata from OpenAlex including institutional monitoring exercises.

This blog post quantifies this data issue of lacking affiliation metadata in OpenAlex using a set of journals from major commercial publishers that are part of current transformative agreements of the Swedish BIBSAM consortium. Overall, investigating more than 13 million records representing articles published between 2018 and 2025 in these journals confirms this decrease, which is partly attributable to some publishers not sharing affiliation metadata via Crossref, including Elsevier and Springer Nature, as well as potential technical issues encountered by OpenAlex.

Data and methods

To better understand reports about declining affiliation metadata coverage in OpenAlex, and relating this to journal portfolios from major commercial publishers, the current journal portfolio of the Swedish BIBSAM consortium was examined based on journal data retrieved from the cOAlition S Journal Checker Tool as of 8 December 2025.

Affiliation coverage was compared across two OpenAlex sets: September 2025 snapshot and the most recent snapshot, to investigate whether the decrease is due to issues associated with the recent Walden release.

Both snapshots are publicly accessible via the SUB Göttingen Open Scholarly Data warehouse based on Google BigQuery. The following SQL query shows how the data was retrieved.

SQL code
CREATE OR REPLACE TABLE `subugoe-collaborative.resources.au_data_check`
AS (
  SELECT DISTINCT
    oalex.doi,
    oalex.publication_year,
    oalex.primary_location.source.issn_l AS issn_l,
    jct_jn.esac_id AS esac_id,
    esac.publisher AS esac_publisher,
    oalex.primary_location.source.host_organization_name AS oalex_publisher,
    oalex.primary_location.source.is_in_doaj AS is_in_doaj,
    oalex.open_access.oa_status AS oa_status,
    au.author_position,
    CASE
      WHEN au.author IS NULL THEN 'Empty'
      ELSE 'Filled'
      END AS author_status,
    CASE
      WHEN ARRAY_LENGTH(au.institutions) = 0 THEN 'Empty'
      ELSE 'Filled'
      END AS institutions_status,
    CASE
      WHEN ARRAY_LENGTH(au.countries) = 0 THEN 'Empty'
      ELSE 'Filled'
      END AS countries_status,
    CASE
      WHEN au.is_corresponding IS NULL THEN 'Empty'
      ELSE 'Filled'
      END AS is_corresponding_status,
    CASE
      WHEN au.raw_author_name IS NULL THEN 'Empty'
      ELSE 'Filled'
      END AS raw_author_name_status,
    CASE
      WHEN ARRAY_LENGTH(au.raw_affiliation_strings) = 0 THEN 'Empty'
      ELSE 'Filled'
      END AS raw_affiliation_strings_status,
    CASE
      WHEN ARRAY_LENGTH(au.affiliations) = 0 THEN 'Empty'
      ELSE 'Filled'
      END AS affiliations_status,
    walden_au.author_position AS walden_author_position,
    CASE
      WHEN walden_au.author IS NULL THEN 'Empty'
      ELSE 'Filled'
      END AS walden_author_status,
    CASE
      WHEN ARRAY_LENGTH(walden_au.institutions) = 0 THEN 'Empty'
      ELSE 'Filled'
      END AS walden_institutions_status,
    CASE
      WHEN ARRAY_LENGTH(walden_au.countries) = 0 THEN 'Empty'
      ELSE 'Filled'
      END AS walden_countries_status,
    CASE
      WHEN walden_au.is_corresponding IS NULL THEN 'Empty'
      ELSE 'Filled'
      END AS walden_is_corresponding_status,
    CASE
      WHEN walden_au.raw_author_name IS NULL THEN 'Empty'
      ELSE 'Filled'
      END AS walden_raw_author_name_status,
    CASE
      WHEN ARRAY_LENGTH(walden_au.raw_affiliation_strings) = 0 THEN 'Empty'
      ELSE 'Filled'
      END AS walden_raw_affiliation_strings_status,
    CASE
      WHEN ARRAY_LENGTH(walden_au.affiliations) = 0 THEN 'Empty'
      ELSE 'Filled'
      END AS walden_affiliations_status
  FROM `subugoe-collaborative.openalex.works` AS oalex
  LEFT JOIN UNNEST(oalex.authorships) AS au
  LEFT JOIN UNNEST(au.countries) AS country
  INNER JOIN `subugoe-collaborative.openalex_walden.works` AS walden
    ON oalex.doi = walden.doi
  LEFT JOIN UNNEST(walden.authorships) AS walden_au
  INNER JOIN `subugoe-collaborative.oa2020.jct_jn` AS jct_jn
    ON oalex.primary_location.source.issn_l = jct_jn.issn_l
  INNER JOIN `subugoe-collaborative.oa2020.esac` AS esac
    ON jct_jn.esac_id = esac.id
  INNER JOIN
    `subugoe-collaborative.resources.document_classification_september25`
      AS doctype_classifier
    ON oalex.doi = doctype_classifier.doi
  WHERE
    (
      esac.organization = "Bibsam Consortium"
      AND end_date > '2024-12-31 00:00:00 UTC')
    AND oalex.is_paratext = FALSE
    AND is_research = TRUE
    AND (
      NOT REGEXP_CONTAINS(oalex.biblio.issue, '^[a-zA-Z]')
      OR oalex.biblio.issue IS NULL)
    AND (NOT (REGEXP_CONTAINS(oalex.title, '[0-9]{3} pp.')))
    AND oalex.publication_year BETWEEN 2018 AND 2025
)

To focus on original articles and reviews, paratext were excluded and results from the document type classifier provided by the German Competence Network for Bibliometrics (Haupka 2025) was used to refine the results. In order to check for potential data loading issues in Google BigQuery, a random sample of records was compared with the raw data dump files.

The resulting dataset can be found on Google BigQuery. After compiling the article-level data, absolute and relative coverage of author and affiliation metadata was calculated. Any data point that was not empty was counted as available.

SQL code
SELECT 
  publication_year,
  COUNT(DISTINCT doi) AS total_dois,
  
  COUNT(DISTINCT CASE WHEN author_status = 'Filled' THEN doi END) AS with_author,
  COUNT(DISTINCT CASE WHEN institutions_status = 'Filled' THEN doi END) AS with_institutions,
  COUNT(DISTINCT CASE WHEN countries_status = 'Filled' THEN doi END) AS with_countries,
  COUNT(DISTINCT CASE WHEN is_corresponding_status = 'Filled' THEN doi END) AS with_is_corresponding,
  COUNT(DISTINCT CASE WHEN raw_author_name_status = 'Filled' THEN doi END) AS with_raw_author_name,
  COUNT(DISTINCT CASE WHEN raw_affiliation_strings_status = 'Filled' THEN doi END) AS with_raw_affiliation_strings,
  COUNT(DISTINCT CASE WHEN affiliations_status = 'Filled' THEN doi END) AS with_affiliations,
  
  COUNT(DISTINCT CASE WHEN walden_author_status = 'Filled' THEN doi END) AS with_walden_author,
  COUNT(DISTINCT CASE WHEN walden_institutions_status = 'Filled' THEN doi END) AS with_walden_institutions,
  COUNT(DISTINCT CASE WHEN walden_countries_status = 'Filled' THEN doi END) AS with_walden_countries,
  COUNT(DISTINCT CASE WHEN walden_is_corresponding_status = 'Filled' THEN doi END) AS with_walden_is_corresponding,
  COUNT(DISTINCT CASE WHEN walden_raw_author_name_status = 'Filled' THEN doi END) AS with_walden_raw_author_name,
  COUNT(DISTINCT CASE WHEN walden_raw_affiliation_strings_status = 'Filled' THEN doi END) AS with_walden_raw_affiliation_strings,
  COUNT(DISTINCT CASE WHEN walden_affiliations_status = 'Filled' THEN doi END) AS with_walden_affiliations

FROM `subugoe-collaborative.resources.au_data_check`
GROUP BY publication_year
ORDER BY publication_year

WITH doi_level_data AS (
  SELECT 
    doi,
    publication_year,
    MAX(CASE WHEN author_position IS NOT NULL THEN 1 ELSE 0 END) AS has_author_position,
    MAX(CASE WHEN author_status = 'Filled' THEN 1 ELSE 0 END) AS has_author,
    MAX(CASE WHEN institutions_status = 'Filled' THEN 1 ELSE 0 END) AS has_institutions,
    MAX(CASE WHEN countries_status = 'Filled' THEN 1 ELSE 0 END) AS has_countries,
    MAX(CASE WHEN is_corresponding_status = 'Filled' THEN 1 ELSE 0 END) AS has_is_corresponding,
    MAX(CASE WHEN raw_author_name_status = 'Filled' THEN 1 ELSE 0 END) AS has_raw_author_name,
    MAX(CASE WHEN raw_affiliation_strings_status = 'Filled' THEN 1 ELSE 0 END) AS has_raw_affiliation_strings,
    MAX(CASE WHEN affiliations_status = 'Filled' THEN 1 ELSE 0 END) AS has_affiliations,
    MAX(CASE WHEN walden_author_position IS NOT NULL THEN 1 ELSE 0 END) AS has_walden_author_position,
    MAX(CASE WHEN walden_author_status = 'Filled' THEN 1 ELSE 0 END) AS has_walden_author,
    MAX(CASE WHEN walden_institutions_status = 'Filled' THEN 1 ELSE 0 END) AS has_walden_institutions,
    MAX(CASE WHEN walden_countries_status = 'Filled' THEN 1 ELSE 0 END) AS has_walden_countries,
    MAX(CASE WHEN walden_is_corresponding_status = 'Filled' THEN 1 ELSE 0 END) AS has_walden_is_corresponding,
    MAX(CASE WHEN walden_raw_author_name_status = 'Filled' THEN 1 ELSE 0 END) AS has_walden_raw_author_name,
    MAX(CASE WHEN walden_raw_affiliation_strings_status = 'Filled' THEN 1 ELSE 0 END) AS has_walden_raw_affiliation_strings,
    MAX(CASE WHEN walden_affiliations_status = 'Filled' THEN 1 ELSE 0 END) AS has_walden_affiliations
  FROM `subugoe-collaborative.resources.au_data_check`
  GROUP BY doi, publication_year
)

SELECT 
  publication_year,
  COUNT(DISTINCT doi) AS total_dois,
  ROUND(100.0 * COUNT(DISTINCT CASE WHEN has_author_position = 1 THEN doi END) / COUNT(DISTINCT doi), 2) AS author_position_pct,
  ROUND(100.0 * COUNT(DISTINCT CASE WHEN has_author = 1 THEN doi END) / COUNT(DISTINCT doi), 2) AS author_pct,
  ROUND(100.0 * COUNT(DISTINCT CASE WHEN has_institutions = 1 THEN doi END) / COUNT(DISTINCT doi), 2) AS institutions_pct,
  ROUND(100.0 * COUNT(DISTINCT CASE WHEN has_countries = 1 THEN doi END) / COUNT(DISTINCT doi), 2) AS countries_pct,
  ROUND(100.0 * COUNT(DISTINCT CASE WHEN has_is_corresponding = 1 THEN doi END) / COUNT(DISTINCT doi), 2) AS is_corresponding_pct,
  ROUND(100.0 * COUNT(DISTINCT CASE WHEN has_raw_author_name = 1 THEN doi END) / COUNT(DISTINCT doi), 2) AS raw_author_name_pct,
  ROUND(100.0 * COUNT(DISTINCT CASE WHEN has_raw_affiliation_strings = 1 THEN doi END) / COUNT(DISTINCT doi), 2) AS raw_affiliation_strings_pct,
  ROUND(100.0 * COUNT(DISTINCT CASE WHEN has_affiliations = 1 THEN doi END) / COUNT(DISTINCT doi), 2) AS affiliations_pct,
  ROUND(100.0 * COUNT(DISTINCT CASE WHEN has_walden_author_position = 1 THEN doi END) / COUNT(DISTINCT doi), 2) AS walden_author_position_pct,
  ROUND(100.0 * COUNT(DISTINCT CASE WHEN has_walden_author = 1 THEN doi END) / COUNT(DISTINCT doi), 2) AS walden_author_pct,
  ROUND(100.0 * COUNT(DISTINCT CASE WHEN has_walden_institutions = 1 THEN doi END) / COUNT(DISTINCT doi), 2) AS walden_institutions_pct,
  ROUND(100.0 * COUNT(DISTINCT CASE WHEN has_walden_countries = 1 THEN doi END) / COUNT(DISTINCT doi), 2) AS walden_countries_pct,
  ROUND(100.0 * COUNT(DISTINCT CASE WHEN has_walden_is_corresponding = 1 THEN doi END) / COUNT(DISTINCT doi), 2) AS walden_is_corresponding_pct,
  ROUND(100.0 * COUNT(DISTINCT CASE WHEN has_walden_raw_author_name = 1 THEN doi END) / COUNT(DISTINCT doi), 2) AS walden_raw_author_name_pct,
  ROUND(100.0 * COUNT(DISTINCT CASE WHEN has_walden_raw_affiliation_strings = 1 THEN doi END) / COUNT(DISTINCT doi), 2) AS walden_raw_affiliation_strings_pct,
  ROUND(100.0 * COUNT(DISTINCT CASE WHEN has_walden_affiliations = 1 THEN doi END) / COUNT(DISTINCT doi), 2) AS walden_affiliations_pct
FROM doi_level_data
GROUP BY publication_year
ORDER BY publication_year

SELECT 
  publication_year,
  esac_publisher,
  COUNT(DISTINCT doi) AS total_dois,
  
  COUNT(DISTINCT CASE WHEN author_status = 'Filled' THEN doi END) AS with_author,
  COUNT(DISTINCT CASE WHEN institutions_status = 'Filled' THEN doi END) AS with_institutions,
  COUNT(DISTINCT CASE WHEN countries_status = 'Filled' THEN doi END) AS with_countries,
  COUNT(DISTINCT CASE WHEN is_corresponding_status = 'Filled' THEN doi END) AS with_is_corresponding,
  COUNT(DISTINCT CASE WHEN raw_author_name_status = 'Filled' THEN doi END) AS with_raw_author_name,
  COUNT(DISTINCT CASE WHEN raw_affiliation_strings_status = 'Filled' THEN doi END) AS with_raw_affiliation_strings,
  COUNT(DISTINCT CASE WHEN affiliations_status = 'Filled' THEN doi END) AS with_affiliations,
  
  COUNT(DISTINCT CASE WHEN walden_author_status = 'Filled' THEN doi END) AS with_walden_author,
  COUNT(DISTINCT CASE WHEN walden_institutions_status = 'Filled' THEN doi END) AS with_walden_institutions,
  COUNT(DISTINCT CASE WHEN walden_countries_status = 'Filled' THEN doi END) AS with_walden_countries,
  COUNT(DISTINCT CASE WHEN walden_is_corresponding_status = 'Filled' THEN doi END) AS with_walden_is_corresponding,
  COUNT(DISTINCT CASE WHEN walden_raw_author_name_status = 'Filled' THEN doi END) AS with_walden_raw_author_name,
  COUNT(DISTINCT CASE WHEN walden_raw_affiliation_strings_status = 'Filled' THEN doi END) AS with_walden_raw_affiliation_strings,
  COUNT(DISTINCT CASE WHEN walden_affiliations_status = 'Filled' THEN doi END) AS with_walden_affiliations

FROM `subugoe-collaborative.resources.au_data_check`
GROUP BY publication_year, esac_publisher
ORDER BY publication_year, esac_publisher
WITH doi_level_data AS (
  SELECT 
    doi,
    publication_year,
    esac_publisher,
    MAX(CASE WHEN author_position IS NOT NULL THEN 1 ELSE 0 END) AS has_author_position,
    MAX(CASE WHEN author_status = 'Filled' THEN 1 ELSE 0 END) AS has_author,
    MAX(CASE WHEN institutions_status = 'Filled' THEN 1 ELSE 0 END) AS has_institutions,
    MAX(CASE WHEN countries_status = 'Filled' THEN 1 ELSE 0 END) AS has_countries,
    MAX(CASE WHEN is_corresponding_status = 'Filled' THEN 1 ELSE 0 END) AS has_is_corresponding,
    MAX(CASE WHEN raw_author_name_status = 'Filled' THEN 1 ELSE 0 END) AS has_raw_author_name,
    MAX(CASE WHEN raw_affiliation_strings_status = 'Filled' THEN 1 ELSE 0 END) AS has_raw_affiliation_strings,
    MAX(CASE WHEN affiliations_status = 'Filled' THEN 1 ELSE 0 END) AS has_affiliations,
    MAX(CASE WHEN walden_author_position IS NOT NULL THEN 1 ELSE 0 END) AS has_walden_author_position,
    MAX(CASE WHEN walden_author_status = 'Filled' THEN 1 ELSE 0 END) AS has_walden_author,
    MAX(CASE WHEN walden_institutions_status = 'Filled' THEN 1 ELSE 0 END) AS has_walden_institutions,
    MAX(CASE WHEN walden_countries_status = 'Filled' THEN 1 ELSE 0 END) AS has_walden_countries,
    MAX(CASE WHEN walden_is_corresponding_status = 'Filled' THEN 1 ELSE 0 END) AS has_walden_is_corresponding,
    MAX(CASE WHEN walden_raw_author_name_status = 'Filled' THEN 1 ELSE 0 END) AS has_walden_raw_author_name,
    MAX(CASE WHEN walden_raw_affiliation_strings_status = 'Filled' THEN 1 ELSE 0 END) AS has_walden_raw_affiliation_strings,
    MAX(CASE WHEN walden_affiliations_status = 'Filled' THEN 1 ELSE 0 END) AS has_walden_affiliations
  FROM `subugoe-collaborative.resources.au_data_check`
  GROUP BY doi, publication_year, esac_publisher
)

SELECT 
  publication_year,
  esac_publisher,
  COUNT(DISTINCT doi) AS total_dois,
  ROUND(100.0 * COUNT(DISTINCT CASE WHEN has_author_position = 1 THEN doi END) / COUNT(DISTINCT doi), 2) AS author_position_pct,
  ROUND(100.0 * COUNT(DISTINCT CASE WHEN has_author = 1 THEN doi END) / COUNT(DISTINCT doi), 2) AS author_pct,
  ROUND(100.0 * COUNT(DISTINCT CASE WHEN has_institutions = 1 THEN doi END) / COUNT(DISTINCT doi), 2) AS institutions_pct,
  ROUND(100.0 * COUNT(DISTINCT CASE WHEN has_countries = 1 THEN doi END) / COUNT(DISTINCT doi), 2) AS countries_pct,
  ROUND(100.0 * COUNT(DISTINCT CASE WHEN has_is_corresponding = 1 THEN doi END) / COUNT(DISTINCT doi), 2) AS is_corresponding_pct,
  ROUND(100.0 * COUNT(DISTINCT CASE WHEN has_raw_author_name = 1 THEN doi END) / COUNT(DISTINCT doi), 2) AS raw_author_name_pct,
  ROUND(100.0 * COUNT(DISTINCT CASE WHEN has_raw_affiliation_strings = 1 THEN doi END) / COUNT(DISTINCT doi), 2) AS raw_affiliation_strings_pct,
  ROUND(100.0 * COUNT(DISTINCT CASE WHEN has_affiliations = 1 THEN doi END) / COUNT(DISTINCT doi), 2) AS affiliations_pct,
  ROUND(100.0 * COUNT(DISTINCT CASE WHEN has_walden_author_position = 1 THEN doi END) / COUNT(DISTINCT doi), 2) AS walden_author_position_pct,
  ROUND(100.0 * COUNT(DISTINCT CASE WHEN has_walden_author = 1 THEN doi END) / COUNT(DISTINCT doi), 2) AS walden_author_pct,
  ROUND(100.0 * COUNT(DISTINCT CASE WHEN has_walden_institutions = 1 THEN doi END) / COUNT(DISTINCT doi), 2) AS walden_institutions_pct,
  ROUND(100.0 * COUNT(DISTINCT CASE WHEN has_walden_countries = 1 THEN doi END) / COUNT(DISTINCT doi), 2) AS walden_countries_pct,
  ROUND(100.0 * COUNT(DISTINCT CASE WHEN has_walden_is_corresponding = 1 THEN doi END) / COUNT(DISTINCT doi), 2) AS walden_is_corresponding_pct,
  ROUND(100.0 * COUNT(DISTINCT CASE WHEN has_walden_raw_author_name = 1 THEN doi END) / COUNT(DISTINCT doi), 2) AS walden_raw_author_name_pct,
  ROUND(100.0 * COUNT(DISTINCT CASE WHEN has_walden_raw_affiliation_strings = 1 THEN doi END) / COUNT(DISTINCT doi), 2) AS walden_raw_affiliation_strings_pct,
  ROUND(100.0 * COUNT(DISTINCT CASE WHEN has_walden_affiliations = 1 THEN doi END) / COUNT(DISTINCT doi), 2) AS walden_affiliations_pct
FROM doi_level_data
GROUP BY esac_publisher, publication_year
ORDER BY esac_publisher, publication_year

Summary tables were created for the entire dataset and by publisher journal portfolio for subsequent analyses. The resulting summary tables can be downloaded here:

All:

By publisher

Results

Overall, this investigation covers 13,462,902 journal articles published between 2018 and 2025 in journals covered by transformative agreements the BIBSAM consortium negotiated with 25 publishers, including the largest commercial publishers Elsevier, Springer Nature, and Wiley.

Figure 1 – Availability of institution entity in OpenAlex works by snapshot. Note that only journals included in current transformative agreements negotiated by the Swedish BIBSAM consortium were investigated.

Figure 1 shows the affiliation metadata availability in OpenAlex by year and snapshot. Figure 1A highlights a massive decrease from 2024 onwards. While in 2023, 95% of records (n = 1,665,864) contained specific institution metadata, it decreased to 61% (n = 924,034), according to the September 2025 snapshot.

In Walden the decrease is even more pronounced: from 84% of records in 2023 (n = 1,475,406) containing specific institution metadata, it decreased to 54% (n = 816,018) in 2025. Overall, the Walden release contained 1,276,633 fewer articles with affiliation metadata than the previous snapshot analysing the same set of articles published between 2018 and 2025.

Comparing the availability of raw affiliation strings (Panel B) and author metadata (Panel C) shows that the decrease relates to availability of affiliation metadata, but does not extend to data about authors.

By publisher

A closer look into the distribution by publishers highlights that in particular Elsevier was affected. Figure 2 shows the absolute number of OpenAlex records by publisher and snapshot, with the three largest publishers distinguished by color from others. The figure shows that while the number of articles published in Wiley journals with institution metadata remains stable across snapshots, the affiliation metadata decreased for Elsevier across all years. For Springer Nature the situation was mixed, with an decrease in 2024, followed by an increase for articles published in 2025 after the introduction of Walden.

Figure 2 – Availability of institution entity in OpenAlex works by publisher and snapshot. Note that only journals included in current transformative agreements negotiated by the Swedish BIBSAM consortium were investigated.

A potential explanation is that Elsevier and Springer Nature do not share affiliation metadata via Crossref while Wiley does, according to the Crossref participation report (see Elsevier, Springer Nature, Wiley). However, the decrease in articles for Elsevier journals was even greater following the release of Walden for the same set of articles, spanning all years. A noticeable decrease can also be observed in the September 2025 version beginning with the publication year 2024. This suggests that OpenAlex is having issues with providing affiliation metadata for Elsevier articles, in particular after the Walden release.

A closer look into Elsevier

Figure 3 shows the relative decrease in coverage over the years for Elsevier. In the most recent Walden snapshot, OpenAlex recorded institution metadata for just 6.3% of articles published in 2025 by Elsevier journals, contributing the most to the decrease, although other publishers are affected. The September 2025 version also shows a marked decline from 2024 onwards. Before the September 2025 OpenAlex snapshot recorded affiliation metadata for 94% of Elsevier articles published in 2023 and even 96% for the publication year 2020. This trend aligns with the availability of raw affiliation strings in OpenAlex.

Figure 3 – Elsevier: Availability of institution entity in OpenAlex works by snapshot. Note that only journals included in the current transformative agreement negotiated by the Swedish BIBSAM consortium were investigated.

Looking ahead

To account for possible changes since the Walden snapshot release, 500 Elsevier articles published in 2025 without affiliation metadata were queried using the OpenAlex API on 15 December. The results show that around one-fourth of the articles now have affiliation metadata recorded in OpenAlex, so the situation is likely to change with the next release. However, this percentage is close to that reported for the September 2025 snapshot, which is still much lower than for previous years and only comprises a third of articles from this major publisher published in 2025.

Discussion and conclusion

Analysing the coverage of affiliation metadata in OpenAlex revealed a substantial decrease, presumably due to some publishers not providing affiliation metadata to Crossref, as well as potential technical issues within OpenAlex itself, which can partly be attributed to the transition to Walden. In particular, articles published in 2025 lacked affilation metadata in Crossref. Elsevier, the largest publisher in our sample, stands out in particular.

The observed decrease is critical for numerous institutional and country-specific research and monitoring activities making use of OpenAlex. In the case of the Hybrid Open Access Dashboard, I must alert users against relying on this data from 2024 onwards. I will carefully observe developments and hope that this issue can be resolved. Therefore, I have written this long-form issue description. Everyone is free to replicate these findings as the two latest OpenAlex snapshots used for this analysis are publicly available on Google BigQuery, as provided by the SUB Göttingen.

References

Alperin, Juan Pablo, Jason Portenoy, Kyle Demes, Vincent Larivière, and Stefanie Haustein. 2024. “An Analysis of the Suitability of OpenAlex for Bibliometric Analyses.” https://arxiv.org/abs/2404.17663.
de Jonge, Hans, Bianca Kramer, and Jeroen Sondervan. 2025. “Tracking Transformative Agreements Through Open Metadata: Method and Validation Using Dutch Research Council NWO Funded Papers.” Quantitative Science Studies 6: 1215–27. https://doi.org/10.1162/qss.a.24.
Haupka, Nick. 2025. “Presenting a Classifier to Detect Research Contributions in OpenAlex.” https://arxiv.org/abs/2507.22479.
Jahn, Najko. 2025. “Estimating Transformative Agreement Impact on Hybrid Open Access: A Comparative Large-Scale Study Using Scopus, Web of Science and Open Metadata.” Scientometrics. https://doi.org/10.1007/s11192-025-05390-3.
van Eck, Nees Jan, Ludo Waltman, and Mark Neijssel. 2024. “Launch of the CWTS Leiden Ranking Open Edition 2024,” October. https://doi.org/10.59350/r512t-r8h93.

Reuse

Citation

BibTeX citation:
@article{jahn2025,
  author = {Jahn, Najko},
  title = {Decreasing Affiliation Metadata Coverage in {OpenAlex}},
  journal = {Scholarly Communication Analytics},
  date = {2025-12-15},
  url = {https://subugoe.github.io/scholcomm_analytics/posts/openalex_affiliation_md/},
  doi = {10.59350/z3c5x-bfk63},
  langid = {en},
  abstract = {This blog post examines the decrease in affiliation
    metadata coverage in OpenAlex. An analysis of over 13 million
    articles published by major commercial publishers between 2018 and
    2025 suggests that this is probably because publishers have not
    provided Crossref with sufficient affiliation metadata. However,
    technical issues also seem to have occurred during the transition to
    Walden, the new version of OpenAlex. Elsevier, the largest
    publisher, stands out in particular.}
}
For attribution, please cite this work as:
Jahn, Najko. 2025. “Decreasing Affiliation Metadata Coverage in OpenAlex.” Scholarly Communication Analytics, December. https://doi.org/10.59350/z3c5x-bfk63.