Evaluation of OpenAlex
Note
|Published
We assessed whether replacing traditional databases with OpenAlex could save time and resources.
Download
Key message
In evidence syntheses, the aim is for the literature search to identify all relevant studies. A new source for identifying studies is OpenAlex, which is a continuation of the Microsoft Academic Graph (MAG) dataset. OpenAlex collects and makes available records from various sources, including some of the traditional databases we use. We therefore assessed whether replacing traditional databases with OpenAlex could save time and resources.
Our study had two parts. We mapped and described studies that had investigated searching for literature in OpenAlex or MAG (part 1). We also compared the searches in evidence syntheses published by the Norwegian Institute of Public Health, that had conducted both a traditional search and a search in OpenAlex/MAG (part 2).
For part 1 (mapping), we conducted a literature search in September 2023. We included 19 studies. None of the studies compared a traditional search with a search in OpenAlex. For part 2 (comparison), we included three evidence syntheses with 860 included studies. Neither the search in OpenAlex/MAG nor the traditional search identified all the included studies. We found that 24 (3%) of the 860 included studies are not in OpenAlex, and 700 (81%) of the 860 included studies were not identified by the search in OpenAlex/MAG.
We found that OpenAlex cannot be used as the only source if the goal is to identify as many relevant studies as possible. Using OpenAlex in addition to a traditional search will neither save time nor resources.
Summary
Introduction
In 2023, a project group within the Division for Health Services at the Norwegian Institute of Public Health (NIPH) published the document "Aims, findings, and suggested target areas for automation of information retrieval: final report 2022". The project group assessed 82 digital tools with machine learning elements, identifying four tools that could potentially enhance and possibly alter parts of the search process for evidence syntheses. The group concluded that none of the tools could significantly enhance the search process and suggested a continued investigation of automated information retrieval tools for evidence syntheses. The catalog (dataset) OpenAlex, utilizes machine learning, and was mentioned in the report, but not evaluated. OpenAlex is searchable from its own website and via EPPI-Reviewer, either using seed articles or a combination of search terms. EPPI-Reviewer from the EPPI Centre is an online tool for preparing evidence syntheses. OpenAlex is a continuation of Microsoft Academic Graph (MAG), created in 2015 and terminated in 2021.
When developing evidence syntheses, an information specialist (librarian) conducts systematic literature searches to identify as many relevant studies as possible on a given topic. When conducting literature searches for evidence syntheses, we usually prioritize sensitivity. This means tolerating noise (irrelevant records) to reduce the risk of missing relevant studies. Developing evidence syntheses is resource-intensive and might involve screening thousands of records from sensitive search strategies.
OpenAlex collects and makes available records from various sources, including some of the traditional databases we usually search to inform evidence syntheses. OpenAlex also collects records from sources that may contain document types other than those typically found in traditional databases. Therefore, we investigated if searching OpenAlex compared to traditional databases could save time and resources when conducting searches to inform evidence syntheses.
Objective
We wanted to investigate whether, by searching OpenAlex instead of performing a traditional literature search, we can spend less time or resources on the search process for evidence syntheses, whether OpenAlex contains the included studies in the evidence syntheses we have examined, and whether the included studies were identified by the search in OpenAlex.
We aimed to:
- Identify and map studies that have investigated literature searches in OpenAlex/MAG and present the results of these studies.
- Examine published evidence synthesis from the Cluster for reviews and health technology assessments that have used two search strategies; conducted traditional literature searches and searched OpenAlex to find 1) whether each of the two independent search strategies identified all the included studies, 2) whether all the included studies are in OpenAlex, 3) the characteristics of included studies not found in OpenAlex, 4) differences in the number of records retrieved from the two search strategies, and 5) differences in precision between the two search strategies.
Part 1: Mapping of previous studies
Method
We conducted a literature search in various sources in September 2023. Our inclusion criteria were studies that had investigated literature searches in OpenAlex/MAG and published in English or Scandinavian language after 2015.
Results
We included 19 studies published in 20 publications. None of the studies compared searches in OpenAlex/MAG with traditional literature searches the way we did in our study. Of the 19 studies, 17 studies examined MAG. Fifteen studies reported that MAG has good coverage, but that it cannot be used as the only source if the aim is to identify as many relevant studies as possible. Two studies concluded that MAG can be used as the only source for a living research map. Two studies are ongoing studies that will compare searches in OpenAlex with traditional searches.
Discussion and conclusion
We conducted literature searches in many sources and contacted known individuals and organizations involved in conducting evidence synthesis, nationally and internationally. However, it is still possible that we have missed relevant studies.
All the 17 published studies reported that MAG has good coverage, but only two studies concluded that OpenAlex can be used as the only source if the aim is to identify as many relevant studies as possible.
Part 2: Comparison of Searches in OpenAlex/MAG with Traditional Searches
Method
We used data from published evidence syntheses that had described, and independently conducted a traditional literature search and a search in OpenAlex/MAG, searched at least two traditional sources, and separately saved the records from the two search strategies. From the included studies in the included evidence syntheses, we extracted the following data: document type, topic, number of included studies found in OpenAlex, number of included studies identified among the records retrieved from the search in OpenAlex, and number of included studies identified among the records retrieved from traditional searches. We used the numbers to calculate sensitivity from the search result in OpenAlex and precision from the search result in OpenAlex and the traditional search.
Results
We included three evidence syntheses with a total of 860 studies included: 802 studies in Ames (2022), 25 studies in Bergsund (2023) and 33 studies in Johansen (2023). Alone, neither the search in OpenAlex/MAG nor the traditional search identified all the included studies. We found that 24 (3%) of the 860 studies were not found in OpenAlex; 700 (81%) of the 860 included studies were not identified among the records retrieved from the search in OpenAlex/MAG. In total, traditional searches yielded 24,870 records, and searches in OpenAlex/MAG yielded 4,308 records. Searches in OpenAlex/MAG had higher precision than traditional searches.
Discussion and conclusion
Only three evidence syntheses met our inclusion criteria. All were published before we started to plan this evaluation. We found that neither of the two search strategies (OpenAlex/MAG and traditional literature search) individually identified all the included studies. Our evaluation found that the two search strategies (OpenAlex/MAG and traditional literature search) complemented each other, and that OpenAlex should not be used as the only source, if the aim is to identify as many studies as possible. Given that OpenAlex should only be used in combination with traditional sources, and not as the only source, it will neither save time nor resources to search OpenAlex. Thus, we cannot recommend changing current practice.