The Open Doors of Open Science: Discover OpenAlex

Olvasási idő: 7 perc

Introduction

There is no doubt that academic research is an exceptionally resource-intensive endeavor. Researchers face a long and demanding process: selecting a topic, formulating a research problem, choosing the appropriate methodology, searching for and processing the literature, collecting and analyzing data, publishing results, and managing the administrative burdens associated with research — all of which require meticulous planning and countless hours of work.
At the same time, researchers must adhere to professional and ethical standards, meet the expectations of funding agencies, and navigate the current trends in science communication — not to mention the constant pressure to maximize visibility and impact.
It is clear, therefore, that there is an urgent need for tools and infrastructure that can help mitigate this “tsunami of expectations” weighing on researchers.

What Exactly Is OpenAlex?

OpenAlex is an open-source platform that collects and makes scientific research data accessible, providing valuable support for many of the tasks mentioned earlier. Its primary goal is to make scientific research more transparent and accessible to researchers, funders, and the general public alike. As such, OpenAlex fits seamlessly into the framework of the Open Science movement, aligning perfectly with its objectives. The platform’s data can be used to track research progress, identify emerging trends, and measure research outcomes, making it a versatile tool for a wide range of scholarly activities.

The History of OpenAlex

Launched in January 2022 as the successor to the Microsoft Academic Graph (MAG), OpenAlex inherited a massive dataset containing 209 million scientific publications and 213 million authors. Its name pays tribute to the famous Library of Alexandria, symbolizing the preservation and sharing of knowledge. Initially, the platform was accessible exclusively through a REST API, yet within just one year it had already reached 1 million registered users. The web interface was recently introduced, built directly on the API, offering the fastest and easiest way to access OpenAlex. All data in the OpenAlex database is available under a CC0 license, meaning it can be freely used and distributed without restrictions. The scale of the available data is best illustrated by the following recent statistics:

Figure 1. – OpenAlex statistics in December 2024. Source: https://openalex.org/stats

Source of Data

OpenAlex provides free access to an immense amount of data. But where does all this information come from? And can we consider these sources reliable?
The platform aggregates and standardizes data from several major projects, with its two primary data sources being the Microsoft Academic Graph (MAG) and Crossref. In addition to these, several other key sources contribute to the richness of the database, including:

  • ORCID (unique researcher identifiers),
  • ROR (Research Organization Registry),
  • DOAJ (Directory of Open Access Journals),
  • Unpaywall (open access content),
  • PubMed and PubMed Central,
  • The ISSN International Centre,
  • Internet Archive,
  • Web crawling (data collection via bots),
  • Subject-specific and institutional repositories (from arXiv to Zenodo).

As of December 2024, OpenAlex indexes 260,810 distinct data sources, making it a comprehensive and ever-growing resource for scientific research.

Figure 2. – The way data is collected and published in OpenAlex. Source: https://help.openalex.org/hc/en-us/articles/24397285563671-About-the-data

Data Credibility

The sources listed above are generally considered verified and reliable databases. The only potential exception might be the data gathered through web crawling, as the validation and verification of this data can be more challenging compared to that from well-established sources. OpenAlex employs both manual and automated processes to ensure the quality of its data. It removes any incorrect or incomplete entries from the database and continually expands its dataset with new data from additional sources, maintaining a high standard of reliability.

Is It Competing?

The database of OpenAlex can be considered “young,” which makes it difficult to directly compare with those of other providers. However, in terms of raw numbers, OpenAlex does not fall short compared to subscription-based competitors. According to its own claims, its coverage is on par with other major players in the market. One of its significant advantages is that it integrates data from multiple sources, including grey literature and preprints, helping to offer a more comprehensive view. In addition to this, OpenAlex provides analytical features, allowing users to uncover trends, compare researchers and institutions, and even explore potential funding sources.

Figure 3. – Number of documents indexed (million) documents per database. Data retrieval date: WoS: August 2024; Scopus: December 2024; Dimensions: December 2024; OpenAlex: December 2024.

Citation Quantity and Quality

When it comes to the quantity and quality of citations recorded, OpenAlex holds its ground confidently. Naturally, there may be some overlap between the citation data indexed by “rival databases” and those captured by OpenAlex. It’s also possible that certain services (primarily Google Scholar) may show more extensive citation lists for particular publications. However, the citation numbers provided by OpenAlex are as reliable as those from Web of Science or Scopus, and can be trusted to the same degree.

Figure 4. – Aggregated figures for the number of references to documents indexed in each database. Data retrieval date: OpenCitations:*; Dimensions: December 2024; WoS: August 2024; Scopus: July 2024; OpenAlex: December 2024.
* Data from the Open Citations website, without date.

What is OpenAlex’s Biggest Competitive Advantage?

Its free accessibility and the linguistic diversity of its sources! In 2023, about 20% of the indexed content in Scopus was in languages other than English, whereas OpenAlex contained approximately 30% non-English content. As of December 2024, OpenAlex contains around 187.5 million English-language documents, and the total dataset includes approximately 261.9 million records. This makes OpenAlex a powerful tool not only for English-language research but also for a broader, more diverse global academic community.

The Database Interface

On the OpenAlex website (https://openalex.org), you’ll encounter a minimalist yet elegant GUI that provides a clean and user-friendly experience.

Figure 5. – OpenAlex search interface

Upon clicking the search box that appears, you can search for various entities within the database:

  • Works (scientific articles, books, datasets, dissertations, etc.)
  • Authors (authors of the “Works”)
  • Sources (journals and repositories that contain the “Works”)
  • Institutions (organizations to which the “Authors” are affiliated)
  • Topics (topics assigned to individual “Works”)
  • Publishers (publishers and other organizations that release the “Works”)
  • Funders (organizations that fund the research)
  • GEO (a key filtering option in OpenAlex, allowing geographic/regional grouping of data)
  • Concepts (previously used instead of “Topics,” no longer maintained)

As you type your search query, you will automatically receive suggestions based on the categories listed above:

Figure 6. – The search box and suggestions — by default, the search will query multiple entity sets

After running the search, we can view and organize the results and reports. Here, we can modify and filter our search query. There are 44 different filters to choose from, and in the Stats section, we can request statistics based on more than 30 criteria. Additionally, the displayed reports can be further refined by clicking on the values of individual slicing fields.

Figure 7. – Search results list with statistics and the possibility to add additional search terms

The result set can be saved in txt (WoS format), ris (EndNote format), and CSV (for importing into spreadsheets) formats. The data from the Stats section can also be saved in CSV format, allowing us to create custom charts later on.

We also have the possibility to view citation data:

Figure 8. – Publication list of an author, in descending order by cited by count

We can examine this data in more detail:

Figure 9. – Details of a publication

Finally, we can request a detailed list of citations for any given publication. This list can be further filtered using the previously mentioned slicing options available in the Stats section.

Figure 10. – List of publications citing a selected publication

In most cases, the web interface provides users with sufficient information. In addition to the web-based user interface and API calls, OpenAlex offers the option to download snapshots of the entire database in JSON Lines format. These snapshots are typically updated monthly.

Figure 11. – Number of documents indexed with PTE affiliation and the number of citations to them

Is It Really Free?

OpenAlex is developed and maintained by OurResearch, a nonprofit organization with a decade of experience in providing sustainable, free tools (such as Unpaywall). To achieve their goals of providing free access and ensuring sustainability, they employ a freemium business model.

At this point, it’s natural to raise an eyebrow. Many of us have had negative experiences or frustrations with the freemium model, widely used in software, mobile apps, games, and streaming services. However, it’s important to emphasize that OpenAlex’s API, website, and published data are entirely free to access, and users who prefer the free version won’t face significant disadvantages compared to those who opt for paid services.

That said, to ensure sustainability, OurResearch currently offers two premium, value-added services:

  1. Paid Consulting Services:
    • Affiliation and Author Curation Services
    • Custom Research Classification Services
    • Disambiguation Services (merging author names)
    • Custom Bibliometric Analyses and Reports
  2. Service Parameter Enhancements: more frequent updates, higher daily API query limits, user support and training sessions, workshops.
Service Free Version Premium Version 1 Premium Version 2
Data Update Frequency Monthly, larger batch updates Hourly Hourly
API Limit 100,000 records/day, max 10 records/second As needed As needed
Support Based on capacity (paid customers get priority) Priority support (up to 5 users) Priority support (all users)
Other Extra Services Training, consulting services, recommendations

Shall we join?

The data from OpenAlex can easily be accessed and used in various software applications. Several libraries are available for the Python programming language (openalex, pyalex, scholarly) that allow us to fetch data from the API and analyze it. For statistical analysis and visualization, the openalexR and bibliometrix packages, which are compatible with the R programming language, provide excellent support. The openalex-js JavaScript library helps create interactive web applications and visualizations. With VOSviewer, we can easily perform scientific landscape clustering analysis based on data fetched via the OpenAlex API. In addition, we can also use many other applications that work with data exported in one of the database’s formats (e.g., spreadsheets, business intelligence platforms, data analysis AI tools).

Figure 12. – Analysis of OpenAlex data in RStudio, using Bibliometrix and biblioshiny (Institute= University of Pécs + publication year= 2020-2024)

Success Story – Vive la France

The Sorbonne University’s commitment to open science reaches a new milestone: one of the most significant moments in the life of OpenAlex is that in 2024, Sorbonne University will discontinue its subscription to the Web of Science publication database and Clarivate bibliometric tools. The university has long been dedicated to advancing open science, and as part of this commitment, it has signed a partnership agreement with OpenAlex, transitioning its publication database and bibliometric analyses to open-source foundations. Moreover, the French Ministry of Higher Education and Research has committed to closely collaborating with OpenAlex, striving to improve general data in the platform, with a special focus on enriching data related to French research.

Closing word

If you want to learn more about OpenAlex, visit https://openalex.org and try out the capabilities of the service, or read the project documentation, join the mailing list, follow the OpenAlex X profile!

 

_______________________________

Sources used:

Comments are closed.

PTE Egyetemi Könyvtár és Tudásközpont | 2023

Up ↑