
OpenAIRE
Founded Year
2018About OpenAIRE
OpenAIRE is a non-profit organization focused on promoting open scholarship and enhancing the discoverability and accessibility of data-driven research results across various scientific disciplines. The organization operates a European e-infrastructure that provides over 15 public services to facilitate the adoption of open science practices and is a key implementor of the European Open Science Cloud (EOSC). OpenAIRE's services are designed to support researchers, policy makers, research organizations, SMEs, universities, libraries, and citizen scientists in integrating and leveraging open science methodologies. It was founded in 2018 and is based in Marousi, Greece.
Loading...
Loading...
Latest OpenAIRE News
Jun 3, 2024
Authors: (1) Arcangelo Massari, Research Centre for Open Scholarly Metadata, Department of Classical Philology and Italian Studies, University of Bologna, Bologna, Italy { [email protected] }; (2) Fabio Mariani, Institute of Philosophy and Sciences of Art, Leuphana University, Lüneburg, Germany { [email protected] }; (3) Ivan Heibi, Research Centre for Open Scholarly Metadata, Department of Classical Philology and Italian Studies, University of Bologna, Bologna, Italy and Digital Humanities Advanced Research Centre (/DH.arc), Department of Classical Philology and Italian Studies, University of Bologna, Bologna, Italy { [email protected] }; (4) Silvio Peroni, Research Centre for Open Scholarly Metadata, Department of Classical Philology and Italian Studies, University of Bologna, Bologna, Italy and Digital Humanities Advanced Research Centre (/DH.arc), Department of Classical Philology and Italian Studies, University of Bologna, Bologna, Italy { [email protected] }; (5) David Shotton, Oxford e-Research Centre, University of Oxford, Oxford, United Kingdom { [email protected] }. Table of Links 2. Related works In this section, we will review the most important scholarly publishing datasets to which access does not require subscription, i.e. publicly available datasets holding scholarly bibliographic metadata. Since OpenCitations Meta uses Semantic Web technologies to represent data, special attention will be given to RDF datasets, namely Wikidata, Springer Nature SciGraph, BioTea, the OpenResearch Knowledge Graph and Scholarly Data. In addition, the OpenAIRE Research Graph, OpenAlex and Scholarly Data will be described, as they are the most extensive datasets in terms of the number of works, although they do not represent data semantically. OpenAlex (Priem et al., 2022) rose from the ashes of the Microsoft Academic Graph on January 1st 2022, and inherited all its metadata. It includes data from Crossref (Hendricks et al., 2020), Pubmed (Maloney et al., 2013), ORCID (Haak et al., 2012), ROR (Lammey, 2020), DOAJ (Morrison, 2017), Unpaywall (Dhakal, 2019), arXiv (Sigurdsson, 2020), Zenodo (Research & OpenAIRE, 2013), the ISSN International Centre[1], and the Internet Archive’s General Index[2]. In addition, web crawls are used to add missing metadata. With over 240 million works[3], OpenAlex is the most extensive bibliographic metadata dataset to date. OpenAlex assigns persistent identifiers to each resource. In addition, authors are disambiguated through heuristics based on co-authors, citations, and other features of the bibliographic resources. The data are distributed under a CC0 licence and can be accessed via API, web interface or downloading a full snapshot copy of the OpenAlex database. The OpenAIRE project started in 2008 to support the adoption of the European Commission Open Access mandates (Manghi et al., 2010), and it is now the flagship organisation within the Horizon 2020 research and innovation programme to realise the European Open Science Cloud (European Commission. Directorate General for Research and Innovation., 2016). One of its primary outcomes is the OpenAIRE Research Graph, which includes metadata about scholarly outputs (e.g. literature, datasets and software), organisations, research funders, funding streams, projects, and communities, together with provenance information. Data are harvested from a variety of sources (Atzori et al., 2017): archives, e.g. ArXiv (Sigurdsson, 2020) Europe PMC (The Europe PMC Consortium, 2015), Software Heritage (Abramatic et al., 2018) and Zenodo (Research & OpenAIRE, 2013); aggregator services, e.g. DOAJ (Morrison, 2017) and OpenCitations (Peroni & Shotton, 2020); and other research graphs, e.g. Crossref (Hendricks et al., 2020) and DataCite (Brase, 2009). As of June 2023, this OpenAIRE dataset consisted of 232,174,001 research products[4]. The deduplication process implemented by OpenAIRE takes into account not only PIDs but also other heuristics, such as the number of authors and the Levenstein distance of titles. However, the internal identifiers OpenAIRE associates with entities are not persistent and may change when the data are updated. Data of the OpenAIRE Research Graph can be accessed via an API and the Explore interface. Dumps are also available under a Creative Commons Attribution 4.0 International Licence. Semantic Scholar was introduced by the Allen Institute for Artificial Intelligence in 2015 (Fricke, 2018). It is a search engine that uses artificial intelligence to select only papers most relevant to the user’s search and to simplify exploration, e.g. by producing automatic summaries. Semantic Scholar sources its content via web indexing and partnerships with scientific journals, indexes, and content providers. Among those are the Association for Computational Linguistics, Cambridge University Press, IEEE, PubMed, Springer Nature, The MIT Press, Wiley, arXiv, HAL, and PubMed. As of June 2023, it indexes 212,605,886 scholarly works[5]. Authors are disambiguated via an artificial intelligence model (Subramanian et al., 2021), associated with a Semantic Scholar ID, and a page is automatically generated for each author, which the real person can redeem. Semantic Scholar provides a web interface, APIs, and the complete dataset is downloadable under the Open Data Commons Attribution Licence (ODCBy) v1.0. Wikidata was introduced in 2012 by Wikimedia Deutschland as an open knowledge base to store in RDF data from other Wikimedia projects, such as Wikipedia, Wikivoyage, Wiktionary, and Wikisource (Mora-Cantallops et al., 2019). Due to its success, Google closed Freebase in 2014, which was intended to become “Wikipedia for structured data” and migrated it to Wikidata (Tanon et al., 2016). Since 2016, the WikiCite project has contributed significantly to the evolution of Wikidata as a bibliographic database, such that, by June 2023, Wikidata contained descriptions of 39,864,447 academic articles[6]. The internal Wikidata identifier referring to any entity (including bibliographic resources) is associated with numerous external identifiers, e.g. DOI, PMID, PMCID, arXiv, ORCID, Google Scholar, VIAF, Crossref funder ID, ZooBank and Twitter. The data are released under a CC0 licence as RDF dumps in Turtle and NTriples. Users can browse them via SPARQL, a web interface and, as of 2017, via Scholia – a web service which performs real-time SPARQL queries to generate profiles on researchers, organisations, journals, publishers, academic works and research topics, while also generating valuable infographics (Nielsen et al., 2017). While OpenAIRE Research Graph and Wikidata aggregate many heterogeneous sources, Springer Nature SciGraph (Hammond et al., 2017), on the other hand, aggregates only data from Springer Nature and its partners. It contains entities concerning publications, affiliations, research projects, funders and conferences, totalling more than 14 million research products[7]. There is no current plan to offer a public SPARQL endpoint, but there is the possibility to explore the data via a browser interface, and a dump is released monthly in JSON-LD format under a CC-BY licence. BioTea is also a domain-oriented dataset, and represents the annotated full-text open-access subset of PubMed Central (PMC-OA) (Garcia et al., 2018) using RDF technologies. At the time of that 2018 paper, the dataset contained 1.5 million bibliographic resources. Unlike other datasets, BioTea describes metadata and citations and defines the annotated full-texts semantically. Named-entity recognition analysis is adopted to identify expressions and terminology related to biomedical ontologies that are then recorded as annotations (e.g. about biomolecules, drugs, and diseases). BioTea data are released as dumps in RDF/XML and JSON-LD formats under the Creative Commons Attribution Non-Commercial 4.0 International licence, while the SPARQL endpoint is currently offline. A noteworthy approach is that adopted by the Open Research Knowledge Graph (ORKG) (Auer et al., 2020). Metadata are mainly collected either by trusted agents via crowdsourcing or automatically from Crossref. However, ORKG’s primary purpose is not to organise metadata but to provide services. The main scope of these services is to perform a literature comparison analysis using word embeddings to enable a similarity analysis and foster the exploration and link of related works. To enable such sophisticated analyses, metadata from Crossref is insufficient; therefore, structured annotations on the topic, result, method, educational context and evaluator must be manually specified for each resource. The dataset contains (as of June 2023) 25,680 papers[8], 5153 datasets, 1364 software and 71 reviews. Given the importance of human contribution to the creaton of the ORKG dataset, the platform keeps track of changes and provenance, athough not in RDF format. The data can be explored through a web interface, SPARQL, and an API, and can also be downloaded under a CC BY-SA licence. ScholarlyData collects information only about conferences and workshops on the topic of the Semantic Web (Nuzzolese et al., 2016). Data are modelled following the Conference Ontology, which describes typical entities in an academic conference, such as accepted papers, authors, their affiliations, and the organising committee, but not bibliographic references. Up to June 2023, the dataset stored information about 5678 conference papers. Such a dataset is updated by employing the Conference Linked Open Data generator software, which outputs RDF starting from CSV files (Gentile & Nuzzolese, 2015). The deduplication of the agents is based only on their URIs using a supervised classification method (Zhang et al., 2017), while ORCIDs are added in a further step. This methodology does not address the existence of homonyms. However, this is a minor issue for ScholarlyData, since only a few thousand people are involved in the conferences being indexed. ScholarlyData can be explored via a SPARQL endpoint, and dumps are available in RDF/XML format under a Creative Commons Attribution 3.0 Unported licence. To conclude, we would like to point out that none of these other datasets mentioned above exposes change-tracking data and the related provenance information in RDF. Table 1 summarises all the considerations made on each dataset. This paper is available on arxiv under CC 4.0 DEED license. [1] https://www.issn.org/
OpenAIRE Frequently Asked Questions (FAQ)
When was OpenAIRE founded?
OpenAIRE was founded in 2018.
Where is OpenAIRE's headquarters?
OpenAIRE's headquarters is located at 6 Artemidos Street & Epidavrou, Marousi.
Who are OpenAIRE's competitors?
Competitors of OpenAIRE include Springer Nature and 5 more.
Loading...
Compare OpenAIRE to Competitors

DeepDyve is a literature management solution provider in the academic and research sector. The company offers search, reference management, and access to a collection of academic papers and ebooks. DeepDyve serves the academic and research community with its suite of literature management and research tools. It is based in Sunnyvale, California.

SciSpace provides research assistance tools for the academic sector. Their offerings include a platform that supports the understanding of research papers, literature reviews, and writing and citation generation. SciSpace serves students and researchers in the academic fields. SciSpace was formerly known as Typeset. It was founded in 2015 and is based in Milpitas, California.
IntechOpen is an Open Access publisher that focuses on the dissemination of scientific research across various academic disciplines. The company provides a platform for authors to publish peer-reviewed books and journals. IntechOpen serves the academic and research community. It was founded in 2004 and is based in London, England.
Technology Networks is a company that provides a platform for scientific content and community engagement. The company offers resources including news articles, educational webinars, podcasts, and how-to guides across various scientific disciplines. Technology Networks primarily serves the research community by offering insights and information relevant to their fields of study. It is based in Sudbury, England.
Scholars is a digital platform that allows researchers in the academic sector to collaborate and discuss research papers. The company provides tools for annotating and reviewing research papers. Scholars serves the academic and research community, offering private and public reading rooms. It is based in Seattle, Washington.
Universiteitsbibliotheken & Koninklijke Bibliotheek focuses on providing access to scientific information and supporting academic research and education. The company offers services to make scientific information optimally accessible and findable for researchers, educators, and students, and contributes to the management and dissemination of knowledge collections. It primarily serves the academic and research sectors. It was founded in 1798 and is based in Utrecht, Netherlands.
Loading...