Skip to main content

Publications

Peer-reviewed scientific publications 

1. Ivan Heibi, I., Moretti, A., Peroni, S., Soricetti, M. (2024). The OpenCitations Index: description of a database providing open citation data. In: Scientometrics 129, 7923–7942. https://link.springer.com/article/10.1007/s11192-024-05160-7

Abstract
This article presents the OpenCitations Index, a collection of open citation data maintained by OpenCitations, an independent, not-for-profit infrastructure organisation for open scholarship dedicated to publishing open bibliographic and citation data using Semantic Web and Linked Open Data technologies. The collection involves citation data harvested from multiple sources. To address the possibility of different sources providing citation data for bibliographic entities represented with different identifiers, therefore potentially representing same citation, a deduplication mechanism has been implemented. This ensures that citations integrated into OpenCitations Index are accurately identified uniquely, even when different identifiers are used. This mechanism follows a specific workflow, which encompasses a preprocessing of the original source data, a management of the provided bibliographic metadata, and the generation of new citation data to be integrated into the OpenCitations Index. The process relies on another data collection—OpenCitations Meta, and on the use of a new globally persistent identifier, namely OMID (OpenCitations Meta Identifier). As of July 2024, OpenCitations Index stores over 2 billion unique citation links, harvest from Crossref, the National Institute of Heath Open Citation Collection (NIH-OCC), DataCite, OpenAIRE, and the Japan Link Center (JaLC). OpenCitations Index can be systematically accessed and queried through several services, including SPARQL endpoint, REST APIs, and web interfaces. Additionally, dataset dumps are available for free download and reuse (under CC0 waiver) in various formats (CSV, N-Triples, and Scholix), including provenance and change tracking information.

2. Corcho, O., Ekaputra, F. J., Heibi, I., Jonquet, C., Micsik, A., Peroni, S., & Storti, E. (2024). A maturity model for catalogues of semantic artefacts .In: Sci Data 11, 479. https://doi.org/10.1038/s41597-024-03185-4. Also available in Open Access at: https://doi.org/10.48550/arXiv.2305.06746

Abstract
This work presents a maturity model for assessing catalogues of semantic artefacts, one of the keystones that permit semantic interoperability of systems. We defined the dimensions and related features to include in the maturity model by analysing the current literature and existing catalogues of semantic artefacts provided by experts. In addition, we assessed 26 different catalogues to demonstrate the effectiveness of the maturity model, which includes 12 different dimensions (Metadata, Openness, Quality, Availability, Statistics, PID, Governance, Community, Sustainability, Technology, Transparency, and Assessment) and 43 related features (or sub-criteria) associated with these dimensions. Such a maturity model is one of the first attempts to provide recommendations for governance and processes for preserving and maintaining semantic artefacts and helps assess/address interoperability challenges.

3.   Rizzetto, E., Peroni, S. (2024). Mapping bibliographic metadata collections: the case of OpenCitations Meta and OpenAlex. In: CEUR Wokshop Proceedings, vol 3643, 20th Conference on Information and Research Science Connecting to Digital and Library Science (IRCDL 2024), Bressanone, Italy. https://ceur-ws.org/Vol-3643/paper15.pdf.  Also available in Open Access at https://arxiv.org/abs/2312.16523

Abstract
This study describes the methodology and analyses the results of the process of mapping entities between two large open bibliographic metadata collections, OpenCitations Meta and OpenAlex. The primary objective of this mapping is to integrate OpenAlex internal identifiers into the existing metadata of bibliographic resources in OpenCitations Meta, thereby interlinking and aligning these collections. Furthermore, analysing the output of the mapping provides a unique perspective on the consistency and accuracy of bibliographic metadata, offering a valuable tool for identifying potential inconsistencies in the processed data.


4. Massari, A., Mariani, F., Heibi, I., Peroni, S., Shotton, D. (2024). OpenCitations Meta. In : Quantitative Science Studies 1-26. https://doi.org/10.1162/qss_a_00292. Also available in Open Access at https://arxiv.org/abs/2306.16191.  

Abstract
OpenCitations Meta is a new database for open bibliographic metadata of scholarly publications involved in the citations indexed by the OpenCitations infrastructure, adhering to Open Science principles and published under a CC0 license to promote maximum reuse. It presently incorporates bibliographic metadata for publications recorded in Crossref, DataCite, and PubMed, making it the largest bibliographic metadata source using Semantic Web technologies. It assigns new globally persistent identifiers (PIDs), known as OpenCitations Meta Identifiers (OMIDs) to all bibliographic resources, enabling it both to disambiguate publications described using different external PIDS (e.g., a DOI in Crossref and a PMID in PubMed) and to handle citations involving publications lacking external PIDs. By hosting bibliographic metadata internally, OpenCitations Meta eliminates its former reliance on API calls to external resources and thus enhances performance in response to user queries. Its automated data curation, following the OpenCitations Data Model, includes deduplication, error correction, metadata enrichment, and full provenance tracking, ensuring transparency and traceability of data and bolstering confidence in data integrity, a feature unparalleled in other bibliographic databases. Its commitment to Semantic Web standards ensures superior interoperability compared to other machine-readable formats, with availability via a SPARQL endpoint, REST APIs, and data dumps.

5. Moretti, A., Soricetti, M., Heibi, I., Massari, A., Peroni, S., & Rizzetto, E. (2024). The Integration of the Japan Link Center’s Bibliographic Data into OpenCitations - The production of bibliographic and citation data structured according to the OpenCitations Data Model, originating from an Anglo-Japanese dataset. In: Journal of Open Humanities Data, 10(1), p. 21. https://openhumanitiesdata.metajnl.com/articles/10.5334/johd.178 

Abstract
This article presents OpenCitations’ main data collections: the unified index of citation data (OpenCitations Index), and the bibliographic data corpus (OpenCitations Meta) in view of the integration of a new dataset provided by the Japan Link Center (JaLC). Based on a computational analysis of the titles of the publications performed in October 2023, 8.6% of the bibliographic metadata stored in OpenCitations Meta are not in English. Nevertheless, the ingestion of an Anglo-Japanese dataset represents the first opportunity to test the soundness of a language-agnostic metadata crosswalk process for collecting data from multilingual sources, aiming to preserve bibliodiversity and to minimize information loss considering the constraints imposed by the OpenCitations data model, which does not allow the acceptance of multiple values in different translations for the same metadata field. The JaLC dataset is set to join OpenCitations’ collections in November 2023, and it will be made available in RDF, CSV, and SCHOLIX formats. Data will be produced using open-source software and provided under a CC0 license via API services, web browsing interfaces, Figshare data dumps, and SPARQL endpoints, ensuring high interoperability, reuse, and semantic exploitation.

6. Koloveas, P., Chatzopoulos, S., Tryfonopoulos, C., Vergoulis, T. (2023). BIP! NDR (NoDoiRefs): A Dataset of Citations from Papers Without DOIs in Computer Science Conferences and Workshops. In: Alonso, O., Cousijn, H., Silvello, G., Marrero, M., Teixeira Lopes, C., Marchesin, S. (eds) Linking Theory and Practice of Digital Libraries. TPDL 2023. Lecture Notes in Computer Science, vol 14241. Springer, Cham. https://doi.org/10.1007/978-3-031-43849-3_9. Also available in Open Access at https://arxiv.org/abs/2307.12794

Abstract
In the field of Computer Science, conference and workshop papers serve as important contributions, carrying substantial weight in research assessment processes, compared to other disciplines. However, a considerable number of these papers are not assigned a Digital Object Identifier (DOI), hence their citations are not reported in widely used citation datasets like OpenCitations and Crossref, raising limitations to citation analysis. While the Microsoft Academic Graph (MAG) previously addressed this issue by providing substantial coverage, its discontinuation has created a void in available data. BIP! NDR aims to alleviate this issue and enhance the research assessment processes within the field of Computer Science. To accomplish this, it leverages a workflow that identifies and retrieves Open Science papers lacking DOIs from the DBLP Corpus, and by performing text analysis, it extracts citation information directly from their full text. The current version of the dataset contains more than 510K citations made by approximately 60K open access Computer Science conference or workshop papers that, according to DBLP, do not have a DOI.

7. Chatzopoulos, S., Vichos, K., Kanellos, I., Vergoulis, T. (2023). Piloting Topic-Aware Research Impact Assessment Features in BIP! Services. In: Pesquita, C., et al. The Semantic Web: ESWC 2023 Satellite Events. ESWC 2023. Lecture Notes in Computer Science, vol 13998. Springer, Cham. https://doi.org/10.1007/978-3-031-43458-7_15. Also available in Open Access at https://arxiv.org/abs/2305.06047. 

Abstract
Various research activities rely on citation-based impact indicators. However these indicators are usually globally computed, hindering their proper interpretation in applications like research assessment and knowledge discovery. In this work, we advocate for the use of topic-aware categorical impact indicators, to alleviate the aforementioned problem. In addition, we extend BIP! Services to support those indicators and showcase their benefits in real-world research activities.

8. Santos, E.A.d., Peroni, S. and Mucheroni, M.L. (2023). An analysis of citing and referencing habits across all scholarly disciplines: approaches and trends in bibliographic referencing and citing practices. In: Journal of Documentation, Vol. 79 No. 7, pp. 196-224. https://doi.org/10.1108/JD-10-2022-0234. Also available in Open Access at https://doi.org/10.48550/arXiv.2202.08469

Abstract
Purpose: In this study, the authors want to identify current possible causes for citing and referencing errors in scholarly literature to compare if something changed from the snapshot provided by Sweetland in his 1989 paper.
Design/Methdology/Approach: The authors analysed reference elements, i.e. bibliographic references, mentions, quotations and respective in-text reference pointers, from 729 articles published in 147 journals across the 27 subject areas.
Findings: The outcomes of the analysis pointed out that bibliographic errors have been perpetuated for decades and that their possible causes have increased, despite the encouraged use of technological facilities, i.e. the reference managers.
Originality/value: As far as the authors know, the study is the best recent available analysis of errors in referencing and citing practices in the literature since Sweetland (1989).

9. Đorđević, A., (2023). GraspOS - Responsible Research Assessment in the Practice of Open Science: Best Practices (original title: GraspOS - Oдговорна процена научних истраживања у доброј пракси отворене науке). In: Pozitron, 29-30, pp. 60-61 (In Serbian). https://hdl.handle.net/21.15107/rcub_cherry_6504 

Abstract (translated from Serbian)
The European project "next Generation Research Assessment to Promote Open Science" (GraspOS) gathers eighteen partner institutions from ten countries in order to explore new and responsible ways to evaluate scientific research, especially through the practice of open science. The University of Belgrade - Faculty of Chemistry also participates in this three-year project, supported by the European Commission, and the head of our Faculty is Ana Đorđević, a librarian.
A team composed of librarians and researchers at the Faculty of Chemistry will, through the GraspOS project, establish a balance between the quantity and quality of scientific results in the Cherry repository with the main emphasis on testing and evaluating new services that will be implemented by the association of institutions. Along with testing and evaluation, a more expedient use of the Cherry external application Authors, Projects, Publications is planned, which would develop a new reward system for the responsible evaluation of research in the form of open profiles of researchers. The new environment developed by the project can be adopted at the European level with good practice in the application of open science and with guidelines to support the reform of research evaluation.

Media articles and blog posts

Do you
want to know
more?
We would be happy to hear from you. Your needs and ideas are very valuable to building a collaborative infrastructure.