Publications

Peer-reviewed scientific publications

1. Kallipoliti M., Chatzopoulos S., Baglioni M., Adamidi E., Koloveas P., Vergoulis T. (2025). From raw affiliations to organization identifiers. In: Proceedings of the 29th International Conference on Theory and Practice of Digital Libraries (TPDL 2025). https://doi.org/10.48550/arXiv.2505.07577

Abstract
Accurate affiliation matching, which links affiliation strings to standardized organization identifiers, is critical for improving research metadata quality, facilitating comprehensive bibliometric analyses, and supporting data interoperability across scholarly knowledge bases. Existing approaches fail to handle the complexity of affiliation strings that often include mentions of multiple organizations or extraneous information. In this paper, we present AffRo, a novel approach designed to address these challenges, leveraging advanced parsing and disambiguation techniques. We also introduce AffRoDB, an expert-curated dataset to systematically evaluate affiliation matching algorithms, ensuring robust benchmarking. Results demonstrate the effectiveness of AffRp in accurately identifying organizations from complex affiliation strings.

2. Koloveas P., Chatzopoulos S., Vergoulis T., Tryfonopoulos C. (2025). Can LLMs Predict Citation Intent? An Experimental Analysis of In-context Learning and Fine-tuning on Open LLMs. In: Proceedings of the 29th International Conference on Theory and Practice of Digital Libraries (TPDL 2025). https://doi.org/10.48550/arXiv.2502.14561

Abstract
This work investigates the ability of open Large Language Models (LLMs) to predict citation intent through in-context learning and fine-tuning. Unlike traditional approaches relying on domain-specific pre-trained models like SciBERT, we demonstrate that general-purpose LLMs can be adapted to this task with minimal task-specific data. We evaluate twelve model variations across five prominent open LLM families using zero-, one-, few-, and many-shot prompting. Our experimental study identifies the top-performing model and prompting parameters through extensive in-context learning experiments. We then demonstrate the significant impact of task-specific adaptation by fine-tuning this model, achieving a relative F1-score improvement of 8% on the SciCite dataset and 4.3% on the ACL-ARC dataset compared to the instruction-tuned baseline. These findings provide valuable insights for model selection and prompt engineering. Additionally, we make our end-to-end evaluation framework and models openly available for future use.

3. Provost L., Xenou Z. (2025). Reframing Research Assessment: towards a comprehensive framework for Researcher Profiles. In: fteval Journal for Research and Technology Policy Evaluation 57, e1, 1-16. https://doi.org/10.22163/fteval.2025.693

Abstract
The reform of research assessment is a top priority in the European Research Area. Recognising its crucial role in a strong Research and Innovation system, recent policies call for new approaches. Traditional methods rely heavily on publication metrics, failing to reflect the collaborative and interdisciplinary nature of modern research. The CoARA Agreement on Reforming Research Assessment, which was officially opened for signature on 28 September 2022 and counted 832 signatories as of 14th, March 2025, calls for better recognition of the diversity of research contributions, outputs, and career paths, and to base research assessment primarily on qualitative evaluation supported by a responsible use of quantitative indicators. The movement for reform also calls for better acknowledgement of contributions to Open Science. This contribution presents a framework for “Researcher Profiles” under development within the Horizon Europe project GraspOS (Grant Agreement n.101095129). This service aims at supporting organisations in implementing the CoARA commitments and to offer a flexible framework for assessing researchers which values diverse practices, and prioritises comprehensive quality and societal impact of research.

4. Ivan Heibi, I., Moretti, A., Peroni, S., Soricetti, M. (2024). The OpenCitations Index: description of a database providing open citation data. In: Scientometrics 129, 7923–7942. https://link.springer.com/article/10.1007/s11192-024-05160-7

Abstract
This article presents the OpenCitations Index, a collection of open citation data maintained by OpenCitations, an independent, not-for-profit infrastructure organisation for open scholarship dedicated to publishing open bibliographic and citation data using Semantic Web and Linked Open Data technologies. The collection involves citation data harvested from multiple sources. To address the possibility of different sources providing citation data for bibliographic entities represented with different identifiers, therefore potentially representing same citation, a deduplication mechanism has been implemented. This ensures that citations integrated into OpenCitations Index are accurately identified uniquely, even when different identifiers are used. This mechanism follows a specific workflow, which encompasses a preprocessing of the original source data, a management of the provided bibliographic metadata, and the generation of new citation data to be integrated into the OpenCitations Index. The process relies on another data collection—OpenCitations Meta, and on the use of a new globally persistent identifier, namely OMID (OpenCitations Meta Identifier). As of July 2024, OpenCitations Index stores over 2 billion unique citation links, harvest from Crossref, the National Institute of Heath Open Citation Collection (NIH-OCC), DataCite, OpenAIRE, and the Japan Link Center (JaLC). OpenCitations Index can be systematically accessed and queried through several services, including SPARQL endpoint, REST APIs, and web interfaces. Additionally, dataset dumps are available for free download and reuse (under CC0 waiver) in various formats (CSV, N-Triples, and Scholix), including provenance and change tracking information.

5. Corcho, O., Ekaputra, F. J., Heibi, I., Jonquet, C., Micsik, A., Peroni, S., & Storti, E. (2024). A maturity model for catalogues of semantic artefacts .In: Sci Data 11, 479. https://doi.org/10.1038/s41597-024-03185-4. Also available in Open Access at: https://doi.org/10.48550/arXiv.2305.06746

Abstract
This work presents a maturity model for assessing catalogues of semantic artefacts, one of the keystones that permit semantic interoperability of systems. We defined the dimensions and related features to include in the maturity model by analysing the current literature and existing catalogues of semantic artefacts provided by experts. In addition, we assessed 26 different catalogues to demonstrate the effectiveness of the maturity model, which includes 12 different dimensions (Metadata, Openness, Quality, Availability, Statistics, PID, Governance, Community, Sustainability, Technology, Transparency, and Assessment) and 43 related features (or sub-criteria) associated with these dimensions. Such a maturity model is one of the first attempts to provide recommendations for governance and processes for preserving and maintaining semantic artefacts and helps assess/address interoperability challenges.

6. Rizzetto, E., Peroni, S. (2024). Mapping bibliographic metadata collections: the case of OpenCitations Meta and OpenAlex. In: CEUR Wokshop Proceedings, vol 3643, 20th Conference on Information and Research Science Connecting to Digital and Library Science (IRCDL 2024), Bressanone, Italy. https://ceur-ws.org/Vol-3643/paper15.pdf. Also available in Open Access at https://arxiv.org/abs/2312.16523.

Abstract
This study describes the methodology and analyses the results of the process of mapping entities between two large open bibliographic metadata collections, OpenCitations Meta and OpenAlex. The primary objective of this mapping is to integrate OpenAlex internal identifiers into the existing metadata of bibliographic resources in OpenCitations Meta, thereby interlinking and aligning these collections. Furthermore, analysing the output of the mapping provides a unique perspective on the consistency and accuracy of bibliographic metadata, offering a valuable tool for identifying potential inconsistencies in the processed data.

7. Massari, A., Mariani, F., Heibi, I., Peroni, S., Shotton, D. (2024). OpenCitations Meta. In : Quantitative Science Studies 1-26. https://doi.org/10.1162/qss_a_00292. Also available in Open Access at https://arxiv.org/abs/2306.16191.

Abstract
OpenCitations Meta is a new database for open bibliographic metadata of scholarly publications involved in the citations indexed by the OpenCitations infrastructure, adhering to Open Science principles and published under a CC0 license to promote maximum reuse. It presently incorporates bibliographic metadata for publications recorded in Crossref, DataCite, and PubMed, making it the largest bibliographic metadata source using Semantic Web technologies. It assigns new globally persistent identifiers (PIDs), known as OpenCitations Meta Identifiers (OMIDs) to all bibliographic resources, enabling it both to disambiguate publications described using different external PIDS (e.g., a DOI in Crossref and a PMID in PubMed) and to handle citations involving publications lacking external PIDs. By hosting bibliographic metadata internally, OpenCitations Meta eliminates its former reliance on API calls to external resources and thus enhances performance in response to user queries. Its automated data curation, following the OpenCitations Data Model, includes deduplication, error correction, metadata enrichment, and full provenance tracking, ensuring transparency and traceability of data and bolstering confidence in data integrity, a feature unparalleled in other bibliographic databases. Its commitment to Semantic Web standards ensures superior interoperability compared to other machine-readable formats, with availability via a SPARQL endpoint, REST APIs, and data dumps.

8. Moretti, A., Soricetti, M., Heibi, I., Massari, A., Peroni, S., & Rizzetto, E. (2024). The Integration of the Japan Link Center’s Bibliographic Data into OpenCitations - The production of bibliographic and citation data structured according to the OpenCitations Data Model, originating from an Anglo-Japanese dataset. In: Journal of Open Humanities Data, 10(1), p. 21. https://openhumanitiesdata.metajnl.com/articles/10.5334/johd.178

Abstract
This article presents OpenCitations’ main data collections: the unified index of citation data (OpenCitations Index), and the bibliographic data corpus (OpenCitations Meta) in view of the integration of a new dataset provided by the Japan Link Center (JaLC). Based on a computational analysis of the titles of the publications performed in October 2023, 8.6% of the bibliographic metadata stored in OpenCitations Meta are not in English. Nevertheless, the ingestion of an Anglo-Japanese dataset represents the first opportunity to test the soundness of a language-agnostic metadata crosswalk process for collecting data from multilingual sources, aiming to preserve bibliodiversity and to minimize information loss considering the constraints imposed by the OpenCitations data model, which does not allow the acceptance of multiple values in different translations for the same metadata field. The JaLC dataset is set to join OpenCitations’ collections in November 2023, and it will be made available in RDF, CSV, and SCHOLIX formats. Data will be produced using open-source software and provided under a CC0 license via API services, web browsing interfaces, Figshare data dumps, and SPARQL endpoints, ensuring high interoperability, reuse, and semantic exploitation.

9. Koloveas, P., Chatzopoulos, S., Tryfonopoulos, C., Vergoulis, T. (2023). BIP! NDR (NoDoiRefs): A Dataset of Citations from Papers Without DOIs in Computer Science Conferences and Workshops. In: Alonso, O., Cousijn, H., Silvello, G., Marrero, M., Teixeira Lopes, C., Marchesin, S. (eds) Linking Theory and Practice of Digital Libraries. TPDL 2023. Lecture Notes in Computer Science, vol 14241. Springer, Cham. https://doi.org/10.1007/978-3-031-43849-3_9. Also available in Open Access at https://arxiv.org/abs/2307.12794.

Abstract
In the field of Computer Science, conference and workshop papers serve as important contributions, carrying substantial weight in research assessment processes, compared to other disciplines. However, a considerable number of these papers are not assigned a Digital Object Identifier (DOI), hence their citations are not reported in widely used citation datasets like OpenCitations and Crossref, raising limitations to citation analysis. While the Microsoft Academic Graph (MAG) previously addressed this issue by providing substantial coverage, its discontinuation has created a void in available data. BIP! NDR aims to alleviate this issue and enhance the research assessment processes within the field of Computer Science. To accomplish this, it leverages a workflow that identifies and retrieves Open Science papers lacking DOIs from the DBLP Corpus, and by performing text analysis, it extracts citation information directly from their full text. The current version of the dataset contains more than 510K citations made by approximately 60K open access Computer Science conference or workshop papers that, according to DBLP, do not have a DOI.

10. Chatzopoulos, S., Vichos, K., Kanellos, I., Vergoulis, T. (2023). Piloting Topic-Aware Research Impact Assessment Features in BIP! Services. In: Pesquita, C., et al. The Semantic Web: ESWC 2023 Satellite Events. ESWC 2023. Lecture Notes in Computer Science, vol 13998. Springer, Cham. https://doi.org/10.1007/978-3-031-43458-7_15. Also available in Open Access at https://arxiv.org/abs/2305.06047.

Abstract
Various research activities rely on citation-based impact indicators. However these indicators are usually globally computed, hindering their proper interpretation in applications like research assessment and knowledge discovery. In this work, we advocate for the use of topic-aware categorical impact indicators, to alleviate the aforementioned problem. In addition, we extend BIP! Services to support those indicators and showcase their benefits in real-world research activities.

11. Santos, E.A.d., Peroni, S. and Mucheroni, M.L. (2023). An analysis of citing and referencing habits across all scholarly disciplines: approaches and trends in bibliographic referencing and citing practices. In: Journal of Documentation, Vol. 79 No. 7, pp. 196-224. https://doi.org/10.1108/JD-10-2022-0234. Also available in Open Access at https://doi.org/10.48550/arXiv.2202.08469.

Abstract
Purpose: In this study, the authors want to identify current possible causes for citing and referencing errors in scholarly literature to compare if something changed from the snapshot provided by Sweetland in his 1989 paper.
Design/Methdology/Approach: The authors analysed reference elements, i.e. bibliographic references, mentions, quotations and respective in-text reference pointers, from 729 articles published in 147 journals across the 27 subject areas.
Findings: The outcomes of the analysis pointed out that bibliographic errors have been perpetuated for decades and that their possible causes have increased, despite the encouraged use of technological facilities, i.e. the reference managers.
Originality/value: As far as the authors know, the study is the best recent available analysis of errors in referencing and citing practices in the literature since Sweetland (1989).

12. Đorđević, A., (2023). GraspOS - Responsible Research Assessment in the Practice of Open Science: Best Practices (original title: GraspOS - Oдговорна процена научних истраживања у доброј пракси отворене науке). In: Pozitron, 29-30, pp. 60-61 (In Serbian). https://hdl.handle.net/21.15107/rcub_cherry_6504

Abstract (translated from Serbian)
The European project "next Generation Research Assessment to Promote Open Science" (GraspOS) gathers eighteen partner institutions from ten countries in order to explore new and responsible ways to evaluate scientific research, especially through the practice of open science. The University of Belgrade - Faculty of Chemistry also participates in this three-year project, supported by the European Commission, and the head of our Faculty is Ana Đorđević, a librarian.
A team composed of librarians and researchers at the Faculty of Chemistry will, through the GraspOS project, establish a balance between the quantity and quality of scientific results in the Cherry repository with the main emphasis on testing and evaluating new services that will be implemented by the association of institutions. Along with testing and evaluation, a more expedient use of the Cherry external application Authors, Projects, Publications is planned, which would develop a new reward system for the responsible evaluation of research in the form of open profiles of researchers. The new environment developed by the project can be adopted at the European level with good practice in the application of open science and with guidelines to support the reform of research evaluation.

Media articles and blog posts

Scientific papers are not just the number of papers and citations, Đorđević, A., Savić, S., Pozitron - 32. University of Belgrade - Faculty of Chemistry, 32, 1-26, March 2024 (In Serbian)

Toward open research information - Introducing the Information & Openness focal area at CWTS, Anli, Z., Tatum, C., Waltman, L., Leiden Madtrics, January 2024.

Don't forget the social dimension of research evaluation, Amanatidis, A., Provost, L., Research Europe (print edition), January 2024.

Community discussion on Research Assessment Reform in Social Sciences and Humanities, Delmazo, C., OPERAS blog, December 2023.

Engaging with research communities to advance research assessment, Provost, L., Italian Open Science Portal, December 2023 (in Italian).

Research(er) assessment that considers Open Science, Amanatidis, A., Leiden Madtrics, September 2023.

GraspOS: Responsible assessment of scientic research and Open Science, Djordjevic, A., Pozitron (pages 60-61), August 2023 (in Serbian).

Do you
want to know
more?

We would be happy to hear from you. Your needs and ideas are very valuable to building a collaborative infrastructure.

Publications

Follow us on Twitter

Follow us on Mastodon

Follow us on Linkedin

General Information