Published in DataCite Blog - DataCite

Standards & More Standards Metadata standards are fundamental to the semantic web and database management; they form the basis for data discovery, sharing, and organization across a range of domains.

References

FAIR for digital twins

Published in CEAS Space Journal

AbstractThe continuing drive towards digitization in manufacturing leads to an increasing number of digital twins for monitoring and controlling all kinds of processes. While these capture crucial data of all individual steps and allow for analysis and optimization, more often than not the underlying models are confined to individual systems or organizations. This hinders data exchange, especially across institutional borders and thus represents an important barrier for economic success. Similar challenges in the scientific community led to the emergence of the FAIR principles (Findable, Accessible, Interoperable, and Reusable) as guidelines towards a sustainable data landscape. Despite the growing presence within academia, their transfer to industry has not yet received similar attention. We argue that the existing efforts and experiences in science can be exploited to address current data management challenges in industry as well. An improved data exchange within organizations and beyond can not just lower costs, but also opens up new opportunities ranging from discovering new suppliers or partners to improving existing value chains.

Library and Information SciencesStatistics, Probability and UncertaintyComputer Science ApplicationsEducationInformation Systems

The FAIR Guiding Principles for scientific data management and stewardship

AbstractThere is an urgent need to improve the infrastructure supporting the reuse of scholarly data. A diverse set of stakeholders—representing academia, industry, funding agencies, and scholarly publishers—have come together to design and jointly endorse a concise and measureable set of principles that we refer to as the FAIR Data Principles. The intent is that these may act as a guideline for those wishing to enhance the reusability of their data holdings. Distinct from peer initiatives that focus on the human scholar, the FAIR Principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals. This Comment is the first formal publication of the FAIR Principles, and includes the rationale behind them, and some exemplar implementations in the community.

FAIR Digital Twins for Data-Intensive Research

Published in Frontiers in Big Data
Authors Erik Schultes, Marco Roos, Luiz Olavo Bonino da Silva Santos, Giancarlo Guizzardi, Jildau Bouwman, Thomas Hankemeier, Arie Baak, Barend Mons

Although all the technical components supporting fully orchestrated Digital Twins (DT) currently exist, what remains missing is a conceptual clarification and analysis of a more generalized concept of a DT that is made FAIR, that is, universally machine actionable. This methodological overview is a first step toward this clarification. We present a review of previously developed semantic artifacts and how they may be used to compose a higher-order data model referred to here as a FAIR Digital Twin (FDT). We propose an architectural design to compose, store and reuse FDTs supporting data intensive research, with emphasis on privacy by design and their use in GDPR compliant open science.

Biospecimens in FDO world

Published in Research Ideas and Outcomes
Authors Sara El-Gebali, Rory Macneil, Rorie Edmunds, Parul Tewatia, Jens Klump

With the advent of technological advances in research settings, scientific collections including sample material became on par with big data. Consequently there is a widespread need to highlight and recognise the inherent value of samples coupled with efforts in unlocking sample potential as resources for new scientific discovery. Samples with informative metadata can be more easily discoverable, more readily shared and reused, allowing reanalysis of associated datasets, avoiding duplicate efforts, and providing metaanalysis yielding considerably enhanced insight. Metadata provides the framework for a consistent, systematic and standardized collection of sample information, enabling users to identify the availability of research output from the samples, relevancy to their intended use, and a way to conveniently identify sample material as well as access provenance information related to the physical samples. Researchers need this essential information aiding their decision making process on the quality, usability and accessibility of the samples and associated datasets. We propose to explore the practical implementation of FAIR Digital Objects (FDO) for biological life science physical samples and practically how to create an FDO framework centralized around biospecimen samples, linked datasets, sample information and PIDs (Persistent Identifiers)Klump et al. 2021. This effort is highly relevant to enhancing the portability of sample information between multiple repositories and other kinds of resources (e.g. e-infrastructures). In this session we would like to present our current work in order to mobilize the community to define the FAIR Digital Object Architecture for biospecimen in life science including all infrastructure components e.g. metadata, PIDs and their integration with technical solutions. To that end, in our community of practice we aim to: What: Identify the minimum set of attributes required for describing biospecimen in biological life science (Minimal Information About a Biological Sample, MIABS) with ontological mapping for semantic unambiguity and machine actionability.Identify the required attributes for registering PIDs for biospecimens and how that will operate in an FDO ecosystem. This will pave the way for a framework of coupling the descriptive metadata to the digital object in a FAIR and comprehensive manner. How:Define a semantic FDO model for biospecimens.Define the role of biospecimen PIDs registration information and kernel attributes and how that translates to machine actionability and programmatic decisions.Define the implementation specifics for integration of biospecimen FDOs with operational infrastructure e.g. e-infrastructures, repositories and machines. What: Identify the minimum set of attributes required for describing biospecimen in biological life science (Minimal Information About a Biological Sample, MIABS) with ontological mapping for semantic unambiguity and machine actionability.Identify the required attributes for registering PIDs for biospecimens and how that will operate in an FDO ecosystem. This will pave the way for a framework of coupling the descriptive metadata to the digital object in a FAIR and comprehensive manner. Identify the minimum set of attributes required for describing biospecimen in biological life science (Minimal Information About a Biological Sample, MIABS) with ontological mapping for semantic unambiguity and machine actionability. Identify the required attributes for registering PIDs for biospecimens and how that will operate in an FDO ecosystem. This will pave the way for a framework of coupling the descriptive metadata to the digital object in a FAIR and comprehensive manner. How:Define a semantic FDO model for biospecimens.Define the role of biospecimen PIDs registration information and kernel attributes and how that translates to machine actionability and programmatic decisions.Define the implementation specifics for integration of biospecimen FDOs with operational infrastructure e.g. e-infrastructures, repositories and machines. Define a semantic FDO model for biospecimens. Define the role of biospecimen PIDs registration information and kernel attributes and how that translates to machine actionability and programmatic decisions. Define the implementation specifics for integration of biospecimen FDOs with operational infrastructure e.g. e-infrastructures, repositories and machines. Relevant technologies include: RO-Crate, Persistent identifiers, and metadata schemas Relevant technologies include: RO-Crate, Persistent identifiers, and metadata schemas The recent partnership between IGSN and DataCite described below is a catalyst in this call of action to the FDO community to build a Community of Practice (CoP) specifically focused on biospecimen samples. Community of practice: IGSN e.V. announced a partnership with DataCite, in which DataCite’s registration services and supporting technology for Digital Object Identifiers (another type of PID) are now being leveraged to register IGSN IDs, and thus ensure the ongoing sustainability of the IGSN ID infrastructure. Importantly, the two organizations are also focusing the community’s efforts on advocacy of PIDs for physical samples and expanding the global sample ecosystem. Assisted by the DataCite Samples Community Manager, the IGSN e.V. is establishing working groups (Communities of Practice) within different research domains to support development and promotion of standardized methods for identifying, citing, and locating physical samples. In particular, the partnership wishes to work with the Biosamples community to elaborate the necessary information (metadata) such that those within the community have a full understanding of a physical sample when its descriptive webpage is accessed via its PID, see this example.

Metadata schemaschema mappingschema.orgXMLDataCite

DataCite Metadata Schema 4.4 to Schema.org Mapping

Published

Building bridges to other domains The DataCite Metadata Schema is a general, domain-agnostic metadata schema used for DataCite DOI registration. To improve interoperability, the DataCite Metadata Schema can be mapped, or crosswalked, to commonly used or domain-specific metadata standards.
This mapping from the DataCite Metadata Schema to Schema.org builds on existing efforts to produce crosswalks. For example, the DataCite Metadata Working Group has produced a mapping from DataCite to Dublin Core. DataCite Content Negotiation also returns DataCite DOI metadata in various formats, including Schema.org, JATS, and BibTeX. This mapping is based on the same mapping used by DataCite’s metadata conversion library (bolognese), with modifications.