Published in DataCite Blog - DataCite

We know that software is important in research, and some of us in the scholarly communications community, for example, in FORCE11, have been pushing the concept of software citation as a method to allow software developers and maintainers to get academic credit for their work: software releases are published and assigned DOIs, and software users then cite these releases when they publish research that uses the software.

References

General Earth and Planetary SciencesGeneral Environmental Science

Persistent Identification and Citation of Software

Published in International Journal of Digital Curation
Authors Catherine Mary Jones, Brian Matthews, Ian Gent, Tom Griffin, Jonathan Tedds

Software underpins the academic research process across disciplines. To be able to understand, use/reuse and preserve data, the software code that generated, analysed or presented the data will need to be retained and executed. An important part of this process is being able to persistently identify the software concerned. This paper discusses the reasons for doing so and introduces a model of software entities to enable better identification of what is being identified. The DataCite metadata schema provides a persistent identification scheme and we consider how this scheme can be applied to software. We then explore examples of persistent identification and reuse. The examples show the differences and similarities of software used in academic research, which has been written and reused at different scales. The key concepts of being able to identify what precisely is being used and provide a mechanism for appropriate credit are important to both of them.  

ropensci/codemetar: codemetar: Generate CodeMeta Metadata for R Packages

Published
Authors Carl Boettiger, Maëlle Salmon, Arfon Smith, Noam Ross, Katrin Leinweber, Anna Krystalli

New functions extract_badges for extracting information from all badges in a Markdown file. give_opinion giving opiniated advice about package metadata Changes to the create_codemeta output relatedLink field now include provider URL and URL(s) from DESCRIPTION that are not the code repository maintainer is now a list allowing for several maintainers since e.g. the BioConductor a4 package has two maintainers. if more than one CI service among Travis, Appveyor and Circle CI are used and shown via a README badge they're all added to the contIntegration field. URLs from codecov and coveralls badges are also added to the contIntegration field. repo status inferred from the README now 1) is an URL instead of a word 2) recognizes either repostatus.org or Tidyverse lifecycle badges. if present, priority is given to the Repository and BugReports fields of DESCRIPTION for filling the codeRepository and issueTracker fields of codemeta.json (which means working on a fork won't change these). ability to parse all CRAN-allowed MARC roles. if there is a badge for an rOpenSci onboarding review and the review issue is closed, basic review metadata is added to codemeta.json For dependencies, if the provider guessed is CRAN or BioConductor, their canonic CRAN/BioConductor URL is added to codemeta.json as sameAs, unless there's a GitHub repo mentioned for them in Remotes in DESCRIPTION, in which case sameAs is that GitHub repo. CRAN is now correctly translated as "Comprehensive R Archive Network" If codeRepository is guessed to be a GitHub repo (via the URL field of DESCRIPTION or via git remote URL), the repo topics are queried via GitHub API V3 and added to the keywords (in combination with keywords stored in the X-schema.org-keywords field of DESCRIPTION) SystemRequirements are now parsed using https://sysreqs.r-hub.io/, outputting URLs then stored in softwareRequirements Help to remind to update codemeta.json regularly: Writing codemeta.json for the first time adds a git pre-commit hook and suggests adding a release question for devtools::release. Internal changes Now uses desc to parse DESCRIPTION files. Package license changed to GPL because of code borrowed from usethis Uses crul instead of httr and uses crul to check some URLs. write_codemeta only uses Rbuildignore and a pre-commit git hook if the function is called from a package folder directly and with the path argument equal to "codemeta.json" The calls to available.packages() for guess_provider now happen inside memoised functions. codemeta_readme function.

DataCite Metadata Schema Documentation for the Publication and Citation of Research Data v4.1

Published
Authors DataCite Metadata Working Group, Madeleine de Smaele, Joan Starr, Jan Ashton, Amy Barton, Noris Birt, Stefanie Dietiker, Jannean Elliot, Martin Fenner, Wim Hugo, Stefan Jakobsson, Isabel Bernal Martínez, Jessica Rücknagel, Mohamed Yahia, Frauke Ziedorn, Lisa Zolly

1 Introduction1.1 The DataCite Consortium1.2 DataCite Community Participation1.3 The Metadata Schema1.4 Version 4.1 Update2 DataCite Metadata Properties2.1 Overview2.2 Citation2.3 DataCite Properties3 XML Example4 XML Schema5 Other DataCite ServicesAppendicesAppendix 1: Controlled List DefinitionsAppendix 2: Earlier Version Update NotesAppendix 3: Standard values for unknown informationAppendix 4: Version 4.1 Changes in support of software citationAppendix 5: FORCE11 Software Citation Principles Mapping

featured

DOI Fabrica 1.0 is Here!

Published

The DataCite team is pleased to announce the release of DOI Fabrica 1.0! DataCite Providers and Clients can check it out at [https://doi.datacite.org]. DOI Fabrica is the new web interface for DataCite DOI registration services. With DOI Fabrica, you...

80609 Information Systems ManagementFOS: Computer and information sciencesFOS: Computer and information sciences80505 Web Technologies (excl. Web Search)80612 Interorganisational Information Systems and Web Services

Codemeta: A Rosetta Stone for Software Metadata

Published

Software is critical to robust and efficient scientific discovery across disciplines, and yet is rarely valued or even understood. Researchers need to be able to discover and understand scientific software to apply it to their projects, but the approaches for documenting software are typically language specific and not interoperable. This project will have a broad impact on multiple disciplines by increasing the interoperability and consistency of software descriptions and by providing examples that illustrate the utility of interoperable software repositories for citation, discovery, archiving and preservation of scientific software. Research relies heavily on scientific software, and a large and growing fraction of researchers must now develop custom software to conduct their own research. Despite this, infrastructure to support the preservation, discovery, reuse, and attribution of software lags substantially behind that of other research products such as journal articles and research data. This frustrates the progress of science in several ways: lacking a way to discover and access software written by other researchers means that multiple teams must re-invent the same wheel. Limited re-use or accreditation of software also discourages researchers from investing more time to improve the performance, reliability or usability of the software they write. This lag is driven not so much by a lack of technology as it is by a lack of unity: existing mechanisms to archive, document, index, share, discover, and cite software contributions are varied among research disciplines and among software archives, and rarely consistent with best practices. The project will convene key stakeholders from software and data repositories to address this issue by aligning existing software metadata approaches. This alignment of software documentation will increase the efficiency and scale or research across disciplines, and simplify the process for researchers to collaborate on interdisciplinary projects.

This project will have three distinct phases:

1. Define a crosswalk table between exiting metadata schema for software

2. Develop prototype applications illustrating the value of crosswalk metadata

3. Assess and communicate impact of results.

The researchers will convene a meeting of repository and science stakeholders to harmonize approaches to software metadata. Rather than try and define yet another standard, they will map the correspondences between standards already in use -- a Rosetta stone of software metadata. In this process, the investigators will identify metadata use cases that have guided existing software metadata descriptions (e.g. more or different metadata may be needed to install software than to cite it, and even more to extend it), and then agree upon which metadata concepts are needed for each use case. This phase will identify some use cases that are not fully supported by existing software repositories (for instance, Zenodo is interested in associating software with funders as a use case but does not recognize funder identifiers yet). This will set the stage for the second phase where the crosswalk table will be used to harmonize the implementation of software metadata in three major repositories that support software deposition (KNB, Zenodo, Figshare). The researchers will modify the software and provenance metadata terms used in the DataONE federation to be interoperable with the crosswalk, and create a tool for generating and uploading software with this metadata to the KNB repository (a member repository of DataONE). Collaborators will extend the existing integration between the software repository GitHub and the data repositories Zenodo and figshare to provide interoperable software metadata. In the final phase, the team will conduct an assessment with researchers at a relevant scientific meeting to evaluate the effectiveness of the crosswalk for the identified software use cases and will summarize results in a scientific paper.

General Computer Science

Software citation principles

Published in PeerJ Computer Science
Authors Arfon M. Smith, Daniel S. Katz, Kyle E. Niemeyer, FORCE11 Software Citation Working Group

Software is a critical part of modern research and yet there is little support across the scholarly ecosystem for its acknowledgement and citation. Inspired by the activities of the FORCE11 working group focused on data citation, this document summarizes the recommendations of the FORCE11 Software Citation Working Group and its activities between June 2015 and April 2016. Based on a review of existing community practices, the goal of the working group was to produce a consolidated set of citation principles that may encourage broad adoption of a consistent policy for software citation across disciplines and venues. Our work is presented here as a set of software citation principles, a discussion of the motivations for developing the principles, reviews of existing community practice, and a discussion of the requirements these principles would place upon different stakeholders. Working examples and possible technical solutions for how these principles can be implemented will be discussed in a separate paper.