Rogue Scholar

Published October 29, 2024

Just a placeholder to mark the ongoing impact of the Internet Archive being attacked (see here, here and here for details). The impact of this on the Biodiversity Heritage Library (BHL) has been huge, and reveals the extent to which BHL depends on the Archive.

Computer and Information Sciences

Exploring BOLD's DNA barcode data releases: there's a fraction too much friction

https://doi.org/10.59350/6qepn-ge510

Published October 18, 2024

Author Roderic Page

Recently I’ve been exploring data downloaded from BOLD. Part of this was motivated by work done with David Schindel for a recent book: In this blog post I record some struggles I’ve had with the supposedly “Frictionless” data provided by BOLD. I list a serious of issues, and make some recommendations as to how these can be fixed. Previous versions disappear from site The web page Data Packages lists datasets that can be downloaded.

Computer and Information Sciences

The Data Citation Corpus revisited

https://doi.org/10.59350/wvwva-v7125

Published October 8, 2024

Author Roderic Page

TL;DR These are some brief notes on the latest version (v. 2) of the Data Citation Corpus, relased shortly before the Make Data Count Summit 2024, which also included a discussion on the practical uses of the corpus. I downloaded version 2 from Zenodo doi:10.5281/zenodo.13376773. The data is in JSON format, which I then loaded into CouchDB to play with.

Computer and Information Sciences

Why do museum and gallery displays ignore the web?

https://doi.org/10.59350/a83tn-c6t14

Published August 13, 2024

Author Roderic Page

This post is inspired by the Pharaoh exhibition at the NGV in Melbourne, Australia. This is a beautifully displayed exhibition of objects from the British Museum, London. It has all the trappings of a modern exhibition, beautiful lighting, a custom sound track, and lots of social media coverage. But I found it immensely frustrating to visit.

Computer and Information Sciences

A future for the Biodiversity Heritage Library

https://doi.org/10.59350/n3dkt-6xd05

Published July 2, 2024

Author Roderic Page

Following the 2024 BHL meeting, and the departure of Martin Kalfatovic and the uncertainty the departure of such a pivitol person brings, perhaps it’s time to think about the future of BHL. Below I sketch some thoughts, which are hazy at best. I should say at the outset that I think BHL is an extraordinary project. My goal is to think about ways to enhance its utility and impact.

Computer and Information Sciences

Visualising big trees: a talk at the Systematics Association 2024

https://doi.org/10.59350/cf6n4-ch767

Published June 19, 2024

Author Roderic Page

This blog post has some notes in support of a talk given to the Systematics Association meeting in Reading June 20th, 2024. Slides I will post a link to the slides here once I have given the talk. Page, Roderic (2024). Visualising big trees. figshare. Presentation.

FAIRIdentifiersNanopublicationPensoftRDFComputer and Information Sciences

Nanopubs, a way to create even more silos

https://doi.org/10.59350/6nj85-7te92

Published June 18, 2024

Author Roderic Page

Pensoft have recently introduced “nanopubs”, small structured publications that can be thought of as containing the minimum possible statement that could be published. Nanopubs are promoted as FAIR, that is findable, accessible, interoperabile, and reusable. I like the idea of nanopubs, but the examples I have seen so far are problematic.

Computer and Information Sciences

Notes on transforming BHL images

https://doi.org/10.59350/2gpbb-98a53

Published April 19, 2024

Author Roderic Page

How to cite: Page, R. (2024). Notes on transforming BHL images https://doi.org/10.59350/2gpbb-98a53 I’ve been down this road before, e.g. BHL, DjVu, and reading the f*cking manual and Demo of full-text indexing of BHL using CouchDB hosted by Cloudant, but I’m revisiting converting BHL page scans to black and white images, partly to clean them up, to make them closer to what a modern reader might expect, and partly to reduce the

Computer and Information Sciences

Hugging Face Autotrain

https://doi.org/10.59350/7p1n4-wdv84

Published March 27, 2024

Author Roderic Page

How to cite: Page, R. (2024). Hugging Face Autotrain https://doi.org/10.59350/7p1n4-wdv84 These are notes to myself on using Hugging Face AutoTrain. The first version of this had a very nice interface where you could simply upload a folder of images and train a model. It was limited in the range of tasks and models, but made up for that in ease of use.

Computer and Information Sciences

Problems with the DataCite Data Citation Corpus

https://doi.org/10.59350/t80g1-xys37

Published February 20, 2024

Author Roderic Page

How to cite: Page, R. (2024). Problems with the DataCite Data Citation Corpus https://doi.org/10.59350/t80g1-xys37 DataCite have released the Data Citation Corpus, together with a dashboard that summarises the corpus. This is billed as: The goal is to build a citation database between scholarly articles and data, such as datasets in repositories, sequences in GenBank, protein structures in PDB, etc.

iPhylo

Internet Archive as a single point of failure

Exploring BOLD's DNA barcode data releases: there's a fraction too much friction

The Data Citation Corpus revisited

Why do museum and gallery displays ignore the web?

A future for the Biodiversity Heritage Library

Visualising big trees: a talk at the Systematics Association 2024

Nanopubs, a way to create even more silos

Notes on transforming BHL images

Hugging Face Autotrain

Problems with the DataCite Data Citation Corpus