Rogue Scholar

Model Context Protocol (MCP) and triple stores: natural language queries for knowledge graphs

Published November 19, 2025

Some quick notes based on experiments with Model Context Protocol (MCP) and (Claude](https://claude.ai). Model Context Protocol (MCP) is all the rage right now, and I’ve been slow to take a look. Kingsley Idehen recently wrote The Semantic Web Project Didn’t Fail — It Was Waiting for AI (The Yin of its Yang) where he argued that Large Language Models (LLMs) provide (finally) a user-friendly way to query triple stores (i.e., knowledge graphs).

Computer and Information Sciences

Make Data Count Kaggle Competition

https://doi.org/10.59350/1kk3z-yq870

Published August 7, 2025

Author Roderic Page

I’ve written several times here about the Make Data Count project and its major output to date, the Data Citation Corpus, currently at version 4 (see The fourth release of the Data Citation Corpus incorporates data citations from Europe PMC and additions to affiliation metadata). In June Make Data Count launched a Kaggle Competition with the goal of developing a tool that will process articles (in either PDF or XML format), extract data

Computer and Information Sciences

How many times are DNA barcoding datasets cited?

https://doi.org/10.59350/s0c6z-2m608

Published July 8, 2025

Author Roderic Page

This note accompanies a dataset that I uploaded to Zenodo (https://doi.org/10.5281/zenodo.15824274). My goal in creating this dataset is to link data created on the Barcode of Life Data Systems to the DOIs for those datasets, and then to link those data DOIs to DOIs for the papers (if any) that created those datasets, and/or cited them.

Computer and Information Sciences

A metabarcoding mess and the importance of just looking at the data

https://doi.org/10.59350/q2v8n-wc488

Published June 5, 2025

Author Roderic Page

Here I summarise a few posts on Bluesky where I raised concerns about some metadabarcoding datasets that were highlighted by GBIF: Looking at these datasets it’s clear that something is wrong. Data The datasets discussed are for CO1 Amplicon Sequence Variants from Madagascar, which are part of the Insect Biome Atlas project.

Computer and Information Sciences

Tracking changes in DNA barcode BINs

https://doi.org/10.59350/h97dq-dat02

Published May 16, 2025

Author Roderic Page

Following on from releasing BOLD View I’ve started to explore how the classifcation of DNA barcodes changes over time. BOLD uses the RESL algorithm described in Ratnasingham & Hebert (2013, 2016) to cluster barcodes into “BINs”. As the number of DNA barcodes grows over time these clusters may change.

Computer and Information Sciences

Future interfaces for the Biodiversity Heritage Library

https://doi.org/10.59350/gvfg4-cw420

Published April 11, 2025

Author Roderic Page

On Wednesday this week (April 9th, 2025) I gave a talk entitled “Future interface(s) for BHL” (the slides are on FigShare) at BHL Day 2025.

Computer and Information Sciences

BOLD View: exploring DNA barcodes

https://doi.org/10.59350/81kzw-qy18

Published February 26, 2025

Author Roderic Page

For a while now I’ve been exploring ways to navigate through DNA barcodes. Over the years I’ve built various “toys” to explore barcodes, such as Displaying a million DNA barcodes on Google Maps using CouchDB, built a small scale browser using Elastic search that had some succes, and discovered that Postgres can search for DNA sequences and it’s really fast.

Computer and Information Sciences

Internet Archive as a single point of failure

https://doi.org/10.59350/1r3m1-c5e22

Published October 29, 2024

Author Roderic Page

How to cite: Page, R. (2024). Internet Archive as a single point of failure https://doi.org/10.59350/1r3m1-c5e22 Just a placeholder to mark the ongoing impact of the Internet Archive being attacked (see here, here and here for details). The impact of this on the Biodiversity Heritage Library (BHL) has been huge, and reveals the extent to which BHL depends on the Archive.

Computer and Information Sciences

Exploring BOLD's DNA barcode data releases: there's a fraction too much friction

https://doi.org/10.59350/6qepn-ge510

Published October 18, 2024

Author Roderic Page

How to cite: Page, R. (2024). Exploring BOLD's DNA barcode data releases: there's a fraction too much friction https://doi.org/10.59350/6qepn-ge510 Recently I’ve been exploring data downloaded from BOLD. Part of this was motivated by work done with David Schindel for a recent book: In this blog post I record some struggles I’ve had with the supposedly “Frictionless” data provided by BOLD.

Computer and Information Sciences

The Data Citation Corpus revisited

https://doi.org/10.59350/wvwva-v7125

Published October 8, 2024

Author Roderic Page

How to cite: Page, R. (2024). The Data Citation Corpus revisited https://doi.org/10.59350/wvwva-v7125 TL;DR These are some brief notes on the latest version (v. 2) of the Data Citation Corpus, relased shortly before the Make Data Count Summit 2024, which also included a discussion on the practical uses of the corpus. I downloaded version 2 from Zenodo doi:10.5281/zenodo.13376773.

Computer and Information Sciences

Why do museum and gallery displays ignore the web?

https://doi.org/10.59350/a83tn-c6t14

Published August 13, 2024

Author Roderic Page

How to cite: Page, R. (2024). Why do museum and gallery displays ignore the web? https://doi.org/10.59350/a83tn-c6t14 This post is inspired by the Pharaoh exhibition at the NGV in Melbourne, Australia. This is a beautifully displayed exhibition of objects from the British Museum, London. It has all the trappings of a modern exhibition, beautiful lighting, a custom sound track, and lots of social media coverage.

iPhylo

Model Context Protocol (MCP) and triple stores: natural language queries for knowledge graphs

Make Data Count Kaggle Competition

How many times are DNA barcoding datasets cited?

A metabarcoding mess and the importance of just looking at the data

Tracking changes in DNA barcode BINs

Future interfaces for the Biodiversity Heritage Library

BOLD View: exploring DNA barcodes

Internet Archive as a single point of failure

Exploring BOLD's DNA barcode data releases: there's a fraction too much friction

The Data Citation Corpus revisited

Why do museum and gallery displays ignore the web?