Rogue Scholar

Published May 27, 2022

Note to self (basically rewriting last year's Finding citations of specimens). Bibliographic data supports going from identifier to citation string and back again, so we can do a "round trip." 1. Given a DOI we can get structured data with a simple HTTP fetch, then use a tool such as citation.js to convert that data into a human-readable string in a variety of formats.

BioRxivGBIFGenbankGeocodingSpecimen CodesComputer and Information Sciences

Geocoding genomic databases using GBIF

https://doi.org/10.59350/35kwk-1ty15

Published November 15, 2018

Author Roderic Page

I've put a short note up on bioRxiv about ways to geocode nucleotide sequences in databases such as GenBank. The preprint is "Geocoding genomic databases using GBIF" https://doi.org/10.1101/469650.

Ross MounceSpecimen CodesText MiningComputer and Information Sciences

Text mining for museum specimen identifiers

https://doi.org/10.59350/xvdw8-nc818

Published May 19, 2015

Author Roderic Page

This post is a response to Ross Mounce's post Text mining for museum specimen identifiers. As Ross notes in that post, mining literature for specimen codes is something I've been interested in for a while (search for specimen codes on iPhylo), and @Aime Rankin (formerly an undergraduate student at Glasgow) did some work on this as well. It's great to see progress in this area.

GBIFGoogle DocsMaterial ExaminedSpecimen CodesWeb ServicesComputer and Information Sciences

Looking up specimen codes in GBIF using Google Spreadsheet

https://doi.org/10.59350/c236y-8bn90

Published April 21, 2015

Author Roderic Page

Playing with the my "material examined" tool I've been working on, I wondered whether I could make use of it in, say, a spreadsheet. Imagine that I have a spreadsheet of museum codes and want to look those up in GBIF. I could create a service for Open Refine but Open Refine is a bit big and clunky, you have to fire up a Java application and point your browser at it, and Open Refine isn't as intuitive or as flexible as a spreadsheet.

GBIFGenbankKnowledge GraphSpecimen CodesComputer and Information Sciences

Linking specimen codes to GBIF

https://doi.org/10.59350/g6gq1-crg31

Published April 15, 2015

Author Roderic Page

I've put together a working demo of some code I've been working on to discover GBIF records that correspond to museum specimen codes. The live demo is at http://bionames.org/~rpage/material-examined/ and code is on GitHub. To use the demo, simply paste in a specimen code (e.g., "MCZ 24351") and click Find and it will do it's best to parse the code, then go off to GBIF and see what it can find.

BHLNHMPteralopexSpecimen CodesComputer and Information Sciences

Linking data from the NHM portal with content in BHL

https://doi.org/10.59350/fwy42-qza35

Published December 18, 2014

Author Roderic Page

One reason I'm excited by the launch of the NHM data portal is that it opens up opportunities to link publications about specimens i the NHM to the record of the specimens themselves.

CrossrefDataCiteDOIIdentifiersSpecimen CodesComputer and Information Sciences

Quick thoughts on specimen identifiers

https://doi.org/10.59350/8y7v3-6jc97

Published April 20, 2012

Author Roderic Page

Based on recent discussions my sense is that our community will continue to thrash the issue of identifiers to death, repeating many of the debates that have gone on (and will go on) in other areas. To be trite, it seems to me we have three criteria: cheap , resolvable , and persistent . We get to pick two.

Darwin Core RipletDuplicatesGBIFIdentifiersSpecimen CodesComputer and Information Sciences

How many specimens does GBIF really have?

https://doi.org/10.59350/2d3dv-8q010

Published February 23, 2012

Author Roderic Page

Duplicate records are the bane of any project that aggregates data from multiple sources.

Darwin Core RipletData MiningMuseumSpecimen CodesComputer and Information Sciences

Extracting museum specimen codes from text

https://doi.org/10.59350/6qy4m-eg641

Published January 26, 2012

Author Roderic Page

Quick note about a tool I've cobbled together as part of the phyloinformatics course, which addresses a long standing need I and others have to extract specimen codes from text. I've had this code kicking around for a while (as part of various never-finished data mining projects), but never got around to releasing it, until now.

iPhylo

Round trip from identifiers to citations and back again

Geocoding genomic databases using GBIF

Text mining for museum specimen identifiers

Looking up specimen codes in GBIF using Google Spreadsheet

Linking specimen codes to GBIF

Linking data from the NHM portal with content in BHL

Quick thoughts on specimen identifiers

How many specimens does GBIF really have?

Extracting museum specimen codes from text