Computer and Information SciencesBlogger

iPhylo

Rants, raves (and occasionally considered opinions) on phyloinformatics, taxonomy, and biodiversity informatics. For more ranty and less considered opinions, see my Twitter feed.ISSN 2051-8188. Written content on this site is licensed under a Creative Commons Attribution 4.0 International license.
Home PageAtom FeedMastodonISSN 2051-8188
language
Published

Continuing the theme of the failings of the GBIF classification I've been playing further with cluster maps to visualise the problem (see this earlier post for an introduction). Browsing through bats in GBIF I keep finding the same species appearing more than once, albeit in different genera.

Published

Readers of this blog will know that I'm sceptical about the current value of linked data and RDF in biodiversity informatics. But I came across an interesting paper on RDF and biocuration that suggests a good "use case" for RDF in constructing and curating taxonomic databases. The paper is "Catching inconsistencies with the semantic web: a biocuration case study" (PDF here) by Jerven Bolleman and Sebastien Gehant.

Published

I know I'm starting to sound like a broken record, but the more I look, the more taxonomic databases seem to be full of garbage. Databases such as the Catalogue of life, which states that it is a "quality-assured checklist" have records that are patently wrong. Here's yet another example.

Published

This post arose from an ongoing email conversation with Tony Rees about extracting and annotating taxonomic names. In BioStor I use the GBIF classification to display the taxonomic names found in the OCR text in the form of a tree. The idea is to give the reader a sense of "what the paper is about". I also use the classification to help link to GBIF occurrence records.

Published

Revisiting an old idea (Clustering taxonomic names) I've added code to cluster strings into sets of similar strings to the phyloinformatics course site. This service (available at http://iphylo.org/~rpage/phyloinformatics/services/clusterstrings.php) takes a list of strings, one per line, and returns a list of clusters. For example, given the names Ferrusac 1821 Bonavita 1965 Ferussa 1821 Fer.

Published

Google Refine is an elegant tool for data cleaning. One of its most powerful features is the ability to call "Reconciliation Services" to help clean data, for example by matching names to external identifiers. Google Refine comes with the ability to use Freebase reconciliation services, but you can also add external services. Inspired by this I've started to implement services to reconcile taxonomic names.

Published

Continuing with my exploration of the Biodiversity Heritage Library one obstacle to linking BHL content with nomenclature databases is the lack of a consistent way to refer to the same bibliographic item (e.g., book or journal). For example, the Amphibia Species of the World (ASW) page for Gastrotheca aureomaculata gives the first reference for this name as: Gastrotheca aureomaculata Cochran and Goin, 1970, Bull. U.S. Natl.