Computer and Information SciencesBlogger

iPhylo

Rants, raves (and occasionally considered opinions) on phyloinformatics, taxonomy, and biodiversity informatics. For more ranty and less considered opinions, see my Twitter feed.ISSN 2051-8188. Written content on this site is licensed under a Creative Commons Attribution 4.0 International license.
Home PageAtom FeedMastodonISSN 2051-8188
language
BHLCodeDjVuHOCRJATSComputer and Information Sciences
Published

A while ago I posted BHL to PDF workflow which was a sketch of a work flow to generate clean, searchable PDFs from Biodiversity Heritage Library (BHL) content: I've made some progress on putting this together, as well as expanded the goal somewhat. In fact, there are several goals: BioStor articles need to be archived somewhere.

GBIFGithubGlassellaGMLPinnixaComputer and Information Sciences
Published

Quick notes on yet another attempt to marry the task of editing a taxonomic classification with versioning it in GitHub. The idea of dumping the whole GBIF classification into GitHub as a series of nested folders looks untenable. So, maybe there's another way to tackle the problem.

ICZNNatureTaxonomyZootaxaComputer and Information Sciences
Published

There is a fairly scathing editorial in Nature [The new zoo. (2013). Nature, 503(7476), 311–312. doi:10.1038/503311b ] that reacts to a recent paper by Dubois et al.: To quote the editorial: Ouch! But Dubois et al.'s paper pretty much deserves this reaction - it's a reactionary rant that is breathtaking in it's lack of perspective.

Index FungorumIONIPNILSIDsNomenclatorsComputer and Information Sciences
Published

Quick notes on taxonomic names (again). It's a continuing source of bafflement that the biodiversity community is making a dog's breakfast of names. It seems we are forever making it more complicated than it needs to be, forever minting new acronyms that pollute the landscape without actually contributing anything useful, and forever promising shiny new tools and services without every actually delivering them.

GBIFGithubZooKeysComputer and Information Sciences
Published

Here's another example of a Darwin Core Archive that is "broken" such that GBIF is missing some information. GBIF data set A checklist to the wasps of Peru (Hymenoptera, Aculeata) comes from Pensoft, and corresponds to the paper: As with the previous example GBIF says there are 0 georeferenced records in this dataset. This is odd, because the ZooKeys page for this article lists three supplementary files, including KML files for Google Earth.

Darwin Core ArchiveGBIFGithubComputer and Information Sciences
Published

Following on from Annotating and cleaning GBIF data: Darwin Core Archive, GitHub, ORCID, and DataCite here's a quick and dirty example of using GitHub to help clean up a Darwin Core Archive. The dataset 3i - Cicadellinae Database has 2,152 species and 4,749 taxa, but GBIF says it has no georeferenced data.

Catalogue Of LifeLSIDComputer and Information Sciences
Published

I have a love/hate relationship with the Catalogue of Life (CoL). On the one hand, it's an impressive achievement to have persuaded taxonomists to share names, and to bring those names together in one place. I suspect that Frank Bisby would feel that the social infrastructure he created is his lasting legacy.

DataCiteGBIFGithubORCIDComputer and Information Sciences
Published

This is a quick sketch of a way to combine existing tools to help clean and annotate data in GBIF, particularly (but not exclusively) occurrence data. GitHub The data provider puts a Darwin Core Archive (expanded, not zipped) into a GitHub repository. GBIF forks the repository, cleans the data, and uploads that to GBIF to populate the database behind the portal.

GBIFComputer and Information Sciences
Published

I've recently been appointed Chair of the Science Committee of the Global Biodiversity Information Facility (GBIF) http://www.gbif.org [1]. The committee is a small group of people with a range of backgrounds, and one of our roles is to advise GBIF on matters scientific (e.g., what kinds of data GBIF should collect?, what kinds of scientific questions should GBIF help answer?, etc.). There have been formal surveys (see the papers in the journal

BHLBioNamesCoverageISSNJournalsComputer and Information Sciences
Published

One reason I was able to build BioNames is because a significant fraction of the taxonomic literature for animals is now online, either due to the efforts of the Biodiversity Heritage Library, digital archives, commercial publishers, or individual institutions and scientific societies. However there are still big gaps in literature availability.

GBICGBIFGBIOComputer and Information Sciences
Published

Wednesday saw the launch of the Global Biodiversity Informatics Outlook (GBIO), based in large part on the Global Biodiversity Informatics Conference (GBIC). The aim is to provide a framework for biodiversity informatics and its applications in the hope that the field will unite around a shared vision of where we are and what needs to be done next: There is a web site http://www.biodiversityinformatics.org/ with more details and links to