Computer and Information SciencesBlogger

iPhylo

Rants, raves (and occasionally considered opinions) on phyloinformatics, taxonomy, and biodiversity informatics. For more ranty and less considered opinions, see my Twitter feed.ISSN 2051-8188. Written content on this site is licensed under a Creative Commons Attribution 4.0 International license.
Home PageAtom FeedMastodonISSN 2051-8188
language
Published

Willi Egloff, Donat Agosti, Puneet Kishor, David Patterson, and Jeremy A. Miller have published an interesting preprint entitled “Copyright and the Use of Images as Biodiversity Data” DOI:10.1101/087015 in which they argue that taxonomic images aren't copyrightable. I'm not convinced, and have commented on the bioRxiv site.

Published

The following is a guest post by Bob Mesibov. According to w3techs, seven out of every eight websites in the Alexa top 10 million are UTF-8 encoded. This is good news for us screenscrapers, because it means that when we scrape data into a UTF-8 encoded document, the chances are good that all the characters will be correctly encoded and displayed. It's not quite good news for two reasons.

Published

The GBIF 2016 Ebbe Nielsen Challenge has received 15 submissions. You can view them here: Unlike last year where the topic was completely open, for the second challenge we've narrowed the focus to "Analysing and addressing gaps and biases in primary biodiversity data". As with last year, judging is limited to the jury (of which I'm a member), however anyone interested in biodiversity informatics can browse the submissions.

Published

This guest post by Tony Rees describes his quest to track all genus names ever published (plus a subset of the species…). A “holy grail” for biodiversity informatics is a suitably quality controlled, human- and machine-queryable list of all the world’s species, preferably arranged in a suitable taxonomic hierarchy such as kingdom-phylum-class-order-family-genus or other.

Published

David Schindel and colleagues recently published a paper in the Biodiversity Data Journal : The paper is a call for the community to help grow a database (GRBio) on biodiversity repositories, a database that will "will require community input and curation". Reading this, I'm struck by the lack of a clear sense of what that community might be. In particular: who is this database for, and who is most likely to build it? I suspect that

Published

The goal of my BioNames project is to link every taxonomic name to its original description (initially focussing on animal names). The rationale is that taxonomy is based on evidence, and yet most of this evidence is buried in a non-digitised and/or hard to find literature. Surfacing this information not only makes taxonomic evidence accessible (see Surfacing the deep data of taxonomy), it also surfaces a lot of basic biological information.

Published

In a classic paper Boggs (1949) appealed for an “atlas of ignorance”, an honest assessment of what we know we don’t know: This is the theme of this year's GBIF Challenge: Analysing and addressing gaps and biases in primary biodiversity data. "Gaps" can be gaps in geographic coverage, taxa group, or types of data. GBIF is looking for ways to access the nature of the gaps in the data it is aggregating from its network of contributors.

Published

BioStor now has 150,000 articles. When I wrote a paper describing how BioStor worked it had 26,784 articles, so things have progressed somewhat! I continue to tweak the interface to BioStor, trying different ways to explore the articles. Spatial search I've tweaked spatial search in BioStor.

Published

Some notes on containers, microservices, and data. The idea of packaging software into portable containers and running them either locally or in the cloud is very attractive (see Docker). Some use cases I'm interested in exploring. Microservices In Towards a biodiversity knowledge graph (doi:10.3897/rio.2.e8767) I listed a number of services that are essentially self contained, such as name parsers, reconciliation tools, resolvers, etc.