Continuing my on-again off-again relationship with the Semantic Web, I stumbled across a cool approach to visualising the results of SPARQL queries.
Continuing my on-again off-again relationship with the Semantic Web, I stumbled across a cool approach to visualising the results of SPARQL queries.
Willi Egloff, Donat Agosti, Puneet Kishor, David Patterson, and Jeremy A. Miller have published an interesting preprint entitled “Copyright and the Use of Images as Biodiversity Data” DOI:10.1101/087015 in which they argue that taxonomic images aren't copyrightable. I'm not convinced, and have commented on the bioRxiv site.
One of the most interesting aspects of EOL is "TraitBank", which has been described in a recent paper: TraitBank is available in JSON-LD, and so is potentially part of the Semantic Web.
The following is a guest post by Bob Mesibov. According to w3techs, seven out of every eight websites in the Alexa top 10 million are UTF-8 encoded. This is good news for us screenscrapers, because it means that when we scrape data into a UTF-8 encoded document, the chances are good that all the characters will be correctly encoded and displayed. It's not quite good news for two reasons.
The GBIF 2016 Ebbe Nielsen Challenge has received 15 submissions. You can view them here: Unlike last year where the topic was completely open, for the second challenge we've narrowed the focus to "Analysing and addressing gaps and biases in primary biodiversity data". As with last year, judging is limited to the jury (of which I'm a member), however anyone interested in biodiversity informatics can browse the submissions.
This guest post by Tony Rees describes his quest to track all genus names ever published (plus a subset of the species…). A “holy grail” for biodiversity informatics is a suitably quality controlled, human- and machine-queryable list of all the world’s species, preferably arranged in a suitable taxonomic hierarchy such as kingdom-phylum-class-order-family-genus or other.
David Schindel and colleagues recently published a paper in the Biodiversity Data Journal : The paper is a call for the community to help grow a database (GRBio) on biodiversity repositories, a database that will "will require community input and curation". Reading this, I'm struck by the lack of a clear sense of what that community might be. In particular: who is this database for, and who is most likely to build it? I suspect that
The goal of my BioNames project is to link every taxonomic name to its original description (initially focussing on animal names). The rationale is that taxonomy is based on evidence, and yet most of this evidence is buried in a non-digitised and/or hard to find literature. Surfacing this information not only makes taxonomic evidence accessible (see Surfacing the deep data of taxonomy), it also surfaces a lot of basic biological information.
In a classic paper Boggs (1949) appealed for an “atlas of ignorance”, an honest assessment of what we know we don’t know: This is the theme of this year's GBIF Challenge: Analysing and addressing gaps and biases in primary biodiversity data. "Gaps" can be gaps in geographic coverage, taxa group, or types of data. GBIF is looking for ways to access the nature of the gaps in the data it is aggregating from its network of contributors.
BioStor now has 150,000 articles. When I wrote a paper describing how BioStor worked it had 26,784 articles, so things have progressed somewhat! I continue to tweak the interface to BioStor, trying different ways to explore the articles. Spatial search I've tweaked spatial search in BioStor.
Some notes on containers, microservices, and data. The idea of packaging software into portable containers and running them either locally or in the cloud is very attractive (see Docker). Some use cases I'm interested in exploring.