InformatikEnglischBlogger

iPhylo

Rants, raves (and occasionally considered opinions) on phyloinformatics, taxonomy, and biodiversity informatics. For more ranty and less considered opinions, see my Twitter feed.ISSN 2051-8188. Written content on this site is licensed under a Creative Commons Attribution 4.0 International license.
StartseiteAtom-FeedMastodonISSN 2051-8188
language
BHLBioStorNgramTimelineInformatikEnglisch
Veröffentlicht

Given a big corpus of literature one of the fun things to do is look at how the use of a term has changed over time. When did people first use a particular word? When did one word start to replace another, etc.? Google's Ngram Viewer is perhaps the best known tool for exploring these questions.

DNA BarcodingGBIFIBOLInformatikEnglisch
Veröffentlicht

I've uploaded all the COI barcodes in the iBOL public data dumps into GBIF. This is an update of an earlier project that uploaded a small subset. Now that dataset doi:10.15468/inygc6 has been expanded to include some 2.7 million barcodes.

Bob MesibovCharacter EncodingGuest PostUFT-8InformatikEnglisch
Veröffentlicht

The following is a guest post by Bob Mesibov. According to w3techs, seven out of every eight websites in the Alexa top 10 million are UTF-8 encoded. This is good news for us screenscrapers, because it means that when we scrape data into a UTF-8 encoded document, the chances are good that all the characters will be correctly encoded and displayed. It's not quite good news for two reasons.

ChallengeGBIFInformatikEnglisch
Veröffentlicht

The GBIF 2016 Ebbe Nielsen Challenge has received 15 submissions. You can view them here: Unlike last year where the topic was completely open, for the second challenge we've narrowed the focus to "Analysing and addressing gaps and biases in primary biodiversity data". As with last year, judging is limited to the jury (of which I'm a member), however anyone interested in biodiversity informatics can browse the submissions.

Guest PostIRMNGTony ReesInformatikEnglisch
Veröffentlicht

This guest post by Tony Rees describes his quest to track all genus names ever published (plus a subset of the species…). A “holy grail” for biodiversity informatics is a suitably quality controlled, human- and machine-queryable list of all the world’s species, preferably arranged in a suitable taxonomic hierarchy such as kingdom-phylum-class-order-family-genus or other.

CommunityCurationGrBioWikidataInformatikEnglisch
Veröffentlicht

David Schindel and colleagues recently published a paper in the Biodiversity Data Journal : The paper is a call for the community to help grow a database (GRBio) on biodiversity repositories, a database that will "will require community input and curation". Reading this, I'm struck by the lack of a clear sense of what that community might be. In particular: who is this database for, and who is most likely to build it? I suspect that