Is it's been a while since I've blogged here. The last few months have been, um, interesting for so many reasons.
Is it's been a while since I've blogged here. The last few months have been, um, interesting for so many reasons.
While working with linked data and ways to explore and visualise information, I keep coming back to the Haystack project, which is now over a decade old. Among the tools developed was the Haystack application, which enabled a user to explore all sorts of structured data. Below is a screen shot of Haystack showing a sequence for Homo sapiens cyclin T1 (CCNT1), transcript variant a, mRNA.
Quick notes on modelling taxonomic names in databases, as part of an ongoing discussion elsewhere about this topic. Simple model One model that is widely used (e.g., ITIS, WoRMS) and which is explicit in Darwin Core Archive is something like this: We have a table for taxa and we don't distinguish between taxa and their names. the taxonomic hierarchy is represented by the parentID field, which points to your parent.
I have a love/hate relationship with the Catalogue of Life (CoL). On the one hand, it's an impressive achievement to have persuaded taxonomists to share names, and to bring those names together in one place. I suspect that Frank Bisby would feel that the social infrastructure he created is his lasting legacy.
I'll keep this short: LSIDs suck because they are so hard to set up that many LSIDs don't actually work. Because of this there seems to be no shame in publishing "fake" LSIDs (LSIDs that look like LSIDs but which don't resolve using the LSID protocol). Hey, it's hard work, so let's just stick them on a web page but not actually make them resolvable.
To much fanfare (e.g., Nature News , "Linnaeus meets the Internet" doi:10.1038/news.2010.221), on May 5th PLoS ONE published Sandy Knapp's "Four New Vining Species of Solanum (Dulcamaroid Clade) from Montane Habitats in Tropical America" doi:10.1371/journal.pone.0010502.
OK, really must stop avoiding what I'm supposed to be doing (writing a paper, already missed the deadline), but continuing the theme of LSIDs and short URLs, it occurs to me that LSIDs can be seen as a disaster (don't work in webrowsers, nobody else uses them, hard to implement, etc.) or an opportunity.
The LSID discussion rumbles on (see my earlier post). One issue that has re-emerged is the use of HTTP proxies in RDF documents.
The LSID discussion has flared up (again) on the TDWG mailing lists. This discussion keeps coming around (I've touched on it here and here), this time it was sparked by the LSID SourceForge site being broken (the part where you get the code is OK). Some of the issues being raised include: Nobody uses LSIDs except the biodiversity informatics crowd, have we missed something?
In the wiki examples I've been developing I've been trying to model names using the TDWG LSID vocabularies, particularly TaxonName. Roger Hyam has obviously put a huge amount of work into developing these, and they handle just about everything I need. However, I think that there's one thing missing, namely a way to express the logical relationship between the parts of a multinomial taxonomic name.