Rogue Scholar

Encylcopedia Of LifeEOLSucksInformatikEnglisch

Encyclopedia of Life - first impressions

Veröffentlicht 26. Februar 2008

Some thoughts on the first release of the Encyclopedia of Life. I am being deliberately critical. This is a high profile project with tens of millions of dollars in funding, lots of people involved, and is accompanied by some of the most overblown hype in organismal biology. In a sense I think EOL has set itself up by over promising and under delivering.

Encylcopedia Of LifeEOLInformatikEnglisch

EOL live

https://doi.org/10.59350/8cc8h-bdq25

Veröffentlicht 26. Februar 2008

Autor Roderic Page

The first release of the Encyclopedia of Life is officially live today. I have promised to be very good...

LSIDPublicationInformatikEnglisch

LSID Tester, a tool for testing Life Science Identifier resolution services

https://doi.org/10.59350/tfjdw-q3h90

Veröffentlicht 18. Februar 2008

Autor Roderic Page

My short note on the LSID Tester tool has been published in the Open Access journal Source Code for Biology and Medicine. The article has just come out so the DOI (doi:10.1186/1751-0473-3-2) isn't live yet, the direct link is http://www.scfbm.org/content/3/1/2/. Source code for the tester is available from Google Code.

ErrorTBMapInformatikEnglisch

TBMap errors

https://doi.org/10.59350/7tqct-bx723

Veröffentlicht 18. Februar 2008

Autor Roderic Page

In the absence of a proper bug reporting system, I'm going to use this post to collect errors in the TBMap project, which maps taxonomic names in TreeBASE onto names in other databases. TaxonID TaxonName Notes T57654 Lycorideae Erroneously agrep matched to the spider family Lycosidae, this is a plant tribe.

CrossrefDOIOpenURLPaperIDInformatikEnglisch

CrossRef blogger tool for DOI lookup

https://doi.org/10.59350/y3t02-esp07

Veröffentlicht 17. Februar 2008

Autor Roderic Page

CrossRef have released a tool for bloggers to look up DOIs and insert them into blog posts: So far the tool is only available for WordPress blogs. The idea is that bloggers can use DOIs to uniquely identify papers that they are discussing, while at the same time providing readers with an easy way to go to the site hosting the article, and aggregators such as postgenomic.com can cluster posts about the same paper.

"data Wars"GoogleMashupMicrosoftScrapingInformatikEnglisch

The Data Wars

https://doi.org/10.59350/q7e07-vys87

Veröffentlicht 5. Februar 2008

Autor Roderic Page

Wired 16.01 has an article entitled The Data Wars by Josh McHugh. A quote from the printed version: It's a sobering read for those of us who advocate harvesting data from as many sources as possible, more so in light of Microsoft's bid to buy Yahoo. Yahoo provides free access to many of its tools via an API (such as the image search I use in iSpecies, and in this sense is much more open than Google. Might this change under Microsoft...?

InformatikEnglisch

How to visualize a phylogeny with thousands of tips?

https://doi.org/10.59350/vpk8f-a4m49

Veröffentlicht 4. Februar 2008

Autor Roderic Page

Dave Lunt has a nice post on How to visualize a phylogeny with thousands of tips?. Dave lists 12 things that his ideal phylogenetic tree viewing tool should do, and invites comments. It will be interesting to see what comes of this...

CitationPagerankInformatikEnglisch

Incomplete citation and ranking

https://doi.org/10.59350/zdg2m-tv281

Veröffentlicht 4. Februar 2008

Autor Roderic Page

Came across the paper "Using incomplete citation data for MEDLINE results ranking" (pmid:16779053, fulltext available in PMC .The authors applied PageRank (the algorithm Google use to rank search results) to papers in MEDLINE and found that PageRank is robust to information loss. In other words, even if a citation database is incomplete it will do a good job of ranking results.

EAVMd5RDFInformatikEnglisch

A database of everything

https://doi.org/10.59350/m04cw-m9011

Veröffentlicht 4. Februar 2008

Autor Roderic Page

Nothing like a little hubris first thing Monday morning... After various experiments, such as a triple store for ants (documented on the Semant blog) and bioGUID (documented on the bioGUID blog), I'm starting from scratch and working on a "database of everything". Put another way, I'm working on a database that aggregates metadata about specimens, sequences, literature, images, taxonomic names, etc.

ConnoteaHTTP URIIdentifiersIntegrationInformatikEnglisch

How Shall I Integrate Thee? Let Me Count the Ways...

https://doi.org/10.59350/zj9ds-v8023

Veröffentlicht 31. Januar 2008

Autor Roderic Page

Leigh Dodds has a nice post How Shall I Integrate Thee? Let Me Count the Ways... about different ways to integrate data.

MetacrapMetadataEnglisch

Metacrap

https://doi.org/10.59350/wrdz0-vmg69

Veröffentlicht 29. Januar 2008

Autor Roderic Page

Time for a rant. I spend a lot of time fussing with records from sources such as GenBank and DiGIR providers, trying to extract strings that might be identifiers, with a view to linking sequences to specimens (and thus to localities), sequences to publications, publications to GUIDs, etc.

iPhylo

Encyclopedia of Life - first impressions

EOL live

LSID Tester, a tool for testing Life Science Identifier resolution services

TBMap errors

CrossRef blogger tool for DOI lookup

The Data Wars

How to visualize a phylogeny with thousands of tips?

Incomplete citation and ranking

A database of everything

How Shall I Integrate Thee? Let Me Count the Ways...

Metacrap