Rogue Scholar

Published September 13, 2011

Some quick half-baked thoughts on citation matching. One of the things I'd really like to add to BioStor is the ability to parse article text and extract the list of literature cited. Not only would this be another source of bibliographic data I can use to find more articles in BHL, but I could also build citation networks for articles in BioStor.

APIBHLBioStorFlickrInterfaceComputer and Information Sciences

More BHL app ideas

https://doi.org/10.59350/qbcrs-4b509

Published September 13, 2011

Author Roderic Page

Following on from my previous post on BHL apps and a Twitter discussion in which I appealed for a "sexier" interface for BHL (to which @elywreplied that is what BHL Australia were trying to do), here are some further thoughts on improving BHL's web interface. Build a new interface A fun project would be to create a BHL website clone using just the BHL API.

BioStorLuceneSearchSolrComputer and Information Sciences

Adding Solr to BioStor: searching for real

https://doi.org/10.59350/cd9pt-bb147

Published June 8, 2011

Author Roderic Page

Prompted by the appearance on the BHL blog of an article about BioStor I've thinking about how to improve what is basically a fairly clunky tool. One major weakness is searching the collection of nearly 40,000 articles extracted from BHL. Note the word "extracted." BioStor isn't a tool like PubMed or Google Scholar where the goal is to find articles on a topic.

BHLBioStorCouchDBPubMed CentralReplicationComputer and Information Sciences

ZooBank on CouchDB: UUIDs, replication, and embedding the literature in taxonomic databases

https://doi.org/10.59350/pmwa9-40707

Published May 26, 2011

Author Roderic Page

Last December I released a web site called Australian Faunal Directory on CouchDB, which was part of my ongoing exploration of how to build a simple yet useful database of taxonomic names. In particular, I want to link names directly to the primary taxonomic literature.

BioStorBMC BioinformaticsGoogle ScholarPublishedComputer and Information Sciences

BioStor article published (finally)

https://doi.org/10.59350/g3xqe-m6618

Published May 23, 2011

Author Roderic Page

My article describing BioStor — "Extracting scientific articles from a large digital archive: BioStor and the Biodiversity Heritage Library" — has finally seen the light of day in BMC Bioinformatics (doi:10.1186/1471-2105-12-187, the DOI is not working at the moment, give it a little while to go live, meantime you can access the article here). Getting this article published was more work than I expected.

BackgroundBHLBioStorDjVuRTFMComputer and Information Sciences

BHL, DjVu, and reading the f*cking manual

https://doi.org/10.59350/nmpja-1sh38

Published April 15, 2011

Author Roderic Page

One of the many biggest challenges I've faced with the BioStor project, apart from dealing with messy metadata, has been handling page images. At present I get these from the Biodiversity Heritage Library. They are big (typically 1 Mb in size), and have the caramel colour of old paper. Nothing fills up a server quicker than thousands of images.

BHLBioStorMicrocitationsNamesNomenclator ZoologicusComputer and Information Sciences

Nomenclator Zoologicus meets Biodiversity Heritage Library: linking names directly to literature

https://doi.org/10.59350/kbesa-avt79

Published March 7, 2011

Author Roderic Page

Following on from my previous post on microcitations I've blasted all the citations in Nomenclator Zoologicus through my microcitation service and created a simple web site where these results can be browsed.

BHLBioStorTwitterComputer and Information Sciences

BioStor updates on Twitter

https://doi.org/10.59350/mdrrm-4b111

Published March 3, 2011

Author Roderic Page

BioStor has had a Twitter account @biostor_org for a while, but it's not been active. I finally got around to hooking it up to BioStor, so that now every time an article is added to BioStor, the title of that article and it's URL appears in the @biostor_org Twitter feed. Activity on this feed will be variable, depending on whether articles are being added manually, or in bulk.

BHLBioStorMendeleyOpenURLComputer and Information Sciences