Rogue Scholar

Published May 28, 2019

Quick note on Frankenplace, a cool search tool that displays the geographic distribution of documents that match the user's query as a heatmap. Details of how the tool works are given in: At the heart of the method is a discrete global grid that divides the world up into small areas of the same size.

GBIFLeafletSearchComputer and Information Sciences

Searching GBIF by drawing on a map

https://doi.org/10.59350/cybn5-ded07

Published April 21, 2016

Author Roderic Page

One of my frustrations with the GBIF portal is that it is hard to drill down and search in a specific area. You have to zoom in and then click for a list of occurrences in the current bounding box of the map. You can't, for example, draw a polygon such as the boundary of a protected area and search within that area.

BHLCloudantCouchDBDjVuSearchComputer and Information Sciences

Demo of full-text indexing of BHL using CouchDB hosted by Cloudant

https://doi.org/10.59350/4crdc-fm682

Published August 10, 2015

Author Roderic Page

One of the limitations of the Biodiversity Heritage Library (BHL) is that, unlike say Google Books, its search functions are limited to searching metadata (e.g., book and article titles) and taxonomic names. It doesn't support full-text search, by which I mean you can't just type in the name of a locality, specimen code, or a phrase and expect to get back much in the way of results.

COinSGreasemonkeyMetadataOpenURLSearchComputer and Information Sciences

In defence of OpenURL: making bibliographic metadata hackable

https://doi.org/10.59350/rt5rw-00c69

Published March 13, 2013

Author Roderic Page

This is not a post I'd thought I'd write, because OpenURL is an awful spec. But last week I ended up in vigorous debate on Twitter after I posted what I thought was a casual remark: This ended up being a marathon thread about OpenURL, accessibility, bibliographic metadata, and more.

GoogleMetadataPollutionSearchComputer and Information Sciences

Bibliographic metadata pollution

https://doi.org/10.59350/ffpw6-tph57

Published March 13, 2013

Author Roderic Page

I spend a lot of time searching the web for bibliographic metadata and links to digitised versions of publications.

BBCGoogleKnowledge GraphSearchStructured DataComputer and Information Sciences

Google Knowledge Graph using data from BBC and Wikipedia

https://doi.org/10.59350/2kdav-4zn26

Published August 2, 2012

Author Roderic Page

Google's Knowledge Graph can enhance search results by display some structured information about a hit in your list of results. It's available in the US (i.e., you need to use www.google.com, although I have seen it occasionally appear for google.co.uk. Here is what Google displays for Eidolon helvum (the straw-coloured fruit bat). You get a snippet of text from Wikipedia, and also a map from the BBC Nature Wildlife site.

BioStorLuceneSearchSolrComputer and Information Sciences

Adding Solr to BioStor: searching for real

https://doi.org/10.59350/cd9pt-bb147

Published June 8, 2011

Author Roderic Page

Prompted by the appearance on the BHL blog of an article about BioStor I've thinking about how to improve what is basically a fairly clunky tool. One major weakness is searching the collection of nearly 40,000 articles extracted from BHL. Note the word "extracted." BioStor isn't a tool like PubMed or Google Scholar where the goal is to find articles on a topic.

Atlas Of Living AustraliaAustralian Faunal DirectoryGooglePagerankSearchComputer and Information Sciences

Why is the Atlas of Living Australia is invisible to Google?

https://doi.org/10.59350/j5sn7-kws35

Published February 6, 2011

Author Roderic Page

Jeff Atwood, one of the co-founders of Stack Overflow recently wrote a blog post Trouble In the House of Google, where he noted that several sites that scrape Stack Overflow content (which Stack Overflow's CC-BY-SA license permits) appear higher in Google's search rankings than the original Stack Overflow pages . When Stack Overflow chose the CC-BY-SA license they made the assumption that: Jeff Atwood's post goes on to argue

FungiGoogleSearchWikipediaComputer and Information Sciences

Fungi in Wikipedia

https://doi.org/10.59350/ye5r0-ta821

Published September 2, 2009

Author Roderic Page

One response to the analysis I did of the Google rank of mammal pages in Wikipedia is to suggest that Wikipedia does well for mammals because these are charismatic. It's been suggested that for other groups of taxa Wikipedia might not be so prominent in the search results. As a quick test I extracted the 1552 fungal species I could find in Wikipedia and repeated the analysis.

Clay ShirkyEOLGooglePower LawSearchComputer and Information Sciences

Google, Wikipedia, and EOL

https://doi.org/10.59350/qvzh4-v1988

Published September 1, 2009

Author Roderic Page

One assumption I've been making so far is that when people search for information on an organism using its scientific name, Wikipedia will dominate the search results (see my earlier post for an example of this assumption). I've decided to quantify this by doing a little experiment. I grabbed the Mammal Species of the World taxonomy and extracted the 5416 species names. I then used Google's AJAX search API to look up each name in Google.

iPhylo

Frankenplace, geospatial search, and discrete global grid systems

Searching GBIF by drawing on a map

Demo of full-text indexing of BHL using CouchDB hosted by Cloudant

In defence of OpenURL: making bibliographic metadata hackable

Bibliographic metadata pollution

Google Knowledge Graph using data from BBC and Wikipedia

Adding Solr to BioStor: searching for real

Why is the Atlas of Living Australia is invisible to Google?

Fungi in Wikipedia

Google, Wikipedia, and EOL