Natural SciencesWordPress

A blog by Ross Mounce

Home PageJSON Feed
language
Published

OpenCon 2015 Brussels was an amazing event. I’ll save a summary of it for the weekend but in the mean time, I urgently need to discuss something that came up at the conference. At OpenCon, it emerged that Elsevier have apparently been blocking Chris Hartgerink’s attempts to access relevant psychological research papers for content mining. No one can doubt that Chris’s research intent is legitimate – he’s not fooling around here.

Published

Anyone care to remember how awful and unusable the web interface for accessing the NHM’s specimen records used to be? Behold the horror below as it was in 2013, or visit the Web Archive to see just how bad it was. It’s not even the ‘look’ of it that was the major problem – it was more that it simply wouldn’t return results for many searches. No one I know actually used that web interface because of these issues.

Published

With a first commit to github not so long ago (2015-04-13), getpapers is one of the newest tools in the ContentMine toolchain. It’s also the most readily accessible and perhaps most immediately exciting – it does exactly what it says on the tin: it gets papers for you en masse without having to click around all those different publisher websites. A superb time-saver.

Published

To prove my point about the way that supplementary data files bury useful data, making it utterly indiscoverable to most, I decided to do a little experiment (in relation to text mining for museum specimen identifiers, but also perhaps with some relevance to the NHM Conservation Hackathon): I collected the links for all Biology Letters supplementary data files.

Published

I’ve been on holiday to Japan recently, so work came to a halt on this for a while but I think I’ve largely ‘done’ PLOS ONE full text now (excluding supplementary materials). My results are on github: https://github.com/rossmounce/NHM-specimens/tree/master/results – one prettier file without the exact provenance or in-sentence context of each putative specimen entity, and one more extensive file with provenance &

Published

In this post I’ll go through an illustrated example of what I plan to do with my text mining project: linking-up biological specimens from the Natural History Museum, London (sometimes known as BMNH or NHMUK) to the published research literature with persistent identifiers. I’ve run some simple grep searches of the PMC open access subset already, and PLOS ONE make up a significant portion of the ‘hits’, unsurprisingly.

Published

Now I’m at the Natural History Museum, London I’ve started a new and ambitious text-mining project: to find, extract, publish, and link-up all mentions of NHM, London specimens published in the recent research literature (born digital, published post-2000). Rod Page is already blazing a trail in this area with older BHL literature.

Published

Yesterday, I made a journal scraper for the International Journal of Systematic and Evolutionary Microbiology (IJSEM). Fortunately, Richard Smith-Unna and the ContentMine team have done most of the hard work in creating the general framework with quickscrape (open-source and available on github), I just had to modify the available journal-scrapers to work with IJSEM.

Published

[Update: I’ve submitted this idea as a FORCE11 £1K Challenge research proposal 2015-01-13. I may be unemployed from April 2015 onwards (unsolicited job offers welcome!), so I certainly might find myself with plenty of time on my hands to properly get this done…!] Inspired by something I heard Stephen Curry say recently, and with a little bit of help from Jo McIntyre I’ve started a project to compare EuropePMC author manuscripts