Rogue Scholar

Published June 10, 2016 in Martin Paul Eve

Yesterday, I wrote of a challenge that I faced in working out which texts in a corpus have decent OCR and, then, which texts they actually are. This morning, I put together a small script that has a first go at this. I enclose this below for anybody who is interested.

Initial parsing work on large JSON corpus