Computer and Information SciencesHugo

rOpenSci - open tools for open science

rOpenSci - open tools for open science
Open Tools and R Packages for Open Science
Home PageJSON Feed
language
Published
Author Sasha Goodman

The Apache Tika parser is like the Babel fish in Douglas Adam’s book, “The Hitchhikers’ Guide to the Galaxy” 1 . The Babel fish translates any natural language to any other. Although Tika does not yet translate natural language, it starts to tame the tower of babel of digital document formats. As the Babel fish allowed a person to understand Vogon poetry, Tika allows an analyst to extract text and objects from Microsoft Word.

Published
Author Scott Chamberlain

The problem Text-mining - the art of answering questions by extracting patterns, data, etc. out of the published literature - is not easy. It’s made incredibly difficult because of publishers. It is a fact that the vast majority of publicly funded research across the globe is published in paywall journals.