Computer and Information SciencesHugo

rOpenSci - open tools for open science

rOpenSci - open tools for open science
Open Tools and R Packages for Open Science
Home PageJSON Feed
language
Published
Author Jeroen Ooms

Last month we released a new version of pdftools and a new companion package qpdf for working with pdf files in R. This release introduces the ability to perform pdf transformations, such as splitting and combining pages from multiple files. Moreover, the pdf_data() function which was introduced in pdftools 2.0 is now available on all major systems.

Published
Author Jeroen Ooms

A new version of pdftools has been released to CRAN. Go get it while it’s hot: install.packages("pdftools") This version has two major improvements: low level text extraction and encoding improvements. About PDF textboxes A pdf document may seem to contain paragraphs or tables in a viewer, but this is not actually true.

Published
Author Thomas J. Leeper

There is no problem in science quite as frustrating as other peoples’ data . Whether it’s malformed spreadsheets, disorganized documents, proprietary file formats, data without metadata, or any other data scenario created by someone else, scientists have taken to Twitter to complain about it. As a political scientist who regularly encounters so-called “open data” in PDFs, this problem is particularly irritating.

Published
Author Jeroen Ooms

Scientific articles are typically locked away in PDF format, a format designed primarily for printing but not so great for searching or indexing. The new pdftools package allows for extracting text and metadata from pdf files in R. From the extracted plain-text one could find articles discussing a particular drug or species name, without having to rely on publishers providing metadata, or pay-walled search engines.