Biological SciencesSubstack

Paired Ends

Bioinformatics, computational biology, and data science updates from the field. Occasional posts on programming.
Home PageRSS FeedMastodon
language
R TILBiological Sciences
Published
Author Stephen Turner

Last year I wrote a post describing an R package I put together that fetches recent bioRxiv preprints from a given subject and summarizes them in a couple of sentences using a local LLM running through Ollama: That tool has a limitation in that it’s using the bioRxiv RSS feed to pull recent paper titles and abstracts, and the RSS feeds currently only provide the 30 most recent preprints in each subject area.

Biological Sciences
Published
Author Stephen Turner

I’ve been blogging about genetics, statistics, computational biology, data science, and science in general for over 15 years. I published my first blog post on Getting Genetics Done in 2009, and started this blog last year after taking a few years off. Science blogging has significantly contributed to my personal and professional growth as a scientist. Writing takes time and effort — time I could have spent elsewhere.

PapersBiological Sciences
Published
Author Stephen Turner

I'm still catching up on papers from my late 2024 backlog. This week’s recap highlights a browser application for visualizing pathogen dispersal, a DNA language model evaluation benchmark on regulatory DNA, regularized ensemble polygenic risk prediction with GWAS summary statistics, multimodal analysis of RNA-seq data for complex trait genetics, and a deep dive on blastp’s E-value.

AIBiological Sciences
Published
Author Stephen Turner

Something a little different for this week’s recap. I’ve been thinking a lot lately about the practice of data science education in this era of widely available (and really good!) LLMs for code. Commentary at the top based on my own data science teaching experience, with a deep dive into a few recent papers below.

TILAIBiological Sciences
Published
Author Stephen Turner

The majority of developers use LLMs to help write code, present company included. When I’m working in languages I know well, they're fantastic at handling the grunt work: generating boilerplate, suggesting completions, and writing tedious tests and documentation.

PapersBiological Sciences
Published
Author Stephen Turner

I'm still catching up on papers from my late 2024 backlog. This week’s recap highlights autonomous microbial sensors for detecting TNT in soil, genome size estimation from long reads, STABIX for indexing and compressing GWAS summary statistics, and Clair3-RNA for deep learning-based small variant calling on long-read RNA-seq data.

AIBiological Sciences
Published
Author Stephen Turner

OpenAI introduced the ability to create custom GPTs back in November 2023. I wanted to try to create one of these, and in the spirit of learning in public this post describes how I made it. But first, what does it do?Gene Info Custom GPT Gene Info custom GPT The Gene Info custom GPT takes a list of human gene symbols as input.

R AIBiological Sciences
Published
Author Stephen Turner

Background Bluesky, atrrr, local LLMs I’ve written a few posts lately about Bluesky — first, Bluesky for Science, about Bluesky as a home for Science Twitter expats after the mass eXodus, another on using the atrrr package to expand your Bluesky network. I’ve also spent some time looking at R packages to provide an interface to Ollama.

AIBiological Sciences
Published
Author Stephen Turner

I had good intentions to give NaNoWriMo a try this year but didn’t get very far. Instead I gave OpenAI’s Creative Writing Coach GPT a try for a (very) short story I had in mind, inspired by my frustration trying to access closed-access research articles for a review article I’m preparing.