Biological SciencesSubstack

Paired Ends

Bioinformatics, computational biology, and data science updates from the field. Occasional posts on programming.
Home PageRSS FeedMastodon
language
R Biological Sciences
Published

I saw this Tweet a few days ago from Jeremy Leipzig counting the number of GitHub repositories that sequencing-related companies have on their organization page, but was immediately curious how many of these had an open-source license versus a restrictive or unknown license. I tried to one-shot this with GPT-5 given a screenshot of Jeremy’s Tweet and a few instructions. It got me 90% of the way but I had to make a few tweaks here and there.

Biological Sciences
Published

Happy Friday, friends. This is my regular attempt to close out my browser tabs I’ve accumulated over the past week with blog posts, podcasts, papers, etc. in data science, genomics, public health, programming, scicomm, and other miscellany. Enjoy! Subscribe now A new study in the European Heart Journal on Accelerated vascular ageing after COVID-19 infection, and Eric Topol’s coverage on Ground Truths, COVID and our arteries.

AIBiological Sciences
Published

I recently reviewed a manuscript for a biotechnology journal I won’t name. The paper was clearly written by AI. I don’t mean edits or revisions here and there, I mean the entire manuscript. As if they had copied and pasted the entire contents of a certain research organization’s website into ChatGPT and asked for a paper on the topic.

PapersBiological Sciences
Published

I’m trying something new this week. I’ve been publishing weekly recaps on this newsletter for over a year now where I take a small deep dive into a few papers I’ve read recently. This takes time, and there’s often so much more interesting and relevant research and news in the data science + biotech space than I could possibly write about here. So I’ll be trying something different here for the next few weeks.

PapersBiological Sciences
Published

This week’s recap highlights analysis of human de novo mutation rates from a four-generation pedigree reference, how LLMs internalize scientific literature and citation practices, the py_ped_sim forward pedigree and genetic simulator for complex family pedigree analysis, and a review on predicting gene expression from DNA sequence using deep learning models like Enformer and Borzoi.

Biological Sciences
Published

I recently wrote a piece about leaving academia for biotech. I left academia for industry in 2019. I spent four years at a consulting firm before joining Colossal Biosciences. This week I’m returning to the University of Virginia School of Data Science as a tenured associate professor and dean of research. The transition from academia to industry can be tricky, but it’s also increasingly common.

PapersBiological Sciences
Published

This week’s recap highlights nanoMDBG for metagenome assembly from nanopore reads, the SCassist AI-based workflow for single-cell analysis, discovery and characterization of GxE and GxG effects in a vertebrate model, the PIGEON framework for estimating gene-environment interaction for polygenic traits, and long-read alignment with multi-level parallelism.

AIBiological Sciences
Published

I’ve written a lot about Ollama here. Ollama lets you run open-weight models like Llama, Gemma, Mistral, Qwen, DeepSeek, etc. on your own computer. You don’t have to pay for a frontier model like ChatGPT, Claude, or Gemini, and all the inputs and outputs stay on your computer, minimizing any privacy and security concerns. Until recently Ollama was a command-line only tool.

AIBiological Sciences
Published

I liked Steve Krouse’s essay, “Vibe code is legacy code.” It helped crystalize some half-baked thoughts I have on vibe coding. Here’s an excerpt.Subscribe now Maintainability and vibe are inversely correlated I’ve been using GitHub copilot and chatbots for code for years, and I’ve written about them a lot here.

PapersBiological Sciences
Published

This week’s recap highlights Variant-EFFECTS for rewriting regulatory DNA to dissect and reprogram gene expression, zero-shot evaluation revealing the limitations of single-cell foundation models, EcoWeaver for large-scale prediction of gene functional associations from coevolutionary signals, and how assemblies of long-read metagenomes suffer from diverse errors.