Rogue Scholar

PackagingPythonToxJustCookiecutter-snekpackNatural Sciences

Switching from using Tox to Just

Published September 21, 2025

Author Charles Tapley Hoyt

I became aware of just while watching Hynek’s second video on uv a few months ago. I immediately fell in love with its elegance and simplicity, so I have begun replacing task running in my repositories that relied on tox with just. This post gives a bit of background, context, and walks through making the switch on one of my repositories that has some annoying dependencies.

NFDISPARQLBioregistryNatural Sciences

Exploring an unfamiliar SPARQL endpoint with the Bioregistry - a case study from NFDI4Culture

https://doi.org/10.59350/56vkd-a9r60

Published September 11, 2025

Author Charles Tapley Hoyt

Earlier this week at the sixth NFDI4Chem consortium meeting, Torsten Schrade from the NFDI4Culture consortium gave a lovely and whimsical talk entitled A Data Alchemist’s Journey through NFDI which explored ways that we might federate and jointly query both consortia’s knowledge via their respective SPARQL endpoints.

CURIEURIURNIRIIdentifiersNatural Sciences

Validating the FAIRness of knowledge graphs and ontologies in RDF using the Bioregistry

https://doi.org/10.59350/36vqy-w6d88

Published September 4, 2025

Author Charles Tapley Hoyt

Using standard CURIE prefixes and URI prefixes in semantic web artifacts such as Resource Description Framework (RDF) promotes interoperability, enables reuse in downstream data integration, and makes data more FAIR. The Bioregistry defines a set of standard CURIE prefixes and URI prefixes against which RDF files can be validated/standardized.

ChEMBLCheminformaticsChemoinformaticsChemistryBibliometricsNatural Sciences

A historical analysis of ChEMBL

https://doi.org/10.59350/z1tb1-a5q29

Published August 26, 2025

Author Charles Tapley Hoyt

I’ve recently submitted an article to the Journal of Open Source Software (JOSS) describing chembl-downloader, a Python package for automating downloading and using ChEMBL data in a reproducible way. In this post, I use chembl-downloader to show how the number of compounds, assays, activities, and other entities in ChEMBL have changed over time.

CURIEURIURNIRIIdentifiersNatural Sciences

Measuring the impact of the Bioregistry

https://doi.org/10.59350/rffdx-en229

Published August 22, 2025

Author Charles Tapley Hoyt

The Bioregistry is a database and toolchain for standardization of prefixes, CURIEs, and URIs that appear in linked (open) data. While I created it in 2019 as a component of PyOBO in order to support parsing database cross-references appearing in biomedical ontologies, it has since become an independent project with a community-driven governance model and much broader applications. This post is a first attempt to quantify its usage and impact.

BiomarkerSemantic SpacesBioregistryBiomarkerKBNatural Sciences

The Bioregistry and BiomarkerKB

https://doi.org/10.59350/qys3w-gy425

Published August 22, 2025

Author Charles Tapley Hoyt

The Bioregistry is a community-driven registry of semantic spaces and their metadata. When I learned about BiomarkerKB at the International Society for Biocuration’s 18th Annual International Biocuration Conference, I was excited to curate new records (and prefixes) in the Bioregistry to cover BiomarkerKB’s semantic spaces on biomarkers.

OntologyEmbeddingsBertSbertSimilarityNatural Sciences

Text-based embeddings of ontology terms

https://doi.org/10.59350/fb2rw-f7w29

Published August 4, 2025

Author Charles Tapley Hoyt

The Ontology Lookup Service (OLS) is now indexing dense embeddings for ontology terms constructed from term labels, synonyms, and descriptions using LLMs. I maintain a Python client library for the OLS (ols-client) and was recently asked to implement a wrapper to the OLS’s API endpoint that exposes these embeddings.

Ontology MergingSemantic WebSemantic MappingsBioinformaticsOntologiesNatural Sciences

Inference over Semantic Mappings with SeMRA

https://doi.org/10.59350/t965v-xtw11

Published April 28, 2025

Author Charles Tapley Hoyt

Assembling and inferring missing semantic mappings is a timely problem in biomedical data and knowledge integration. I’ve been developing the Semantic Mapping Assembler and Reasoner (SeMRA) as a generic toolkit for this. In this blog post, I highlight its inference capabilities. SeMRA implements the chaining and inference rules described in the SSSOM specification.

PythonMypyStatic TypingNatural Sciences

I wish I could unpack Callables in Python type annotations

https://doi.org/10.59350/tcz2x-n4d84

Published April 23, 2025

Author Charles Tapley Hoyt

Following the theme of my previous two posts, I’ve run into another typing conundrum where I want to unpack a pre-existing Callable into a class with Generic[P, T] where P is a parameter specification type (i.e. ParamsSpec) After figuring out the right way to declare a generic featuring a ParamSpec, I updated the class-resolver package to use the shiny new (and more accurate) annotations.

PythonMypyStatic TypingNatural Sciences

Using ParamSpec with Python Generics

https://doi.org/10.59350/a9srr-an019

Published April 22, 2025

Author Charles Tapley Hoyt

I’ve been working on applying strict static typing to my Python package class-resolver and ran into an interesting way of using generics in combination with parameter specification variables (i.e., ParamSpecs). Normally, if you want to type annotate a function, you use the Callable, which works like the following: from collections.abc import Callable #: the [int] represents a function that takes in a single integer, #: and returns a single

PythonMypyStatic TypingNatural Sciences

A dilemma with PEP-696 default generics when using optional static typing in Python

https://doi.org/10.59350/3zq9w-my741

Published April 19, 2025

Author Charles Tapley Hoyt

This post describes an issue I’ve had with writing correct types when using PEP-696 defaults in typing.TypeVar. I posted the exploration in a companion repository on GitHub. The motivation behind this comes from my work in biomedical data integration and the semantic web.

Biopragmatics

Switching from using Tox to Just

Exploring an unfamiliar SPARQL endpoint with the Bioregistry - a case study from NFDI4Culture

Validating the FAIRness of knowledge graphs and ontologies in RDF using the Bioregistry

A historical analysis of ChEMBL

Measuring the impact of the Bioregistry

The Bioregistry and BiomarkerKB

Text-based embeddings of ontology terms

Inference over Semantic Mappings with SeMRA

I wish I could unpack Callables in Python type annotations

Using ParamSpec with Python Generics

A dilemma with PEP-696 default generics when using optional static typing in Python