Rogue Scholar

Natural Sciences

Leave Beacons in Code

Published June 24, 2022

Leave beacons in your code. I would have avoided a silly error if a variable named xgb_train_data would have been named, for example, xgb_train_data_filepath instead. When you can’t leave globally unique, persistent, resolvable identifiers (GUPRIs), mind your beacons. References: F. Hermans, The Programmer’s brain: what every programmer needs to know about cognition , pp28-30. Shelter Island, NY: Manning, 2021.

Natural Sciences

CFF for Machine-Actionable Software Citations

https://doi.org/10.59350/3h71e-wsp52

Published June 23, 2022

Author Donny Winston

Add a CITATION.cff file to your git repository. The Citation File Format is automatically rendered on GitHub and usable by Zenodo and Zotero. Already have a DOI? Let’s see about a DOI-to-CFF tool. Looks like there’s doi2cff, but it’s currently restricted to DOIs on Zenodo that are tagged as software releases.

Natural Sciences

PageRank of Linked Open Vocabularies (LOV)

https://doi.org/10.59350/7fx2v-8f226

Published June 15, 2022

Author Donny Winston

Datasets are easier to reuse if they use standards that are well-established, particularly in a given domain. A first approach is to ask around – ask people with whom you coauthor , people you trust in your field, etc. A follow-on approach is to examine the “graph reputation” of relevant standards, particularly if they may be represented as resources with outbound links.

Natural Sciences

Lean Web - Principles of Lean Thinking applied to Web Development

https://doi.org/10.59350/tm820-mys68

Published June 9, 2022

Author Donny Winston

Lean manufacturing aims to reduce waste in production processes and to reduce response times to consumers from producers. Womack and Jones ¹ authored five key principles for lean thinking in the context of manufacturing: Value : Identify the value of a product to a consumer. Value Stream - Identify the minimal process (steps, time, information, material) to produce the value.

Natural Sciences

Hallucinating Datasets Across Epochal Time

https://doi.org/10.59350/ca9bw-s1897

Published June 9, 2022

Author Donny Winston

“Dataset” is a derived notion, a psychological construct, where “versions” of the dataset are a succession of values that we perceive to be causally related. “Dataset” is a side effect.

Natural Sciences

¬ consistent ⇒ ¬ valid ⇒ ¬ accurate

https://doi.org/10.59350/dg9by-6gb12

Published June 3, 2022

Author Donny Winston

If it’s not consistent, it can’t be valid. If it’s not valid, it can’t be accurate. If it’s not accurate, who cares if it’s timely? Subscribe to get short notes like this on Machine-Centric Science delivered to your email.

Natural Sciences

W3C data recommendations -- there are many!

https://doi.org/10.59350/4yddb-wdx72

Published June 1, 2022

Author Donny Winston

The World Wide Web Consortium (W3C) publishes a range of specifications and guidelines which help move web standards forward. However, even when restricting scope to the Latest version of specifications with the status Recommendation and with the tag Data, there are currently 77 of them: https://www.w3.org/TR/?tag=data&status=REC&version=latest!

Natural Sciences

Data Stacks for FAIR

https://doi.org/10.59350/f9zfd-jmz47

Published May 30, 2022

Author Donny Winston

I noticed a pattern at the top of each case study listed by Stemma.ai, which provides data catalog software as a service based on the open-source Amundsen code. Each case study’s so-called “Data Stack” comprises up to four distinct categories of functionality – Data Catalog, Data Warehouse, ETL, and Business Intelligence.

Natural Sciences

A Sign Helps You Use It as Though It Were an X

https://doi.org/10.59350/j3yk7-1h854

Published May 29, 2022

Author Donny Winston

For evolvable data exchange, you need to be able to continually add qualified references galore so that participants can reason by analogy – i.e., each new thing resembles something known before. This is FAIR principle I3, which depends on I1 and I2 for robustness. Subscribe to get short notes like this on Machine-Centric Science delivered to your email. M. Minsky, The Society of Mind . New York: Simon and Schuster, 1986, p.

Natural Sciences

Sending Signal-Signs

https://doi.org/10.59350/z3ebc-4kq78

Published May 29, 2022

Author Donny Winston

Sending signal-signs ¹ to steer engines of compute, the wheel does no work. Subscribe to get short notes like this on Machine-Centric Science delivered to your email.

Natural Sciences

FAIR Principle R1.1: Meta(data) are released with a clear and accessible data usage license

https://doi.org/10.59350/4az1n-62n91

Published May 29, 2022

Author Donny Winston

I’ve been recording introductions to each of the 15 FAIR Principles and releasing them as episodes of my Machine-Centric Science podcast (https://podcast.polyneme.xyz/). I just released the 13th one, featuring an overview of various data and code licenses. Listen here. Full transcript below (but also linked to via the episode landing page): ====== Hello, and welcome to Machine-Centric Science.