Tensions rise in the condaverse

GitHub has a tool called Dependabot that automatically finds outdated package versions pinned in project configuration files and issues a pull request to update them. Support for conda environment.yml files has long been one of the most requested features in the Dependabot repo. At long last, GitHub has now added partial support for conda to Dependabot, first as a beta announced last week, and now generally available. But there have been some issues with the rollout.

The main appeal of conda over something like Poetry, uv, or just plain-old requirements.txt is that conda can manage arbitrary dependencies, not just Python packages. You can conda create --no-default-packages git micro compilers to set up a Fortran dev environment if you want. Dependabot’s conda support includes only Python packages. A few folks grumbled about this limitation in the GitHub issue comments, but it’s understandable: The space of “all conda installable packages” is vast indeed, and the Dependabot devs had to start somewhere.

A more compelling criticism of the new feature stems from the fact that Dependabot determines the latest versions of Python packages by looking up the names given in environment.yml on PyPI. This is a problem because PyPI is an entirely different package ecosystem from conda. Some package versions are released on PyPI well before they appear in conda repos, and some packages have different names between the two.

For a nasty example, Ipopt is a nonlinear programming solver written in C, and cyipopt provides Python bindings. conda install ipopt installs the C library, and conda install cyipopt installs the Python wrapper. But pip install ipopt actually refers to cyipopt. The upshot, if I understand correctly, is that if you pin ipopt in your environment.yml, then Dependabot will check its version number against that of the latest version of cyipopt, a flawed comparison.

Luckily, Ipopt/cyipopt is the only such case I could find in this Rosetta stone (the fact that this exists …) mapping package names across ecosystems. But anyone(ish) can post packages on PyPI, so the current behavior of Dependabot creates new opportunities for typo-squatting attacks on conda users. As Jannis Leidel (a conda maintainer) put it, “This premature rollout makes the conda ecosystem less secure and shouldn’t have occurred.”

I’m not sure what the right move is for Dependabot. For a start, they could use the Rosetta stone to map conda packages to the correct PyPI names, but this would only solve the naming issue, and not the possibility of different versions between the two repositories.

Is ABC-SMC just an evolutionary algorithm?

Suppose we have data DD and a model that expresses DD as a noisy function of a parameter vector θ\theta. We want to determine a value of θ\theta that fits the data. For the purposes of this post, we’re concerned with models that are “difficult,” meaning we cannot write down a simple expression for the likelihood function and maximize it, whether analytically (as in ordinary least squares) or numerically (as in nonlinear regression). In fact, all we really know how to do is sample data from the model when given an arbitrary θ\theta. (We’ll get a different DD every time, because the model is nondeterministic.)

If you enjoy Bayesian statistics, then you may have already pattern-matched this problem statement to the ABC-SMC algorithm. But if you are like me and view parameter estimation as an optimization problem (there is no reason to privilege this view; it’s just how I turned out), then you might instead apply an evolutionary algorithm. Below, I describe such an algorithm, then argue that ABC-SMC is a special case. This insight suggests improvements to the implementation and usage of both evolution and ABC-SMC.

Read more →

Full of types

I’m obsessed with this essay “Raising a person in a culture full of types” by Dan Brooks. Before we get to the author’s message, let’s take a moment to appreciate his sense of comedic timing, e.g.:

My son talks incessantly about VSCO girls and Karens and other categories of people he has learned about from YouTube. He described a classmate as “the kind of person who borrows your pencil and doesn’t give it back,” i.e. she borrowed his pencil and didn’t give it back. For a while he tried to propagate a type of his own invention, “the Suzan,” whose behavior was ill-defined but tracked closely with that of my mother of the same name. It did not catch on, and eventually he concluded that he was not the kind of person who could come up with memes.

Is it just me, or is this a highly clever paragraph?

Brooks’s point, expressed better there than I will here, is that our culture’s emphasis on “being“ over “doing” prevents us from separating people’s actions from their destiny. At best, the idea that people belong to fixed, inescapable categories is merely the antithesis of the growth mindset; it saps motivation from our desire to try new things (why learn programming if I am not a “math person”?). At worst, as Brooks points out, “The illusion of a fixed nature gives us an excuse to repeat bad behavior,” because every mistake is like a movie trailer for the rest of your life. And like a TV commercial, the fixed mindset is an illusion with a slope (sorry): it inclines us toward the most automatic decision instead of assessing the alternatives on their merits.

I will not pretend to be a paragon of the growth mindset. I talk myself out of good ideas, such as learning how to actually cook, all the time. But if I could choose to have one impact on the world, it would be to motivate those around me to reject limiting beliefs and embrace challenge—even at the risk of failure. And if linking you again to this essay, which has a wonderful subplot about fatherhood, is the way to do it, then, OK.

Jekyll plugin to recommend related posts

I wrote my first plugin for the Jekyll static website builder: a tool that recommends related posts at the end of each page. It determines the similarity between post pairs using a fairly unremarkable token-counting algorithm, so it’s fast enough to rerun on every site build. You can configure the number of posts to recommend and a parameter factor which determines the algorithm’s sensitivity to rare vs. common words.

I made a little demo of the plugin with a fake blog whose posts are the articles of the UN Universal Declaration of Human Rights. You can also see a demo on the current version of this site if you click the “read more” link below to go to this post’s individual page. I think it works pretty well!

Read more →

Some things I tried recently

Kagi Search: It’s a paid search engine that promises to give better results than Google and friends. Indeed, the search results are a little more relevant, especially when researching technical topics. I made great use of the ability to filter and promote entire domains. However, the pricing doesn’t work for me: $5/month gets you 300 searches, which isn’t enough (I burned through the free 100 searches in a week), and for unlimited searches, you have to pay $10 for a bundle deal that also includes AI stuff I don’t care about. Kagi wants to become an everything app (probably adding email soon), which a tough sell while claiming to be a privacy-focused company. (Same issue with Proton, by the way.)

Fender Studio: Fender, the guitar company, just kind of threw this over the fence in May. It’s a free (but not open source) digital audio workstation, so it competes with the likes of Ardour and GarageBand. But Fender Studio runs on Linux, and quite well at that. On my machine, it supports JACK with minimal configuration and achieves lower latency than Guitarix while doing a lot more. The vendored backing tracks are a bit cheesy but well engineered.

Proselint: It’s a prose … linter, i.e. you feed it your draft blog post and it complains about vague wording and common typography problems like curly vs. straight quotes. I like that Proselint uses regex instead of an LLM, so there’s no creative interference; it’s more like an automated style guide than a chatty editor. But my homegrown typography.py script (I need to upload this to GitHub sometime) enforces a few lesser irks, such as en dashes in numerical ranges, that Proselint lets be, so I’m still using both.