Welcome to LWN.netThe following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider accepting the trial offer on the right. Thank you for visiting LWN.net! |
|
By Jonathan Corbet
October 30, 2020
Linux distributors are in the business of integrating software from multiple sources, packaging the result, and making it available to their users. It has long been true that some projects are easier to package than others. The
Debian technical committee (TC)is currently being asked to make a decision in a dispute over how an especially hard-to-package project —
Kubernetes— should be handled. Regardless of the eventual outcome, this disagreement clearly shows how the packaging model used by Linux distributors is increasingly mismatched to how software is often developed in the 2020s; what should replace that model is rather less clear, though.
A longstanding rule followed by most distributors is that there should be only one copy of any given library (or other dependency) in the system, and that said copy should usually be in its own package. To do otherwise would bloat the system and complicate the task of keeping things secure. As an extreme example, consider what would happen if every program carried its own copy of the C library in its package. Those thousands of copies would consume vast amounts of both storage space and memory. If a security vulnerability were found in that library, thousands of packages would have to be updated to fix it everywhere. A single library package shared by all users, instead, is more efficient and far easier to maintain.
This rule is thus contrary to the practice of stuffing dependent libraries into the package of a program that needs them — a practice often called "vendoring". Living up to this rule can be challenging, though, with many modern projects, which also often engage in a fair amount of vendoring. Projects written in certain languages appear to be especially prone to this sort of behavior; the Go language, for example, seems to encourage vendoring.
Kubernetes is written in Go, and it carries a long list of dependencies with it. It was maintained in Debian for a while by Dmitry Smirnov, but he orphaned Kubernetes in 2018, stating that packaging it is "a full time job, probably for more than one person". The Kubernetes package was eventually picked up by Janos Lenart, who has been supplying updated versions to the Debian Testing repository.
Kubernetes vendoring considered harmful
Back in March, though, Smirnov made it clear that he was far from happy with how Lenart has approached the task of packaging Kubernetes. Rather than work to build Kubernetes with independently packaged libraries in the Debian repository, Lenart has chosen to vendor those libraries into the Kubernetes package directly. The Kubernetes 1.19.3 package contains over 200 of these libraries; the directory of applicable licenses alone contains 3MB of text. A README file added by Lenart notes that this approach may not suit everybody:
However, I kindly ask purist aspirations that effectively halted Kubernetes' release and updates in Debian for YEARS to be kept at bay. I wholeheartedly agree that in an ideal world all the 200+ Go packages in vendor/ would be packaged separately in Debian, all of them following the excellent semantic versioning perfectly. It would also be awesome if there was a robust and meaningful(!) way to link Go binaries dynamically. That being said, I feel that the most important step at the moment is to have Kubernetes available in Debian instead of postponing until that perfect world arrives.
Smirnov denied being a purist, but was clearly upset about what had been done to the package he once maintained. It is, in his mind, a violation of Debian's policies. What, he asked, can be done in a situation like this?
The resulting discussion was lengthy and often heated, as one might expect. This being Debian, the developers devoted a long subthread to the question of whether Debian developers really have to verify the licenses for every vendored dependency (there was no definitive answer to that question). The reasons behind Debian's policies and the degree to which they make sense when applied to a project like Kubernetes were explored, also without any real conclusions.
Lenart posted exactly one message to the thread, defending the changes to how Kubernetes is packaged. There are other packages in Debian with vendored dependencies, though none, he acknowledged, have anywhere near the 200 found in his Kubernetes package. Independently packaging hundreds of dependencies is not feasible, he said; Smirnov's attempts to do so has a lot to do with why most Kubernetes releases never made it into Debian. Even if that effort were to succeed, Debian's package would not use the versions of the libraries tested by the Kubernetes developers and would thus essentially be a fork that "no sane cluster admin would dare to use". With that many separate libraries, it would never be possible to get security updates out in a timely manner. Go binaries are statically linked, so the resource-consumption benefits of shared libraries are not available in any case. And so on.
Smirnov, unsurprisingly, was not impressed with this list of justifications, and put some effort into casting Lenart as being too inexperienced to manage a package like Kubernetes. Many others argued for or against specific points until the conversation eventually wound down with nobody seemingly having budged from their initial positions.
To the technical committee
The topic then went quiet — on the public lists, at least — until the beginning of October, when Smirnov took the issue to the TC for resolution. The Debian TC exists to make decisions on technical disputes that Debian developers are unable to resolve on their own; it was this committee, for example, that finally answered the question of whether Debian would move to systemd or not. Now the TC is being asked to decide whether the level of vendoring seen in the Kubernetes package is acceptable.
There has been little public discussion since this request was filed, but a couple of interesting things have come out anyway. One was this message from Shengjing Zhu noting that Kubernetes, too, is a library that is depended upon by other packages. But Kubernetes is not packaged in a way that allows others to use it; doing so, Zhu said, would require decoupling all of its own vendored dependencies. Without that, every package that needs the Kubernetes library must vendor its own copy of Kubernetes, which does not seem like a rational path.
As part of the TC's deliberation, Sean Whitton asked the Debian security team about the security implications of that level of vendoring. Since security is one of the primary arguments against vendoring, one might expect the security team to dislike the idea; the actual response from Moritz Mühlenhoff was somewhat more nuanced than that. Supporting Kubernetes in a stable release is difficult in the best of situations, he said, because upstream only supports specific releases for one year, "and it would be presumptuous to pretend that we can seriously commit to fix security issues in k8s for longer than upstream". Given that, there are two options that Debian could consider for this package.
The first of those options would be to just not ship Kubernetes in a Debian stable release at all. Debian users would then obtain it either from the Testing repository (which does not receive security support) or from outside of Debian entirely. The alternative is to just update Kubernetes wholesale whenever a security problem is disclosed and upstream is no longer supporting the version shipped by Debian. That is an unusual practice for Debian, he allowed, but Kubernetes users are already used to it.
Crucially, he said that if Debian ships Kubernetes in a stable release (and thus goes with the second option above), vendoring the dependencies as is being done currently is the only realistic option. Otherwise, the chances of a newer Kubernetes release working with the older versions of its dependencies shipped by Debian are small at best. Rather than impeding the security effort in this case, vendored dependencies appear to be the only way that the Debian security team could support Kubernetes at all.
In the end, the options listed by Mühlenhoff are probably the only ones available to the TC. The committee could try to mandate that the Kubernetes package be managed like others, with few (if any) vendored dependencies, but it has no authority to order any developer to actually do the work to make that happen. So such a mandate is highly likely to be equivalent to saying that Debian does not ship Kubernetes at all.
Not just Kubernetes
The TC has not given any indication of when it will make a decision on this issue. Regardless of the outcome, though, this issue is one that is likely to come up again. There is a small but growing set of free-software projects that are simply too unwieldy for most distributors to handle on their own. Beyond Kubernetes, web browsers clearly fall into this category. Distributors have generally given up on trying to backport patches to older browser releases; they just move their users forward to new releases when they happen. The resources to do things any other way just do not exist.
The kernel might in some ways be the original example of this kind of package, but with some interesting differences. The kernel, too, is a huge and fast-moving project; most distributors have no hope of trying to maintain an older release on their own. The distributors that do maintain such versions — in "enterprise" distributions usually — dedicate massive resources to keeping those kernels working and secure. Others depend heavily on the fact that the kernel project itself is now maintaining releases for several years; the 4.4 kernel has received 241 updates (at last count) with 16,422 patches. Debian is an interesting exception in that it does maintain old kernels for a long time, but that support, too, benefits from the kernel's long-term support work. In the absence of that support, most distributors would have to choose between not even pretending to keep their kernels maintained (a favorite choice of embedded vendors) or upgrading users to current releases.
The kernel, at least, is self-contained; most projects of any size accumulate dependencies quickly, and many current programming environments encourage tying dependencies to specific versions of libraries — through a relative lack of concern about ABI compatibility if nothing else. Such applications will be painful to package; Kristoffer Grönlund's 2017 linux.conf.au talk on the subject is still highly relevant.
In other words, the Linux distribution model that was first worked out in the 1990s is increasingly unsuited to the way software is developed in the 2020s. Distributors understand that and are investigating ways to stay relevant, including new package-management techniques, immutable distributions, and more. Preserving the best of what distributions have to offer while taking advantage of the best of what the software-development community has to offer will prove challenging for some time. It is, as some might put it, a high-quality problem to have, but doesn't make it easy to solve.
Did you like this article? Please accept our trial subscription offer to be able to see more content like it and to participate in the discussion.
(
Log into post comments)
from Hacker News https://ift.tt/3mF3ruJ
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.